Deep Technical Docs
System Architecture & Execution Flow
A ground-level view of how Evida ingests, validates, and commits versioned AI knowledge at production scale.
Pipeline Overview
01 — Execution Model
Durable Execution
Every generative task is modelled as a persistent state machine. Before any inference step executes, its preconditions and inputs are written to a durable store. If the worker process crashes, is preempted, or loses its network connection mid-run, the task is automatically re-enqueued and resumes from the last committed checkpoint — not from the beginning.
Long-running tasks — multi-step RAG pipelines, recursive Judge–Gen loops iterating three or four times — are never lost to transient infrastructure failures. Every transition is recorded: node entered, timestamp, inputs, and output hash. Replay is deterministic.
A task scheduled at t=0 and interrupted at t=45s completes at t=47s — two seconds of re-queue overhead, zero data loss.
02 — Quality Gate
Comparative Scoring
score_candidate(output, rubric) → [correctness, completeness, clarity, usefulness]
Four independent judge passes. Each dimension scored 0.0 – 1.0.
composite = weighted_sum(scores) vs. current_version.composite
Candidate must exceed both global threshold AND the current live version.
if composite ≥ max(0.96, current_version.composite): seal(output)
SHA-256 hash sealed. Full lineage chain written to version tree.
else: re_enqueue(task, context=failure_report)
Failure context injected into next Re-gen prompt. Budget decremented.
4-Dimension Scoring Rubric
| DIMENSION | DEFINITION | WEIGHT | MIN PASS |
|---|---|---|---|
| Correctness | Factual accuracy against ground-truth corpus | 40% | 0.90 |
| Completeness | Full coverage of all required topics and sub-claims | 25% | 0.85 |
| Clarity | Structural coherence, readability, and concision | 20% | 0.80 |
| Usefulness | Actionability, specificity, and practical depth | 15% | 0.75 |
REGRESSION_BLOCKED status code.03 — Orchestration Layer
Durable Workflow
✗ Stateless Serverless
✓ Persistent Orchestration
Stateful Execution Timeline — Multi-Minute Task
Initial state written to durable store. Task ID, model config, and RAG context sealed.
Context retrieval complete. Embedding vectors and source document hashes persisted.
First inference pass complete. Candidate output written. Worker process terminated (preempted).
Re-queued from last checkpoint. Execution continues at GEN_v1 evaluation — no re-inference.
Evaluation complete. Composite score 0.91 — below regression threshold (current: 0.94). Retry.
Second inference with failure context injected. Composite 0.974 — exceeds threshold.
State sealed. SHA-256 hash written to version tree. Execution record archived.
04 — Data Model
Version-Aware Execution State
interface ExecutionState {
// ── Identity ──────────────────────────────────────────────
task_id: string // "task_7f3a9b2c"
schema_version: string // "evida/state/v2"
status: "pending" | "running" | "committed" | "rejected"
// ── Lineage ───────────────────────────────────────────────
parent_version_hash: string // "sha256:9f86d081…"
committed_hash: string | null // "sha256:e3b0c442…" | null
checkpoint_at: string // ISO 8601 — last durable write
// ── Model Config ──────────────────────────────────────────
model_config: {
id: string // "evida-gen-turbo-01"
quantization: "int8" | "fp16"
temperature: number // 0.7
max_tokens: number // 2048
}
// ── Evaluation ────────────────────────────────────────────
evaluation_score: {
correctness: number // 0.94 — 40% weight
completeness: number // 0.91 — 25% weight
clarity: number // 0.88 — 20% weight
usefulness: number // 0.86 — 15% weight
composite: number // 0.912 — weighted sum
}
// ── Loop Control ──────────────────────────────────────────
recursive_depth: number // 2 — increments on each retry
retry_budget: number // 3 — max allowed retries
}05 — Hardware
Infrastructure Note
Inference nodes are sized and scheduled specifically for NVIDIA Tensor Core workloads. All model weights are quantized to INT8 for Tensor Core execution, reducing memory bandwidth pressure and increasing throughput on A100 and H100 class GPUs without measurable quality degradation on the task distribution we target.
Worker concurrency is tuned to maximise SM (Streaming Multiprocessor) utilisation rather than wall-clock latency. A single inference node handles multiple concurrent pipeline executions, batching Judge scoring passes to take advantage of CUDA kernel fusion where the scheduler allows.
Target hardware: NVIDIA A100 80 GB SXM / H100 NVLink. INT8 quantization via TensorRT. Batch size & sequence length are runtime-configurable per deployment tier.