Deep Technical Docs

System Architecture & Execution Flow

A ground-level view of how Evida ingests, validates, and commits versioned AI knowledge at production scale.

Pipeline Overview

01 — Execution Model

Durable Execution

Every generative task is modelled as a persistent state machine. Before any inference step executes, its preconditions and inputs are written to a durable store. If the worker process crashes, is preempted, or loses its network connection mid-run, the task is automatically re-enqueued and resumes from the last committed checkpoint — not from the beginning.

Long-running tasks — multi-step RAG pipelines, recursive Judge–Gen loops iterating three or four times — are never lost to transient infrastructure failures. Every transition is recorded: node entered, timestamp, inputs, and output hash. Replay is deterministic.

A task scheduled at t=0 and interrupted at t=45s completes at t=47s — two seconds of re-queue overhead, zero data loss.

02 — Quality Gate

Comparative Scoring

EVALUATE

score_candidate(output, rubric) → [correctness, completeness, clarity, usefulness]

Four independent judge passes. Each dimension scored 0.0 – 1.0.

COMPARE

composite = weighted_sum(scores) vs. current_version.composite

Candidate must exceed both global threshold AND the current live version.

COMMIT

if composite ≥ max(0.96, current_version.composite): seal(output)

SHA-256 hash sealed. Full lineage chain written to version tree.

REJECT

else: re_enqueue(task, context=failure_report)

Failure context injected into next Re-gen prompt. Budget decremented.

4-Dimension Scoring Rubric

DIMENSION	DEFINITION	WEIGHT	MIN PASS
Correctness	Factual accuracy against ground-truth corpus	40%	0.90
Completeness	Full coverage of all required topics and sub-claims	25%	0.85
Clarity	Structural coherence, readability, and concision	20%	0.80
Usefulness	Actionability, specificity, and practical depth	15%	0.75

REGRESSION GUARD

The engine maintains a live composite scorefor every knowledge artifact in the version tree. A new candidate is only eligible for commit if its weighted composite strictly exceeds the current production score — not just the global pass threshold. This prevents a technically “passing” but inferior candidate from silently displacing a higher-quality version. Regressions are automatically rejected and surfaced in the audit log with a REGRESSION_BLOCKED status code.

03 — Orchestration Layer

Durable Workflow

✗ Stateless Serverless

—15-min hard execution ceilings

—No task state between invocations

—Cold start latency on every retry

—Failure = full restart from zero

—Audit trail requires external glue

✓ Persistent Orchestration

→Unbounded execution duration

→Full state persisted between activities

→Resume at exact checkpoint on failure

→Deterministic replay of any past run

→Native lineage — no external tooling

Stateful Execution Timeline — Multi-Minute Task

t +0:00TASK_ENQUEUED

Initial state written to durable store. Task ID, model config, and RAG context sealed.

t +0:12CHECKPOINT — RAG

Context retrieval complete. Embedding vectors and source document hashes persisted.

t +0:45CHECKPOINT — GEN_v1

First inference pass complete. Candidate output written. Worker process terminated (preempted).

t +0:47TASK_RESUMED

Re-queued from last checkpoint. Execution continues at GEN_v1 evaluation — no re-inference.

t +1:03CHECKPOINT — JUDGE

Evaluation complete. Composite score 0.91 — below regression threshold (current: 0.94). Retry.

t +1:28CHECKPOINT — GEN_v2

Second inference with failure context injected. Composite 0.974 — exceeds threshold.

t +1:34TASK_COMMITTED

State sealed. SHA-256 hash written to version tree. Execution record archived.

04 — Data Model

Version-Aware Execution State

ExecutionState.ts

interface ExecutionState {
  // ── Identity ──────────────────────────────────────────────
  task_id:             string         // "task_7f3a9b2c"
  schema_version:       string         // "evida/state/v2"
  status:               "pending" | "running" | "committed" | "rejected"

  // ── Lineage ───────────────────────────────────────────────
  parent_version_hash: string         // "sha256:9f86d081…"
  committed_hash:       string | null   // "sha256:e3b0c442…" | null
  checkpoint_at:        string         // ISO 8601 — last durable write

  // ── Model Config ──────────────────────────────────────────
  model_config: {
    id:            string       // "evida-gen-turbo-01"
    quantization:  "int8" | "fp16"
    temperature:   number       // 0.7
    max_tokens:    number       // 2048
  }

  // ── Evaluation ────────────────────────────────────────────
  evaluation_score: {
    correctness:   number       // 0.94  — 40% weight
    completeness:  number       // 0.91  — 25% weight
    clarity:       number       // 0.88  — 20% weight
    usefulness:    number       // 0.86  — 15% weight
    composite:     number       // 0.912 — weighted sum
  }

  // ── Loop Control ──────────────────────────────────────────
  recursive_depth:     number       // 2  — increments on each retry
  retry_budget:        number       // 3  — max allowed retries
}

05 — Hardware

Infrastructure Note

Inference nodes are sized and scheduled specifically for NVIDIA Tensor Core workloads. All model weights are quantized to INT8 for Tensor Core execution, reducing memory bandwidth pressure and increasing throughput on A100 and H100 class GPUs without measurable quality degradation on the task distribution we target.

Worker concurrency is tuned to maximise SM (Streaming Multiprocessor) utilisation rather than wall-clock latency. A single inference node handles multiple concurrent pipeline executions, batching Judge scoring passes to take advantage of CUDA kernel fusion where the scheduler allows.

⬡

Target hardware: NVIDIA A100 80 GB SXM / H100 NVLink. INT8 quantization via TensorRT. Batch size & sequence length are runtime-configurable per deployment tier.

← Back to Home