Deep Technical Docs

System Architecture & Execution Flow

A ground-level view of how Evida ingests, validates, and commits versioned AI knowledge at production scale.

Pipeline Overview

INGESTIONAPI / Webhook → Redis QueueORCHESTRATIONDurable Worker → RAG RetrievalEXECUTIONInference NodeNVIDIA Tensor Core — OptimizedVALIDATIONRecursive Judge Loop✗ confidence < 0.96✓ confidence > 0.96Re-genRecursive retryCommitState sealedretrySTORAGEImmutable Version Tree+ Vector Index

01Execution Model

Durable Execution

Every generative task is modelled as a persistent state machine. Before any inference step executes, its preconditions and inputs are written to a durable store. If the worker process crashes, is preempted, or loses its network connection mid-run, the task is automatically re-enqueued and resumes from the last committed checkpoint — not from the beginning.

Long-running tasks — multi-step RAG pipelines, recursive Judge–Gen loops iterating three or four times — are never lost to transient infrastructure failures. Every transition is recorded: node entered, timestamp, inputs, and output hash. Replay is deterministic.

A task scheduled at t=0 and interrupted at t=45s completes at t=47s — two seconds of re-queue overhead, zero data loss.

02Quality Gate

Comparative Scoring

EVALUATE

score_candidate(output, rubric) → [correctness, completeness, clarity, usefulness]

Four independent judge passes. Each dimension scored 0.0 – 1.0.

COMPARE

composite = weighted_sum(scores) vs. current_version.composite

Candidate must exceed both global threshold AND the current live version.

COMMIT

if composite ≥ max(0.96, current_version.composite): seal(output)

SHA-256 hash sealed. Full lineage chain written to version tree.

REJECT

else: re_enqueue(task, context=failure_report)

Failure context injected into next Re-gen prompt. Budget decremented.


4-Dimension Scoring Rubric

DIMENSIONDEFINITIONWEIGHTMIN PASS
CorrectnessFactual accuracy against ground-truth corpus40%0.90
CompletenessFull coverage of all required topics and sub-claims25%0.85
ClarityStructural coherence, readability, and concision20%0.80
UsefulnessActionability, specificity, and practical depth15%0.75

REGRESSION GUARD
The engine maintains a live composite scorefor every knowledge artifact in the version tree. A new candidate is only eligible for commit if its weighted composite strictly exceeds the current production score — not just the global pass threshold. This prevents a technically “passing” but inferior candidate from silently displacing a higher-quality version. Regressions are automatically rejected and surfaced in the audit log with a REGRESSION_BLOCKED status code.

03Orchestration Layer

Durable Workflow

✗ Stateless Serverless

15-min hard execution ceilings
No task state between invocations
Cold start latency on every retry
Failure = full restart from zero
Audit trail requires external glue

✓ Persistent Orchestration

Unbounded execution duration
Full state persisted between activities
Resume at exact checkpoint on failure
Deterministic replay of any past run
Native lineage — no external tooling

Stateful Execution Timeline — Multi-Minute Task

t +0:00TASK_ENQUEUED

Initial state written to durable store. Task ID, model config, and RAG context sealed.

t +0:12CHECKPOINT — RAG

Context retrieval complete. Embedding vectors and source document hashes persisted.

t +0:45CHECKPOINT — GEN_v1

First inference pass complete. Candidate output written. Worker process terminated (preempted).

t +0:47TASK_RESUMED

Re-queued from last checkpoint. Execution continues at GEN_v1 evaluation — no re-inference.

t +1:03CHECKPOINT — JUDGE

Evaluation complete. Composite score 0.91 — below regression threshold (current: 0.94). Retry.

t +1:28CHECKPOINT — GEN_v2

Second inference with failure context injected. Composite 0.974 — exceeds threshold.

t +1:34TASK_COMMITTED

State sealed. SHA-256 hash written to version tree. Execution record archived.

04Data Model

Version-Aware Execution State

ExecutionState.ts
interface ExecutionState {
  // ── Identity ──────────────────────────────────────────────
  task_id:             string         // "task_7f3a9b2c"
  schema_version:       string         // "evida/state/v2"
  status:               "pending" | "running" | "committed" | "rejected"

  // ── Lineage ───────────────────────────────────────────────
  parent_version_hash: string         // "sha256:9f86d081…"
  committed_hash:       string | null   // "sha256:e3b0c442…" | null
  checkpoint_at:        string         // ISO 8601 — last durable write

  // ── Model Config ──────────────────────────────────────────
  model_config: {
    id:            string       // "evida-gen-turbo-01"
    quantization:  "int8" | "fp16"
    temperature:   number       // 0.7
    max_tokens:    number       // 2048
  }

  // ── Evaluation ────────────────────────────────────────────
  evaluation_score: {
    correctness:   number       // 0.94  — 40% weight
    completeness:  number       // 0.91  — 25% weight
    clarity:       number       // 0.88  — 20% weight
    usefulness:    number       // 0.86  — 15% weight
    composite:     number       // 0.912 — weighted sum
  }

  // ── Loop Control ──────────────────────────────────────────
  recursive_depth:     number       // 2  — increments on each retry
  retry_budget:        number       // 3  — max allowed retries
}

05Hardware

Infrastructure Note

Inference nodes are sized and scheduled specifically for NVIDIA Tensor Core workloads. All model weights are quantized to INT8 for Tensor Core execution, reducing memory bandwidth pressure and increasing throughput on A100 and H100 class GPUs without measurable quality degradation on the task distribution we target.

Worker concurrency is tuned to maximise SM (Streaming Multiprocessor) utilisation rather than wall-clock latency. A single inference node handles multiple concurrent pipeline executions, batching Judge scoring passes to take advantage of CUDA kernel fusion where the scheduler allows.

Target hardware: NVIDIA A100 80 GB SXM / H100 NVLink. INT8 quantization via TensorRT. Batch size & sequence length are runtime-configurable per deployment tier.