Skip to content

Feat/rag p1 runner eval#444

Merged
cukas merged 3 commits into
mainfrom
feat/rag-p1-runner-eval
Jun 16, 2026
Merged

Feat/rag p1 runner eval#444
cukas merged 3 commits into
mainfrom
feat/rag-p1-runner-eval

Conversation

@cukas

@cukas cukas commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

What

Why

How

Checklist

  • tsc -b passes
  • pnpm test passes
  • pnpm test:kern passes
  • pnpm lint passes
  • kern review packages/ --recursive checked

RAG was a contract+validator+MCP layer with no execution: codegen emitted
zero, and the only retriever was lexical Jaccard, exported but never run.
This makes a `.kern` RAG spec executable end-to-end in the toolchain.

Architecture (retrieve/verify split): retrieval runs in ONE engine
(toolchain-side, dbt-test shape), so there is no TS↔Python surface and the
parity-lockstep rule does not apply — nothing is emitted to either target.
The seam signature `RagContractRetriever` already existed; this fills it in.

- rag-embedding.ts: `Embedder` seam + `DeterministicHashEmbedder`
  (zero-dep, FNV-1a-32 feature hash, L2-normalised, byte-reproducible) and
  `EmbeddingRagIndex` (cosine; ordering/filtering/citation-defaulting mirror
  `InMemoryRagCorpus`; integerised fixed-point scores, signed-zero normalised).
- rag-eval-runner.ts: `evaluateRagEvalDocument` — parse →
  collectRagSemanticFacts → run each `ragEval` against the real retriever via
  the existing `evaluateRagEvalContract`.
- cli rag.ts: `kern rag eval <file.kern> --corpus <chunks.json>` (PASS exit 0 /
  FAIL exit 1 with per-assertion diagnostics). On-disk ingestion is P1.5.

Tests:
- rag-eval-mutants.test.ts: discriminating-oracle gate — 5 mutant retrievers
  (wrong-L2, sign-flip, off-by-one-topK, wrong-tie-break, jaccard-imposter),
  each provably killed by a specific case; the real retriever passes all.
- rag-embedding.test.ts: embedder/index/determinism unit coverage.
- rag-eval-document.test.ts: parsed `.kern` ragEval → real cosine retrieval.

Determinism substrate only; semantic embedder + ingestion land in P1.5.

⚔️ Forged by [Agon](https://github.com/KERNlang/agon)

Co-Authored-By: agon (KERN) <292465531+KERN-Agon@users.noreply.github.com>
… hardening

Agon review (5/6 engines): 1 blocking + several important findings, folded in:

- EmbeddingRagIndex now keys chunks by id (Map) → duplicate ids upsert
  last-write-wins instead of returning duplicates that trip the
  retrieve-result duplicate-id guard [codex 0.98].
- Defensive deep copy (structuredClone) on add + on metadata output, so
  caller mutation of corpus chunks can't leak into retrieval [codex 0.95].
- Tie-break uses a pinned code-point comparator, not locale-dependent
  localeCompare — supports the cross-environment determinism claim [kimi].
- roundScore guards non-finite input (NaN/Infinity → 0) [kimi].
- evaluateRagEvalDocument fails closed on invalid RAG semantics
  (validateRagSemantics) — a contract over a broken spec is meaningless;
  report now carries `diagnostics` [codex 0.84].
- CLI: validate corpus element shapes with a clear error (was a raw cast)
  [kimi blocking]; catch evaluation throws; zero ragEval → exit 0
  ("nothing to run" is not a CI failure) [kimi]; surface INVALID specs;
  drop the redundant parseFlag fallback (parseFlagOrNext handles both forms).

Disproven findings skipped: createInMemoryRetriever import (tests green);
zero-ragCase vacuous pass (evaluateRagEvalContract already fails closed).

New tests: duplicate-id upsert, defensive-copy isolation, no-diagnostics on
valid spec, fail-closed on unresolved refs.

⚔️ Forged by [Agon](https://github.com/KERNlang/agon)

Co-Authored-By: agon (KERN) <292465531+KERN-Agon@users.noreply.github.com>
…guard ts2534 FP)

kern-guard flagged ts2534 on fail() — a false positive: the real build
(`tsc -b`, same as CI) compiles cleanly because @types/node types
process.exit as `never`, so the end is unreachable. kern-guard's isolated
ts-morph analysis didn't resolve that and saw a reachable end under a
`never` return.

fail() now throws; cli.ts main() already catches and exits 1 (identical
message + exit code). A function that always throws is unambiguously `never`
in every toolchain, so the finding no longer depends on type resolution.

⚔️ Forged by [Agon](https://github.com/KERNlang/agon)

Co-Authored-By: agon (KERN) <292465531+KERN-Agon@users.noreply.github.com>
@cukas cukas merged commit 1fd27b0 into main Jun 16, 2026
4 checks passed
@cukas cukas deleted the feat/rag-p1-runner-eval branch June 16, 2026 19:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants