Feat/rag p1 runner eval#444
Merged
Merged
Conversation
RAG was a contract+validator+MCP layer with no execution: codegen emitted zero, and the only retriever was lexical Jaccard, exported but never run. This makes a `.kern` RAG spec executable end-to-end in the toolchain. Architecture (retrieve/verify split): retrieval runs in ONE engine (toolchain-side, dbt-test shape), so there is no TS↔Python surface and the parity-lockstep rule does not apply — nothing is emitted to either target. The seam signature `RagContractRetriever` already existed; this fills it in. - rag-embedding.ts: `Embedder` seam + `DeterministicHashEmbedder` (zero-dep, FNV-1a-32 feature hash, L2-normalised, byte-reproducible) and `EmbeddingRagIndex` (cosine; ordering/filtering/citation-defaulting mirror `InMemoryRagCorpus`; integerised fixed-point scores, signed-zero normalised). - rag-eval-runner.ts: `evaluateRagEvalDocument` — parse → collectRagSemanticFacts → run each `ragEval` against the real retriever via the existing `evaluateRagEvalContract`. - cli rag.ts: `kern rag eval <file.kern> --corpus <chunks.json>` (PASS exit 0 / FAIL exit 1 with per-assertion diagnostics). On-disk ingestion is P1.5. Tests: - rag-eval-mutants.test.ts: discriminating-oracle gate — 5 mutant retrievers (wrong-L2, sign-flip, off-by-one-topK, wrong-tie-break, jaccard-imposter), each provably killed by a specific case; the real retriever passes all. - rag-embedding.test.ts: embedder/index/determinism unit coverage. - rag-eval-document.test.ts: parsed `.kern` ragEval → real cosine retrieval. Determinism substrate only; semantic embedder + ingestion land in P1.5. ⚔️ Forged by [Agon](https://github.com/KERNlang/agon) Co-Authored-By: agon (KERN) <292465531+KERN-Agon@users.noreply.github.com>
… hardening
Agon review (5/6 engines): 1 blocking + several important findings, folded in:
- EmbeddingRagIndex now keys chunks by id (Map) → duplicate ids upsert
last-write-wins instead of returning duplicates that trip the
retrieve-result duplicate-id guard [codex 0.98].
- Defensive deep copy (structuredClone) on add + on metadata output, so
caller mutation of corpus chunks can't leak into retrieval [codex 0.95].
- Tie-break uses a pinned code-point comparator, not locale-dependent
localeCompare — supports the cross-environment determinism claim [kimi].
- roundScore guards non-finite input (NaN/Infinity → 0) [kimi].
- evaluateRagEvalDocument fails closed on invalid RAG semantics
(validateRagSemantics) — a contract over a broken spec is meaningless;
report now carries `diagnostics` [codex 0.84].
- CLI: validate corpus element shapes with a clear error (was a raw cast)
[kimi blocking]; catch evaluation throws; zero ragEval → exit 0
("nothing to run" is not a CI failure) [kimi]; surface INVALID specs;
drop the redundant parseFlag fallback (parseFlagOrNext handles both forms).
Disproven findings skipped: createInMemoryRetriever import (tests green);
zero-ragCase vacuous pass (evaluateRagEvalContract already fails closed).
New tests: duplicate-id upsert, defensive-copy isolation, no-diagnostics on
valid spec, fail-closed on unresolved refs.
⚔️ Forged by [Agon](https://github.com/KERNlang/agon)
Co-Authored-By: agon (KERN) <292465531+KERN-Agon@users.noreply.github.com>
…guard ts2534 FP) kern-guard flagged ts2534 on fail() — a false positive: the real build (`tsc -b`, same as CI) compiles cleanly because @types/node types process.exit as `never`, so the end is unreachable. kern-guard's isolated ts-morph analysis didn't resolve that and saw a reachable end under a `never` return. fail() now throws; cli.ts main() already catches and exits 1 (identical message + exit code). A function that always throws is unambiguously `never` in every toolchain, so the finding no longer depends on type resolution. ⚔️ Forged by [Agon](https://github.com/KERNlang/agon) Co-Authored-By: agon (KERN) <292465531+KERN-Agon@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Why
How
Checklist
tsc -bpassespnpm testpassespnpm test:kernpassespnpm lintpasseskern review packages/ --recursivechecked