L1: effect-size provenance gate (schema + fail-closed validator + six-PMID contract) by manuelcorpas · Pull Request #4 · ClawBio/ClawBench

manuelcorpas · 2026-06-30T15:45:46Z

Layer 1 of the scientific-correctness CI gate (#3) — now wired to live data

The deterministic, permanent fix for the fabricated/mismatched-citation failure class that has shipped past the keyword scanner three review rounds running (ancestry-risk-profiler #297). A citation is a falsifiable claim, not a label; this resolves every effect-size entry's PMID against GWAS Catalog + PubMed and fails closed when the cited paper does not support the (variant, trait, ancestry, effect) claim.

What's here

SCHEMAS/effect_size_provenance_schema.json — the contract every effect-size/ancestry-risk skill must satisfy: {variant.rsid, trait.{label,efo_id}, ancestry, effect, source.pmid}. Modelled on acmg_evidence_schema.json (Draft 2020-12, x-provenance).
HARNESS/validate_provenance.py — fail-closed validator mirroring validate_evidence.py: machine-readable ERROR_CODES, exception-guarded, injected Oracle. Ships CachedOracle (offline, reads the committed snapshot) and LiveOracle (GWAS Catalog REST + PubMed E-utilities).
scripts/build_provenance_snapshot.py + TRUTH/gwas_catalog/snapshot.json — live-built, committed freeze (413 records, 5 variants) so CI runs deterministically offline. Regenerate with the script.
tests/ — 21 tests: 14 deterministic fixture + 7 against the real snapshot.

Proven on real GWAS Catalog/PubMed data

Citation	Reality	Gate verdict
`20566908` cited as Genovese/APOL1/kidney	head-and-neck-cancer survey	🔴 `PMID_STUDY_MISMATCH` (block)
`20647424` (the real Genovese 2010) for APOL1/kidney	correct, but GWAS Catalog indexes the variant via PAGE/MVP	🟡 `TOPIC_MATCH_LOW` (sign-off, not false-rejected)
a genuinely catalogued (variant,trait,ancestry,pmid,OR) tuple	matches	✅ pass

The five wrong ancestry-risk-profiler PMIDs (20566908, 22158537, 27005778, 23945395, 17478679) all block; the correct 16415884 passes. Other codes covered: PMID_UNRESOLVABLE, ANCESTRY_MISMATCH, EFFECT_OUT_OF_RANGE, ASSOC_NOT_FOUND, SCHEMA_INVALID.

Design note discovered by wiring the live API

GWAS Catalog does not index every primary paper. So a cited PMID that is not a registered catalog study is not auto-blocked: the gate falls back to a PubMed topic check and flags on-topic-but-uncatalogued papers for human sign-off (TOPIC_MATCH_LOW), reserving hard blocks for genuinely off-topic citations. This keeps correct citations from being false-rejected.

Correctness note

While building this I asserted the real Genovese PMID as 20413513 from memory; that is wrong (it is an unrelated diabetes trial). Corrected to 20647424 after resolving against PubMed, and the #297 review was corrected too — the exact "typed a PMID from memory" failure this gate exists to prevent.

Verification

pytest tests/test_validate_provenance.py tests/test_provenance_snapshot.py → 21 passed. Full suite collects 250; neighbouring test_evidence_schema/test_acmg_points green. No import-time I/O.

Remaining (separate follow-ups, tracked in #3)

Backfill ancestry-risk-profiler's panel to the schema and run the gate on it in its own repo
GitHub Action that blocks skills/* PRs declaring emits_effect_sizes
Scale the snapshot panel beyond the audited variants

Refs #3.

🤖 Generated with Claude Code

Note

Medium Risk
Gate logic affects which skill PRs pass CI for health-related effect-size claims; incorrect rules or stale snapshot could block or warn wrongly, though behavior is heavily tested and advisory for catalog gaps.

Overview
Adds Layer 1 of the scientific-correctness CI gate: skills that declare emits_effect_sizes must ship data/provenance.json entries that match GWAS Catalog + PubMed, or the check fails closed.

Contract and validation: New SCHEMAS/effect_size_provenance_schema.json defines each association (variant, trait/EFO, ancestry, effect, PMID). HARNESS/validate_provenance.py validates entries via an injectable oracle with machine-readable codes (mismatch, ancestry, effect range, unresolvable PMID, etc.). When the catalog lacks the cited study, a PubMed topic overlap yields advisory TOPIC_MATCH_LOW instead of a hard block so correct primary papers are not auto-rejected.

CI runner: HARNESS/run_provenance_gate.py discovers skills from SKILL.md frontmatter, validates panels with CachedOracle, supports --changed path filtering, and emits GitHub Actions annotations plus a step summary.

Offline truth: scripts/build_provenance_snapshot.py refreshes TRUTH/gwas_catalog/snapshot.json; tests cover fixture oracle, real snapshot (including flagship wrong-PMID cases), and the runner.

^{Reviewed by Cursor Bugbot for commit 8d73bfa. Bugbot is set up for automated code reviews on this repo. Configure here.}

…t test Layer 1 of the scientific-correctness CI gate (#3). Resolves every effect-size association entry's cited PMID against an injected GWAS Catalog + PubMed oracle and fails closed on variant/trait/ancestry/PMID/effect mismatch. Encodes the six audited ancestry-risk-profiler PMIDs as a deterministic contract (5 wrong, 1 correct). LiveOracle (network) + TRUTH/gwas_catalog snapshot remain as the build target; the fixture-backed unit test is green (14 passed). Mirrors validate_evidence.py. Refs #3. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…n-off path Takes the provenance gate from fixture-only to real-data: - LiveOracle (GWAS Catalog REST + PubMed E-utilities) + CachedOracle reading a committed TRUTH/gwas_catalog/snapshot.json freeze; scripts/build_provenance_snapshot.py rebuilds it. - Refined validator: when a cited PMID is not a registered catalog study for a (variant, trait), fall back to a PubMed topic check. On-topic-but-uncatalogued -> TOPIC_MATCH_LOW (human sign-off), not a hard block. This was discovered by wiring the live API: GWAS Catalog links rs73885319/kidney to PAGE/MVP, not the original Genovese 2010 APOL1 paper. - Corrected Genovese APOL1 PMID 20413513 -> 20647424 (20413513 is an unrelated diabetes trial; asserted from memory in error and now resolved against PubMed). - 21 provenance tests (14 fixture + 7 against the real snapshot): flagship 20566908 (head-and-neck cancer) blocks; correct-but-uncatalogued 20647424 warns; a real catalogued tuple passes. Full suite collects 250. Refs #3. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

cursor · 2026-06-30T16:10:54Z

+            pmid = str(study.get("publicationInfo", {}).get("pubmedId") or "")
+            for t in a.get("efoTraits", []) or []:
+                out.append({"efo_id": t.get("shortForm"), "trait": t.get("trait"),
+                            "pmid": pmid, "ancestries": [], "or_value": a.get("orPerCopyNum")})


LiveOracle skips ancestry checks

High Severity

LiveOracle.associations_for always sets ancestries to an empty list, so validate_entry never raises ANCESTRY_MISMATCH when the CLI runs with --live. Ancestry enforcement only works for CachedOracle / the committed snapshot, not for the documented production oracle path.

^{Reviewed by Cursor Bugbot for commit 8f287ec. Configure here.}

cursor · 2026-06-30T16:10:54Z

+                if not (EFFECT_RATIO_LO <= ratio <= EFFECT_RATIO_HI):
+                    return _finding(index, rsid, "EFFECT_OUT_OF_RANGE", "block", "effect.value",
+                                    f"Effect {value} deviates from catalog {cat} (ratio {ratio:.2f}, "
+                                    f"allowed {EFFECT_RATIO_LO}-{EFFECT_RATIO_HI}x)", False)


Effect check ignores measure type

Medium Severity

Effect validation always compares effect.value to GWAS or_value and never reads effect.measure. Entries with beta, HR, or RR can be wrongly blocked or pass without a meaningful catalog comparison, and non-positive values skip the check entirely.

^{Reviewed by Cursor Bugbot for commit 8f287ec. Configure here.}

cursor · 2026-06-30T16:10:54Z

+    if errs:
+        e = errs[0]
+        return f"{'/'.join(str(p) for p in e.path) or '<root>'}: {e.message}"
+    return None


Schema validation fails open

Medium Severity

If jsonschema cannot be imported, _schema_error returns None and validation continues as if the entry were schema-valid. Invalid panels may reach oracle checks or surface as generic PARSE_ERROR instead of SCHEMA_INVALID.

^{Reviewed by Cursor Bugbot for commit 8f287ec. Configure here.}

…tions HARNESS/run_provenance_gate.py: walks (changed) skills, and for any that declare the emits_effect_sizes capability in SKILL.md, requires data/provenance.json and validates every entry with the provenance gate against the committed snapshot. Emits GitHub Actions annotations + step summary; exits 1 on any blocking finding or missing panel. Skills that do not emit effect sizes are skipped. 8 tests (good passes, fabricated cite fails with PMID_STUDY_MISMATCH, missing panel errors, non-emitting skipped, --changed scoping). Consumed by the provenance-gate workflow in the skills repo. Refs #3. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

cursor

Cursor Bugbot has reviewed your changes using high effort and found 4 potential issues.

There are 7 total unresolved issues (including 3 from previous reviews).

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit 8d73bfa. Configure here.}

cursor · 2026-06-30T18:50:35Z

+        return {"passed": False, "n_entries": 0, "n_blocking": 0, "n_warnings": 0,
+                "findings": [_finding(0, None, "PARSE_ERROR", "block", "<root>",
+                                      "panel is not iterable", False)]}
+    findings = [validate_entry(e, oracle, i) for i, e in enumerate(items)]


Dict panel validates JSON keys

Medium Severity

validate_panel coerces input with list(entries) without requiring a JSON array. A single object (or wrapper object) in provenance.json iterates top-level keys as fake “entries”, producing bogus PARSE_ERROR rows instead of rejecting the panel shape.

Additional Locations (1)

HARNESS/run_provenance_gate.py#L56-L65

^{Reviewed by Cursor Bugbot for commit 8d73bfa. Configure here.}

cursor · 2026-06-30T18:50:35Z

+    oracle = oracle or VP.CachedOracle()
+    results = [gate_skill(d, oracle) for d in discover_skills(skills_root, changed)]
+    blocking = sum(1 for r in results if r["status"] in ("fail", "error"))
+    return {"results": results, "blocking": blocking, "exit_code": 1 if blocking else 0}


Changed paths skip all skills

Medium Severity

When the --changed argument is used, if the provided paths don't contain any skills/<name>/ directories (or if the parsed list is empty), discover_skills returns an empty list. This causes the gate to process no skills and exit successfully (code 0), effectively bypassing validation for any emitting skills.

^{Reviewed by Cursor Bugbot for commit 8d73bfa. Configure here.}

cursor · 2026-06-30T18:50:35Z

+    skill_md = Path(skill_dir) / "SKILL.md"
+    if not skill_md.exists():
+        return False
+    return CAPABILITY in _frontmatter(skill_md.read_text())


Capability detected by substring

Medium Severity

declares_effect_sizes treats any frontmatter substring match for emits_effect_sizes as opt-in, so names like not_emits_effect_sizes or comment text containing that token can force provenance gating on skills that did not intend to declare the capability.

^{Reviewed by Cursor Bugbot for commit 8d73bfa. Configure here.}

cursor · 2026-06-30T18:50:35Z

+    g = (group or "").lower()
+    for needle, code in ANCESTRY_MAP:
+        if needle in g:
+            return code


Non-African labels map to AFR

High Severity

_map_ancestry uses substring matching, so ancestral group strings like non-african american still contain the needle african american and are mapped to AFR. Regenerating the snapshot can attach the wrong super-population codes to catalog records.

^{Reviewed by Cursor Bugbot for commit 8d73bfa. Configure here.}

manuelcorpas mentioned this pull request Jun 30, 2026

feat(skill): add ancestry-risk-profiler ClawBio/ClawBio#297

Open

5 tasks

familygenome and others added 2 commits June 30, 2026 17:06

Fix Genovese APOL1 PMID in schema provenance note (20413513 -> 20647424)

8f287ec

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

manuelcorpas marked this pull request as ready for review June 30, 2026 16:08

cursor Bot reviewed Jun 30, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

L1: effect-size provenance gate (schema + fail-closed validator + six-PMID contract)#4

L1: effect-size provenance gate (schema + fail-closed validator + six-PMID contract)#4
manuelcorpas wants to merge 4 commits into
mainfrom
feat/provenance-gate-l1

manuelcorpas commented Jun 30, 2026 •

edited by cursor Bot

Loading

Uh oh!

cursor Bot Jun 30, 2026

Uh oh!

cursor Bot Jun 30, 2026

Uh oh!

cursor Bot Jun 30, 2026

Uh oh!

cursor Bot left a comment

Uh oh!

cursor Bot Jun 30, 2026

Uh oh!

cursor Bot Jun 30, 2026

Uh oh!

cursor Bot Jun 30, 2026

Uh oh!

cursor Bot Jun 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

manuelcorpas commented Jun 30, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Layer 1 of the scientific-correctness CI gate (#3) — now wired to live data

What's here

Proven on real GWAS Catalog/PubMed data

Design note discovered by wiring the live API

Correctness note

Verification

Remaining (separate follow-ups, tracked in #3)

Uh oh!

cursor Bot Jun 30, 2026

Choose a reason for hiding this comment

LiveOracle skips ancestry checks

Uh oh!

cursor Bot Jun 30, 2026

Choose a reason for hiding this comment

Effect check ignores measure type

Uh oh!

cursor Bot Jun 30, 2026

Choose a reason for hiding this comment

Schema validation fails open

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor Bot Jun 30, 2026

Choose a reason for hiding this comment

Dict panel validates JSON keys

Uh oh!

cursor Bot Jun 30, 2026

Choose a reason for hiding this comment

Changed paths skip all skills

Uh oh!

cursor Bot Jun 30, 2026

Choose a reason for hiding this comment

Capability detected by substring

Uh oh!

cursor Bot Jun 30, 2026

Choose a reason for hiding this comment

Non-African labels map to AFR

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

manuelcorpas commented Jun 30, 2026 •

edited by cursor Bot

Loading