L1: effect-size provenance gate (schema + fail-closed validator + six-PMID contract)#4
L1: effect-size provenance gate (schema + fail-closed validator + six-PMID contract)#4manuelcorpas wants to merge 4 commits into
Conversation
…t test Layer 1 of the scientific-correctness CI gate (#3). Resolves every effect-size association entry's cited PMID against an injected GWAS Catalog + PubMed oracle and fails closed on variant/trait/ancestry/PMID/effect mismatch. Encodes the six audited ancestry-risk-profiler PMIDs as a deterministic contract (5 wrong, 1 correct). LiveOracle (network) + TRUTH/gwas_catalog snapshot remain as the build target; the fixture-backed unit test is green (14 passed). Mirrors validate_evidence.py. Refs #3. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…n-off path Takes the provenance gate from fixture-only to real-data: - LiveOracle (GWAS Catalog REST + PubMed E-utilities) + CachedOracle reading a committed TRUTH/gwas_catalog/snapshot.json freeze; scripts/build_provenance_snapshot.py rebuilds it. - Refined validator: when a cited PMID is not a registered catalog study for a (variant, trait), fall back to a PubMed topic check. On-topic-but-uncatalogued -> TOPIC_MATCH_LOW (human sign-off), not a hard block. This was discovered by wiring the live API: GWAS Catalog links rs73885319/kidney to PAGE/MVP, not the original Genovese 2010 APOL1 paper. - Corrected Genovese APOL1 PMID 20413513 -> 20647424 (20413513 is an unrelated diabetes trial; asserted from memory in error and now resolved against PubMed). - 21 provenance tests (14 fixture + 7 against the real snapshot): flagship 20566908 (head-and-neck cancer) blocks; correct-but-uncatalogued 20647424 warns; a real catalogued tuple passes. Full suite collects 250. Refs #3. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
| pmid = str(study.get("publicationInfo", {}).get("pubmedId") or "") | ||
| for t in a.get("efoTraits", []) or []: | ||
| out.append({"efo_id": t.get("shortForm"), "trait": t.get("trait"), | ||
| "pmid": pmid, "ancestries": [], "or_value": a.get("orPerCopyNum")}) |
There was a problem hiding this comment.
LiveOracle skips ancestry checks
High Severity
LiveOracle.associations_for always sets ancestries to an empty list, so validate_entry never raises ANCESTRY_MISMATCH when the CLI runs with --live. Ancestry enforcement only works for CachedOracle / the committed snapshot, not for the documented production oracle path.
Reviewed by Cursor Bugbot for commit 8f287ec. Configure here.
| if not (EFFECT_RATIO_LO <= ratio <= EFFECT_RATIO_HI): | ||
| return _finding(index, rsid, "EFFECT_OUT_OF_RANGE", "block", "effect.value", | ||
| f"Effect {value} deviates from catalog {cat} (ratio {ratio:.2f}, " | ||
| f"allowed {EFFECT_RATIO_LO}-{EFFECT_RATIO_HI}x)", False) |
There was a problem hiding this comment.
Effect check ignores measure type
Medium Severity
Effect validation always compares effect.value to GWAS or_value and never reads effect.measure. Entries with beta, HR, or RR can be wrongly blocked or pass without a meaningful catalog comparison, and non-positive values skip the check entirely.
Reviewed by Cursor Bugbot for commit 8f287ec. Configure here.
| if errs: | ||
| e = errs[0] | ||
| return f"{'/'.join(str(p) for p in e.path) or '<root>'}: {e.message}" | ||
| return None |
There was a problem hiding this comment.
Schema validation fails open
Medium Severity
If jsonschema cannot be imported, _schema_error returns None and validation continues as if the entry were schema-valid. Invalid panels may reach oracle checks or surface as generic PARSE_ERROR instead of SCHEMA_INVALID.
Reviewed by Cursor Bugbot for commit 8f287ec. Configure here.
…tions HARNESS/run_provenance_gate.py: walks (changed) skills, and for any that declare the emits_effect_sizes capability in SKILL.md, requires data/provenance.json and validates every entry with the provenance gate against the committed snapshot. Emits GitHub Actions annotations + step summary; exits 1 on any blocking finding or missing panel. Skills that do not emit effect sizes are skipped. 8 tests (good passes, fabricated cite fails with PMID_STUDY_MISMATCH, missing panel errors, non-emitting skipped, --changed scoping). Consumed by the provenance-gate workflow in the skills repo. Refs #3. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes using high effort and found 4 potential issues.
There are 7 total unresolved issues (including 3 from previous reviews).
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 8d73bfa. Configure here.
| return {"passed": False, "n_entries": 0, "n_blocking": 0, "n_warnings": 0, | ||
| "findings": [_finding(0, None, "PARSE_ERROR", "block", "<root>", | ||
| "panel is not iterable", False)]} | ||
| findings = [validate_entry(e, oracle, i) for i, e in enumerate(items)] |
There was a problem hiding this comment.
Dict panel validates JSON keys
Medium Severity
validate_panel coerces input with list(entries) without requiring a JSON array. A single object (or wrapper object) in provenance.json iterates top-level keys as fake “entries”, producing bogus PARSE_ERROR rows instead of rejecting the panel shape.
Additional Locations (1)
Reviewed by Cursor Bugbot for commit 8d73bfa. Configure here.
| oracle = oracle or VP.CachedOracle() | ||
| results = [gate_skill(d, oracle) for d in discover_skills(skills_root, changed)] | ||
| blocking = sum(1 for r in results if r["status"] in ("fail", "error")) | ||
| return {"results": results, "blocking": blocking, "exit_code": 1 if blocking else 0} |
There was a problem hiding this comment.
Changed paths skip all skills
Medium Severity
When the --changed argument is used, if the provided paths don't contain any skills/<name>/ directories (or if the parsed list is empty), discover_skills returns an empty list. This causes the gate to process no skills and exit successfully (code 0), effectively bypassing validation for any emitting skills.
Reviewed by Cursor Bugbot for commit 8d73bfa. Configure here.
| skill_md = Path(skill_dir) / "SKILL.md" | ||
| if not skill_md.exists(): | ||
| return False | ||
| return CAPABILITY in _frontmatter(skill_md.read_text()) |
There was a problem hiding this comment.
Capability detected by substring
Medium Severity
declares_effect_sizes treats any frontmatter substring match for emits_effect_sizes as opt-in, so names like not_emits_effect_sizes or comment text containing that token can force provenance gating on skills that did not intend to declare the capability.
Reviewed by Cursor Bugbot for commit 8d73bfa. Configure here.
| g = (group or "").lower() | ||
| for needle, code in ANCESTRY_MAP: | ||
| if needle in g: | ||
| return code |
There was a problem hiding this comment.
Non-African labels map to AFR
High Severity
_map_ancestry uses substring matching, so ancestral group strings like non-african american still contain the needle african american and are mapped to AFR. Regenerating the snapshot can attach the wrong super-population codes to catalog records.
Reviewed by Cursor Bugbot for commit 8d73bfa. Configure here.


Layer 1 of the scientific-correctness CI gate (#3) — now wired to live data
The deterministic, permanent fix for the fabricated/mismatched-citation failure class that has shipped past the keyword scanner three review rounds running (
ancestry-risk-profiler#297). A citation is a falsifiable claim, not a label; this resolves every effect-size entry's PMID against GWAS Catalog + PubMed and fails closed when the cited paper does not support the (variant, trait, ancestry, effect) claim.What's here
SCHEMAS/effect_size_provenance_schema.json— the contract every effect-size/ancestry-risk skill must satisfy:{variant.rsid, trait.{label,efo_id}, ancestry, effect, source.pmid}. Modelled onacmg_evidence_schema.json(Draft 2020-12,x-provenance).HARNESS/validate_provenance.py— fail-closed validator mirroringvalidate_evidence.py: machine-readableERROR_CODES, exception-guarded, injectedOracle. ShipsCachedOracle(offline, reads the committed snapshot) andLiveOracle(GWAS Catalog REST + PubMed E-utilities).scripts/build_provenance_snapshot.py+TRUTH/gwas_catalog/snapshot.json— live-built, committed freeze (413 records, 5 variants) so CI runs deterministically offline. Regenerate with the script.tests/— 21 tests: 14 deterministic fixture + 7 against the real snapshot.Proven on real GWAS Catalog/PubMed data
20566908cited as Genovese/APOL1/kidneyPMID_STUDY_MISMATCH(block)20647424(the real Genovese 2010) for APOL1/kidneyTOPIC_MATCH_LOW(sign-off, not false-rejected)The five wrong
ancestry-risk-profilerPMIDs (20566908,22158537,27005778,23945395,17478679) all block; the correct16415884passes. Other codes covered:PMID_UNRESOLVABLE,ANCESTRY_MISMATCH,EFFECT_OUT_OF_RANGE,ASSOC_NOT_FOUND,SCHEMA_INVALID.Design note discovered by wiring the live API
GWAS Catalog does not index every primary paper. So a cited PMID that is not a registered catalog study is not auto-blocked: the gate falls back to a PubMed topic check and flags on-topic-but-uncatalogued papers for human sign-off (
TOPIC_MATCH_LOW), reserving hard blocks for genuinely off-topic citations. This keeps correct citations from being false-rejected.Correctness note
While building this I asserted the real Genovese PMID as
20413513from memory; that is wrong (it is an unrelated diabetes trial). Corrected to20647424after resolving against PubMed, and the #297 review was corrected too — the exact "typed a PMID from memory" failure this gate exists to prevent.Verification
pytest tests/test_validate_provenance.py tests/test_provenance_snapshot.py→ 21 passed. Full suite collects 250; neighbouringtest_evidence_schema/test_acmg_pointsgreen. No import-time I/O.Remaining (separate follow-ups, tracked in #3)
ancestry-risk-profiler's panel to the schema and run the gate on it in its own reposkills/*PRs declaringemits_effect_sizesRefs #3.
🤖 Generated with Claude Code
Note
Medium Risk
Gate logic affects which skill PRs pass CI for health-related effect-size claims; incorrect rules or stale snapshot could block or warn wrongly, though behavior is heavily tested and advisory for catalog gaps.
Overview
Adds Layer 1 of the scientific-correctness CI gate: skills that declare
emits_effect_sizesmust shipdata/provenance.jsonentries that match GWAS Catalog + PubMed, or the check fails closed.Contract and validation: New
SCHEMAS/effect_size_provenance_schema.jsondefines each association (variant, trait/EFO, ancestry, effect, PMID).HARNESS/validate_provenance.pyvalidates entries via an injectable oracle with machine-readable codes (mismatch, ancestry, effect range, unresolvable PMID, etc.). When the catalog lacks the cited study, a PubMed topic overlap yields advisoryTOPIC_MATCH_LOWinstead of a hard block so correct primary papers are not auto-rejected.CI runner:
HARNESS/run_provenance_gate.pydiscovers skills fromSKILL.mdfrontmatter, validates panels withCachedOracle, supports--changedpath filtering, and emits GitHub Actions annotations plus a step summary.Offline truth:
scripts/build_provenance_snapshot.pyrefreshesTRUTH/gwas_catalog/snapshot.json; tests cover fixture oracle, real snapshot (including flagship wrong-PMID cases), and the runner.Reviewed by Cursor Bugbot for commit 8d73bfa. Bugbot is set up for automated code reviews on this repo. Configure here.