Skip to content

L1: effect-size provenance gate (schema + fail-closed validator + six-PMID contract)#4

Open
manuelcorpas wants to merge 4 commits into
mainfrom
feat/provenance-gate-l1
Open

L1: effect-size provenance gate (schema + fail-closed validator + six-PMID contract)#4
manuelcorpas wants to merge 4 commits into
mainfrom
feat/provenance-gate-l1

Conversation

@manuelcorpas

@manuelcorpas manuelcorpas commented Jun 30, 2026

Copy link
Copy Markdown
Contributor

Layer 1 of the scientific-correctness CI gate (#3) — now wired to live data

The deterministic, permanent fix for the fabricated/mismatched-citation failure class that has shipped past the keyword scanner three review rounds running (ancestry-risk-profiler #297). A citation is a falsifiable claim, not a label; this resolves every effect-size entry's PMID against GWAS Catalog + PubMed and fails closed when the cited paper does not support the (variant, trait, ancestry, effect) claim.

What's here

  • SCHEMAS/effect_size_provenance_schema.json — the contract every effect-size/ancestry-risk skill must satisfy: {variant.rsid, trait.{label,efo_id}, ancestry, effect, source.pmid}. Modelled on acmg_evidence_schema.json (Draft 2020-12, x-provenance).
  • HARNESS/validate_provenance.py — fail-closed validator mirroring validate_evidence.py: machine-readable ERROR_CODES, exception-guarded, injected Oracle. Ships CachedOracle (offline, reads the committed snapshot) and LiveOracle (GWAS Catalog REST + PubMed E-utilities).
  • scripts/build_provenance_snapshot.py + TRUTH/gwas_catalog/snapshot.json — live-built, committed freeze (413 records, 5 variants) so CI runs deterministically offline. Regenerate with the script.
  • tests/ — 21 tests: 14 deterministic fixture + 7 against the real snapshot.

Proven on real GWAS Catalog/PubMed data

Citation Reality Gate verdict
20566908 cited as Genovese/APOL1/kidney head-and-neck-cancer survey 🔴 PMID_STUDY_MISMATCH (block)
20647424 (the real Genovese 2010) for APOL1/kidney correct, but GWAS Catalog indexes the variant via PAGE/MVP 🟡 TOPIC_MATCH_LOW (sign-off, not false-rejected)
a genuinely catalogued (variant,trait,ancestry,pmid,OR) tuple matches ✅ pass

The five wrong ancestry-risk-profiler PMIDs (20566908, 22158537, 27005778, 23945395, 17478679) all block; the correct 16415884 passes. Other codes covered: PMID_UNRESOLVABLE, ANCESTRY_MISMATCH, EFFECT_OUT_OF_RANGE, ASSOC_NOT_FOUND, SCHEMA_INVALID.

Design note discovered by wiring the live API

GWAS Catalog does not index every primary paper. So a cited PMID that is not a registered catalog study is not auto-blocked: the gate falls back to a PubMed topic check and flags on-topic-but-uncatalogued papers for human sign-off (TOPIC_MATCH_LOW), reserving hard blocks for genuinely off-topic citations. This keeps correct citations from being false-rejected.

Correctness note

While building this I asserted the real Genovese PMID as 20413513 from memory; that is wrong (it is an unrelated diabetes trial). Corrected to 20647424 after resolving against PubMed, and the #297 review was corrected too — the exact "typed a PMID from memory" failure this gate exists to prevent.

Verification

pytest tests/test_validate_provenance.py tests/test_provenance_snapshot.py → 21 passed. Full suite collects 250; neighbouring test_evidence_schema/test_acmg_points green. No import-time I/O.

Remaining (separate follow-ups, tracked in #3)

  • Backfill ancestry-risk-profiler's panel to the schema and run the gate on it in its own repo
  • GitHub Action that blocks skills/* PRs declaring emits_effect_sizes
  • Scale the snapshot panel beyond the audited variants

Refs #3.

🤖 Generated with Claude Code


Note

Medium Risk
Gate logic affects which skill PRs pass CI for health-related effect-size claims; incorrect rules or stale snapshot could block or warn wrongly, though behavior is heavily tested and advisory for catalog gaps.

Overview
Adds Layer 1 of the scientific-correctness CI gate: skills that declare emits_effect_sizes must ship data/provenance.json entries that match GWAS Catalog + PubMed, or the check fails closed.

Contract and validation: New SCHEMAS/effect_size_provenance_schema.json defines each association (variant, trait/EFO, ancestry, effect, PMID). HARNESS/validate_provenance.py validates entries via an injectable oracle with machine-readable codes (mismatch, ancestry, effect range, unresolvable PMID, etc.). When the catalog lacks the cited study, a PubMed topic overlap yields advisory TOPIC_MATCH_LOW instead of a hard block so correct primary papers are not auto-rejected.

CI runner: HARNESS/run_provenance_gate.py discovers skills from SKILL.md frontmatter, validates panels with CachedOracle, supports --changed path filtering, and emits GitHub Actions annotations plus a step summary.

Offline truth: scripts/build_provenance_snapshot.py refreshes TRUTH/gwas_catalog/snapshot.json; tests cover fixture oracle, real snapshot (including flagship wrong-PMID cases), and the runner.

Reviewed by Cursor Bugbot for commit 8d73bfa. Bugbot is set up for automated code reviews on this repo. Configure here.

…t test

Layer 1 of the scientific-correctness CI gate (#3). Resolves every effect-size
association entry's cited PMID against an injected GWAS Catalog + PubMed oracle and
fails closed on variant/trait/ancestry/PMID/effect mismatch. Encodes the six audited
ancestry-risk-profiler PMIDs as a deterministic contract (5 wrong, 1 correct).

LiveOracle (network) + TRUTH/gwas_catalog snapshot remain as the build target; the
fixture-backed unit test is green (14 passed). Mirrors validate_evidence.py.

Refs #3.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
familygenome and others added 2 commits June 30, 2026 17:06
…n-off path

Takes the provenance gate from fixture-only to real-data:
- LiveOracle (GWAS Catalog REST + PubMed E-utilities) + CachedOracle reading a committed
  TRUTH/gwas_catalog/snapshot.json freeze; scripts/build_provenance_snapshot.py rebuilds it.
- Refined validator: when a cited PMID is not a registered catalog study for a (variant,
  trait), fall back to a PubMed topic check. On-topic-but-uncatalogued -> TOPIC_MATCH_LOW
  (human sign-off), not a hard block. This was discovered by wiring the live API: GWAS
  Catalog links rs73885319/kidney to PAGE/MVP, not the original Genovese 2010 APOL1 paper.
- Corrected Genovese APOL1 PMID 20413513 -> 20647424 (20413513 is an unrelated diabetes
  trial; asserted from memory in error and now resolved against PubMed).
- 21 provenance tests (14 fixture + 7 against the real snapshot): flagship 20566908
  (head-and-neck cancer) blocks; correct-but-uncatalogued 20647424 warns; a real catalogued
  tuple passes. Full suite collects 250.

Refs #3.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@manuelcorpas manuelcorpas marked this pull request as ready for review June 30, 2026 16:08
pmid = str(study.get("publicationInfo", {}).get("pubmedId") or "")
for t in a.get("efoTraits", []) or []:
out.append({"efo_id": t.get("shortForm"), "trait": t.get("trait"),
"pmid": pmid, "ancestries": [], "or_value": a.get("orPerCopyNum")})

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LiveOracle skips ancestry checks

High Severity

LiveOracle.associations_for always sets ancestries to an empty list, so validate_entry never raises ANCESTRY_MISMATCH when the CLI runs with --live. Ancestry enforcement only works for CachedOracle / the committed snapshot, not for the documented production oracle path.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 8f287ec. Configure here.

if not (EFFECT_RATIO_LO <= ratio <= EFFECT_RATIO_HI):
return _finding(index, rsid, "EFFECT_OUT_OF_RANGE", "block", "effect.value",
f"Effect {value} deviates from catalog {cat} (ratio {ratio:.2f}, "
f"allowed {EFFECT_RATIO_LO}-{EFFECT_RATIO_HI}x)", False)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Effect check ignores measure type

Medium Severity

Effect validation always compares effect.value to GWAS or_value and never reads effect.measure. Entries with beta, HR, or RR can be wrongly blocked or pass without a meaningful catalog comparison, and non-positive values skip the check entirely.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 8f287ec. Configure here.

if errs:
e = errs[0]
return f"{'/'.join(str(p) for p in e.path) or '<root>'}: {e.message}"
return None

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Schema validation fails open

Medium Severity

If jsonschema cannot be imported, _schema_error returns None and validation continues as if the entry were schema-valid. Invalid panels may reach oracle checks or surface as generic PARSE_ERROR instead of SCHEMA_INVALID.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 8f287ec. Configure here.

…tions

HARNESS/run_provenance_gate.py: walks (changed) skills, and for any that declare the
emits_effect_sizes capability in SKILL.md, requires data/provenance.json and validates every
entry with the provenance gate against the committed snapshot. Emits GitHub Actions
annotations + step summary; exits 1 on any blocking finding or missing panel. Skills that do
not emit effect sizes are skipped. 8 tests (good passes, fabricated cite fails with
PMID_STUDY_MISMATCH, missing panel errors, non-emitting skipped, --changed scoping). Consumed
by the provenance-gate workflow in the skills repo.

Refs #3.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes using high effort and found 4 potential issues.

There are 7 total unresolved issues (including 3 from previous reviews).

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 8d73bfa. Configure here.

return {"passed": False, "n_entries": 0, "n_blocking": 0, "n_warnings": 0,
"findings": [_finding(0, None, "PARSE_ERROR", "block", "<root>",
"panel is not iterable", False)]}
findings = [validate_entry(e, oracle, i) for i, e in enumerate(items)]

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dict panel validates JSON keys

Medium Severity

validate_panel coerces input with list(entries) without requiring a JSON array. A single object (or wrapper object) in provenance.json iterates top-level keys as fake “entries”, producing bogus PARSE_ERROR rows instead of rejecting the panel shape.

Additional Locations (1)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 8d73bfa. Configure here.

oracle = oracle or VP.CachedOracle()
results = [gate_skill(d, oracle) for d in discover_skills(skills_root, changed)]
blocking = sum(1 for r in results if r["status"] in ("fail", "error"))
return {"results": results, "blocking": blocking, "exit_code": 1 if blocking else 0}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed paths skip all skills

Medium Severity

When the --changed argument is used, if the provided paths don't contain any skills/<name>/ directories (or if the parsed list is empty), discover_skills returns an empty list. This causes the gate to process no skills and exit successfully (code 0), effectively bypassing validation for any emitting skills.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 8d73bfa. Configure here.

skill_md = Path(skill_dir) / "SKILL.md"
if not skill_md.exists():
return False
return CAPABILITY in _frontmatter(skill_md.read_text())

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Capability detected by substring

Medium Severity

declares_effect_sizes treats any frontmatter substring match for emits_effect_sizes as opt-in, so names like not_emits_effect_sizes or comment text containing that token can force provenance gating on skills that did not intend to declare the capability.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 8d73bfa. Configure here.

g = (group or "").lower()
for needle, code in ANCESTRY_MAP:
if needle in g:
return code

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Non-African labels map to AFR

High Severity

_map_ancestry uses substring matching, so ancestral group strings like non-african american still contain the needle african american and are mapped to AFR. Regenerating the snapshot can attach the wrong super-population codes to catalog records.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 8d73bfa. Configure here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants