Skip to content

Add CI workflow to run pytest on push/PR to main#2

Open
camlloyd wants to merge 1 commit into
ClawBio:mainfrom
camlloyd:ci/add-github-actions-workflow
Open

Add CI workflow to run pytest on push/PR to main#2
camlloyd wants to merge 1 commit into
ClawBio:mainfrom
camlloyd:ci/add-github-actions-workflow

Conversation

@camlloyd

Copy link
Copy Markdown

Summary

  • Adds .github/workflows/ci.yml: a single test job that runs the existing tests/ suite via pytest across Python 3.11 and 3.12 on push/PR to main.

Note

  • This workflow will run red until Pin jsonschema>=4.18 in requirements.txt #1 (pins jsonschema>=4.18) merges, since main's requirements.txt is currently missing jsonschema, causing 15 test failures with ModuleNotFoundError: No module named 'jsonschema'. Verified locally with act.

Test plan

Single test job across Python 3.11/3.12 running the existing tests/
suite via pip + requirements.txt.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
manuelcorpas added a commit that referenced this pull request Jun 30, 2026
Deterministically re-scores the enriched-arm code sets the model actually
submitted, holding all non-PM2 codes at their submitted calibrated strength and
varying ONLY PM2 (supporting vs moderate) within one convention. Baseline
insufficiency 17/24 (Sonnet) and 18/24 (GPT-5.2) exactly match the paper's
enriched numbers, confirming faithfulness.

RESULT: under the clean frame, PM2->moderate recovers 2 pathogenic and regresses
ZERO benign in BOTH models (17->15, 18->16). The paper's 'direction-coupled'
benign regression (17->22) does NOT reproduce. Verified mechanism: the original
re-prompted calibration arm induced the model to apply PM2 to benign variants
(27/60 -> 59/60 reps); that instruction-induced over-application, not strength
arithmetic, caused the regression. The reviewer was right: 'direction-coupled,
cannot be globally optimised' is not supported; the corrected finding is that
the residual is partly recoverable and the apparent coupling was a prompt-
fragility artefact. 3 tests; suite green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant