DO-NOT-MERGE · model ratchet: default opus to 4.8 + deprecated-tier sweep + model-id allowlist guard#220
Open
Antawari wants to merge 1 commit into
Open
DO-NOT-MERGE · model ratchet: default opus to 4.8 + deprecated-tier sweep + model-id allowlist guard#220Antawari wants to merge 1 commit into
Antawari wants to merge 1 commit into
Conversation
Sweep every Claude model-id literal in src/, tests/, and the config surfaces (.env.example, README.md) up to the current model catalog: - claude-opus-4-7 -> claude-opus-4-8 (default reasoning tier in ModelsConfig, README config example, and all test literals) - bare claude-opus-4 (deprecated, retiring June 2026) -> claude-opus-4-8 - claude-sonnet-4-20250514 (deprecated dated variant) -> claude-sonnet-4-6 - bare claude-sonnet-4 -> claude-sonnet-4-6 - claude-sonnet-4-7-20260101 -> claude-sonnet-4-6: this id was never a real model (a fabricated date-suffixed string); any live call pinned to it would 404. Fixed in .env.example and the e2e runner prompt. Add tests/unit/test_model_id_allowlist_sweep.py: a sweep test that extracts every model-family token (claude-opus*/sonnet*/haiku*/fable*/ mythos*) from src/bonfire/**/*.py plus the config surfaces and asserts membership in an explicit allowlist of real current ids. Verified RED on the pre-bump tree (4 offenders) and GREEN after. Frozen decision-history documents under docs/audit/ are intentionally left untouched. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Model-id ratchet across the shipped surface, plus a permanent guard so ids can't rot silently again.
Maintainer merges — do not auto-merge.
Census — before → after
claude-opus-4-7ModelsConfig.reasoningdefault (src/bonfire/models/config.py), README config example, test literals across 12 filesclaude-opus-4-8claude-opus-4(bare; deprecated tier, retiring June 2026)test_config.py,test_dispatch_runner.py,test_persona_cli.py)claude-opus-4-8claude-sonnet-4-20250514(deprecated dated variant)src/bonfire/dispatch/pydantic_ai_backend.py), test literals (test_onboard_scanner_claude_memory.py,test_protocols.py)claude-sonnet-4-6claude-sonnet-4(bare)test_dispatch_pydantic_ai_backend.pyclaude-sonnet-4-6claude-sonnet-4-7-20260101(never a real model id).env.examplemodel-pin example, e2e runner promptclaude-sonnet-4-6claude-opus-4-6/claude-opus-4-1/claude-opus-4-20250514Already-current ids (
claude-sonnet-4-6,claude-haiku-4-5) untouched. Frozen decision-history documents underdocs/audit/intentionally left as-is — they are historical records, not live configuration.Why
claude-opus-4tier and the*-20250514dated snapshots are on the deprecation track (June 2026 retirement); after retirement every pinned call 404s at once.claude-sonnet-4-7-20260101does not exist in the model catalog and never did — there is no Sonnet 4.7, and current Sonnet ids carry no date suffix. Any user who uncommented the.env.examplemodel pin got a guaranteed 404 on first call. Archaeology (git log -S): the string entered with the v0.1.0a2 release-prep commit (.env.example) and the release-gate runner-script work (PR complete release-gate Box runner script and supporting box artifacts #59, e2e runner prompt) — a plausible-looking constructed id that survived review precisely because it looks like a valid dated variant. Both sites are illustrative examples; neither is exercised by the test suite, so nothing ever 404'd in CI — which is exactly why it survived.ModelsConfig.reasoningis the BYOK default for the reasoning tier (researcher/reviewer/synthesizer/analyst roles);claude-opus-4-8is the current Opus-tier model with the same request surface as 4.7 (no breaking API changes in the 4.7→4.8 step, so the default swap is API-safe).Guard (new test, TDD)
tests/unit/test_model_id_allowlist_sweep.py— extracts every model-family token (claude-opus*/sonnet*/haiku*/fable*/mythos*) fromsrc/bonfire/**/*.py+ the config surfaces (.env.example,README.md,pyproject.toml) and asserts membership in an explicit allowlist of real current ids (claude-fable-5,claude-opus-4-8,claude-sonnet-4-6,claude-haiku-4-5). Non-modelclaude-*tokens (package names, CLI tool names, editor dirs) are exempt via a family filter; test fixtures' deliberately fake tier placeholders (claude-sonnet,claude-opuswith no version) stay out of scope.src/bonfire/models/config.py(opus-4-7),src/bonfire/dispatch/pydantic_ai_backend.py(dated sonnet),.env.example(fabricated id),README.md(opus-4-7).Gate results (run from the branch worktree)
ruff check .— All checks passed (313 files)ruff format --check .— 313 files already formattedpytest(full suite) — 4419 passed, 79 skipped, 34 xfailed, 73 xpassed, 0 failed (100.8s)maincommit, separate clean worktree — 4417 passed, 79 skipped, 34 xfailed, 73 xpassed, 0 failed. Pre-registered prediction (branch counts minus the 2 new guard tests, all other buckets identical): hit exactly. No pre-existing failures on either side.Notes on 4.8 re-tuning (out of scope here)
This PR is an id ratchet only — no agent/system prompts were rewritten. Opus 4.8 keeps 4.7's request surface, so the swap is API-safe, but 4.8 has behavioral shifts worth a follow-up tuning pass wherever Bonfire ships prompts: it narrates more between tool calls (consider a silence-default for terse agents), asks more often on minor decisions (consider explicit small-decisions-don't-ask guidance), is more conservative reaching for subagents/custom tools (state trigger conditions in tool descriptions), and writes in a warmer, less hedged voice (re-check style prompts that countered 4.7's terseness). Review-style harnesses should keep report-everything-filter-downstream phrasing.
🤖 Generated with Claude Code