Skip to content

DO-NOT-MERGE · model ratchet: default opus to 4.8 + deprecated-tier sweep + model-id allowlist guard#220

Open
Antawari wants to merge 1 commit into
mainfrom
catrina/2026-06-12/model-ratchet
Open

DO-NOT-MERGE · model ratchet: default opus to 4.8 + deprecated-tier sweep + model-id allowlist guard#220
Antawari wants to merge 1 commit into
mainfrom
catrina/2026-06-12/model-ratchet

Conversation

@Antawari

Copy link
Copy Markdown
Contributor

What

Model-id ratchet across the shipped surface, plus a permanent guard so ids can't rot silently again.

Maintainer merges — do not auto-merge.

Census — before → after

Old id Where New id Lines
claude-opus-4-7 ModelsConfig.reasoning default (src/bonfire/models/config.py), README config example, test literals across 12 files claude-opus-4-8 55
claude-opus-4 (bare; deprecated tier, retiring June 2026) test literals (test_config.py, test_dispatch_runner.py, test_persona_cli.py) claude-opus-4-8 8
claude-sonnet-4-20250514 (deprecated dated variant) docstring example (src/bonfire/dispatch/pydantic_ai_backend.py), test literals (test_onboard_scanner_claude_memory.py, test_protocols.py) claude-sonnet-4-6 5
claude-sonnet-4 (bare) test_dispatch_pydantic_ai_backend.py claude-sonnet-4-6 1
claude-sonnet-4-7-20260101 (never a real model id) .env.example model-pin example, e2e runner prompt claude-sonnet-4-6 2
claude-opus-4-6 / claude-opus-4-1 / claude-opus-4-20250514 0 found

Already-current ids (claude-sonnet-4-6, claude-haiku-4-5) untouched. Frozen decision-history documents under docs/audit/ intentionally left as-is — they are historical records, not live configuration.

Why

  • Deprecated tiers are retiring. The bare claude-opus-4 tier and the *-20250514 dated snapshots are on the deprecation track (June 2026 retirement); after retirement every pinned call 404s at once.
  • One id was never real. claude-sonnet-4-7-20260101 does not exist in the model catalog and never did — there is no Sonnet 4.7, and current Sonnet ids carry no date suffix. Any user who uncommented the .env.example model pin got a guaranteed 404 on first call. Archaeology (git log -S): the string entered with the v0.1.0a2 release-prep commit (.env.example) and the release-gate runner-script work (PR complete release-gate Box runner script and supporting box artifacts #59, e2e runner prompt) — a plausible-looking constructed id that survived review precisely because it looks like a valid dated variant. Both sites are illustrative examples; neither is exercised by the test suite, so nothing ever 404'd in CI — which is exactly why it survived.
  • Default quality. ModelsConfig.reasoning is the BYOK default for the reasoning tier (researcher/reviewer/synthesizer/analyst roles); claude-opus-4-8 is the current Opus-tier model with the same request surface as 4.7 (no breaking API changes in the 4.7→4.8 step, so the default swap is API-safe).

Guard (new test, TDD)

tests/unit/test_model_id_allowlist_sweep.py — extracts every model-family token (claude-opus* / sonnet* / haiku* / fable* / mythos*) from src/bonfire/**/*.py + the config surfaces (.env.example, README.md, pyproject.toml) and asserts membership in an explicit allowlist of real current ids (claude-fable-5, claude-opus-4-8, claude-sonnet-4-6, claude-haiku-4-5). Non-model claude-* tokens (package names, CLI tool names, editor dirs) are exempt via a family filter; test fixtures' deliberately fake tier placeholders (claude-sonnet, claude-opus with no version) stay out of scope.

  • RED proven on the pre-bump tree — 4 offenders: src/bonfire/models/config.py (opus-4-7), src/bonfire/dispatch/pydantic_ai_backend.py (dated sonnet), .env.example (fabricated id), README.md (opus-4-7).
  • GREEN after the bump.

Gate results (run from the branch worktree)

  • ruff check .All checks passed (313 files)
  • ruff format --check .313 files already formatted
  • pytest (full suite) — 4419 passed, 79 skipped, 34 xfailed, 73 xpassed, 0 failed (100.8s)
  • Baseline on the same main commit, separate clean worktree — 4417 passed, 79 skipped, 34 xfailed, 73 xpassed, 0 failed. Pre-registered prediction (branch counts minus the 2 new guard tests, all other buckets identical): hit exactly. No pre-existing failures on either side.

Notes on 4.8 re-tuning (out of scope here)

This PR is an id ratchet only — no agent/system prompts were rewritten. Opus 4.8 keeps 4.7's request surface, so the swap is API-safe, but 4.8 has behavioral shifts worth a follow-up tuning pass wherever Bonfire ships prompts: it narrates more between tool calls (consider a silence-default for terse agents), asks more often on minor decisions (consider explicit small-decisions-don't-ask guidance), is more conservative reaching for subagents/custom tools (state trigger conditions in tool descriptions), and writes in a warmer, less hedged voice (re-check style prompts that countered 4.7's terseness). Review-style harnesses should keep report-everything-filter-downstream phrasing.


🤖 Generated with Claude Code

Sweep every Claude model-id literal in src/, tests/, and the config
surfaces (.env.example, README.md) up to the current model catalog:

- claude-opus-4-7 -> claude-opus-4-8 (default reasoning tier in
  ModelsConfig, README config example, and all test literals)
- bare claude-opus-4 (deprecated, retiring June 2026) -> claude-opus-4-8
- claude-sonnet-4-20250514 (deprecated dated variant) -> claude-sonnet-4-6
- bare claude-sonnet-4 -> claude-sonnet-4-6
- claude-sonnet-4-7-20260101 -> claude-sonnet-4-6: this id was never a
  real model (a fabricated date-suffixed string); any live call pinned
  to it would 404. Fixed in .env.example and the e2e runner prompt.

Add tests/unit/test_model_id_allowlist_sweep.py: a sweep test that
extracts every model-family token (claude-opus*/sonnet*/haiku*/fable*/
mythos*) from src/bonfire/**/*.py plus the config surfaces and asserts
membership in an explicit allowlist of real current ids. Verified RED
on the pre-bump tree (4 offenders) and GREEN after.

Frozen decision-history documents under docs/audit/ are intentionally
left untouched.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant