DO-NOT-MERGE · model ratchet: default opus to 4.8 + deprecated-tier sweep + model-id allowlist guard by Antawari · Pull Request #220 · BonfireAI/bonfire

Antawari · 2026-06-13T00:10:40Z

What

Model-id ratchet across the shipped surface, plus a permanent guard so ids can't rot silently again.

Maintainer merges — do not auto-merge.

Census — before → after

Old id	Where	New id	Lines
`claude-opus-4-7`	`ModelsConfig.reasoning` default (`src/bonfire/models/config.py`), README config example, test literals across 12 files	`claude-opus-4-8`	55
`claude-opus-4` (bare; deprecated tier, retiring June 2026)	test literals (`test_config.py`, `test_dispatch_runner.py`, `test_persona_cli.py`)	`claude-opus-4-8`	8
`claude-sonnet-4-20250514` (deprecated dated variant)	docstring example (`src/bonfire/dispatch/pydantic_ai_backend.py`), test literals (`test_onboard_scanner_claude_memory.py`, `test_protocols.py`)	`claude-sonnet-4-6`	5
`claude-sonnet-4` (bare)	`test_dispatch_pydantic_ai_backend.py`	`claude-sonnet-4-6`	1
`claude-sonnet-4-7-20260101` (never a real model id)	`.env.example` model-pin example, e2e runner prompt	`claude-sonnet-4-6`	2
`claude-opus-4-6` / `claude-opus-4-1` / `claude-opus-4-20250514`	—	—	0 found

Already-current ids (claude-sonnet-4-6, claude-haiku-4-5) untouched. Frozen decision-history documents under docs/audit/ intentionally left as-is — they are historical records, not live configuration.

Why

Deprecated tiers are retiring. The bare claude-opus-4 tier and the *-20250514 dated snapshots are on the deprecation track (June 2026 retirement); after retirement every pinned call 404s at once.
One id was never real. claude-sonnet-4-7-20260101 does not exist in the model catalog and never did — there is no Sonnet 4.7, and current Sonnet ids carry no date suffix. Any user who uncommented the .env.example model pin got a guaranteed 404 on first call. Archaeology (git log -S): the string entered with the v0.1.0a2 release-prep commit (.env.example) and the release-gate runner-script work (PR complete release-gate Box runner script and supporting box artifacts #59, e2e runner prompt) — a plausible-looking constructed id that survived review precisely because it looks like a valid dated variant. Both sites are illustrative examples; neither is exercised by the test suite, so nothing ever 404'd in CI — which is exactly why it survived.
Default quality. ModelsConfig.reasoning is the BYOK default for the reasoning tier (researcher/reviewer/synthesizer/analyst roles); claude-opus-4-8 is the current Opus-tier model with the same request surface as 4.7 (no breaking API changes in the 4.7→4.8 step, so the default swap is API-safe).

Guard (new test, TDD)

tests/unit/test_model_id_allowlist_sweep.py — extracts every model-family token (claude-opus* / sonnet* / haiku* / fable* / mythos*) from src/bonfire/**/*.py + the config surfaces (.env.example, README.md, pyproject.toml) and asserts membership in an explicit allowlist of real current ids (claude-fable-5, claude-opus-4-8, claude-sonnet-4-6, claude-haiku-4-5). Non-model claude-* tokens (package names, CLI tool names, editor dirs) are exempt via a family filter; test fixtures' deliberately fake tier placeholders (claude-sonnet, claude-opus with no version) stay out of scope.

RED proven on the pre-bump tree — 4 offenders: src/bonfire/models/config.py (opus-4-7), src/bonfire/dispatch/pydantic_ai_backend.py (dated sonnet), .env.example (fabricated id), README.md (opus-4-7).
GREEN after the bump.

Gate results (run from the branch worktree)

ruff check . — All checks passed (313 files)
ruff format --check . — 313 files already formatted
pytest (full suite) — 4419 passed, 79 skipped, 34 xfailed, 73 xpassed, 0 failed (100.8s)
Baseline on the same main commit, separate clean worktree — 4417 passed, 79 skipped, 34 xfailed, 73 xpassed, 0 failed. Pre-registered prediction (branch counts minus the 2 new guard tests, all other buckets identical): hit exactly. No pre-existing failures on either side.

Notes on 4.8 re-tuning (out of scope here)

This PR is an id ratchet only — no agent/system prompts were rewritten. Opus 4.8 keeps 4.7's request surface, so the swap is API-safe, but 4.8 has behavioral shifts worth a follow-up tuning pass wherever Bonfire ships prompts: it narrates more between tool calls (consider a silence-default for terse agents), asks more often on minor decisions (consider explicit small-decisions-don't-ask guidance), is more conservative reaching for subagents/custom tools (state trigger conditions in tool descriptions), and writes in a warmer, less hedged voice (re-check style prompts that countered 4.7's terseness). Review-style harnesses should keep report-everything-filter-downstream phrasing.

🤖 Generated with Claude Code

Sweep every Claude model-id literal in src/, tests/, and the config surfaces (.env.example, README.md) up to the current model catalog: - claude-opus-4-7 -> claude-opus-4-8 (default reasoning tier in ModelsConfig, README config example, and all test literals) - bare claude-opus-4 (deprecated, retiring June 2026) -> claude-opus-4-8 - claude-sonnet-4-20250514 (deprecated dated variant) -> claude-sonnet-4-6 - bare claude-sonnet-4 -> claude-sonnet-4-6 - claude-sonnet-4-7-20260101 -> claude-sonnet-4-6: this id was never a real model (a fabricated date-suffixed string); any live call pinned to it would 404. Fixed in .env.example and the e2e runner prompt. Add tests/unit/test_model_id_allowlist_sweep.py: a sweep test that extracts every model-family token (claude-opus*/sonnet*/haiku*/fable*/ mythos*) from src/bonfire/**/*.py plus the config surfaces and asserts membership in an explicit allowlist of real current ids. Verified RED on the pre-bump tree (4 offenders) and GREEN after. Frozen decision-history documents under docs/audit/ are intentionally left untouched. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DO-NOT-MERGE · model ratchet: default opus to 4.8 + deprecated-tier sweep + model-id allowlist guard#220

DO-NOT-MERGE · model ratchet: default opus to 4.8 + deprecated-tier sweep + model-id allowlist guard#220
Antawari wants to merge 1 commit into
mainfrom
catrina/2026-06-12/model-ratchet

Antawari commented Jun 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Antawari commented Jun 13, 2026

What

Census — before → after

Why

Guard (new test, TDD)

Gate results (run from the branch worktree)

Notes on 4.8 re-tuning (out of scope here)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant