Skip to content

fix(synthbench): refuse display-label model recs from --best-model-for (sy-kh3)#521

Merged
openclaw-dv merged 2 commits into
mainfrom
polecat/garnet-mpiryq6k
May 23, 2026
Merged

fix(synthbench): refuse display-label model recs from --best-model-for (sy-kh3)#521
openclaw-dv merged 2 commits into
mainfrom
polecat/garnet-mpiryq6k

Conversation

@openclaw-dv
Copy link
Copy Markdown
Collaborator

Summary

SynthBench product/ensemble leaderboard rows can carry a human-readable
display label (e.g. "SynthPanel (Gemini Flash Lite)") in their model
field rather than a runnable provider model id. --best-model-for stamped
that label straight onto --model, deferring the failure to call time as
an opaque provider "not a valid model ID" error (#519).

Fix

  • is_runnable_model_id() — new structural check rejecting whitespace / paren display labels while accepting bare openai-compat / local ids.
  • recommend() now sets a runnable flag on Recommendation, and only adopts a config_id-derived base model when it's recognized (known provider prefix or registered alias) — never a hash fragment from a hyphenated config_id split.
  • _apply_best_model_for() refuses a non-runnable recommendation with an actionable message and falls back to the existing --model/default instead of stamping the label; --dry-run surfaces this upfront.

Test plan

  • New unit tests pin the runnable/non-runnable discriminator.
  • --best-model-for + --dry-run flow now exits clean on a display-label rec instead of producing a downstream provider error.
  • GitHub CI runs the full suite on this PR.

References

openclaw-dv and others added 2 commits May 23, 2026 15:24
…r (sy-kh3)

SynthBench product/ensemble leaderboard rows can carry a human-readable
display label (e.g. "SynthPanel (Gemini Flash Lite)") in their model
field rather than a runnable provider model id. --best-model-for stamped
that label straight onto --model, deferring the failure to call time as
an opaque provider "not a valid model ID" error (gh-519).

- Add is_runnable_model_id(): structural check rejecting whitespace/paren
  display labels while accepting bare openai-compat/local ids.
- recommend() now sets a runnable flag on Recommendation, and only adopts
  a config_id-derived base model when it's recognized (known provider
  prefix or registered alias) — never a hash fragment from a hyphenated
  config_id split.
- _apply_best_model_for() refuses a non-runnable recommendation with an
  actionable message and falls back to the existing --model/default
  instead of stamping the label; --dry-run surfaces this upfront.

Closes #519.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Collapse multi-line calls that fit the project's line length so
`ruff format --check` passes in CI.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@openclaw-dv openclaw-dv force-pushed the polecat/garnet-mpiryq6k branch from df11df8 to 0839f8e Compare May 23, 2026 20:25
@cloudflare-workers-and-pages
Copy link
Copy Markdown

cloudflare-workers-and-pages Bot commented May 23, 2026

Deploying synthpanel with  Cloudflare Pages  Cloudflare Pages

Latest commit: 0839f8e
Status: ✅  Deploy successful!
Preview URL: https://4e67da3e.synthpanel.pages.dev
Branch Preview URL: https://polecat-garnet-mpiryq6k.synthpanel.pages.dev

View logs

@openclaw-dv openclaw-dv added the semver:patch Bump patch version on merge label May 23, 2026
@openclaw-dv openclaw-dv merged commit 33b38d0 into main May 23, 2026
19 checks passed
@openclaw-dv openclaw-dv deleted the polecat/garnet-mpiryq6k branch May 23, 2026 20:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

semver:patch Bump patch version on merge

Projects

None yet

Development

Successfully merging this pull request may close these issues.

--best-model-for should not pass SynthBench display names as provider model IDs

1 participant