fix(synthbench): refuse display-label model recs from --best-model-for (sy-kh3)#521
Merged
Conversation
…r (sy-kh3) SynthBench product/ensemble leaderboard rows can carry a human-readable display label (e.g. "SynthPanel (Gemini Flash Lite)") in their model field rather than a runnable provider model id. --best-model-for stamped that label straight onto --model, deferring the failure to call time as an opaque provider "not a valid model ID" error (gh-519). - Add is_runnable_model_id(): structural check rejecting whitespace/paren display labels while accepting bare openai-compat/local ids. - recommend() now sets a runnable flag on Recommendation, and only adopts a config_id-derived base model when it's recognized (known provider prefix or registered alias) — never a hash fragment from a hyphenated config_id split. - _apply_best_model_for() refuses a non-runnable recommendation with an actionable message and falls back to the existing --model/default instead of stamping the label; --dry-run surfaces this upfront. Closes #519. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Collapse multi-line calls that fit the project's line length so `ruff format --check` passes in CI. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
df11df8 to
0839f8e
Compare
Deploying synthpanel with
|
| Latest commit: |
0839f8e
|
| Status: | ✅ Deploy successful! |
| Preview URL: | https://4e67da3e.synthpanel.pages.dev |
| Branch Preview URL: | https://polecat-garnet-mpiryq6k.synthpanel.pages.dev |
This was referenced May 23, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
SynthBench product/ensemble leaderboard rows can carry a human-readable
display label (e.g.
"SynthPanel (Gemini Flash Lite)") in theirmodelfield rather than a runnable provider model id.
--best-model-forstampedthat label straight onto
--model, deferring the failure to call time asan opaque provider
"not a valid model ID"error (#519).Fix
is_runnable_model_id()— new structural check rejecting whitespace / paren display labels while accepting bare openai-compat / local ids.recommend()now sets arunnableflag onRecommendation, and only adopts aconfig_id-derived base model when it's recognized (known provider prefix or registered alias) — never a hash fragment from a hyphenatedconfig_idsplit._apply_best_model_for()refuses a non-runnable recommendation with an actionable message and falls back to the existing--model/default instead of stamping the label;--dry-runsurfaces this upfront.Test plan
--best-model-for+--dry-runflow now exits clean on a display-label rec instead of producing a downstream provider error.References