Skip to content

--best-model-for should not pass SynthBench display names as provider model IDs #519

@claude-dataviking

Description

@claude-dataviking

Context

After SynthBench published the live /data/leaderboard.json endpoint, SynthPanel 1.5.2 successfully fetches live recommendations. But for Technology & Digital Life, the selected live row produces this dry-run output:

synthbench: best model for globalopinionqa/Technology & Digital Life → SynthPanel (Gemini Flash Lite) · SPS 0.901 · JSD 0.313 · n=100 · cached 0h ago · source=live
synthbench: note — top entry is a product/ensemble config; using underlying base model 'SynthPanel (Gemini Flash Lite)'.
Model: SynthPanel (Gemini Flash Lite)

That string is a display label, not a runnable model ID.

Reproduction

SYNTHPANEL_SYNTHBENCH_REFRESH=1   synthpanel panel run   --personas developer   --instrument general-survey   --best-model-for 'Technology & Digital Life'   --dry-run

synthpanel --model 'SynthPanel (Gemini Flash Lite)' prompt 'Reply exactly: ok'

The prompt call fails under OpenRouter with:

Error: OpenRouter API error 400: SynthPanel (Gemini Flash Lite) is not a valid model ID

Expected behavior

--best-model-for should only set --model to a runnable provider model ID or a supported SynthPanel ensemble/config spec. If SynthBench only provides a display label, SynthPanel should refuse with an actionable message instead of producing a non-runnable model selection.

Acceptance criteria

  • Live SynthBench recommendations are translated/validated before use.
  • Product/ensemble rows are either resolved to a supported runtime config or skipped with a clear explanation.
  • --dry-run makes it obvious whether the selected recommendation is actually runnable.

Related upstream/API issue

DataViking-Tech/SynthBench#297

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions