Skip to content

Add AANA OpenRouter GlobalOpinionQA result#239

Open
mindbomber wants to merge 1 commit into
DataViking-Tech:mainfrom
mindbomber:codex/aana-synthbench-globalopinionqa
Open

Add AANA OpenRouter GlobalOpinionQA result#239
mindbomber wants to merge 1 commit into
DataViking-Tech:mainfrom
mindbomber:codex/aana-synthbench-globalopinionqa

Conversation

@mindbomber
Copy link
Copy Markdown

Summary

  • Adds an AANA-constrained OpenRouter provider adapter (�ana-openrouter).
  • Submits a validated GlobalOpinionQA leaderboard result for �ana/openrouter/openai/gpt-oss-20b.
  • Keeps the result artifact in leaderboard-results/ for the standard SynthBench PR validation path.

Benchmark run

  • Dataset: GlobalOpinionQA
  • Questions: 100
  • Samples per question: 3
  • Calls: 300
  • Provider: �ana/openrouter/openai/gpt-oss-20b`n- Run date: 2026-05-07

Scores

  • SPS: 0.723663
  • P_dist: 0.611510
  • P_rank: 0.579745
  • P_refuse: 0.979733
  • Mean JSD: 0.388490
  • Median JSD: 0.330163
  • Mean Kendall tau: 0.159490

AANA scope

This is an AANA response-contract wrapper around OpenRouter gpt-oss-20b, not a new trained base model. The wrapper constrains the model to preserve the survey/persona constraints, choose exactly one listed option, and avoid free-form output. SynthBench ground-truth distributions are used only for scoring.

Validation

  • synthbench validate leaderboard-results/globalopinionqa_aana_openrouter_openai_gpt-oss-20b_20260507_044954.jsonn- python -m py_compile src/synthbench/providers/aana_openrouter.py src/synthbench/providers/__init__.pyn- python -m pytest tests/test_providers.py`n

Note

Tier-3 strict validation flags that raw responses are mostly one-letter outputs. That is expected for this provider because the AANA gate contract intentionally requires a single listed option letter; the standard PR validator passes with no issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant