Skip to content

evals: th config fetch — pin prod env + org, force JWT auth#121

Open
brentrager wants to merge 2 commits into
mainfrom
evals-th-config-runner
Open

evals: th config fetch — pin prod env + org, force JWT auth#121
brentrager wants to merge 2 commits into
mainfrom
evals-th-config-runner

Conversation

@brentrager

Copy link
Copy Markdown
Contributor

Fix th config fetch in the evals runner: pin the production environment + org and force JWT auth.

🤖 Generated with Claude Code

brentrager and others added 2 commits June 20, 2026 21:57
scripts/run-evals.sh fetches the LiteLLM gateway virtual key from @smooai/config
via `th config get liteLLMVirtualKeyAiServer` (env-overridable) and runs the
LLM-as-judge eval suite against the live gateway. Replaces reading the key out of
opencode auth.json — @smooai/config is the single source of truth, fetched at run
time and never printed. Validates the key is a sk- virtual key (the gateway 401s
on non-virtual keys) and fails with a clear message otherwise.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Three subtle env-inheritance bugs the wrapper hit:
- Read the ambient SMOOAI_CONFIG_ENV (your local working env, usually
  `development`, where the key is the literal placeholder "unset") instead of
  production where the real key lives → now a dedicated SMOOAI_EVAL_GATEWAY_ENV
  defaulting to production.
- Ambient @smooai/config M2M env vars (API_KEY/CLIENT_*/API_URL) override
  --org-id and resolve the wrong org → now unset for the fetch, forcing the th
  user JWT + explicit --org-id (cwd-independent).
- Require SMOOAI_CONFIG_ORG_ID explicitly (the infra-secrets org) with a clear
  error rather than silently resolving a default org.

Verified end-to-end: fetches the prod virtual key via th config and runs the
llm_judge suite green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@brentrager brentrager enabled auto-merge (squash) June 30, 2026 21:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant