Experiment: reduce termchart diagram-generation latency + tokens
Measuring the end-to-end cost for a coding agent to generate a termchart diagram, across multiple runners, and testing whether candidate fixes reduce it. Full plan: docs/plans/2026-06-07-latency-token-experimentation-plan.md (branch perf/reduce-latency, PR #143).
Setup (what actually ran)
- Substrate: local podman containers (one long-lived LiteLLM proxy container + one isolated container per cell).
- Measurement spine: every runner is pointed at the same LiteLLM proxy, so they share one model and tokens/latency are captured uniformly (provider-neutral) and correlated per cell by spend-log slice.
- Backend model (shared, held constant): Gemini 2.5 Flash on Vertex (
vertex_ai/gemini-2.5-flash) via ADC.
- Note: Claude on Vertex is not enabled in the project (404 in all regions; needs the Anthropic Model Garden EULA accepted in the console). Switching to Claude is a one-line
EXPERIMENT_MODEL flip once enabled.
- Conditions:
baseline = master; c1 = master + the two fixes below.
- Tasks:
terminal-er-orders, component-plan-comparison, flow-auth-callgraph (from scripts/experiments/tasks/pilot.jsonl).
- Matrix: 2 runners × 2 conditions × 3 tasks × 2 reps = 24 cells, 100% success.
Each cell: container builds termchart at the condition ref, installs the CLI, runs the runner headless (as non-root node), the agent uses termchart to render a diagram, and a RunRecord is emitted (tokens from the proxy, wall-clock latency, success gate = clean exit + ≥1 model call).
Headline finding — runner choice dominates cost
| condition |
runner |
median input tokens |
median latency |
| baseline |
claude-code |
29,510 |
15.2s |
| baseline |
opencode |
8,522 |
3.5s |
| c1 |
claude-code |
38,986 |
14.8s |
| c1 |
opencode |
8,742 |
4.4s |
OpenCode uses ~3–4× fewer input tokens than Claude Code for identical diagram tasks — smaller system prompt + far less repo exploration. The runner is a bigger lever than the termchart-side fixes tried so far.
Fix effect (c1 vs baseline)
| metric |
Δ |
significant? |
| median tokens |
−1.0% |
no (CIs overlap) |
| median latency |
+62.8% |
no (small-N noise, heavy tails) |
c1 = T1 (minify diagram example JSONs, −45% bytes) + T5 (correct stale AGENTS.md exit code). This null result is expected and honest: T1 only saves tokens when an agent actually reads a large example, and T5 is a one-character correctness fix — neither materially affects these tasks. The pilot's value is a working, reproducible measurement apparatus + baselines.
Runner status
| Runner |
Headless |
Routed via proxy |
Status |
| Claude Code |
claude -p … --permission-mode bypassPermissions (runs as non-root node) |
ANTHROPIC_BASE_URL → LiteLLM |
✅ working |
| OpenCode |
opencode run … --model openai/shared-model |
openai provider baseURL → LiteLLM |
✅ working |
| AGY (Antigravity) |
agy -p … --dangerously-skip-permissions |
— |
⛔ deferred — agy 1.0.6 exists but has no custom base-URL flag, so it can't share the proxy/model or be token-measured uniformly; also needs ANTIGRAVITY_API_KEY |
Fixes shipped (PRs into master, not merged)
Conclusions
- The experiment apparatus works end-to-end on local podman with real Vertex Gemini, across two runners.
- The fixes tried so far (T1, T5) are good hygiene but do not move tokens on these tasks.
- The dominant cost driver is the runner's own overhead (system prompt + exploration), with Claude Code ≫ OpenCode.
Next
To get a real token delta, target the always-loaded path:
- T2 — slim
AGENTS.md / SKILL.md (cut what every session loads).
- T3 — tighter routing so the agent reads exactly one detail file + one example.
- Reduce agent repo-exploration (run tasks in a neutral workspace with termchart installed, not inside the termchart source tree).
- Add a matrix-heavy task (RACI/risk) so T1's minification is actually exercised.
- Flip the shared model to Claude once Anthropic Model Garden is enabled, and scale reps for tighter CIs.
Reproduce: RUNNERS=claude-code,opencode CONDITIONS=baseline,c1 TASKS=… REPS=2 scripts/experiments/podman/run_local.sh
Experiment: reduce termchart diagram-generation latency + tokens
Measuring the end-to-end cost for a coding agent to generate a termchart diagram, across multiple runners, and testing whether candidate fixes reduce it. Full plan:
docs/plans/2026-06-07-latency-token-experimentation-plan.md(branchperf/reduce-latency, PR #143).Setup (what actually ran)
vertex_ai/gemini-2.5-flash) via ADC.EXPERIMENT_MODELflip once enabled.baseline=master;c1=master+ the two fixes below.terminal-er-orders,component-plan-comparison,flow-auth-callgraph(fromscripts/experiments/tasks/pilot.jsonl).Each cell: container builds termchart at the condition ref, installs the CLI, runs the runner headless (as non-root
node), the agent uses termchart to render a diagram, and aRunRecordis emitted (tokens from the proxy, wall-clock latency, success gate = clean exit + ≥1 model call).Headline finding — runner choice dominates cost
OpenCode uses ~3–4× fewer input tokens than Claude Code for identical diagram tasks — smaller system prompt + far less repo exploration. The runner is a bigger lever than the termchart-side fixes tried so far.
Fix effect (c1 vs baseline)
c1 = T1 (minify diagram example JSONs, −45% bytes) + T5 (correct stale
AGENTS.mdexit code). This null result is expected and honest: T1 only saves tokens when an agent actually reads a large example, and T5 is a one-character correctness fix — neither materially affects these tasks. The pilot's value is a working, reproducible measurement apparatus + baselines.Runner status
claude -p … --permission-mode bypassPermissions(runs as non-rootnode)ANTHROPIC_BASE_URL→ LiteLLMopencode run … --model openai/shared-modelbaseURL→ LiteLLMagy -p … --dangerously-skip-permissionsagy1.0.6 exists but has no custom base-URL flag, so it can't share the proxy/model or be token-measured uniformly; also needsANTIGRAVITY_API_KEYFixes shipped (PRs into master, not merged)
AGENTS.mdpush/statusexit code (4, not 3)Conclusions
Next
To get a real token delta, target the always-loaded path:
AGENTS.md/SKILL.md(cut what every session loads).Reproduce:
RUNNERS=claude-code,opencode CONDITIONS=baseline,c1 TASKS=… REPS=2 scripts/experiments/podman/run_local.sh