feat: per-user token usage as a Grafana table (LGTM, no Langfuse) by tylerpotts · Pull Request #112 · nebari-dev/llm-serving-pack

tylerpotts · 2026-06-25T20:24:16Z

What

Attribute LLM token usage to the API-key owner who made each request, and
surface it as a Grafana table — one row per user, total tokens used in
the last 30 days — built entirely on the LGTM pack (Grafana + Mimir)
already running in the cluster. No Langfuse, no extra datastores, no secrets
in GitOps.

Scope: external (API-key) access path only. The internal JWT path is not
attributed in this version.

How it works

Operator renders apiKeyAuth.forwardClientIDHeader: x-client-id +
sanitize: true on the external SecurityPolicy, so Envoy Gateway
forwards the matched key's clientID (e.g. user-chuck-1) downstream and
strips the raw key. (Requires Envoy Gateway v1.5.1+; present in the pack's
pinned v1.6.2.)
AI Gateway controller.metricsRequestHeaderAttributes: "x-client-id:user.id"
labels the gen_ai_client_token_usage metric with user_id=<clientID>.
The proxy already carries prometheus.io/scrape, so the LGTM collector
scrapes gen_ai_* from /stats/prometheus into Mimir — no gateway-side
OTLP export.
A chart-managed Grafana dashboard ConfigMap (labeled
grafana_dashboard: "1", auto-loaded by the sidecar) runs
sum by (user_id) (increase(gen_ai_client_token_usage_sum{user_id!=""}[30d]))
in one Table panel.

Changes

operator/.../auth.go — forward clientID header + sanitize (kept; comment updated)
examples/envoy-ai-gateway.yaml — metric-label value (replaces OTLP→Langfuse env)
charts/.../dashboards/per-user-token-usage.json — dashboard JSON
charts/.../templates/token-usage-dashboard.yaml + values.yaml —
ConfigMap gated on observability.dashboard.enabled (namespace monitoring)
docs/install-production.md §14 — rewritten for the metric + dashboard path
Removed the self-hosted Langfuse ArgoCD app and all OTLP/Langfuse wiring

Pivot note

This PR originally routed per-user usage to a self-hosted Langfuse via OTLP
traces. Live-cluster testing showed that path is heavy (Postgres + ClickHouse

Redis + MinIO) and forces the OTLP auth secret into GitOps. The GenAI
token-usage metric + Grafana table delivers the same per-user view on
infrastructure already present, with nothing secret in the repo.

Out of scope (v1)

Per-model / input-output split, time-series & rate panels, alerting, cost
estimation, and internal/JWT-user attribution.

Testing

go build ./... && go test ./internal/controller/reconcilers/ — pass
helm template / helm lint — render clean; ConfigMap embeds valid
dashboard JSON; toggle off removes it
Live smoke (the load-bearing assumption): confirm
gen_ai_client_token_usage_*{user_id="..."} reaches Mimir after authed
requests, then that the Grafana table renders it. Blocked in the dev cluster
by an extProc sidecar-injection install issue (env, not a code defect).

Live-cluster testing showed the v1.6.2 SecurityPolicy CRD already has apiKeyAuth.forwardClientIDHeader/sanitize; they were added in EG v1.5.1. The earlier v1.7.0 bump was unnecessary — revert examples + dev/Makefile to the pack's existing v1.6.2 pin and fix the code/docs claims.

…rom) Live-cluster testing showed ai-gateway-helm v0.5.0 flattens extProc.extraEnvVars into a CLI arg and only honors literal value:; secretKeyRef rendered as nil and no traces exported. Switch the OTLP endpoint/headers to literals (with a security caveat about the Basic header containing the Langfuse secret key), drop the now-unused interface-Secret example, and update docs.

Companion to the prior commit: ai-gateway-helm v0.5.0 only honors literal value: in extProc.extraEnvVars (secretKeyRef -> nil), so the OTLP endpoint and Basic-auth header are literals (with a security caveat). Docs §14 updated; the chart-limitation is documented.

…ot a Langfuse span

…P-to-Langfuse

…tric)

… runbook

Live testing on tyler-hetzner-dev proved the gen_ai_* metrics are exposed on the extProc sidecar admin port (:1064 /metrics), not on envoy's :19001 /stats/prometheus that the default prometheus.io/scrape annotation targets. A dedicated collector scrape job for :1064 is required. Document the scrape_config and correct the spec; the metricsRequestHeaderAttributes label (user_id) is confirmed working.

Verified on a rebuilt cluster: an unescaped ${1} crashes the OpenTelemetry Collector (env-var expansion), so the documented scrape_config must use $${1}. End-to-end scrape :1064 -> Mimir -> per-user token query confirmed.

…gateway

tylerpotts added 16 commits June 25, 2026 15:00

feat(operator): forward API-key clientID header for per-user tracing

02f52cc

chore: require Envoy Gateway v1.7+ for apiKeyAuth client-id forwarding

2370ccd

feat: export AI Gateway GenAI traces to Langfuse with per-user user.id

3be7a2b

docs: add Langfuse OTLP interface Secret template

c4b5a1a

feat: add self-hosted Langfuse ArgoCD app with OIDC-gated UI

8a6eb85

docs: document per-user token usage in Langfuse

04c98b7

fix: provide MinIO root-user secret key for Langfuse S3

03e1ee5

docs(operator): clarify clientID forwarding feeds the usage metric, n…

9ff050e

…ot a Langfuse span

feat: label AI Gateway token-usage metric with user.id instead of OTL…

f0f6a7a

…P-to-Langfuse

chore: remove self-hosted Langfuse ArgoCD app (replaced by Grafana me…

5f124fc

…tric)

feat: add per-user token-usage Grafana dashboard JSON

672f7ce

feat: ship per-user token-usage dashboard as a Grafana ConfigMap

254c396

docs: document per-user token usage via Grafana metric, drop Langfuse…

019e19f

… runbook

tylerpotts changed the title ~~feat: per-user token usage tracking in self-hosted Langfuse~~ feat: per-user token usage as a Grafana table (LGTM, no Langfuse) Jun 26, 2026

tylerpotts added 3 commits June 26, 2026 15:40

docs: note the :1064 scrape gateway-name regex must match the shared …

f4bac66

…gateway

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: per-user token usage as a Grafana table (LGTM, no Langfuse)#112

feat: per-user token usage as a Grafana table (LGTM, no Langfuse)#112
tylerpotts wants to merge 19 commits into
mainfrom
feat/per-user-token-usage-pipeline

tylerpotts commented Jun 25, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

tylerpotts commented Jun 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What

How it works

Changes

Pivot note

Out of scope (v1)

Testing

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

tylerpotts commented Jun 25, 2026 •

edited

Loading