feat: per-user token usage as a Grafana table (LGTM, no Langfuse)#112
Open
tylerpotts wants to merge 19 commits into
Open
feat: per-user token usage as a Grafana table (LGTM, no Langfuse)#112tylerpotts wants to merge 19 commits into
tylerpotts wants to merge 19 commits into
Conversation
Live-cluster testing showed the v1.6.2 SecurityPolicy CRD already has apiKeyAuth.forwardClientIDHeader/sanitize; they were added in EG v1.5.1. The earlier v1.7.0 bump was unnecessary — revert examples + dev/Makefile to the pack's existing v1.6.2 pin and fix the code/docs claims.
…rom) Live-cluster testing showed ai-gateway-helm v0.5.0 flattens extProc.extraEnvVars into a CLI arg and only honors literal value:; secretKeyRef rendered as nil and no traces exported. Switch the OTLP endpoint/headers to literals (with a security caveat about the Basic header containing the Langfuse secret key), drop the now-unused interface-Secret example, and update docs.
Companion to the prior commit: ai-gateway-helm v0.5.0 only honors literal value: in extProc.extraEnvVars (secretKeyRef -> nil), so the OTLP endpoint and Basic-auth header are literals (with a security caveat). Docs §14 updated; the chart-limitation is documented.
…ot a Langfuse span
Live testing on tyler-hetzner-dev proved the gen_ai_* metrics are exposed on the extProc sidecar admin port (:1064 /metrics), not on envoy's :19001 /stats/prometheus that the default prometheus.io/scrape annotation targets. A dedicated collector scrape job for :1064 is required. Document the scrape_config and correct the spec; the metricsRequestHeaderAttributes label (user_id) is confirmed working.
Verified on a rebuilt cluster: an unescaped ${1} crashes the OpenTelemetry
Collector (env-var expansion), so the documented scrape_config must use
$${1}. End-to-end scrape :1064 -> Mimir -> per-user token query confirmed.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Attribute LLM token usage to the API-key owner who made each request, and
surface it as a Grafana table — one row per user, total tokens used in
the last 30 days — built entirely on the LGTM pack (Grafana + Mimir)
already running in the cluster. No Langfuse, no extra datastores, no secrets
in GitOps.
Scope: external (API-key) access path only. The internal JWT path is not
attributed in this version.
How it works
apiKeyAuth.forwardClientIDHeader: x-client-id+sanitize: trueon the externalSecurityPolicy, so Envoy Gatewayforwards the matched key's clientID (e.g.
user-chuck-1) downstream andstrips the raw key. (Requires Envoy Gateway v1.5.1+; present in the pack's
pinned v1.6.2.)
controller.metricsRequestHeaderAttributes: "x-client-id:user.id"labels the
gen_ai_client_token_usagemetric withuser_id=<clientID>.prometheus.io/scrape, so the LGTM collectorscrapes
gen_ai_*from/stats/prometheusinto Mimir — no gateway-sideOTLP export.
grafana_dashboard: "1", auto-loaded by the sidecar) runssum by (user_id) (increase(gen_ai_client_token_usage_sum{user_id!=""}[30d]))in one Table panel.
Changes
operator/.../auth.go— forward clientID header + sanitize (kept; comment updated)examples/envoy-ai-gateway.yaml— metric-label value (replaces OTLP→Langfuse env)charts/.../dashboards/per-user-token-usage.json— dashboard JSONcharts/.../templates/token-usage-dashboard.yaml+values.yaml—ConfigMap gated on
observability.dashboard.enabled(namespacemonitoring)docs/install-production.md§14 — rewritten for the metric + dashboard pathPivot note
This PR originally routed per-user usage to a self-hosted Langfuse via OTLP
traces. Live-cluster testing showed that path is heavy (Postgres + ClickHouse
token-usage metric + Grafana table delivers the same per-user view on
infrastructure already present, with nothing secret in the repo.
Out of scope (v1)
Per-model / input-output split, time-series & rate panels, alerting, cost
estimation, and internal/JWT-user attribution.
Testing
go build ./... && go test ./internal/controller/reconcilers/— passhelm template/helm lint— render clean; ConfigMap embeds validdashboard JSON; toggle off removes it
gen_ai_client_token_usage_*{user_id="..."}reaches Mimir after authedrequests, then that the Grafana table renders it. Blocked in the dev cluster
by an extProc sidecar-injection install issue (env, not a code defect).