Skip to content

feat: per-user token usage as a Grafana table (LGTM, no Langfuse)#112

Open
tylerpotts wants to merge 19 commits into
mainfrom
feat/per-user-token-usage-pipeline
Open

feat: per-user token usage as a Grafana table (LGTM, no Langfuse)#112
tylerpotts wants to merge 19 commits into
mainfrom
feat/per-user-token-usage-pipeline

Conversation

@tylerpotts

@tylerpotts tylerpotts commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

What

Attribute LLM token usage to the API-key owner who made each request, and
surface it as a Grafana table — one row per user, total tokens used in
the last 30 days — built entirely on the LGTM pack (Grafana + Mimir)
already running in the cluster. No Langfuse, no extra datastores, no secrets
in GitOps.

Scope: external (API-key) access path only. The internal JWT path is not
attributed in this version.

How it works

  1. Operator renders apiKeyAuth.forwardClientIDHeader: x-client-id +
    sanitize: true on the external SecurityPolicy, so Envoy Gateway
    forwards the matched key's clientID (e.g. user-chuck-1) downstream and
    strips the raw key. (Requires Envoy Gateway v1.5.1+; present in the pack's
    pinned v1.6.2.)
  2. AI Gateway controller.metricsRequestHeaderAttributes: "x-client-id:user.id"
    labels the gen_ai_client_token_usage metric with user_id=<clientID>.
  3. The proxy already carries prometheus.io/scrape, so the LGTM collector
    scrapes gen_ai_* from /stats/prometheus into Mimir — no gateway-side
    OTLP export.
  4. A chart-managed Grafana dashboard ConfigMap (labeled
    grafana_dashboard: "1", auto-loaded by the sidecar) runs
    sum by (user_id) (increase(gen_ai_client_token_usage_sum{user_id!=""}[30d]))
    in one Table panel.

Changes

  • operator/.../auth.go — forward clientID header + sanitize (kept; comment updated)
  • examples/envoy-ai-gateway.yaml — metric-label value (replaces OTLP→Langfuse env)
  • charts/.../dashboards/per-user-token-usage.json — dashboard JSON
  • charts/.../templates/token-usage-dashboard.yaml + values.yaml
    ConfigMap gated on observability.dashboard.enabled (namespace monitoring)
  • docs/install-production.md §14 — rewritten for the metric + dashboard path
  • Removed the self-hosted Langfuse ArgoCD app and all OTLP/Langfuse wiring

Pivot note

This PR originally routed per-user usage to a self-hosted Langfuse via OTLP
traces. Live-cluster testing showed that path is heavy (Postgres + ClickHouse

  • Redis + MinIO) and forces the OTLP auth secret into GitOps. The GenAI
    token-usage metric + Grafana table delivers the same per-user view on
    infrastructure already present, with nothing secret in the repo.

Out of scope (v1)

Per-model / input-output split, time-series & rate panels, alerting, cost
estimation, and internal/JWT-user attribution.

Testing

  • go build ./... && go test ./internal/controller/reconcilers/ — pass
  • helm template / helm lint — render clean; ConfigMap embeds valid
    dashboard JSON; toggle off removes it
  • Live smoke (the load-bearing assumption): confirm
    gen_ai_client_token_usage_*{user_id="..."} reaches Mimir after authed
    requests, then that the Grafana table renders it. Blocked in the dev cluster
    by an extProc sidecar-injection install issue (env, not a code defect).

Live-cluster testing showed the v1.6.2 SecurityPolicy CRD already has
apiKeyAuth.forwardClientIDHeader/sanitize; they were added in EG v1.5.1.
The earlier v1.7.0 bump was unnecessary — revert examples + dev/Makefile
to the pack's existing v1.6.2 pin and fix the code/docs claims.
…rom)

Live-cluster testing showed ai-gateway-helm v0.5.0 flattens
extProc.extraEnvVars into a CLI arg and only honors literal value:;
secretKeyRef rendered as nil and no traces exported. Switch the OTLP
endpoint/headers to literals (with a security caveat about the Basic
header containing the Langfuse secret key), drop the now-unused
interface-Secret example, and update docs.
Companion to the prior commit: ai-gateway-helm v0.5.0 only honors literal
value: in extProc.extraEnvVars (secretKeyRef -> nil), so the OTLP endpoint
and Basic-auth header are literals (with a security caveat). Docs §14
updated; the chart-limitation is documented.
@tylerpotts tylerpotts changed the title feat: per-user token usage tracking in self-hosted Langfuse feat: per-user token usage as a Grafana table (LGTM, no Langfuse) Jun 26, 2026
Live testing on tyler-hetzner-dev proved the gen_ai_* metrics are exposed on
the extProc sidecar admin port (:1064 /metrics), not on envoy's :19001
/stats/prometheus that the default prometheus.io/scrape annotation targets.
A dedicated collector scrape job for :1064 is required. Document the
scrape_config and correct the spec; the metricsRequestHeaderAttributes label
(user_id) is confirmed working.
Verified on a rebuilt cluster: an unescaped ${1} crashes the OpenTelemetry
Collector (env-var expansion), so the documented scrape_config must use
$${1}. End-to-end scrape :1064 -> Mimir -> per-user token query confirmed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants