feat(dev): align local dev stack with AI Gateway v0.5 + key-manager UI dev mode by dcmcand · Pull Request #115 · nebari-dev/llm-serving-pack

dcmcand · 2026-06-26T08:36:12Z

What and why

Makes the local kind dev path actually run an external-provider PassthroughModel end to end, and lets the key-manager UI run without Keycloak.

Closes #113
Closes #114

#113 - dev stack version mismatch

dev/Makefile installed Envoy Gateway v1.3.0 and Gateway API v1.2.1 alongside Envoy AI Gateway v0.5.0, which needs Envoy Gateway v1.6.x and Gateway API v1.4.0 (compatibility matrix). On v1.3.0 a PassthroughModel reconciled to Ready but the operator's BackendTLSPolicy was never translated into an upstream TLS socket, so Envoy dialed the provider in plaintext and inference returned 503 UC.

Pin the dependency versions at the top of dev/Makefile and move them as a set; bump to Envoy Gateway v1.6.7 and Gateway API v1.4.0.
Install Envoy Gateway with the AI Gateway ext_proc wiring (dev/eg-extension-values.yaml) and bring the AI Gateway up first so the extension server exists.
Apply the PassthroughModel CRD in make setup.
Extend dev/manifests/ so the operator can serve passthrough: RBAC for passthroughmodels, aiservicebackends/backendsecuritypolicies/backends/backendtlspolicies, and the shared-TLS reconciler (certificates, gateways); the passthrough validating webhook; and shared-TLS issuance through the existing local selfsigned-issuer (LLM_CLUSTER_ISSUER_NAME), so no ACME or hand-made cert is needed.
New make targets: create-openrouter-secret, apply-passthrough-model, ui.
Operator: emit the PassthroughModel BackendTLSPolicy as gateway.networking.k8s.io/v1 instead of v1alpha3. Gateway API v1.4.0 graduates the policy to v1 and no longer serves v1alpha3, so on the version-aligned stack the old apiVersion failed to apply and the upstream never got a TLS socket. This raises the effective floor for passthrough to Gateway API v1.4 / Envoy Gateway v1.6, which is what AI Gateway v0.5 already requires.

#114 - key-manager UI dev mode

The UI could not run locally: the gateway enforces OIDC before forwarding, and the key-manager 401s without a JWT.

LLM_DEV_MODE (off by default) makes the auth middleware skip token handling and inject a fixed identity (LLM_DEV_USER, LLM_DEV_GROUPS). The production path is unchanged when unset, and a warning is logged when it is on.
Exposed as keyManager.devMode in the Helm chart (default disabled) and enabled in the dev manifest. make ui port-forwards the Service so the gateway OIDC layer is bypassed too.

One-command UI dev environment (`make run-dev`)

For frontend work on the key-manager UI, make run-dev is the whole setup: drop an OpenRouter key in dev/.env and run it. It idempotently brings up the cluster, operator, dev-mode key-manager, and three passthrough models, then port-forwards the key-manager and starts a hot-reloading UI dev server.

dev/uidev/ is a zero-dependency (stdlib-only) Go dev server: it serves the UI static files from disk, proxies /api/* to the port-forwarded key-manager, and live-reloads the browser on edits. The UI is plain static files, so there is no build step or npm.
dev/manifests/dev-models.yaml gives the UI a populated three-model list.
docs/ui-development.md documents the workflow; dev/.env is gitignored.

Verification

key-manager: go vet clean, full go test ./... passes, including a new table-driven TestAuthMiddlewareDevMode.
Chart lints; renders the dev-mode env when devMode.enabled=true and omits it by default.
Operator: go build, go vet, and the Passthrough reconciler tests pass with the v1 assertion.
Clean end-to-end on a fresh cluster: make teardown && make setup && make build-images && make load-images && make deploy && make create-openrouter-secret && make apply-passthrough-model brings the PassthroughModel to Ready, the operator patches the llm-https listener, and a chat completion through the gateway returns a real OpenRouter response (200). The key-manager logs the dev-mode warning and GET /api/me returns the injected dev identity (200, not 401).

Notes

The chart references a ClusterIssuer/selfsigned-issuer it does not create; the dev path supplies one via cert-manager-config.yaml. Whether the chart should ship a dev issuer is out of scope here.
The internal endpoint (llm-internal.<domain>) still requires a real Keycloak JWT even when access is public, so only the external endpoint is reachable on kind.

…ager UI dev mode Bump the dev/Makefile dependency stack to versions compatible with the bundled Envoy AI Gateway v0.5.0 (Envoy Gateway v1.6.7, Gateway API v1.4.0) and wire the AI Gateway ext_proc extension into Envoy Gateway at install time. On the previous versions (EG v1.3.0) a PassthroughModel reconciled to Ready but its upstream TLS was never programmed, so provider inference returned 503. Extend the dev manifests with the PassthroughModel RBAC, validating webhook, and shared-TLS issuance via the local self-signed ClusterIssuer, plus Makefile targets and an example model for the OpenRouter passthrough. Add an off-by-default dev mode to the key-manager: LLM_DEV_MODE bypasses auth and injects a fixed identity so the UI runs on a local cluster with no Keycloak. Exposed via keyManager.devMode in the Helm chart and enabled in the dev manifest, with a `make ui` port-forward target. Refs #113, #114

…pstream Gateway API v1.4.0 (required by the bundled Envoy AI Gateway v0.5) graduates BackendTLSPolicy to v1 and no longer serves v1alpha3, so the operator's hardcoded v1alpha3 failed to apply on a version-aligned stack ("no matches for kind BackendTLSPolicy in gateway.networking.k8s.io/v1alpha3") and the passthrough upstream never got a TLS transport socket. Emit v1, which is the same spec shape. Refs #113

The key-manager watches PassthroughModels as well as LLMModels, but the dev manifest's llm-key-manager-models ClusterRole only granted llmmodels, so model sync failed ("cannot list passthroughmodels") and passthrough models never appeared in the UI. Matches the chart's key-manager role. Refs #114

… reload Frontend devs working on the key-manager UI now need only an OpenRouter key in dev/.env and `make run-dev`. The target idempotently brings up the kind cluster, operator, dev-mode key-manager, and three OpenRouter passthrough models, then port-forwards the key-manager and starts a hot-reloading UI dev server. - dev/uidev: a zero-dependency (stdlib-only) Go dev server that serves the UI static files from disk, proxies /api/* to the port-forwarded key-manager, and live-reloads the browser on file edits. The UI is plain static files, so no build step or npm is involved. - dev/run-dev.sh + `make run-dev`: orchestrates cluster/deploy/models/port-forward /UI server, loading OPENROUTER_API_KEY from a gitignored dev/.env. - dev/manifests/dev-models.yaml: three passthrough models so the UI list is populated. - docs/ui-development.md: frontend-dev guide (setup, editing, dev-mode auth, shipping changes, API table, troubleshooting), linked from getting-started. Refs #114

jbouder

Reviewed by actually running make run-dev on a fresh kind cluster (Helm 4). Two blockers stopped the bring-up (Gateway API CRD conflict; operator webhook startup race), plus a few smaller things. Details inline.

jbouder · 2026-06-26T17:26:24Z

-	# Envoy AI Gateway
-	helm upgrade -i aieg-crd oci://docker.io/envoyproxy/ai-gateway-crds-helm --version v0.5.0 -n envoy-ai-gateway-system --create-namespace
-	helm upgrade -i aieg oci://docker.io/envoyproxy/ai-gateway-helm --version v0.5.0 -n envoy-ai-gateway-system
+	kubectl apply -f https://github.com/kubernetes-sigs/gateway-api/releases/download/$(GATEWAY_API_VERSION)/standard-install.yaml


🔴 Blocker on Helm 4: Gateway API CRD ownership conflict.

On Helm 4 (server-side apply is the default), the eg install at L39 aborts because the chart re-applies these same Gateway API CRDs via SSA and collides with this client-side kubectl apply:

Error: failed to install CRD crds/gatewayapi-crds.yaml: conflict occurred while applying object ... conflicts with "kubectl-client-side-apply"

This blocks make setup / make run-dev on a fresh cluster for anyone on Helm 4. Both install the same pinned version, so taking ownership is safe — switch this to server-side apply:

- kubectl apply -f https://.../$(GATEWAY_API_VERSION)/standard-install.yaml + kubectl apply --server-side --force-conflicts -f https://.../$(GATEWAY_API_VERSION)/standard-install.yaml

and add --force-conflicts to the eg helm install (see comment on L39).

jbouder · 2026-06-26T17:26:24Z

+	# Envoy Gateway, wired with the AI Gateway ext_proc extension (enableBackend,
+	# extensionManager, backendResources). Without this the per-model routing
+	# layer 404s and passthrough upstreams never get a TLS transport socket.
+	helm upgrade -i eg oci://docker.io/envoyproxy/gateway-helm --version $(ENVOY_GATEWAY_VERSION) -n envoy-gateway-system --create-namespace -f eg-extension-values.yaml


Part of the Helm 4 CRD fix (see L27): add --force-conflicts so the chart's bundled Gateway API CRDs cleanly take ownership instead of erroring.

- helm upgrade -i eg oci://.../gateway-helm --version $(ENVOY_GATEWAY_VERSION) -n envoy-gateway-system --create-namespace -f eg-extension-values.yaml + helm upgrade -i eg oci://.../gateway-helm --version $(ENVOY_GATEWAY_VERSION) -n envoy-gateway-system --create-namespace --force-conflicts -f eg-extension-values.yaml

jbouder · 2026-06-26T17:26:24Z

+	./run-dev.sh
+
 setup: ## Create kind cluster and install dependencies
 	kind create cluster --name $(CLUSTER_NAME)


🟡 make setup isn't resumable after a partial failure. kind create cluster errors hard (node(s) already exist for a cluster with the name ...) when the cluster exists, aborting the whole target. Since run-dev.sh only calls make setup when the cluster is absent, a setup that dies midway (e.g. the CRD conflict above) can't be recovered with make setup or make run-dev — you have to make teardown first. Guarding the create makes it idempotent:

@kind get clusters | grep -qx $(CLUSTER_NAME) || kind create cluster --name $(CLUSTER_NAME)

jbouder · 2026-06-26T17:26:24Z

+kubectl -n "$NS" create secret generic openrouter-api-key \
+  --from-literal=apiKey="$OPENROUTER_API_KEY" \
+  --dry-run=client -o yaml | kubectl apply -f - >/dev/null
+kubectl apply -f manifests/dev-models.yaml >/dev/null


🔴 Webhook startup race — this apply fails intermittently and set -euo pipefail kills the run. The operator's validating webhook gates PassthroughModel creates, but it isn't serving yet right after make deploy:

Error from server (InternalError): ... failed calling webhook "vpassthroughmodel-v1alpha1.kb.io": Post "https://llm-operator-webhook-service.../validate-...": dial tcp ...:443: connect: connection refused

Root cause is in operator.yaml (no readiness probe — see that comment); a bounded retry here makes it reliable regardless:

Suggested change

kubectl apply -f manifests/dev-models.yaml >/dev/null

# The operator's validating webhook gates PassthroughModel creates, and isn't

# serving the instant `make deploy`'s rollout returns. Retry until it accepts.

for attempt in $(seq 1 30); do

kubectl apply -f manifests/dev-models.yaml >/dev/null 2>&1 && break

if [[ $attempt -eq 30 ]]; then

echo "ERROR: operator webhook never became ready" >&2

kubectl apply -f manifests/dev-models.yaml >&2 || true

exit 1

fi

echo "==> operator webhook not ready yet, retrying ($attempt)..."

sleep 2

done

jbouder · 2026-06-26T17:26:24Z

+
+# Foreground: exits on Ctrl-C, which triggers cleanup of the port-forward.
+( cd uidev && go run . \
+    -static ../../key-manager/internal/ui/static \


🟢 Nit: /tmp/km-portforward.log is a fixed path. A stale file from a crashed prior run can make the grep -q "Forwarding from" readiness check below pass instantly against old output. mktemp would avoid that.

jbouder · 2026-06-26T17:26:24Z

+              value: "https://keycloak.local/realms/nebari"
            - name: LLM_OIDC_GROUPS_CLAIM
              value: "groups"
            - name: ENABLE_WEBHOOKS


🟡 Root cause of the run-dev.sh webhook race lives here. Webhooks are enabled, and the container exposes containerPort: 9443, but the Deployment declares no readinessProbe — so kubectl rollout status in make deploy returns the moment the process launches, before the webhook binds its port (it waits on the cert-manager cert mount). Anything that applies a webhook-gated CR right after make deploy races the webhook.

Gating readiness on the webhook port makes rollout status mean "webhook ready" for every consumer of make deploy, not just run-dev.sh:

readinessProbe: tcpSocket: { port: 9443 } initialDelaySeconds: 2 periodSeconds: 2

(A controller-runtime readyz check wired to the webhook server is even better.)

dcmcand added area: developer experience 👩🏻‍💻 type: bug 🐛 Something isn't working labels Jun 26, 2026

dcmcand added 3 commits June 26, 2026 10:46

dcmcand requested a review from jbouder June 26, 2026 11:54

dcmcand mentioned this pull request Jun 26, 2026

fix(operator): scope API keys per-model via per-route authorization (#116) #117

Open

jbouder reviewed Jun 26, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(dev): align local dev stack with AI Gateway v0.5 + key-manager UI dev mode#115

feat(dev): align local dev stack with AI Gateway v0.5 + key-manager UI dev mode#115
dcmcand wants to merge 4 commits into
mainfrom
feat/local-dev-passthrough-and-ui-devmode

dcmcand commented Jun 26, 2026 •

edited

Loading

Uh oh!

jbouder left a comment

Uh oh!

jbouder Jun 26, 2026

Uh oh!

jbouder Jun 26, 2026

Uh oh!

jbouder Jun 26, 2026

Uh oh!

jbouder Jun 26, 2026

Uh oh!

jbouder Jun 26, 2026

Uh oh!

jbouder Jun 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

-kubectl apply -f manifests/dev-models.yaml >/dev/null
+# The operator's validating webhook gates PassthroughModel creates, and isn't
+# serving the instant `make deploy`'s rollout returns. Retry until it accepts.
+for attempt in $(seq 1 30); do
+  kubectl apply -f manifests/dev-models.yaml >/dev/null 2>&1 && break
+  if [[ $attempt -eq 30 ]]; then
+    echo "ERROR: operator webhook never became ready" >&2
+    kubectl apply -f manifests/dev-models.yaml >&2 || true
+    exit 1
+  fi
+  echo "==> operator webhook not ready yet, retrying ($attempt)..."
+  sleep 2
+done

Uh oh!

Conversation

dcmcand commented Jun 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What and why

#113 - dev stack version mismatch

#114 - key-manager UI dev mode

One-command UI dev environment (make run-dev)

Verification

Notes

Uh oh!

jbouder left a comment

Choose a reason for hiding this comment

Uh oh!

jbouder Jun 26, 2026

Choose a reason for hiding this comment

Uh oh!

jbouder Jun 26, 2026

Choose a reason for hiding this comment

Uh oh!

jbouder Jun 26, 2026

Choose a reason for hiding this comment

Uh oh!

jbouder Jun 26, 2026

Choose a reason for hiding this comment

Uh oh!

jbouder Jun 26, 2026

Choose a reason for hiding this comment

Uh oh!

jbouder Jun 26, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

dcmcand commented Jun 26, 2026 •

edited

Loading

One-command UI dev environment (`make run-dev`)