Skip to content

feat(spawner): trust enterprise CA bundle in singleuser/app pods#102

Merged
tylerpotts merged 10 commits into
mainfrom
feat/singleuser-ca-bundle-85
Jun 26, 2026
Merged

feat(spawner): trust enterprise CA bundle in singleuser/app pods#102
tylerpotts merged 10 commits into
mainfrom
feat/singleuser-ca-bundle-85

Conversation

@tylerpotts

Copy link
Copy Markdown
Contributor

Summary

Closes #85.

Makes JupyterHub singleuser and jhub-apps pods trust the enterprise CA so pip install, conda install, git clone, and arbitrary user-driven HTTPS work through a TLS-inspecting proxy with no --trusted-host / ssl_verify: false workarounds.

  • Consumes NIC core's trust-manager–projected nebari-trust-bundle ConfigMap (key ca-certificates.crt) — no cross-repo config plumbing; the Bundle's namespaceSelector: {} already covers JupyterHub's namespace.
  • An init container (merge-ca-bundle) concatenates the singleuser image's system CA bundle with the org CA into an emptyDir; every CA env var (REQUESTS_CA_BUNDLE, SSL_CERT_FILE, NODE_EXTRA_CA_CERTS, CURL_CA_BUNDLE, GIT_SSL_CAINFO) points at the merged file. This verifies both proxy-inspected (org-signed) and genuine public-root endpoints.
  • The nebi-pull init container (nebi pull + pixi install over HTTPS) gets the same CA env vars + merged mount, and _setup_trust_bundle runs before Nebi auto-auth so merge-ca-bundle executes first and the ca-merged volume exists when nebi-pull mounts it.
  • Gated behind custom.trust-bundle-enabled (default false) — existing behavior is unchanged byte-for-byte. ConfigMap name/key are overridable via custom.trust-bundle-configmap / custom.trust-bundle-key. The ConfigMap mount stays optional: true so a cluster without trust-manager (or a spawn racing the projection) degrades to just the system bundle.

Design + plan: docs/superpowers/specs/2026-06-03-singleuser-ca-bundle-design.md, docs/superpowers/plans/2026-06-03-singleuser-ca-bundle.md.

Test plan

  • pytest tests/unit/test_spawner_ca_bundle.py — 9 passing (setup, custom configmap/key, append-without-clobber, toggle reflection, orchestrator on/off, nebi-pull CA on/off, init-container ordering).
  • pytest tests/unit/test_spawner_storage.py tests/unit/test_nss_wrapper_shared_dir.py — no regression from the orchestrator reorder.
  • Manual (on a cluster with NIC trust-manager + inspecting proxy): pip install requests, conda install, git clone an HTTPS repo, and a Nebi-workspace spawn (pixi install) — all with no flags.

Wire _setup_trust_bundle into the _pre_spawn_hook orchestrator (step 1b,
after Nebi auto-auth) gated on _trust_bundle_enabled. Add TDD tests that
verify the flag reflects config and that the orchestrator skips or applies
the CA merge accordingly.
The nebi-pull init container runs `nebi pull` + `pixi install` over HTTPS,
which behind a TLS-inspecting proxy needs the merged org CA too. Inject the
five CA env vars and the ca-merged mount into nebi-pull when the trust bundle
is enabled, and run _setup_trust_bundle before Nebi auto-auth so merge-ca-bundle
executes first and the ca-merged volume exists when nebi-pull mounts it. Also
pins merge-ca-bundle's imagePullPolicy to IfNotPresent.
@tylerpotts

tylerpotts commented Jun 3, 2026

Copy link
Copy Markdown
Contributor Author

Manual verification on a cloud (Hetzner) cluster — real hub-spawned pod

Verified end-to-end on a nebari-infrastructure-core cluster on Hetzner (amd64), deployed with an org CA in trust_bundle and certificate.type: lets-encrypt. NIC's cert-manager trust-manager projects that CA as the nebari-trust-bundle ConfigMap (key ca-certificates.crt) into every namespace — the exact producer this PR consumes. Unlike a local kind run, the singleuser pod is spawned through the hub (amd64, so no install-nebi/replica-pod workarounds).

Setup

  • Deployed this branch's chart as an ArgoCD Application with jupyterhub.custom.trust-bundle-enabled: true into a Nebari-managed jupyterhub namespace (which carries the projected nebari-trust-bundle ConfigMap).
  • Stood up an in-cluster HTTPS probe (ca-test) — nginx serving a leaf signed by the org CA. That CA is present only in the merged bundle (not the image's system store), so it cleanly isolates the org-CA trust path without a full re-signing proxy.
  • Spawned a singleuser server through the hub and ran the checks inside it.

What I verified

  • The merge-ca-bundle init container ran and produced /etc/ssl/certs-extra/ca-bundle.crt (system bundle + org CA — 147 certs, including the org root), and all five CA env vars (REQUESTS_CA_BUNDLE, SSL_CERT_FILE, NODE_EXTRA_CA_CERTS, CURL_CA_BUNDLE, GIT_SSL_CAINFO) point at it.
  • With/without-bundle discriminator in the spawned pod: each tool must succeed with the merged bundle and fail with the system bundle alone (SSL_CERT_FILE=/etc/ssl/certs/ca-certificates.crt), isolating the org CA as the thing enabling trust.
Tool Merged bundle (default env) System bundle only
pixi search -c https://ca-test… past TLS (fails only on repodata decode) invalid peer certificate: UnknownIssuer
git ls-remote https://ca-test… no SSL error (is this a git repository?) server certificate verification failed
python -c "urllib.request.urlopen(...)" trust-ok CERTIFICATE_VERIFY_FAILED (unable to get local issuer certificate)

All three pass: trust succeeds only when the merged bundle (org CA) is in play.

Note on the toolchain

The data-science-pack jupyterlab image is pixi-based — it ships neither pip nor conda (pixi replaces both). So I verified pixi, git, and stdlib Python rather than the literal pip/conda named in #85. pixi (Rust/reqwest) does honor SSL_CERT_FILE, so it trusts the org CA via the merged bundle; stdlib Python picks it up automatically (ssl.get_default_verify_paths().cafile resolves to the merged file via SSL_CERT_FILE).


Reproduction steps (cloud / Hetzner)

Replace <domain> with your DNS-resolvable NIC domain throughout.

0. Prerequisites

  • A NIC cluster on an amd64 cloud (Hetzner here) deployed with:
    • an org CA in trust_bundle (trust-manager then projects nebari-trust-bundle into every namespace),
    • certificate.type: lets-encrypt with a DNS-resolvable domain — so the browser trusts the hub and the hub's back-channel OAuth to Keycloak verifies (a self-signed/existing cert breaks login),
    • the standard foundational apps (nebari-operator, Keycloak, Envoy gateway, longhorn).
  • openssl, kubectl, and the GitOps repo ArgoCD watches.

1. Create the org CA (fed to NIC's trust_bundle)

NIC distributes the cert only — its trust_bundle validation rejects any PRIVATE KEY block, so keep the key separate.

openssl genrsa -out /tmp/test-org-root-ca.key 4096
openssl req -x509 -new -nodes -key /tmp/test-org-root-ca.key -sha256 -days 3650 \
  -subj "/O=Test Org/CN=Test Org Root CA" -out /tmp/test-ca.pem

2. NIC config (relevant bits)

domain: <domain>
certificate:
  type: lets-encrypt
  acme:
    email: you@example.com
trust_bundle:
  path: /tmp/test-ca.pem        # the org CA from step 1
sharedStorage:
  enabled: true
  storageClass: longhorn        # RWX-capable

3. Deploy this branch's chart (ArgoCD Application)

Commit to your GitOps apps/ dir. The values below bake in every fix the verification surfaced:

apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: data-science-pack
  namespace: argocd
  finalizers: [resources-finalizer.argocd.argoproj.io]
spec:
  project: default
  source:
    repoURL: https://github.com/nebari-dev/nebari-data-science-pack.git
    targetRevision: feat/singleuser-ca-bundle-85
    path: .
    helm:
      releaseName: data-science-pack
      values: |
        keycloak:
          hostname: keycloak.<domain>
        nebariapp:
          enabled: true
          hostname: jupyter.<domain>          # REQUIRED when enabled, or the chart render fails
        nebi:
          image:
            tag: ""                            # skip the install-nebi init container
        sharedStorage:
          storageClass: longhorn               # the chart's default NFS PVC uses a
          nfsServer:                            # no-provisioner SC that won't bind
            enabled: false
        jupyterhub:
          custom:
            trust-bundle-enabled: true          # <-- the flag under test
            external-url: jupyter.<domain>
          singleuser:
            networkPolicy:
              egressAllowRules:
                privateIPs: true                # so the pod can reach the in-cluster ca-test probe
  destination:
    server: https://kubernetes.default.svc
    namespace: jupyterhub
  syncPolicy:
    managedNamespaceMetadata:
      labels:
        nebari.dev/managed: "true"              # opt the ns into Nebari management
    automated: { prune: true, selfHeal: true }
    syncOptions:
      - CreateNamespace=true
      - ServerSideApply=true
      - SkipDryRunOnMissingResource=true

Gotchas, each of which blocks the deploy (learned the hard way):

  • Namespace opt-in — without nebari.dev/managed=true the operator errors NamespaceNotOptedIn → no route, no OIDC client → 404 / hub crash. managedNamespaceMetadata only labels a namespace ArgoCD creates fresh, so use a new namespace.
  • nebariapp.hostname required — with nebariapp.enabled: true and no nebariapp.hostname/keycloak.hostname, the chart render fails (Set keycloak.hostname … or nebariapp.hostname).
  • Shared storage — the chart's transitional in-cluster NFS uses a no-provisioner PVC that stays Pending; point it at longhorn RWX.
  • Singleuser egress — z2jh blocks private-IP egress by default; allow it so the pod can reach the in-cluster ca-test probe (a real TLS-inspecting proxy re-signs public endpoints, which are already allowed).

Confirm health:

kubectl get nebariapp -n jupyterhub           # Ready / ReconcileSuccess
kubectl get pods -n jupyterhub                # hub + proxy Running
kubectl get configmap nebari-trust-bundle -n jupyterhub -o jsonpath='{.data.ca-certificates\.crt}' | head

4. Stand up the ca-test probe (org-CA-signed) in jupyterhub

openssl genrsa -out /tmp/ca-test.key 2048
openssl req -new -key /tmp/ca-test.key -subj "/CN=ca-test" -out /tmp/ca-test.csr
cat > /tmp/ca-test-ext.cnf <<'EOF'
subjectAltName = DNS:ca-test, DNS:ca-test.jupyterhub.svc, DNS:ca-test.jupyterhub.svc.cluster.local
extendedKeyUsage = serverAuth
EOF
openssl x509 -req -in /tmp/ca-test.csr -CA /tmp/test-ca.pem -CAkey /tmp/test-org-root-ca.key \
  -CAcreateserial -days 825 -sha256 -extfile /tmp/ca-test-ext.cnf -out /tmp/ca-test.crt

kubectl create secret tls ca-test-tls -n jupyterhub --cert=/tmp/ca-test.crt --key=/tmp/ca-test.key
kubectl apply -n jupyterhub -f ca-test.yaml    # nginx serving :443 with the leaf; Service `ca-test`
ca-test.yaml
apiVersion: v1
kind: ConfigMap
metadata: { name: ca-test-nginx }
data:
  default.conf: |
    server {
      listen 443 ssl;
      server_name ca-test.jupyterhub.svc.cluster.local;
      ssl_certificate     /etc/nginx/tls/tls.crt;
      ssl_certificate_key /etc/nginx/tls/tls.key;
      location / { return 200 "ca-test ok\n"; }
    }
---
apiVersion: apps/v1
kind: Deployment
metadata: { name: ca-test }
spec:
  replicas: 1
  selector: { matchLabels: { app: ca-test } }
  template:
    metadata: { labels: { app: ca-test } }
    spec:
      containers:
        - name: nginx
          image: nginx:1.27
          ports: [{ containerPort: 443 }]
          volumeMounts:
            - { name: tls,  mountPath: /etc/nginx/tls, readOnly: true }
            - { name: conf, mountPath: /etc/nginx/conf.d }
      volumes:
        - { name: tls,  secret: { secretName: ca-test-tls } }
        - { name: conf, configMap: { name: ca-test-nginx } }
---
apiVersion: v1
kind: Service
metadata: { name: ca-test }
spec:
  selector: { app: ca-test }
  ports: [{ port: 443, targetPort: 443 }]

5. Log in and spawn a server

Browse https://jupyter.<domain>/, log in via Keycloak (works because the gateway cert is Let's Encrypt), and start a server.

Headless alternative (used to drive this verification): mint a hub API token and POST /hub/api/users/<user>/server, then kubectl exec into the jupyter-<user> pod.

6. Verify the merge + run the discriminator

P=jupyter-<user>
kubectl exec -n jupyterhub $P -c notebook -- grep -c BEGIN /etc/ssl/certs-extra/ca-bundle.crt
kubectl exec -n jupyterhub $P -c notebook -- printenv \
  REQUESTS_CA_BUNDLE SSL_CERT_FILE NODE_EXTRA_CA_CERTS CURL_CA_BUNDLE GIT_SSL_CAINFO

URL=https://ca-test.jupyterhub.svc.cluster.local
SYS=/etc/ssl/certs/ca-certificates.crt

# python — merged: trust-ok | system-only: CERTIFICATE_VERIFY_FAILED
kubectl exec -n jupyterhub $P -c notebook -- python -c "import urllib.request; urllib.request.urlopen('$URL'); print('trust-ok')"
kubectl exec -n jupyterhub $P -c notebook -- env SSL_CERT_FILE=$SYS python -c "import urllib.request; urllib.request.urlopen('$URL'); print('trust-ok')"

# git — merged: no SSL error | system-only: server certificate verification failed
kubectl exec -n jupyterhub $P -c notebook -- sh -c "git ls-remote $URL"
kubectl exec -n jupyterhub $P -c notebook -- sh -c "GIT_SSL_CAINFO=$SYS git ls-remote $URL"

# pixi — merged: past TLS | system-only: invalid peer certificate: UnknownIssuer
kubectl exec -n jupyterhub $P -c notebook -- sh -c "pixi search -c $URL dummy"
kubectl exec -n jupyterhub $P -c notebook -- sh -c "SSL_CERT_FILE=$SYS pixi search -c $URL dummy"

Trust succeeds only when the merged bundle (org CA) is in play.

@dcmcand dcmcand self-requested a review June 22, 2026 14:47
…ndle-85

# Conflicts:
#	config/jupyterhub/01-spawner.py

@dcmcand dcmcand left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this @tylerpotts

@tylerpotts tylerpotts merged commit beb6bfb into main Jun 26, 2026
26 of 27 checks passed
@tylerpotts tylerpotts deleted the feat/singleuser-ca-bundle-85 branch June 26, 2026 14:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

JupyterHub spawner integration for enterprise CA bundle

3 participants