containers: switch environmentd and clusterd to distroless base image#36099
containers: switch environmentd and clusterd to distroless base image#36099jasonhernandez wants to merge 1 commit into
Conversation
|
Thanks for opening this PR! Here are a few tips to help make the review process smooth for everyone. PR title guidelines
Pre-merge checklist
|
|
I have tried that too, it's going to be difficult: #35631 |
eab3117 to
ec0f2c3
Compare
Distroless images run as nonroot (UID 65534) instead of root. Add version-gating so orchestratord sets the correct runAsUser/runAsGroup based on the Materialize version, avoiding UID mismatches during rolling upgrades from Debian-based to distroless images. Gate versions (verified against release history, 2026-06): - balancerd: V26_18_0. Its ci/Dockerfile switched to distroless-prod-base in v26.18.0 (prod-base in v26.17.x). The original V26_19_0 was off by one and would have forced UID 999 onto v26.18.x balancerd pods that actually run as 65534. - environmentd/clusterd: V26_28_0, matching the release that ships their distroless migration (#36099). The original V26_20_0 predated the actual landing by ~8 releases (main is now 26.28-dev) and would have applied UID 65534 to v26.20-v26.27 images that still run as UID 999. NOTE: the env/clusterd gate assumes #36099 lands in the 26.28 cycle. If it slips, bump V26_28_0 to the actual release. The three distroless PRs (#36099 image, #36100 SIGTERM, #36101 this) must ship in the same release. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
Warning OUTDATED — superseded by the coordination note below. This PR now ships a static 🔗 Distroless migration — coordination note This PR is one of three that must ship together in the same release:
Merge order: #36100 must merge before or with this PR — otherwise distroless clusterd runs as PID 1 with no SIGTERM handler and won't shut down gracefully (k8s waits, then SIGKILLs). Release coupling: #36101 hard-codes Rebased onto current |
ec0f2c3 to
be3af0c
Compare
Replace Debian-based prod images with distroless. Remove shell entrypoint scripts (no shell in distroless). Add libeatmydata to distroless-prod-base for CI compatibility. Update clusterd mzcompose service to drop shell-dependent options. Distroless ships no init, so add a tini-static mzbuild image (static tini built from source, mirroring openssh-static) and copy it into distroless-prod-base. tini stays PID 1 to forward signals and reap zombies, notably the ssh subprocesses spawned by the SSH tunnel feature. Leaf images wrap their binary in ENTRYPOINT ["/usr/bin/tini", "--", ...]. The deleted environmentd entrypoint slept forever on graceful exit so a fenced-out generation would not be restarted. Distroless has no shell to do this, and a Kubernetes StatefulSet (restartPolicy: Always) would crash-loop a process that exits. Replace it with an in-process idle: when fenced out and supervised directly by Kubernetes, environmentd idles until the orchestrator deletes the StatefulSet (SIGTERM, then SIGKILL) instead of calling exit!(0). The all-in-one image keeps its entrypoint and exit-0 behavior. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
be3af0c to
2dfe22f
Compare
🔗 Distroless migration — coordination note (updated)This PR now bundles a static ENTRYPOINT ["/usr/bin/tini", "--", "/usr/local/bin/clusterd"]That changes the coordination picture from the original note:
Merge order: #36101 must land before or with this PR. The gate is version-conditional, so it's a no-op until distroless images of the gated version actually exist — which makes it safe to land first. The reverse is not safe: if this PR ships distroless (nonroot Version gate bumped: |
Distroless images run as nonroot (UID 65534) instead of root. Add version-gating so orchestratord sets the correct runAsUser/runAsGroup based on the Materialize version, avoiding UID mismatches during rolling upgrades from Debian-based to distroless images. Gate versions (verified against release history, 2026-06): - balancerd: V26_18_0. Its ci/Dockerfile switched to distroless-prod-base in v26.18.0 (prod-base in v26.17.x). The original V26_19_0 was off by one and would have forced UID 999 onto v26.18.x balancerd pods that actually run as 65534. - environmentd/clusterd: V26_28_0, matching the release that ships their distroless migration (#36099). The original V26_20_0 predated the actual landing by ~8 releases (main is now 26.28-dev) and would have applied UID 65534 to v26.20-v26.27 images that still run as UID 999. NOTE: the env/clusterd gate assumes #36099 lands in the 26.28 cycle. If it slips, bump V26_28_0 to the actual release. The three distroless PRs (#36099 image, #36100 SIGTERM, #36101 this) must ship in the same release. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
main has moved to 26.32.0-dev and 26.28.0 already shipped as a Debian image. Leaving the gate at 26.28 would apply the nonroot 65534 securityContext to the still-Debian 26.28-26.31 env/clusterd images, which expect uid/gid 999. Bump the gate so only the genuinely distroless images get the nonroot context. The gate must equal the release the distroless env/clusterd images (#36099) first ship in. Set to 26.32 on the assumption this lands in the 26.32 cycle. Re-confirm against the actual release cut before merge. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Summary
Replace Debian-based prod images with distroless for environmentd and clusterd.
Follow-on to #37227 (static AWS-LC ssh in the prod base images): now that the
statically-linked
sshclient exists as theopenssh-staticmzimage,environmentd and clusterd can move to a distroless base, which ships no apt and
no
/usr/bin/ssh.Part of the distroless migration, split from #35859.
Changes
distroless-prod-base: base image offgcr.io/distroless/cc-debian13:nonroot.Carries
libeatmydata(parity withubuntu-base, no-op unlessLD_PRELOADisset) and a static
tinifor all distroless images to inherit.environmentd/clusterdDockerfiles: switch fromprod-basetodistroless-prod-base, copy the staticsshfromopenssh-static, and dropthe shell
entrypoint.shscripts (no shell in distroless). Env-var defaultsthe entrypoint used to set are passed explicitly by the orchestrator.
tini-static(new mzimage): statically-linkedtinibuilt from source(v0.19.0), mirroring
openssh-static. Distroless ships no init, so withoutthis each leaf binary would run as PID 1 and have to forward signals and reap
zombies itself. The zombie case matters here because the SSH tunnel feature
spawns
sshsubprocesses. Leaf images wrap their binary inENTRYPOINT ["/usr/bin/tini", "--", ...]. tini is the same init we alreadyrun as PID 1 in every prod image today (previously apt-installed in
prod-base); this just sources it as a static binary.Fenced-out behavior without a shell entrypoint
The deleted environmentd entrypoint slept forever on graceful exit so that a
fenced-out generation would not be restarted (see commit
2977153da1, #28983).A fenced-out environmentd calls
exit!(0)in preflight to signal "do not keeprunning me". Distroless has no shell to translate that into a sleep, and a
Kubernetes StatefulSet (
restartPolicy: Always, the only option) wouldcrash-loop a process that exits — restarting it straight back into the
fenced-out state until the orchestrator deletes the StatefulSet.
Fix: gate on
Config::idle_when_fenced_out(set fromOrchestratorKind::Kubernetes). When supervised directly by Kubernetes, afenced-out environmentd idles in-process until pod deletion terminates it
(SIGTERM, then SIGKILL) instead of exiting. The all-in-one
materializedimagekeeps its shell entrypoint and
exit!(0)behavior unchanged.Notes
sshis copied per-image (only environmentd and clusterd open tunnels), whiletinilives in the shared base (every distroless image needs an init).tini-staticis kept separate fromopenssh-staticon purpose: the opensshbuild (AWS-LC + OpenSSH from source) is expensive and the tini build is
trivial, so sharing one mzimage would couple a cheap build's cache to an
expensive one for no shipping benefit (both are build-only
FROM scratchstages).
Alternative considered: prebuilt tini binary instead of building from source
tini-staticcurrently compiles tini from source (cmake + build toolchain),mirroring
openssh-static. An alternative is to download the officialtini-static-${TARGETARCH}release binary and verify it in a small builderstage. tini publishes GPG-signed (key
595E85A6…) andSHA256-checksummedstatic binaries, so a pinned checksum is arguably a stronger supply-chain
guarantee than trusting a mutable git tag, and it drops the compile step
entirely.
Trade-off: building from source is hermetic (no build-time dependency on GitHub
releases) and consistent with
openssh-static, which must build from sourcesince there is no official static-AWS-LC ssh release. The prebuilt path is
simpler and faster but adds a network fetch at build time. Kept as build-from-
source for now; easy to switch if reviewers prefer the verified-binary route.
Follow-up
Once this lands, the static
sshcan be removed fromprod-baseentirely: itsremaining consumers (
mz,jobs, and the test images) don't open SSH tunnels,so OpenSSH/OpenSSL drops out of those images and their scanner noise with it.
Test plan