Skip to content

containers: switch environmentd and clusterd to distroless base image#36099

Draft
jasonhernandez wants to merge 1 commit into
mainfrom
jason/distroless-dockerfiles
Draft

containers: switch environmentd and clusterd to distroless base image#36099
jasonhernandez wants to merge 1 commit into
mainfrom
jason/distroless-dockerfiles

Conversation

@jasonhernandez

@jasonhernandez jasonhernandez commented Apr 15, 2026

Copy link
Copy Markdown
Contributor

Summary

Replace Debian-based prod images with distroless for environmentd and clusterd.
Follow-on to #37227 (static AWS-LC ssh in the prod base images): now that the
statically-linked ssh client exists as the openssh-static mzimage,
environmentd and clusterd can move to a distroless base, which ships no apt and
no /usr/bin/ssh.

Part of the distroless migration, split from #35859.

Changes

  • distroless-prod-base: base image off gcr.io/distroless/cc-debian13:nonroot.
    Carries libeatmydata (parity with ubuntu-base, no-op unless LD_PRELOAD is
    set) and a static tini for all distroless images to inherit.
  • environmentd / clusterd Dockerfiles: switch from prod-base to
    distroless-prod-base, copy the static ssh from openssh-static, and drop
    the shell entrypoint.sh scripts (no shell in distroless). Env-var defaults
    the entrypoint used to set are passed explicitly by the orchestrator.
  • tini-static (new mzimage): statically-linked tini built from source
    (v0.19.0), mirroring openssh-static. Distroless ships no init, so without
    this each leaf binary would run as PID 1 and have to forward signals and reap
    zombies itself. The zombie case matters here because the SSH tunnel feature
    spawns ssh subprocesses. Leaf images wrap their binary in
    ENTRYPOINT ["/usr/bin/tini", "--", ...]. tini is the same init we already
    run as PID 1 in every prod image today (previously apt-installed in
    prod-base); this just sources it as a static binary.

Fenced-out behavior without a shell entrypoint

The deleted environmentd entrypoint slept forever on graceful exit so that a
fenced-out generation would not be restarted (see commit 2977153da1, #28983).
A fenced-out environmentd calls exit!(0) in preflight to signal "do not keep
running me". Distroless has no shell to translate that into a sleep, and a
Kubernetes StatefulSet (restartPolicy: Always, the only option) would
crash-loop a process that exits — restarting it straight back into the
fenced-out state until the orchestrator deletes the StatefulSet.

Fix: gate on Config::idle_when_fenced_out (set from
OrchestratorKind::Kubernetes). When supervised directly by Kubernetes, a
fenced-out environmentd idles in-process until pod deletion terminates it
(SIGTERM, then SIGKILL) instead of exiting. The all-in-one materialized image
keeps its shell entrypoint and exit!(0) behavior unchanged.

Notes

  • ssh is copied per-image (only environmentd and clusterd open tunnels), while
    tini lives in the shared base (every distroless image needs an init).
  • tini-static is kept separate from openssh-static on purpose: the openssh
    build (AWS-LC + OpenSSH from source) is expensive and the tini build is
    trivial, so sharing one mzimage would couple a cheap build's cache to an
    expensive one for no shipping benefit (both are build-only FROM scratch
    stages).

Alternative considered: prebuilt tini binary instead of building from source

tini-static currently compiles tini from source (cmake + build toolchain),
mirroring openssh-static. An alternative is to download the official
tini-static-${TARGETARCH} release binary and verify it in a small builder
stage. tini publishes GPG-signed (key 595E85A6…) and SHA256-checksummed
static binaries, so a pinned checksum is arguably a stronger supply-chain
guarantee than trusting a mutable git tag, and it drops the compile step
entirely.

Trade-off: building from source is hermetic (no build-time dependency on GitHub
releases) and consistent with openssh-static, which must build from source
since there is no official static-AWS-LC ssh release. The prebuilt path is
simpler and faster but adds a network fetch at build time. Kept as build-from-
source for now; easy to switch if reviewers prefer the verified-binary route.

Follow-up

Once this lands, the static ssh can be removed from prod-base entirely: its
remaining consumers (mz, jobs, and the test images) don't open SSH tunnels,
so OpenSSH/OpenSSL drops out of those images and their scanner noise with it.

Test plan

  • environmentd container starts and serves traffic
  • clusterd container starts and connects to environmentd
  • CI mzcompose tests pass with new images
  • SSH tunnel sources/sinks work against the distroless images (ssh present, tini reaps ssh subprocesses)
  • 0dt rollout: fenced-out old generation idles (no CrashLoopBackoff) and is torn down cleanly

@github-actions

Copy link
Copy Markdown
Contributor

Thanks for opening this PR! Here are a few tips to help make the review process smooth for everyone.

PR title guidelines

  • Use imperative mood: "Fix X" not "Fixed X" or "Fixes X"
  • Be specific: "Fix panic in catalog sync when controller restarts" not "Fix bug" or "Update catalog code"
  • Prefix with area if helpful: compute: , storage: , adapter: , sql:

Pre-merge checklist

  • The PR title is descriptive and will make sense in the git log.
  • This PR has adequate test coverage / QA involvement has been duly considered. (trigger-ci for additional test/nightly runs)
  • If this PR includes major user-facing behavior changes, I have pinged the relevant PM to schedule a changelog post.
  • This PR has an associated up-to-date design doc, is a design doc (template), or is sufficiently small to not require a design.
  • If this PR evolves an existing $T ⇔ Proto$T mapping (possibly in a backwards-incompatible way), then it is tagged with a T-proto label.
  • If this PR will require changes to cloud orchestration or tests, there is a companion cloud PR to account for those changes that is tagged with the release-blocker label (example).

@jasonhernandez jasonhernandez changed the base branch from main to jason/sec-236-static-openssh-fips April 15, 2026 05:38
@def-

def- commented Apr 15, 2026

Copy link
Copy Markdown
Contributor

I have tried that too, it's going to be difficult: #35631

Base automatically changed from jason/sec-236-static-openssh-fips to main May 30, 2026 01:05
@jasonhernandez jasonhernandez force-pushed the jason/distroless-dockerfiles branch from eab3117 to ec0f2c3 Compare June 2, 2026 16:43
jasonhernandez added a commit that referenced this pull request Jun 2, 2026
Distroless images run as nonroot (UID 65534) instead of root. Add
version-gating so orchestratord sets the correct runAsUser/runAsGroup
based on the Materialize version, avoiding UID mismatches during
rolling upgrades from Debian-based to distroless images.

Gate versions (verified against release history, 2026-06):
- balancerd: V26_18_0. Its ci/Dockerfile switched to distroless-prod-base
  in v26.18.0 (prod-base in v26.17.x). The original V26_19_0 was off by
  one and would have forced UID 999 onto v26.18.x balancerd pods that
  actually run as 65534.
- environmentd/clusterd: V26_28_0, matching the release that ships their
  distroless migration (#36099). The original V26_20_0 predated the actual
  landing by ~8 releases (main is now 26.28-dev) and would have applied
  UID 65534 to v26.20-v26.27 images that still run as UID 999.

NOTE: the env/clusterd gate assumes #36099 lands in the 26.28 cycle. If it
slips, bump V26_28_0 to the actual release. The three distroless PRs
(#36099 image, #36100 SIGTERM, #36101 this) must ship in the same release.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@jasonhernandez

jasonhernandez commented Jun 2, 2026

Copy link
Copy Markdown
Contributor Author

Warning

OUTDATED — superseded by the coordination note below. This PR now ships a static tini as PID 1, so the #36100 dependency described here no longer applies. Kept for history.

🔗 Distroless migration — coordination note

This PR is one of three that must ship together in the same release:

Merge order: #36100 must merge before or with this PR — otherwise distroless clusterd runs as PID 1 with no SIGTERM handler and won't shut down gracefully (k8s waits, then SIGKILLs).

Release coupling: #36101 hard-codes V26_28_0 as the version at which env/clusterd images become distroless. That must equal the release this PR actually lands in. If this slips past the 26.28 cycle, bump that constant in #36101.

Rebased onto current main; the MZFROM openssh-static / COPY --from=openssh /output/ssh wiring was verified by a local docker build (statically-linked OpenSSH_10.3p1).

Replace Debian-based prod images with distroless. Remove shell
entrypoint scripts (no shell in distroless). Add libeatmydata to
distroless-prod-base for CI compatibility. Update clusterd mzcompose
service to drop shell-dependent options.

Distroless ships no init, so add a tini-static mzbuild image (static
tini built from source, mirroring openssh-static) and copy it into
distroless-prod-base. tini stays PID 1 to forward signals and reap
zombies, notably the ssh subprocesses spawned by the SSH tunnel
feature. Leaf images wrap their binary in ENTRYPOINT ["/usr/bin/tini",
"--", ...].

The deleted environmentd entrypoint slept forever on graceful exit so a
fenced-out generation would not be restarted. Distroless has no shell to
do this, and a Kubernetes StatefulSet (restartPolicy: Always) would
crash-loop a process that exits. Replace it with an in-process idle:
when fenced out and supervised directly by Kubernetes, environmentd
idles until the orchestrator deletes the StatefulSet (SIGTERM, then
SIGKILL) instead of calling exit!(0). The all-in-one image keeps its
entrypoint and exit-0 behavior.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@jasonhernandez jasonhernandez force-pushed the jason/distroless-dockerfiles branch from be3af0c to 2dfe22f Compare June 30, 2026 04:34
@jasonhernandez

Copy link
Copy Markdown
Contributor Author

🔗 Distroless migration — coordination note (updated)

This PR now bundles a static tini (tini-static mzimage) into distroless-prod-base, so every distroless leaf image runs tini as PID 1:

ENTRYPOINT ["/usr/bin/tini", "--", "/usr/local/bin/clusterd"]

That changes the coordination picture from the original note:

Merge order: #36101 must land before or with this PR. The gate is version-conditional, so it's a no-op until distroless images of the gated version actually exist — which makes it safe to land first. The reverse is not safe: if this PR ships distroless (nonroot 65534) images while orchestratord still lacks the corrected gate, orchestratord applies the wrong UID/GID to those pods.

Version gate bumped: V26_28_0V26_32_0. main has moved to 26.32.0-dev and 26.28.0 has already shipped (as a Debian image). Leaving the gate at 26.28 would wrongly apply the nonroot 65534 securityContext to the still-Debian 26.2826.31 env/clusterd images, which expect 999. The gate must equal the release these distroless images first ship in. Set to 26.32.0 on the assumption this lands in the 26.32 cycle — re-confirm against the actual release cut before merge and bump again if it slips.

jasonhernandez added a commit that referenced this pull request Jun 30, 2026
Distroless images run as nonroot (UID 65534) instead of root. Add
version-gating so orchestratord sets the correct runAsUser/runAsGroup
based on the Materialize version, avoiding UID mismatches during
rolling upgrades from Debian-based to distroless images.

Gate versions (verified against release history, 2026-06):
- balancerd: V26_18_0. Its ci/Dockerfile switched to distroless-prod-base
  in v26.18.0 (prod-base in v26.17.x). The original V26_19_0 was off by
  one and would have forced UID 999 onto v26.18.x balancerd pods that
  actually run as 65534.
- environmentd/clusterd: V26_28_0, matching the release that ships their
  distroless migration (#36099). The original V26_20_0 predated the actual
  landing by ~8 releases (main is now 26.28-dev) and would have applied
  UID 65534 to v26.20-v26.27 images that still run as UID 999.

NOTE: the env/clusterd gate assumes #36099 lands in the 26.28 cycle. If it
slips, bump V26_28_0 to the actual release. The three distroless PRs
(#36099 image, #36100 SIGTERM, #36101 this) must ship in the same release.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
jasonhernandez added a commit that referenced this pull request Jun 30, 2026
main has moved to 26.32.0-dev and 26.28.0 already shipped as a Debian
image. Leaving the gate at 26.28 would apply the nonroot 65534
securityContext to the still-Debian 26.28-26.31 env/clusterd images,
which expect uid/gid 999. Bump the gate so only the genuinely distroless
images get the nonroot context.

The gate must equal the release the distroless env/clusterd images
(#36099) first ship in. Set to 26.32 on the assumption this lands in the
26.32 cycle. Re-confirm against the actual release cut before merge.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants