Skip to content

feat(build): harden and shrink dockerfile#644

Open
Pinguladora wants to merge 4 commits into
openchoreo:mainfrom
Pinguladora:feat/harden-and-shrink-dockerfile
Open

feat(build): harden and shrink dockerfile#644
Pinguladora wants to merge 4 commits into
openchoreo:mainfrom
Pinguladora:feat/harden-and-shrink-dockerfile

Conversation

@Pinguladora

@Pinguladora Pinguladora commented Jun 19, 2026

Copy link
Copy Markdown

Purpose

The current container image (at packages/backend/Dockerfile) produces a ~1.6 GB runtime image that ships the full node:22-bookworm-slim userland (apt, dpkg, g++, libsqlite3-dev, etc.) into the final image even if they aren't necessary. This come with quite a few downsides:

  • Larger attack surface than necessary (shell, package manager, compiler in the runtime image).
  • ~1 GB of Debian-side CVE noise on every image scan that has nothing to do with the application.
  • Slower image pulls in production clusters.

No tracked issue, opened directly from a downstream POC where this surfaced via Trivy reporting ~200 HIGH+CRITICAL findings per Backstage pod, the majority of them in the runtime stage's build toolchain rather than in node_modules.

Goals

  • Cut runtime image size roughly in half (~855MB on my tests)
  • Remove build toolchain (g++, python3, build-essential, libsqlite3-dev, apt, /bin/sh) from the runtime stage
  • Add explicit fully qualified domain and pinning the images (e.g. docker.io/library/node:22-bookworm-slim@sha256...)
  • Keep production-only node_modules on runtime image
  • Fix yarn berry cache, for faster builds
  • Stay drop-in compatible with the existing OpenChoreo Helm chart's args: ["node", "packages/backend", ...] so consumers don't need a chart bump to take this image.
  • Don't change behavior for docker run standalone use.

Approach

Three-stage build with a distroless final stage:

  1. build: installs system build deps, runs yarn install --immutable, compiles TypeScript, builds the backend, and extracts the bundle/skeleton tarballs into /tmp/.
  2. deps: separate stage to produce a production-only node_modules via yarn workspaces focus --all --production on a clean skeleton. After install, prunes node_modules/**/{test,docs,examples,.md,.map,.tgz,...} from common bundled unnecessary files. LICENSE and *.d.ts are deliberately preserved (Backstage's config-loader reads per-plugin config.d.ts at runtime).
  3. final: runs on gcr.io/distroless/nodejs22-debian13:nonroot. No shell, no apt, no toolchain, runs as UID 65532. Copies only the bundle, production node_modules, package.json, and configs/templates/catalog-entities.

Distroless's default ENTRYPOINT is ["/nodejs/bin/node"], but the OpenChoreo Helm chart passes args: ["node",
"packages/backend", ...]. To keep that working without requiring a sibling chart change (which ideally should be done regardless, as the chart should be agnostic to image behaviour), the final stage sets ENTRYPOINT [] + ENV
PATH=/nodejs/bin:$PATH so the chart's literal node resolves via PATH. CMD ["node", "packages/backend"] preserves standalone docker run behavior.

Image base pinned by digest in ARG so both the build base and runtime base are reproducible. Images are based on official ones and Google distroless images to align with the rest of OpenChoreo, but they could be swapped by Chainguard, Docker Hardened Images or any other undistro / distroless and should work without much trouble.

User stories

  • As a platform operator, I want a smaller Backstage image so pulls and rolling updates are faster.
  • As a platform operator, I want the OpenChoreo Backstage image to be drop-in compatible with the existing chart.
  • As a security engineer, I want the runtime image to not ship apt / a compiler / a shell so attack surface is lower and my vulnerability scans focus on the application rather than the OS toolchain.

Release note

packages/backend/Dockerfile: shrink from ~1.6 GB to ~855 MB, hardened by switching to a distroless image

Documentation

N/A. Only affects Dockerfile and .dockerignore, no further changes, documentation is self-contained within comments.

Training

N/A

Certification

N/A

Marketing

N/A

Automation tests

N/A

Security checks

Samples

N/A

Related PRs

N/A

Migrations (if applicable)

N/A

Test environment

  • OS / runtime: Linux x86_64 6.18.7-WSL2-STABLE+
  • Node.js: 22.23.0
  • Container runtime: Podman 5.8 + BuildKit-compatible builder
  • Kubernetes: AKS 1.35.4 + Cilium 1.18.9 (Azure-managed)
  • OpenChoreo: v1.1.1
  • Backstage: main head + this PR's Dockerfile.
  • Database: PostgreSQL 18.3 (CloudNativePG operator)
  • IdP: Microsoft Entra ID
  • Browsers: Firefox 140 on Linux

Learning

  • Backstage's yarn workspaces focus --all --production is sensitive to BuildKit cache mount paths. Mounting /home/node/.yarn (yarn berry's global cache root) leaks install-state between stages and short-circuits the focus install. Keeping the upstream mount path /app/.yarn/cache (or the inherited /home/node/.cache/yarn) avoids this.

  • The "conservative" find ... -prune patterns aggressively used by node-prune collide with several packages that use generic directory names (yaml/dist/doc/, @backstage/plugin-app-backend/dist/lib/assets/) at runtime. Final prune list intentionally excludes doc, docs, assets, images, media, website, *.ts, .tsx, tsconfig.json, LICENSE and *.d.ts files, which in particular are read by Backstage's config-loader at startup to validate app-config.yaml.

  • References used:

Summary by CodeRabbit

  • Chores
    • Updated Docker configuration for optimized container builds.
    • Refined backend Dockerfile to streamline image size and deployment efficiency.

@github-actions

github-actions Bot commented Jun 19, 2026

Copy link
Copy Markdown

Changeset detected — the following file(s) will be released with this PR:

.changeset/old-lemons-speak.md

@coderabbitai

coderabbitai Bot commented Jun 19, 2026

Copy link
Copy Markdown

Review Change Stack

📝 Walkthrough

Walkthrough

The .dockerignore gains two Yarn Berry exclusions. The packages/backend/Dockerfile is rewritten from a 3-stage build into a parameterized 3-stage pipeline: a build stage compiles TypeScript and extracts skeleton/bundle tarballs, a deps stage installs and prunes production-only node_modules, and a distroless final stage assembles the minimal runtime image without the examples directory.

Changes

Distroless Backend Container Build

Layer / File(s) Summary
Build args, .dockerignore, and build stage
.dockerignore, packages/backend/Dockerfile
Adds BUILD_IMAGE/RUNTIME_IMAGE build args; excludes .yarn/cache and .yarn/install-state.gz from Docker context; defines the build stage that installs apt dependencies, runs yarn install --immutable, compiles TypeScript, and extracts dist/skeleton and dist/bundle tarballs.
Production deps stage with node_modules pruning
packages/backend/Dockerfile
Defines the deps stage: copies Yarn state and skeleton metadata from build, runs yarn workspaces focus --all --production, then removes tests, source maps, type declarations, docs, and unused binaries from node_modules via find pruning commands.
Distroless runtime final stage
packages/backend/Dockerfile
Switches base image to distroless nonroot Node, copies only runtime artifacts (omitting examples), sets NODE_ENV=production and NODE_OPTIONS=--no-node-snapshot, clears the distroless default ENTRYPOINT, adjusts PATH, and retains CMD ["node", "packages/backend"].

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐇 Hippity-hop, the Dockerfile's new,
Distroless and lean, no extras to chew!
Yarn Berry cache? We'll ignore that stuff.
Three crisp little stages, sleek and tough.
The bundle is packed, the skeleton's bare —
A nonroot rabbit runs without a care! 🥕

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description check ✅ Passed The PR description comprehensively addresses the template with detailed Purpose, Goals, Approach, User stories, Release note, and Test environment sections, clearly documenting the rationale and implementation.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Title check ✅ Passed The title accurately summarizes the main changes: hardening the Dockerfile through distroless images and container size reduction.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Pinguladora and others added 4 commits June 20, 2026 00:36
swap to distroless runtime image
fix yarn berry cache path
tree-shake production node_modules

Signed-off-by: Pinguladora <50406923+Pinguladora@users.noreply.github.com>
pruning was too agressive breaking image upon booting
explicit Docker Hub path so it cannot be confused or typosquatted in any
way

Signed-off-by: Pinguladora <50406923+Pinguladora@users.noreply.github.com>
Helm chart already pass node as the first argument so adjust to it, at
least temporary

Signed-off-by: Pinguladora <50406923+Pinguladora@users.noreply.github.com>
Signed-off-by: Alvaro <212876443+aechegoyan@users.noreply.github.com>
@Pinguladora Pinguladora force-pushed the feat/harden-and-shrink-dockerfile branch from 2dc1070 to 8e03d00 Compare June 19, 2026 22:38
@Pinguladora Pinguladora changed the title Feat/harden and shrink dockerfile feat(build): harden and shrink dockerfile Jun 20, 2026
@Pinguladora

Copy link
Copy Markdown
Author

Forgot to add, it would be interesting to consider having a dev / quick-start variant image through a Dockerfile.dev or similar so sqlite3 stuff can be pruned too, which should remove another ~30MB.

@LakshanSS LakshanSS left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your contribution @Pinguladora!

@codecov

codecov Bot commented Jun 22, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants