Skip to content

feat(release): publish multi-arch docker images (V2-298)#64

Open
jacderida wants to merge 3 commits into
masterfrom
chore-docker_ci
Open

feat(release): publish multi-arch docker images (V2-298)#64
jacderida wants to merge 3 commits into
masterfrom
chore-docker_ci

Conversation

@jacderida
Copy link
Copy Markdown
Contributor

@jacderida jacderida commented May 21, 2026

Summary

  • On tagged vX.Y.Z releases, publish withautonomi/indelible + withautonomi/antd to Docker Hub and GHCR for linux/amd64 + linux/arm64, with :vX.Y.Z and :latest. antd images carry the resolved ant-sdk release tag.
  • Extends workflow_dispatch with a version input that exercises the docker publish path from a branch without cutting a real release: binary build + GitHub release jobs are skipped, :latest is left alone, and the antd image is namespaced under the dry-run version.
  • :latest is gated on prerelease == 'false', so v1.0.0-rc1 does not overwrite stable :latest (for either indelible or antd).
  • README compose Quick Start drops the "image not yet available" caveat.
  • CI change (separate commit 9db423e): un-gates the existing docker job in ci.yml to run on PRs as well as pushes. Adjacent rather than strictly part of V2-298, but a logical follow-on now that this PR exercises the same image build paths — easier to catch Dockerfile/cross-compile breakage at PR time than post-merge. Comment block updated to match the new behaviour. If the per-PR cost turns out to be objectionable in practice, this commit can be reverted independently without affecting the V2-298 work.

Test plan (what was verified before opening this PR)

  • Dispatched Release from this branch with version=0.0.0-dryrun-1 (run 26228474819). Completed in 3m42s.
  • build and release jobs correctly skipped under workflow_dispatch; only release-meta, resolve-antd-version, and docker ran.
  • Docker Hub + GHCR logins succeeded; both build-push steps pushed all four image refs.
  • docker manifest inspect showed linux/amd64 + linux/arm64 present on all four image refs.
  • Unauthenticated docker pull of withautonomi/indelible:0.0.0-dryrun-1 and withautonomi/antd:v0.7.1 worked from both Docker Hub and GHCR. Digests matched across registries (same image content).
  • :latest was confirmed not touched in either registry during the dry-run.
  • Dry-run artifacts cleaned from both registries afterwards.
  • Follow-up review-driven dispatch with version=0.0.0-dryrun-2 to validate the antd-namespacing condition and REPLACE-ME fail-fast (linked in the comment thread).

Follow-ups

  • New Linear ticket V2-350: ant-sdk publishes its own multi-arch antd container image from its own release workflow.
  • New Linear ticket V2-351: cleanup on indelible — drop deploy/antd/Dockerfile + second publish step once V2-350 ships.

🤖 Generated with Claude Code

jacderida and others added 2 commits May 21, 2026 14:37
…2-298)

Tagged `vX.Y.Z` pushes now publish `withautonomi/indelible` and
`withautonomi/antd` to Docker Hub and GHCR for `linux/amd64` +
`linux/arm64`, both as `:vX.Y.Z` and `:latest`. antd images use the
resolved ant-sdk release tag as their version.

Also extends `workflow_dispatch` with a `version` input so the docker
publish path can be exercised from a branch without cutting a real
release tag. In dry-run mode the binary `build` and GitHub `release`
jobs are skipped, `:latest` is not touched, and the antd image is
namespaced under the dry-run version rather than its real antd tag.

README's compose Quick Start no longer carries the "image not yet
available" caveat.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Previously gated to push events only, with a comment claiming the ~5min
cost wasn't worth running per PR. In practice the buildx GHA cache makes
warm runs much cheaper, and catching Dockerfile/cross-compile breakage
on the PR is more valuable than catching it post-merge on master. Also
removes the outdated mention of V2-298 from the comment.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@Nic-dorman
Copy link
Copy Markdown
Contributor

Nic-dorman commented May 21, 2026

Review from a read-through of the diff, full pre/post release.yml, the CI change, and docker-compose.yml.

Significant — worth addressing before merge

1. :latest gets overwritten by pre-release tags. release-meta already computes a prerelease flag (true when the version contains -), but the docker job ignores it. Push v1.0.0-rc1 and you'll happily overwrite withautonomi/indelible:latest (and antd:latest) with a release candidate. Since release-meta.outputs.prerelease already exists, the fix is just an extra clause on the four :latest tag lines:

${{ github.event_name == 'push' && needs.release-meta.outputs.prerelease == 'false' && 'ghcr.io/withautonomi/indelible:latest' || '' }}

You could also gate the whole job on prerelease == 'false' for tag pushes, but per-tag is cheaper.

2. README change races the first real release. The README diff drops the "image not yet available" caveat, but until the first v* tag actually fires release.yml, no :latest will exist in either registry (the dry-run artifacts were cleaned up). A user who follows the Quick Start in that window hits a pull failure. Either:

  • Land this PR but cut a real v* release immediately after merge, or
  • Keep the caveat (re-worded) until the first real tag lands, and drop it in a follow-up.

3. CI workflow change is unexplained in the description. .github/workflows/ci.yml un-gates the docker smoke-test job (if: github.event_name == 'push' removed) and rewrites the comment explaining why it was master-only. That re-adds ~5min to every PR run. The previous comment was explicit ("we don't need it on every PR; use make docker locally"), and the description here is entirely about release.yml. Either explain the reversal in the PR body, or split it into its own PR.

Smaller things

4. The antd dry-run conditional isn't actually CI-verified. You flagged this in the description, which is great — but it's a fix on a path that publishes to a public registry. The cost of one more dispatch (3m42s last time) is much lower than the cost of pushing withautonomi/antd:v0.7.1 accidentally because a condition flipped. Worth a second dispatch to close it.

5. Default dispatch version 0.0.0-dryrun is a footgun. If two people (or one person twice) dispatch without bumping -N, the second run silently overwrites the first registry tag. Either drop the default and mark required: true, or change it to something obviously broken (e.g. "REPLACE-ME") so the build-push step fails fast on bad input.

6. ${{ inputs.version }} interpolated into the heredoc. Standard GHA-injection pattern. The blast radius is small (only users with workflow_dispatch perms can set it), but the lint-clean form is env: INPUT_VERSION: ${{ inputs.version }} and then version="${INPUT_VERSION}" in the script. Cheap to fix, removes a future reviewer flag.

7. Comment cleanup — the linux/amd64 — arm64 blocked on V2-275 note is gone, which is consistent with the dry-run showing arm64 manifests on antd. Just confirm V2-275 is actually resolved (or that the ant-sdk arm64 antd binary is now reliably published) — the comment was there for a real reason previously.

Things I liked

  • Test plan is unusually concrete (run ID, manifest inspect, cross-registry digest match, :latest confirmed untouched, artifacts cleaned up). That's the right bar for a workflow that publishes public images.
  • Namespacing the dry-run antd image under the indelible dry-run version rather than under the resolved antd tag is the right call — keeps withautonomi/antd:v0.7.1 reserved for real ant-sdk releases.
  • The follow-up tickets in the description (ant-sdk owning its own image, deleting deploy/antd/Dockerfile after) are sensible scope-bounding.

The pre-release :latest overwrite is the only one I'd block on; the rest are fix-on-merge or follow-up material.

Addresses three review comments on PR #64:

* `:latest` (4 image refs) is now gated on `prerelease == 'false'` in
  addition to event_name being a tag push. Pushing `v1.0.0-rc1` no
  longer overwrites stable `:latest` for either indelible or antd.

* `workflow_dispatch` `version` input is now `required: true` with a
  sentinel default of `REPLACE-ME`, and `release-meta` fails fast if the
  sentinel survives to runtime. This prevents accidental dry-runs from
  silently overwriting a prior `:0.0.0-dryrun` tag.

* `inputs.version` is no longer interpolated directly into the shell
  heredoc — it's bound via `env: INPUT_VERSION:` and dereferenced as a
  shell variable instead, removing the standard GHA-injection footgun.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@jacderida
Copy link
Copy Markdown
Contributor Author

Thanks @Nic-dorman — addressed all seven points. Summary of the response, with commit links and a fresh dispatch ID for the items that have CI evidence.

1. :latest overwritten by pre-release tags — fixed in 540a178. All four :latest lines now also gate on needs.release-meta.outputs.prerelease == 'false'. Pushing v1.0.0-rc1 will publish :v1.0.0-rc1 only; stable :latest is untouched.

2. README races the first real release — flagged to @chriso83 for a call between (a) keep the caveat in a re-worded form and drop it in a follow-up, or (b) cut a real release tag immediately after merge. I won't merge until they pick one. If (b), the README change stays as written; the window is minutes.

3. CI workflow change unexplained in description — PR description updated to call out the 9db423e commit and the reasoning. Kept it as its own commit on the branch so it can be reverted independently if the per-PR cost turns out to be noisy.

4. antd dry-run conditional not CI-verified — re-dispatched as run 26241831423 with version=0.0.0-dryrun-2. Confirmed:

  • withautonomi/antd:0.0.0-dryrun-2 (Hub + GHCR) published with both arches.
  • withautonomi/antd:v0.7.1 is not present (no pollution of antd release tag namespace).
  • withautonomi/{indelible,antd}:latest both absent on both registries (still gated correctly).
  • Dispatch artifacts cleaned from GHCR; Hub cleanup queued for @chriso83.

5. Default dispatch 0.0.0-dryrun is a footgun — fixed in 540a178. version is now required: true with sentinel default REPLACE-ME. release-meta fails fast (`echo "::error::"` + exit 1) if REPLACE-ME survives to runtime. Anyone dispatching without thinking gets a one-second job failure rather than a silent overwrite.

6. \${{ inputs.version }} interpolated into heredoc — fixed in 540a178. Bound through env: INPUT_VERSION: and dereferenced as a shell variable. Same pattern as the antd_version resolution in resolve-antd-version (which was already doing the safe thing — I should probably retrofit that step too, but it's already validated against a controlled allow-list via gh release list, so lower priority).

7. V2-275 comment — V2-275's antd-linux-arm64 artifact is reliably published in ant-sdk v0.7.1 (commit 0bbae14 added it). The original V2-275 ticket is stale but the binary is real and pulled successfully in both dispatches. Will note this on the ticket separately.

Open question for me to address: point 2 is on @chriso83.

@jacderida
Copy link
Copy Markdown
Contributor Author

Re point 2 (README race) — leaving as-is. The window between this PR merging and the first real v* tag landing is expected to be very short, and the documented docker compose up --build fallback on the next line still works during that window. Not worth churning the README twice for a few-minutes gap.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants