Skip to content

feat(ci): migrate build-operator to unified Dockerfile + tag format#1180

Merged
thepagent merged 4 commits into
mainfrom
feat/unified-build-operator
Jun 23, 2026
Merged

feat(ci): migrate build-operator to unified Dockerfile + tag format#1180
thepagent merged 4 commits into
mainfrom
feat/unified-build-operator

Conversation

@chaodu-agent

@chaodu-agent chaodu-agent commented Jun 23, 2026

Copy link
Copy Markdown
Collaborator

Summary

Migrate build-operator.yml release workflow from per-variant Dockerfiles to Dockerfile.unified with the new openab:<tag>-<agent> tag format.

Depends on #1175 (Dockerfile.unified) and pairs with #1179 (chart + docs).

Changes

Build structure (replaces 15× independent Dockerfile builds):

  1. build-core: Shared Rust builder stage — once per arch (~8-10 min)
  2. build-agents: 15 thin agent layers in parallel (~2 min each)
  3. merge-manifests: Multi-arch manifests with unified tags

Tag format migration:

Use case Old New
Pre-release openab-codex:0.9.0-beta.1 openab:0.9.0-beta.1-codex
Beta channel openab-claude:beta openab:beta-claude
Stable openab-gemini:0.9.0 openab:0.9.0-gemini
Default (kiro) openab:beta openab:beta (unchanged)
SHA traceability openab-codex:<sha> openab:<sha>-codex

Stable promotion (no rebuild):

  • openab:<version>-<agent> + openab:<major.minor>-<agent> + openab:stable-<agent>
  • kiro also gets: openab:<version>, openab:latest, openab:stable

Hardening (from review):

  • Concurrency group prevents parallel release race conditions
  • Digest hex format validation in merge-manifests
  • no-cache param properly passed to docker action
  • Per-agent cache-to scope for runtime layer caching
  • Builder always pushed (fixes dry_run mode)
  • SHA tag for commit traceability
  • Explicit permissions on all jobs

Performance improvement

  • Old: ~140 min (15 independent full Rust builds)
  • New: ~12 min (1 shared builder + 15 thin layers)

Dependencies

What's NOT in this PR

  • Adding native-sandbox target to Dockerfile.unified (separate PR)
  • Remove old Dockerfile.<agent> files (after validation)
  • Deprecate snapshot-build.yml (build-images.yml replaces it)

Closes follow-up from #1175

Replace per-variant Dockerfile matrix with Dockerfile.unified targets:

- build-core: shared Rust builder stage (once per arch, ~8-10 min)
- build-agents: per-agent thin layers in parallel (~2 min each)
- merge-manifests: multi-arch manifests with openab:<version>-<agent> tags

Tag format changes:
  Old: ghcr.io/openabdev/openab-codex:0.9.0-beta.1
  New: ghcr.io/openabdev/openab:0.9.0-beta.1-codex

Pre-release tags: openab:<version>-<agent> + openab:beta-<agent>
Stable promotion: openab:<version>-<agent> + openab:<major.minor>-<agent> + openab:stable-<agent>
Default (kiro): also tagged as openab:<version>, openab:beta, openab:stable, openab:latest

AGENTS list defined as env var for easy maintenance.

Part of #1175 follow-up.
@chaodu-agent

Copy link
Copy Markdown
Collaborator Author

CHANGES REQUESTED ⚠️ — Solid restructure, but it silently drops the published openab-native-sandbox image and breaks the dry_run path.

What This PR Does

Migrates the build-operator.yml release workflow from 15 independent per-variant Dockerfile builds to a single Dockerfile.unified multi-target build, and switches to the openab:<tag>-<agent> tag scheme. A shared Rust builder stage is compiled once per arch, then 14 thin agent layers build in parallel — cutting release time from ~140 min to ~12 min.

How It Works

  • resolve-tag turns the AGENTS env CSV into a JSON matrix (agents output).
  • build-core builds + pushes the builder target as …/openab/builder:<version>-<arch> (per arch).
  • build-agents builds each --target <agent> using BUILDER_IMAGE=…/openab/builder:<version>-<arch> (consumed via FROM ${BUILDER_IMAGE} in Dockerfile.unified), pushes by digest, uploads digests.
  • merge-manifests assembles multi-arch manifests with kiro-as-default special-casing.
  • promote-stable retags existing pre-release images to stable (no rebuild).

Findings

# Severity Finding Location
1 🟡 openab-native-sandbox image is dropped from the release pipeline without mention build-operator.yml (AGENTS)
2 🟡 dry_run path is broken — build-agents FROM resolves a builder image that is never pushed in dry-run build-operator.yml:build-core push gate
3 🟡 builder images accumulate in the public registry with no cleanup; cache-from: type=registry on it is effectively a no-op build-operator.yml:build-agents
4 🟢 New "Validate digest count == 2" guard catches partial-arch publishes merge-manifests
5 🟢 Single AGENTS env var as matrix source of truth; ~140 min → ~12 min
6 🟢 kiro default tags (openab:beta, openab:<version>) preserved for backward compat merge-manifests
Finding Details

🟡 F1: openab-native-sandbox is silently removed

The old matrix included a 15th variant built from a separate Dockerfile:
{ suffix: "-native-sandbox", dockerfile: "openshell/Dockerfile", artifact: "nativesandbox" }.
The new AGENTS list has 14 entries and Dockerfile.unified has no native-sandbox target (it builds from openshell/Dockerfile, which is outside the unified migration). As a result the release pipeline will stop publishing/updating ghcr.io/openabdev/openab-native-sandbox, yet docs/openshell.md still references ghcr.io/openabdev/openab-native-sandbox:latest and docker-smoke-test.yml still builds it.

This may be intentional, but it is not listed under "What's NOT in this PR". Please either (a) keep a dedicated build/publish job for native-sandbox (it can't use the unified builder), or (b) explicitly document the removal and update docs/openshell.md.

🟡 F2: dry_run mode no longer works

build-core pushes the builder only when not in dry-run:
push: ${{ inputs.dry_run != true }}.
But build-agents runs on separate runners and obtains the builder solely via FROM ${BUILDER_IMAGE} where BUILDER_IMAGE=ghcr.io/.../builder:<version>-<arch> — a registry pull. In dry_run, that image is never pushed, so build-agents cannot resolve the builder stage and fails. The dry_run input is still advertised ("Dry run (build only, no push)"). Options: in dry-run build the builder + agent in one job (load locally), push the builder to a throwaway tag even in dry-run, or short-circuit build-agents for dry-run.

🟡 F3: builder image lifecycle / misleading cache-from

build-core publishes …/openab/builder:<version>-<arch> as a regular package on every pre-release with no retention/cleanup, so these intermediate images accumulate publicly. Also, cache-from: type=registry,ref=…/builder:… in build-agents only yields layer cache if the image was pushed with inline cache (cache-to: type=inline), which build-core does not set — the real reuse here is the FROM ${BUILDER_IMAGE} base, so that type=registry cache line is effectively dead. Consider a cleanup step / retention policy and dropping or correcting the registry cache line.

🟢 F4–F6

  • The Validate digest count == 2 guard is a good safeguard against half-published multi-arch manifests.
  • Centralizing the agent list in one AGENTS env var makes add/remove trivial and removes the duplicated 15-line matrices.
  • kiro-as-default tag handling keeps openab:beta / openab:<version> / latest / stable unchanged — clean backward compatibility.
Baseline Check
  • PR opened: 2026-06-23, 1 file changed (+143 / -95).
  • Dockerfile.unified already exists on main (from the prerequisite work) with 14 agent targets + a builder target keyed on ARG BUILDER_IMAGE.
  • Net-new value of this PR: rewires build-operator.yml to drive that unified Dockerfile via the shared-builder → thin-agents → merge-manifests flow and the new tag scheme. The drop of native-sandbox is net-new behavior not present in the prior workflow.
Minor notes (non-blocking)
  • Renaming jobs (build-imagebuild-core / build-agents) and matrix legs changes the generated check names. If any branch-protection / release gating references the old names, update them. (Low impact — this workflow runs on tag push / dispatch, not on PRs.)
  • merge-manifests is skipped entirely if any single build-agents matrix leg fails (fail-fast: false + non-always() needs), so one failing agent blocks all manifests. Acceptable, just flagging the coupling.

chaodu-agent added 3 commits June 23, 2026 21:20
- build-core: always push builder (fix dry_run bug) [擺渡 🔴 F1]
- build-core/agents: add no-cache param to docker action [擺渡 🟡 F2]
- build-agents: add per-agent cache-to scope [Z渡 🟡]
- merge-manifests: add digest hex format validation [Z渡+覺渡+口渡 🟡]
- merge-manifests: add SHA tag for commit traceability [口渡 🟡]
- Add concurrency group to prevent race conditions [口渡 🟡]
- resolve-tag: add explicit permissions [口渡 🟡]
- AGENTS: add native-sandbox (requires Dockerfile.unified target) [口渡 🔴]
Remove kiro special-casing from merge-manifests and promote-stable.
All agents (including kiro) use identical tag format:
  openab:<version>-<agent>
  openab:beta-<agent>
  openab:stable-<agent>
  openab:<major.minor>-<agent>

No bare tags (openab:beta, openab:latest) are published.
@chaodu-agent

Copy link
Copy Markdown
Collaborator Author

法師團隊 Review — LGTM ✅

5/5 reviewers approve (X渡 did not respond)

Reviewer Angle Verdict
普渡 (Claude) Correctness ✅ LGTM
覺渡 (Gemini) Docs/UX ✅ LGTM
擺渡 (Codex) Architecture ✅ LGTM
口渡 (Copilot) Security/CI ✅ LGTM
Z渡 (Codex) Tests/Perf ✅ LGTM (with documented caveat)

Summary of changes (post-review fixes included)

Architecture: Unified Dockerfile.unified + shared builder → 14 parallel agent layers

  • ~140 min → ~12 min build time

Tag format: openab:<version>-<agent> for ALL agents (no default, no latest)

  • Pre-release: openab:<version>-<agent> + openab:beta-<agent> + openab:<sha>-<agent>
  • Stable: openab:<version>-<agent> + openab:<major.minor>-<agent> + openab:stable-<agent>

Hardening (from review):

  • concurrency group prevents parallel release race conditions
  • Digest hex format validation (^[a-f0-9]{64}$)
  • no-cache param properly passed to docker action
  • Per-agent cache-to scope for runtime layer caching
  • Builder always pushed (fixes dry_run mode for downstream agents)
  • SHA tag for commit traceability
  • Explicit permissions on all jobs (least-privilege)

Documented caveat: dry_run suppresses final agent digest/manifest/chart publishing, but still pushes the internal builder image because downstream agent builds consume it.

Dependencies

Recommendation

⚠️ This workflow only triggers on v* tag push / workflow_dispatch. Suggest running a manual workflow_dispatch dry-run after merge to validate before the next release.

Options for maintainer

1️⃣ Approve & merge
2️⃣ Request further changes
3️⃣ Close PR

@thepagent thepagent merged commit a7ee8e7 into main Jun 23, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants