From 0030292de198f1aa8079b393e007d843767efabc Mon Sep 17 00:00:00 2001 From: Chisanan232 Date: Mon, 1 Jun 2026 21:21:34 +0800 Subject: [PATCH] =?UTF-8?q?=F0=9F=90=9B=20(ci):=20Retry=20gh=20release=20d?= =?UTF-8?q?ownload=20on=20cross-repo=20race=20(release=20not=20found)?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit release-node.yml fires on the same tag-push event as agent-assembly's Release workflow, in parallel. agent-assembly takes ~10 min to build + publish the binaries that release-node depends on, so the first gh release download attempt consistently fails with: release not found (observed on v0.0.1-alpha.2 and v0.0.1-alpha.3 dry-runs; would have failed every prior alpha too if not blocked at earlier steps). Fix: wrap gh release download in a retry loop. Up to 20 attempts with 60-second sleeps = 20 min ceiling waiting for the Release. Distinguish 'release not found' (race condition → retry) from other errors (auth failure, rate limit, etc. → fail-fast). Verified locally: * retry-on-race: 3-attempt simulation with first 2 returning 'release not found' → third succeeds → loop exits cleanly * fail-fast on other error: 'HTTP 403 rate limit' → grep miss on 'release not found' → exit 1 without retry Trade-off vs alternatives considered: - workflow_run cross-repo trigger: NOT possible; workflow_run only fires for same-repo workflows - repository_dispatch from agent-assembly: cleaner architecturally but needs PAT + extra cross-repo wiring. Defer until needed. - Manual workflow_dispatch only: loses automation entirely. Retry-with-backoff is the smallest change that solves the race. Migrate to repository_dispatch later if polling cost becomes real. Tracked: AAASM-2328 --- .github/workflows/release-node.yml | 30 ++++++++++++++++++++++++++---- 1 file changed, 26 insertions(+), 4 deletions(-) diff --git a/.github/workflows/release-node.yml b/.github/workflows/release-node.yml index 28286a6..d688bb3 100644 --- a/.github/workflows/release-node.yml +++ b/.github/workflows/release-node.yml @@ -77,10 +77,32 @@ jobs: run: | set -euo pipefail mkdir -p bin-staging - gh release download "$RELEASE_TAG" \ - --repo AI-agent-assembly/agent-assembly \ - --pattern "aasm-*.tar.gz" \ - --dir bin-staging + # Cross-repo race: release-node fires on the same tag-push event as + # agent-assembly's Release workflow, in parallel. agent-assembly takes + # ~10 min to build + publish the binaries we depend on. Retry up to + # 20 times × 60s = 20 min ceiling waiting for the Release object. + # AAASM-2328. + MAX_ATTEMPTS=20 + for attempt in $(seq 1 $MAX_ATTEMPTS); do + if gh release download "$RELEASE_TAG" \ + --repo AI-agent-assembly/agent-assembly \ + --pattern "aasm-*.tar.gz" \ + --dir bin-staging 2>&1 | tee /tmp/gh-rd.log; then + echo "✓ Downloaded on attempt ${attempt}/${MAX_ATTEMPTS}" + break + fi + # Distinguish 'release not found' (race — retry) from other errors (fail-fast) + if ! grep -q 'release not found' /tmp/gh-rd.log; then + echo "::error::gh release download failed with non-race error — aborting retry" + exit 1 + fi + if [ "$attempt" -eq "$MAX_ATTEMPTS" ]; then + echo "::error::Release v${RELEASE_TAG} never appeared after $((MAX_ATTEMPTS * 60))s — agent-assembly Release pipeline likely failed" + exit 1 + fi + echo "Attempt ${attempt}/${MAX_ATTEMPTS}: release not yet published, sleeping 60s..." + sleep 60 + done ls -la bin-staging/ - name: Stage runtime binaries into runtime-* sub-packages