Retry postinstall downloads on connect failure#3599
Open
gavande1 wants to merge 9 commits into
Open
Conversation
Fixes STU-1749. The postinstall download script relied on undici's default 10s connect timeout and gave up after a single attempt, so a transient slow TLS handshake to wordpress.org (or any of the other artifact hosts) failed the entire install. Wrap the fetch in a fetchWithRetry helper that uses an undici Agent with connect.timeout=15000 and retries up to 3 times with exponential backoff.
5007219 to
0db6c18
Compare
Contributor
There was a problem hiding this comment.
Pull request overview
This PR hardens the postinstall WordPress/server-files download script by introducing an undici-backed fetchWithRetry wrapper with a longer connect timeout, reducing failures caused by slow TCP/TLS handshakes during artifact downloads.
Changes:
- Add an undici
Agentwith a 15s connect timeout and afetchWithRetryhelper with exponential backoff retries. - Route artifact downloads in
downloadFilethroughfetchWithRetryinstead of a singlefetch.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…dupe logs - Don't retry 4xx responses other than 429 — those are non-transient and only waste time on retries. - Type lastError as Error and normalize non-Error throws so the final re-throw preserves a real stack trace. - Drop the in-helper console.error on the final attempt; downloadFiles already logs the thrown error once before exiting.
Extract a generic withRetry<T> helper backed by a NonRetriableError class so any network call in this script can opt into the same retry behavior. Refactor fetchWithRetry to use it and wrap fetchLatestGithubRelease in withRetry so the GitHub API lookup also benefits from the 15s connect timeout and 3-attempt backoff instead of its previous 5s total timeout with no retries.
…b.com:Automattic/studio into stu-1749-add-retry-to-postinstall-wp-download
Both download-wp-server-files.ts and download-language-packs.ts had their own fetchWithRetry helpers with overlapping behavior. Extract withRetry<T>, NonRetriableError, throwForHttpStatus, and a shared undici Agent (15s connect timeout) into scripts/lib/with-retry.ts and have both scripts import from there. Also bring download-language-packs in line with the improvements already in download-wp-server-files: fail fast on 4xx (other than 429) and use the shared 15s connect timeout instead of undici's 10s default.
…b.com:Automattic/studio into stu-1749-add-retry-to-postinstall-wp-download
The retry loop (3 attempts with backoff) provides 30s+ of total connect budget, which is enough to absorb the transient TLS handshake failures reported in STU-1749 without bumping the per-attempt cap. Keep the dispatcher and constant declared explicitly so the value can be tuned in one place if needed later.
Collaborator
📊 Performance Test ResultsComparing 6b9f9b0 vs trunk app-size
site-editor
site-startup
Results are median values from multiple test runs. Legend: 🟢 Improvement (faster) | 🔴 Regression (slower) | ⚪ No change (<50ms diff) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Related issues
How AI was used in this PR
AI assisted with diagnosing the failure mode (undici's default 10s
connect.timeoutfiring at the TCP/TLS handshake towordpress.org, never reaching the body phase, and the script giving up after a single attempt) and drafted the implementation. I reviewed each change manually and ran lint + typecheck locally.Proposed Changes
New shared lib —
scripts/lib/with-retry.ts:withRetry< T >( name, fn, options? )— generic retry helper with exponential backoff (1s → 2s) and up to 3 attempts.NonRetriableError— throw this from insidefnto opt an error out of retry.throwForHttpStatus( context, status, statusText? )— maps response status to retriable vs non-retriable: 5xx and 429 retry; other 4xx (404, 401, 403, ...) throwNonRetriableErrorso we fail fast.sharedDispatcher— undiciAgentwithconnect: { timeout: 10_000 }(matches Node's undici default; declared centrally so it can be tuned in one place if needed).Per-attempt total connect budget is roughly
3 × 10s + (1s + 2s)≈ 33s, which absorbs transient TLS handshake failures without bumping the per-attempt cap.Adopted by
scripts/download-wp-server-files.ts:fetchWithRetryartifact downloader (WordPress, SQLite integration, WP-CLI, sqlite-command, phpMyAdmin) now wraps the fetch inwithRetry.fetchLatestGithubReleaseis also wrapped inwithRetry— the GitHub API call that resolves thesqlite-commandURL no longer uses a 5s total timeout with no retries.Adopted by
scripts/download-language-packs.ts(sibling postinstall script that had its own near-identical retry helper):fetchWithRetry,MAX_RETRIES,INITIAL_RETRY_DELAY_MS.withRetryindividually, so JSON parsing and body streaming both benefit from retry.Other notes:
Attempt N/M failed ... Retrying in Xms); the script entry points already log the final thrown error beforeprocess.exit(1), so failures surface exactly once.undiciis imported but kept as a transitive dependency (already pulled in by@octokit/core,hono,mime-db, and others) rather than declared inpackage.json, to keep this change minimal. Happy to pin it explicitly in a follow-up if reviewers prefer.Testing Instructions
rm -rf wp-files && npx tsx ./scripts/download-wp-server-files.ts && npx tsx ./scripts/download-language-packs.tsshould succeed and repopulatewp-files/.wordpress.orgat a blackholed IP to force connect-phase failures:[wordpress] Attempt N/3 failed: ... Retrying in Xms...lines followed by the final thrown error and a non-zero exit. The same blackhole trick onapi.wordpress.orgexercises the language-packs path.Request failed with status code: 404— no retries.api.github.comshould produce three[github:automattic/wp-cli-sqlite-command] Attempt N/3 failed ...lines before failing.npx eslint scripts/— clean.npx tsc -p scripts/tsconfig.json --noEmit— clean.Pre-merge Checklist