perf(vrt): parallelize visual regression tests across N browser pages#45
Merged
Conversation
The VRT runner was fully sequential: one Playwright page running every test back-to-back in a single browser-side for-loop. Adds a --workers / -w flag (default 1, preserves old behavior) that spawns N parallel pages sharing one Chromium instance, each handling a round-robin shard of the test list. Browser side (examples/index.ts): runAutomation accepts a ?shard=i/N URL param, sorts test paths deterministically, and skips tests outside its slice (index % N !== i). Sorting keys off the path so shard membership is stable across pages and machines. Node side (visual-regression/src/index.ts): the per-page setup (snapshot/doneTests exposed functions, navigation, donePromise) is extracted into a runWorker function that runs once per shard. Promise.all gates the final summary + browser close. The aggregate snapshot counters stay as module globals — JS is single-threaded so shared ++ is safe across pages. Per-test snapshot indices are kept in a per-page map (each test only runs in one shard, so no cross-page state). Local 4-worker run on a ~3 minute baseline cuts wall time to ~2 minutes (~1.5x), with CPU utilization rising from ~27% to ~63%. Text-rendering tests show some additional variance under parallel load — flagged for follow-up if it shows up in CI; the existing -w 1 default keeps current behavior bit-for-bit. The docker-ci mode relays --workers through so `pnpm test:visual:update -i -w 4` works end-to-end inside the container. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Public-repo GitHub-hosted runners are 4-core / 16 GB, comfortably above the per-page memory and CPU budget for parallel Chromium SwiftShader contexts. Opting CI into -w 4 cuts the VRT step proportionally; if the text-rendering variance flagged in the PR description shows up, dial down to -w 2 (still ~2x over the baseline). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds a
--workers/-wflag (default1, preserves current behavior) that runs the visual regression test suite across N parallel Playwright pages sharing one Chromium instance. Each page handles a round-robin shard of the test list.Local on my machine: a ~3 minute single-worker baseline drops to ~2 minutes with
-w 4(~1.5×). CI runners with more cores should scale a bit further before hitting GPU-process contention.Design
Browser side (examples/index.ts) —
runAutomationaccepts a?shard=i/NURL param, sorts the test paths deterministically, and skips tests outside its slice (index % N !== i). Sorting keys off the path so shard membership is stable across pages and machines; round-robin balances roughly evenly even when test durations vary.Node side (visual-regression/src/index.ts) — the per-page setup (snapshot/doneTests exposed functions, navigation, donePromise) is extracted into a
runWorkerfunction that runs once per shard.Promise.allgates the final summary +browser.close(). The aggregate snapshot counters stay as module globals — JS is single-threaded so shared++across pages is race-free. Per-test snapshot indices are kept in a per-page map; each test only runs in one shard so no cross-page state.The
--cidocker mode relays--workersthrough, sopnpm test:visual:update -i -w 4works end-to-end inside the container.What I deliberately did NOT do
@playwright/test. Would give us workers + retries + traces for free, but is a substantial rewrite of how tests declare themselves and howsettings.snapshot()is wired up. Not worth it for the gain here.Known caveat: text-rendering variance
On my Mac, a 4-worker run produced 2 extra failures in text-related tests (text-line-height, text-max-lines, text-scaling, text-vertical-align) on top of the 3 pre-existing failures (which are Mac-vs-Linux snapshot diffs unrelated to this change). The text suite uses canvas/SDF rendering that may have timing-dependent variance under parallel load. Possible causes:
I haven't seen these in the Docker/Linux runner — worth watching when this lands. If it's a real flake source, the mitigation is either to add a brief font-ready barrier to those tests or to keep them on shard 0 only. Keeping
-w 1as the default means current CI green-runs aren't affected unless someone opts in.Test plan
pnpm build(renderer + examples + visual-regression) cleanpnpm test— 193/193 pass-w 1— passes (3 pre-existing Mac/Linux diffs)-w 4— runs to completion, ~1.5× faster-w 4and doesn't regress text tests — needs CI to verify🤖 Generated with Claude Code