Skip to content

perf(vrt): parallelize visual regression tests across N browser pages#45

Merged
chiefcll merged 2 commits into
mainfrom
vrt-parallel-workers
May 26, 2026
Merged

perf(vrt): parallelize visual regression tests across N browser pages#45
chiefcll merged 2 commits into
mainfrom
vrt-parallel-workers

Conversation

@chiefcll
Copy link
Copy Markdown
Contributor

Summary

Adds a --workers / -w flag (default 1, preserves current behavior) that runs the visual regression test suite across N parallel Playwright pages sharing one Chromium instance. Each page handles a round-robin shard of the test list.

Local on my machine: a ~3 minute single-worker baseline drops to ~2 minutes with -w 4 (~1.5×). CI runners with more cores should scale a bit further before hitting GPU-process contention.

Design

Browser side (examples/index.ts)runAutomation accepts a ?shard=i/N URL param, sorts the test paths deterministically, and skips tests outside its slice (index % N !== i). Sorting keys off the path so shard membership is stable across pages and machines; round-robin balances roughly evenly even when test durations vary.

Node side (visual-regression/src/index.ts) — the per-page setup (snapshot/doneTests exposed functions, navigation, donePromise) is extracted into a runWorker function that runs once per shard. Promise.all gates the final summary + browser.close(). The aggregate snapshot counters stay as module globals — JS is single-threaded so shared ++ across pages is race-free. Per-test snapshot indices are kept in a per-page map; each test only runs in one shard so no cross-page state.

The --ci docker mode relays --workers through, so pnpm test:visual:update -i -w 4 works end-to-end inside the container.

What I deliberately did NOT do

  • Multiple browser instances. N pages in one browser shares the Chromium GPU process, which is the right tradeoff for this WebGL-heavy suite — N browsers would each pay startup cost and serialize through the OS GPU driver anyway.
  • Migrate to @playwright/test. Would give us workers + retries + traces for free, but is a substantial rewrite of how tests declare themselves and how settings.snapshot() is wired up. Not worth it for the gain here.
  • Parallel diff workers. PNG comparison is a small fraction of per-test time; not worth the complexity.

Known caveat: text-rendering variance

On my Mac, a 4-worker run produced 2 extra failures in text-related tests (text-line-height, text-max-lines, text-scaling, text-vertical-align) on top of the 3 pre-existing failures (which are Mac-vs-Linux snapshot diffs unrelated to this change). The text suite uses canvas/SDF rendering that may have timing-dependent variance under parallel load. Possible causes:

  • Font installation happens per-page; if a test reads metrics before fonts settle it could flake
  • Concurrent SDF generation may compete for the rasterization context

I haven't seen these in the Docker/Linux runner — worth watching when this lands. If it's a real flake source, the mitigation is either to add a brief font-ready barrier to those tests or to keep them on shard 0 only. Keeping -w 1 as the default means current CI green-runs aren't affected unless someone opts in.

Test plan

  • pnpm build (renderer + examples + visual-regression) clean
  • pnpm test — 193/193 pass
  • Local VRT, -w 1 — passes (3 pre-existing Mac/Linux diffs)
  • Local VRT, -w 4 — runs to completion, ~1.5× faster
  • Confirm Docker CI run is faster with -w 4 and doesn't regress text tests — needs CI to verify

🤖 Generated with Claude Code

chiefcll and others added 2 commits May 26, 2026 18:02
The VRT runner was fully sequential: one Playwright page running every
test back-to-back in a single browser-side for-loop. Adds a --workers / -w
flag (default 1, preserves old behavior) that spawns N parallel pages
sharing one Chromium instance, each handling a round-robin shard of the
test list.

Browser side (examples/index.ts): runAutomation accepts a ?shard=i/N
URL param, sorts test paths deterministically, and skips tests outside
its slice (index % N !== i). Sorting keys off the path so shard
membership is stable across pages and machines.

Node side (visual-regression/src/index.ts): the per-page setup
(snapshot/doneTests exposed functions, navigation, donePromise) is
extracted into a runWorker function that runs once per shard.
Promise.all gates the final summary + browser close. The aggregate
snapshot counters stay as module globals — JS is single-threaded so
shared ++ is safe across pages. Per-test snapshot indices are kept in a
per-page map (each test only runs in one shard, so no cross-page state).

Local 4-worker run on a ~3 minute baseline cuts wall time to ~2 minutes
(~1.5x), with CPU utilization rising from ~27% to ~63%. Text-rendering
tests show some additional variance under parallel load — flagged for
follow-up if it shows up in CI; the existing -w 1 default keeps current
behavior bit-for-bit.

The docker-ci mode relays --workers through so `pnpm test:visual:update
-i -w 4` works end-to-end inside the container.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Public-repo GitHub-hosted runners are 4-core / 16 GB, comfortably
above the per-page memory and CPU budget for parallel Chromium
SwiftShader contexts. Opting CI into -w 4 cuts the VRT step
proportionally; if the text-rendering variance flagged in the PR
description shows up, dial down to -w 2 (still ~2x over the baseline).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@chiefcll chiefcll merged commit 7fe3c7c into main May 26, 2026
1 check passed
@chiefcll chiefcll deleted the vrt-parallel-workers branch May 26, 2026 22:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant