perf(vrt): parallelize visual regression tests across N browser pages by chiefcll · Pull Request #45 · solid-tv/renderer

chiefcll · 2026-05-26T22:03:12Z

Summary

Adds a --workers / -w flag (default 1, preserves current behavior) that runs the visual regression test suite across N parallel Playwright pages sharing one Chromium instance. Each page handles a round-robin shard of the test list.

Local on my machine: a ~3 minute single-worker baseline drops to ~2 minutes with -w 4 (~1.5×). CI runners with more cores should scale a bit further before hitting GPU-process contention.

Design

Browser side (examples/index.ts) — runAutomation accepts a ?shard=i/N URL param, sorts the test paths deterministically, and skips tests outside its slice (index % N !== i). Sorting keys off the path so shard membership is stable across pages and machines; round-robin balances roughly evenly even when test durations vary.

Node side (visual-regression/src/index.ts) — the per-page setup (snapshot/doneTests exposed functions, navigation, donePromise) is extracted into a runWorker function that runs once per shard. Promise.all gates the final summary + browser.close(). The aggregate snapshot counters stay as module globals — JS is single-threaded so shared ++ across pages is race-free. Per-test snapshot indices are kept in a per-page map; each test only runs in one shard so no cross-page state.

The --ci docker mode relays --workers through, so pnpm test:visual:update -i -w 4 works end-to-end inside the container.

What I deliberately did NOT do

Multiple browser instances. N pages in one browser shares the Chromium GPU process, which is the right tradeoff for this WebGL-heavy suite — N browsers would each pay startup cost and serialize through the OS GPU driver anyway.
Migrate to @playwright/test. Would give us workers + retries + traces for free, but is a substantial rewrite of how tests declare themselves and how settings.snapshot() is wired up. Not worth it for the gain here.
Parallel diff workers. PNG comparison is a small fraction of per-test time; not worth the complexity.

Known caveat: text-rendering variance

On my Mac, a 4-worker run produced 2 extra failures in text-related tests (text-line-height, text-max-lines, text-scaling, text-vertical-align) on top of the 3 pre-existing failures (which are Mac-vs-Linux snapshot diffs unrelated to this change). The text suite uses canvas/SDF rendering that may have timing-dependent variance under parallel load. Possible causes:

Font installation happens per-page; if a test reads metrics before fonts settle it could flake
Concurrent SDF generation may compete for the rasterization context

I haven't seen these in the Docker/Linux runner — worth watching when this lands. If it's a real flake source, the mitigation is either to add a brief font-ready barrier to those tests or to keep them on shard 0 only. Keeping -w 1 as the default means current CI green-runs aren't affected unless someone opts in.

Test plan

pnpm build (renderer + examples + visual-regression) clean
pnpm test — 193/193 pass
Local VRT, -w 1 — passes (3 pre-existing Mac/Linux diffs)
Local VRT, -w 4 — runs to completion, ~1.5× faster
Confirm Docker CI run is faster with -w 4 and doesn't regress text tests — needs CI to verify

🤖 Generated with Claude Code

The VRT runner was fully sequential: one Playwright page running every test back-to-back in a single browser-side for-loop. Adds a --workers / -w flag (default 1, preserves old behavior) that spawns N parallel pages sharing one Chromium instance, each handling a round-robin shard of the test list. Browser side (examples/index.ts): runAutomation accepts a ?shard=i/N URL param, sorts test paths deterministically, and skips tests outside its slice (index % N !== i). Sorting keys off the path so shard membership is stable across pages and machines. Node side (visual-regression/src/index.ts): the per-page setup (snapshot/doneTests exposed functions, navigation, donePromise) is extracted into a runWorker function that runs once per shard. Promise.all gates the final summary + browser close. The aggregate snapshot counters stay as module globals — JS is single-threaded so shared ++ is safe across pages. Per-test snapshot indices are kept in a per-page map (each test only runs in one shard, so no cross-page state). Local 4-worker run on a ~3 minute baseline cuts wall time to ~2 minutes (~1.5x), with CPU utilization rising from ~27% to ~63%. Text-rendering tests show some additional variance under parallel load — flagged for follow-up if it shows up in CI; the existing -w 1 default keeps current behavior bit-for-bit. The docker-ci mode relays --workers through so `pnpm test:visual:update -i -w 4` works end-to-end inside the container. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Public-repo GitHub-hosted runners are 4-core / 16 GB, comfortably above the per-page memory and CPU budget for parallel Chromium SwiftShader contexts. Opting CI into -w 4 cuts the VRT step proportionally; if the text-rendering variance flagged in the PR description shows up, dial down to -w 2 (still ~2x over the baseline). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

chiefcll and others added 2 commits May 26, 2026 18:02

chiefcll merged commit 7fe3c7c into main May 26, 2026
1 check passed

chiefcll deleted the vrt-parallel-workers branch May 26, 2026 22:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(vrt): parallelize visual regression tests across N browser pages#45

perf(vrt): parallelize visual regression tests across N browser pages#45
chiefcll merged 2 commits into
mainfrom
vrt-parallel-workers

chiefcll commented May 26, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

chiefcll commented May 26, 2026

Summary

Design

What I deliberately did NOT do

Known caveat: text-rendering variance

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant