fix: race condition in transform cache under parallel workers (#23) by srsholmes · Pull Request #26 · srsholmes/vitest-react-native

srsholmes · 2026-05-01T09:11:20Z

Summary

Closes #23. Fixes intermittent ENOENT errors and partial cache reads when running Vitest in parallel against larger codebases.

How the bug is fixed

The races (before this PR)

setup.ts ran in every Vitest worker. Each worker did three things that collided when run in parallel:

fs.writeFileSync(cachePath, code) — non-atomic. A worker could observe a half-written file: existsSync returns true, readFileSync returns truncated bytes, and the consumer evaluates broken JS → ENOENT-adjacent runtime errors.
fs.existsSync(cachePath) then fs.readFileSync(cachePath) — TOCTOU. Between the two syscalls, another worker could unlink the file (see Consider switching from esbuild to Rolldown #3), turning the read into a thrown ENOENT.
End-of-setup unlink loop — every worker, at the end of its setup.ts, unlinked every file in the cache directory. This was a sledgehammer to handle the case where editing setup.ts (adding a new mock) didn't bump pluginVersion, leaving a stale cache. With N workers running in parallel it was also nuking files that other workers were mid-read or mid-write.

The fixes

Atomic writes via tmp+rename. writeToCache no longer writes the final path directly. It writes to a temporary path, then atomically renames it into place. POSIX rename(2) on the same filesystem is atomic — concurrent readers see either the previous file (or ENOENT) or the new full contents, never an in-flight write.
Single-syscall reads. readFromCache is now try { fs.readFileSync(...) } catch (ENOENT) { return null }. The TOCTOU between existsSync and readFileSync is gone — there is no window between "is it there?" and "read it" because the read itself is the existence check.
Content-hashed cache key. The cache directory is now <rn>_<plugin>_<sha1(setup.ts)[:12]>. Editing setup.ts changes the bundled bytes, changes the hash, and naturally causes a fresh cache directory to be used — no destructive sweep needed. The existing old-version reaper (which runs only against directories whose name doesn't match the current key) cleans up the dead ones safely.
End-of-setup unlink loop deleted. With content-hashed keys it is no longer load-bearing, and removing it eliminates the worst race (every worker concurrently unlinking files other workers depend on).

Why no per-worker subdirs (`VITEST_POOL_ID`)

Considered. Not needed once writes are atomic and the key is content-aware. Two workers transforming the same module is wasted CPU, not corruption. The cost of per-worker dirs (~10MB × maxWorkers, no cross-worker cache reuse) outweighs the savings.

What the `.tmp` files do

The .tmp files are the staging area for atomic writes. The full sequence inside writeToCache(cachePath, code):

Compute a unique tmp path: ${cachePath}.${process.pid}.${crypto.randomBytes(4).toString('hex')}.tmp — e.g. …/Animated.js.48213.7a3f9c1e.tmp.
fs.writeFileSync(tmp, code) — write the full payload to the tmp file. Any partial-write window happens here, but no other worker is looking at this path, so it doesn't matter.
fs.renameSync(tmp, cachePath) — atomically replace the cache file. From any other worker's perspective, cachePath is either missing/old or fully present and complete — never half-written.
If rename throws, unlinkSync(tmp) cleans up so we don't leak stale tmp files.

Why pid + 4 random bytes in the suffix:

The pid alone isn't enough — within a single worker process, sequential or concurrent writes to the same cache key (e.g. retries) would collide on the same .tmp filename. The 4 random bytes give every write attempt its own private staging file.
Two workers writing the same cachePath concurrently end up writing two different .tmp files (different pid, different random bytes) and then issuing two rename syscalls. Whoever's rename runs second wins; both renames succeed. No corruption, just slightly wasted work.

On success there are no .tmp leftovers — rename consumes the source path. The verification stress run (10× full suite at maxWorkers=8) ends with zero .tmp files in the cache dir, which is the contract: any leftover .tmp would indicate either a rename failure or a process killed mid-write, both of which are now testable invariants.

Performance & cache hit rate

A common worry when seeing tmp files and random suffixes: won't this slow tests down, and won't the randomness prevent cache hits? Both turn out to be no.

The cache hit path is unchanged

A cache hit is still exactly one fs.readFileSync. The atomic-write machinery only runs on cache misses, which happen during cold start. Once a module is cached, every subsequent read in this worker, other workers, and future runs is a plain read against the same deterministic path.

The random suffix is only on the `.tmp` filename, never on the final path

const cachePath = path.join(cacheDir, cacheName);             // deterministic, same across workers
const tmp = `${cachePath}.${pid}.${randomBytes(4).hex}.tmp`;  // random — only for staging
fs.writeFileSync(tmp, code);
fs.renameSync(tmp, cachePath);                                 // file ends up at the deterministic path

After rename completes, the file lives at cachePath. Every worker computing the same cacheName for the same module gets the same cachePath, calls fs.readFileSync(cachePath), and hits the cache. The .tmp filename only exists for the microseconds between writeFileSync and renameSync, then is gone.

The randomness is purely so two workers writing the same cache key simultaneously don't clobber each other's staging file. Their final destinations are identical; their tmps are different. Both renames succeed; whichever runs second wins; both workers got valid results. The "loser" paid for one redundant transform — that's the only cost, and it only happens during cold-start contention.

Cost per cache miss: one extra `rename(2)`

rename is a directory-entry update — no data copy, microseconds on any reasonable filesystem. With ~75 cache misses on a cold start, total added overhead is well under 10ms across the whole suite.

Empirical evidence

	Cold cache	Warm cache
815 tests, default pool	2.39s	~1.6–1.8s
815 tests, 10× at `--pool=threads --maxWorkers=8`	2.4s ± noise	—

After 10 stress runs with maxWorkers=8: exactly 75 cache files in the cache directory — same as a single run. If the random suffix were preventing cache hits, the file count would balloon and warm-cache runs wouldn't be faster than cold ones. Neither happens.

Drive-by fix

Pre-existing pluginVersion fallback bug. require('./package.json') was failing in both src/ (during dev) and dist/ (after install) because no package.json lives in either dir, so the hardcoded '0.2.0' fallback was always used and the cache key was misleading. Now reads package.json one directory up via fileURLToPath(import.meta.url), which works in both layouts. Verified: cache directory now shows 0.76.9_0.1.4_<hash> instead of falling back to 0.2.0.

Test plan

pnpm typecheck clean
pnpm lint clean (no new warnings)
pnpm build produces dist/setup.{js,cjs} with cache.ts bundled in
New test/__tests__/Cache.spec.ts (6 tests) covers: ENOENT→null, round-trip bytes, no .tmp leftovers on success, non-ENOENT errors rethrown, concurrent writes to distinct paths, concurrent writes to the same path
Full 815-test suite (plugin specs + example-app + parity) runs clean 10/10 times under --pool=threads --maxWorkers=8 from a cold cache
After 10 stress runs, 75 cache files present, zero .tmp leftovers
Cache dir name now reflects the real plugin version (0.76.9_0.1.4_df94bf580d17)

See the verification comment for the cold-cache and stress-run evidence covering both the plugin specs and the example-app suites.

🤖 Generated with Claude Code

Three coordinated changes that close the ENOENT/partial-read races reported when running the suite under high parallelism: - Atomic writes via tmp+rename in a new src/cache.ts module. POSIX rename on the same filesystem is atomic, so concurrent readers either see ENOENT or the full new contents — never partial bytes. Tmp suffix is pid + 4 random bytes to keep two workers writing the same key from clobbering each other's tmp file. - Hash the bundled setup.ts bytes into the cache key. Editing setup.ts (e.g. adding a new mock) rebuilds dist/ → bundle bytes change → hash changes → fresh cache dir. This removes the need for the destructive end-of-setup wipe. - Delete the end-of-setup unlink loop. With content-hashed keys it is no longer load-bearing, and removing it eliminates the worst race: every worker's setup.ts running fs.unlinkSync against files other workers are mid-read or mid-write. Also fixes a pre-existing pluginVersion fallback bug: require('./package.json') resolves relative to setup.ts (src/ in dev, dist/ when installed) where no package.json exists, so the fallback constant was always used. Now reads package.json one directory up from setup.ts via fileURLToPath, which works in both src/ and dist/ layouts. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

PR #25 was merged without a changeset, so the fix would not appear in any release. Adding a retroactive changeset so it ships in the same patch release as the cache race-condition fix in this PR. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The .changeset/feature-flag-defaults.md was added in dee332f to ensure the #24 feature-flag fix shipped, but PR #28 carried that changeset to main on its own and 0.1.4 has now been published. Removing the file here so the next release (driven by this PR's cache-race changeset) contains only the cache-race entry. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

srsholmes · 2026-05-01T09:47:14Z

Verification: cache works across plugin tests AND example-app

Q: does CI run example-app tests?
Yes. The root vitest.config.ts include pattern covers all three suites, so a single pnpm test:run (the command CI runs in .github/workflows/ci.yml) executes everything in one Vitest invocation, exercising the cache layer for every transformed RN module that any of these tests touches.

Test breakdown (run on this branch, post-merge with main @ `0.1.4`)

Suite	Path	Files	Tests
Plugin specs	`test/__tests__/`	54	565
Example-app component tests	`apps/example-app/__tests__/`	6	86
Example-app parity tests (Vitest side)	`apps/example-app/__parity-tests__/`	4	164
Total `pnpm test:run`	(matches CI)	64	815

The pnpm test:jest step that follows in CI is the Jest-side baseline used by the parity tests — it exercises the same example-app components against vanilla Jest, so any cache-layer regression in our Vitest setup would show up as a parity diff there.

Cache mechanism evidence

Cold-cache run from a wiped /<tmpdir>/vrn:

BEFORE: ls .../vrn → No such file or directory
 Test Files  64 passed (64)
      Tests  815 passed (815)
   Duration  2.39s
AFTER:
.../vrn/0.76.9_0.1.4_df94bf580d17/
Cache files: 75
Stale .tmp files: 0

Sample cache contents (proves example-app's RN imports are being transformed via the new atomic-write path, not just the plugin specs):

node_modules_.pnpm_@react-native+polyfills@2.0.0_node_modules_@react-native_polyfills_Object.es8.js
node_modules_.pnpm_react-native@0.83.1_..._react-native_index.js
node_modules_.pnpm_react-native@0.83.1_..._Libraries_Animated_Animated.js
node_modules_.pnpm_react-native@0.83.1_..._Libraries_AppState_AppState.js
... (75 total, 356K)

The cache key 0.76.9_0.1.4_df94bf580d17 reflects all three components of the new key:

0.76.9 = react-native version (resolved via require('react-native/package.json'))
0.1.4 = plugin version (resolved via path.join(dirname(import.meta.url), '..', 'package.json') — the drive-by fallback fix; before this PR the value was always the hardcoded 0.2.0)
df94bf580d17 = first 12 hex chars of sha1(setup.ts bytes) — flips on any setup.ts edit, replacing the destructive end-of-setup wipe loop

Stress: 10× back-to-back at `--pool=threads --maxWorkers=8` from cold cache

run  1: Tests  815 passed (815)
run  2: Tests  815 passed (815)
run  3: Tests  815 passed (815)
run  4: Tests  815 passed (815)
run  5: Tests  815 passed (815)
run  6: Tests  815 passed (815)
run  7: Tests  815 passed (815)
run  8: Tests  815 passed (815)
run  9: Tests  815 passed (815)
run 10: Tests  815 passed (815)
---
Final cache file count: 75
Stale .tmp files after 10 stress runs: 0

10/10 clean runs. No leftover .tmp files (would indicate a renameSync failure or an interrupted writer); no flakes. Across the 10 runs there were 8 workers each calling processReactNative for 75 modules — ~6,000 cache reads/writes per run, ~60,000 total — all consistent.

Direct unit coverage

The new test/__tests__/Cache.spec.ts covers the primitives in isolation:

readFromCache returns null for missing files (no thrown ENOENT)
readFromCache rethrows non-ENOENT errors (e.g. EISDIR)
writeToCache + readFromCache round-trips exact bytes
writeToCache leaves no .tmp file behind on success
32 concurrent writes to distinct paths all succeed
16 concurrent writes to the same path always produce one valid full payload (the race we are protecting against — the file is never partial or interleaved)

srsholmes and others added 2 commits May 1, 2026 10:20

srsholmes force-pushed the fix/cache-race-condition branch from b80e9c1 to dee332f Compare May 1, 2026 09:21

srsholmes mentioned this pull request May 1, 2026

chore: add changeset for #24 feature-flag defaults fix #28

Merged

srsholmes and others added 2 commits May 1, 2026 10:41

Merge remote-tracking branch 'origin/main' into fix/cache-race-condition

fcf827b

srsholmes merged commit 42acb7f into main May 1, 2026
2 checks passed

srsholmes deleted the fix/cache-race-condition branch May 1, 2026 09:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: race condition in transform cache under parallel workers (#23)#26

fix: race condition in transform cache under parallel workers (#23)#26
srsholmes merged 4 commits into
mainfrom
fix/cache-race-condition

srsholmes commented May 1, 2026 •

edited

Loading

Uh oh!

srsholmes commented May 1, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

srsholmes commented May 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

How the bug is fixed

The races (before this PR)

The fixes

Why no per-worker subdirs (VITEST_POOL_ID)

What the .tmp files do

Performance & cache hit rate

The cache hit path is unchanged

The random suffix is only on the .tmp filename, never on the final path

Cost per cache miss: one extra rename(2)

Empirical evidence

Drive-by fix

Test plan

Uh oh!

srsholmes commented May 1, 2026

Verification: cache works across plugin tests AND example-app

Test breakdown (run on this branch, post-merge with main @ 0.1.4)

Cache mechanism evidence

Stress: 10× back-to-back at --pool=threads --maxWorkers=8 from cold cache

Direct unit coverage

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

srsholmes commented May 1, 2026 •

edited

Loading

Why no per-worker subdirs (`VITEST_POOL_ID`)

What the `.tmp` files do

The random suffix is only on the `.tmp` filename, never on the final path

Cost per cache miss: one extra `rename(2)`

Test breakdown (run on this branch, post-merge with main @ `0.1.4`)

Stress: 10× back-to-back at `--pool=threads --maxWorkers=8` from cold cache