fix(interp): isolate process-global built-in state between Test262 realms#966
Merged
Conversation
…alms The interpreted Test262 baseline was non-deterministic: consecutive isolated runs disagreed (a just-written baseline showed drift on immediate re-run), and tests like Array/prototype/*/call-with-boolean.js flipped Pass<->Fail between runs. This made the interpreted conformance gate unpinnable (a #964 follow-up; the issue noted the interpreted baseline as "stale/drifted and flaky"). Root cause: several built-ins are process-global `static` singletons carrying guest-WRITABLE, guest-OBSERVABLE state that is never reset across a new Interpreter: - SharpTSMath.Instance._extras (Math.x = v) - SharpTSNumberPrototype.Instance._extras (Number.prototype.x = v) - SharpTSBooleanPrototype.Instance._extras (Boolean.prototype.length/[i] = v) - SharpTSStringPrototype.Instance._extras (String.prototype.x = v) - SharpTSSymbol global Symbol.for registry Each Test262 test gets a fresh `new Interpreter(...)`, but these statics are shared, so an earlier test's mutation leaks into later tests in the same worker process. Combined with the worker pool's nondeterministic test->worker ordering, order-dependent tests flip run-to-run. (RegExp.prototype was already moved per-realm onto the Interpreter for exactly this reason — issue #101; these were the remaining vectors.) Fix (additive, low-risk): add ClearExtras()/ClearGlobalRegistry() to the leaking singletons and a central Runtime/RealmState.ResetMutableBuiltInState(). The Test262 worker runs tests serially, so Test262Runner.RunOne calls it before each test, giving every script a pristine realm. Nothing outside Test262Runner invokes it, so normal CLI execution and the main test suite are unchanged by construction. The principled long-term fix is to make these prototypes per-realm like RegExp.prototype; that touches the hot primitive-dispatch paths and is left as a documented follow-up in RealmState's summary. Validation: the update->verify cycle that reliably failed before now passes with 0 drift across three consecutive isolated runs. The refreshed baseline recovers 51 genuine improvements (26 Fail->Pass, 25 RuntimeError->Pass) and the previously flaky clusters (call-with-boolean, Number.prototype.toString, defineProperty-with- Math) are now stable. 6 remaining Pass->Fail are deterministic real drift (5 JSON/stringify, 1 defineProperty) recorded honestly for separate follow-up.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Follow-up to #964. Makes the interpreted Test262 baseline deterministic and refreshes it.
The interpreted baseline was non-deterministic: two consecutive isolated runs disagreed (a just-written baseline showed drift on immediate re-run), and tests like
Array/prototype/*/call-with-boolean.jsflippedPass⇄Failbetween runs. That made the interpreted conformance gate unpinnable — exactly the "stale/drifted and flaky" interpreted baseline #964 called out as a separate concern.Root cause
Each Test262 test runs in a fresh
new Interpreter(...), but several built-ins are process-globalstaticsingletons carrying guest-writable, guest-observable state that is never reset across interpreters:SharpTSMath.Instance._extrasMath.x = vSharpTSNumberPrototype.Instance._extrasNumber.prototype.x = vSharpTSBooleanPrototype.Instance._extrasBoolean.prototype.length/[i] = vSharpTSStringPrototype.Instance._extrasString.prototype.x = vSharpTSSymbolSymbol.forregistrySymbol.for(k)An earlier test's mutation leaks into later tests in the same worker process; combined with the worker pool's nondeterministic test→worker ordering, order-dependent tests flip run-to-run. (
RegExp.prototypewas already moved per-realm onto theInterpreterfor this exact reason — issue #101; these were the remaining vectors.)Two independent read-only investigations converged on this, with the leak paths reproduced against the built
SharpTS.dll(e.g.Array.prototype.map.call(true, …)readsBoolean.prototype.lengthfrom the shared_extras;Number.prototype.toString = fnshadows the spec method via extras-firstGetMember).Fix (additive, low-risk)
ClearExtras()to the four prototype/Mathsingletons andClearGlobalRegistry()toSharpTSSymbol.Runtime/RealmState.ResetMutableBuiltInState()centralizes the reset and documents the hazard + the principled per-realm follow-up.Test262Runner.RunOnecalls it before each test. The worker runs tests serially, so every script gets a pristine realm.Test262Runnerinvokes it → normal CLI execution and the main test suite are unchanged by construction.Validation
The update→verify cycle that reliably failed before now passes with 0 drift across three consecutive isolated runs.
The refreshed baseline:
Fail→Pass, 25RuntimeError→Pass),call-with-boolean,Number.prototype.toString,defineProperty-with-Math) are now stable,Pass→Failare deterministic real drift (5×JSON/stringify/{*-circular, value-bigint}, 1×Object/defineProperty/15.2.3.6-3-253.js) — recorded honestly, candidates for separate follow-up (the JSON ones especially).Relationship to #965
Independent of #965 (that fixes the test-host crash so Test262 is runnable; this makes the interpreted results deterministic). Both touch
Test262Runner.csin different regions — a trivial merge either way.