fix(test262): stop in-process collectible-ALC churn crashing the testhost (#964)#965
Merged
Merged
Conversation
…host (#964) The unfiltered `dotnet test SharpTS.Test262` crashed the test host with a fatal CLR error (`Internal CLR error. (0x80131506)`, COR_E_EXECUTIONENGINE) before the compiled differ could run, so compiled-mode Test262 conformance was unmeasurable via the convenience run. `--filter ~CompiledBaseline` / `~InterpretedBaseline` completed, which made it look combined-run-only. Root cause: the only in-process emitted-IL execution in the test host is SmokeTest, whose ~20 diagnostic facts run compiled tests through a collectible AssemblyLoadContext (load -> JIT -> Invoke -> Unload) plus a periodic forced GC, all under Server + Concurrent GC. That collectible-ALC teardown churn intermittently hits the fatal engine fault while JIT-compiling a collectible assembly's entry point (dump faulting frame: `[PrestubMethodFrame] $Program.Main()`). Proven by `--filter ~SmokeTest` crashing on its own with no baselines present — so it is selection-based, not concurrency-based. Both baselines are immune: they run through the subprocess worker pool, which uses the non-collectible `Assembly.Load(byte[])` path and never Unloads. This is a known .NET runtime fragility around collectible ALC unload under JIT + concurrent GC (cf. dotnet/runtime #34838, #40547, #45285), not a SharpTS logic bug — the same emitted IL runs cleanly in the workers and standalone. Fix: make `Test262Runner.UseNonCollectibleLoad` a per-instance immutable property (was a racy mutable process-global static) set via a new optional ctor parameter. SmokeTest opts all its in-process runners into the non-collectible path (its curated lists are small, so the per-process leak is negligible); the worker passes the flag via the ctor (behavior-identical). `Diagnostic_CompiledPhaseBreakdown` is rewritten to build a per-mode runner and skipped, since its "collectible" arm intentionally drives the crash path. The in-process baseline fallback (worker-not-found, ~11k tests) deliberately stays collectible to avoid the issue #109 OOM, now documented. Validation: - `--filter ~SmokeTest`: was crash @ ~1m25s -> 20 pass / 4 skip, clean. - Full unfiltered run: was crash @ ~1m46s (differ never ran) -> completes in 3m17s, no 0x80131506, compiled differ runs to the end (11384/11384, 0 regressions / 8 new passes). The remaining baseline drift (interpreted 15 regressions/49 new passes, compiled 8 new passes) is pre-existing #882-family staleness, out of scope.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes #964. The unfiltered
dotnet test SharpTS.Test262crashed the test host with a fatal CLR error —Internal CLR error. (0x80131506)(COR_E_EXECUTIONENGINE) — before the compiled differ ran, so compiled-mode Test262 conformance couldn't be measured via the convenience run.--filter ~CompiledBaseline/~InterpretedBaselinecompleted, which made it look "combined-run-only."Root cause
The crash has nothing to do with the compiled
CompiledBaselinediffer, #882, or codegen (#963). Both baselines run through the subprocess worker pool, which uses the non-collectibleAssembly.Load(byte[])path and neverUnload()s — a guest fault there is contained in a child process.The only in-process emitted-IL execution in the test host is
SmokeTest: its ~20 diagnostic facts each run compiled tests through a collectibleAssemblyLoadContext(load → JIT → Invoke → Unload()), plus a periodic forcedGC.Collect(), all underServerGarbageCollection+ConcurrentGarbageCollection. That collectible-ALC teardown churn intermittently hits the fatal engine fault while JIT-compiling a collectible assembly's entry point.Proven by dump + isolation:
--filter ~SmokeTestalone (no baselines present) reproduces the identical crash at ~1m25s → the crash is selection-based, not concurrency-based.[PrestubMethodFrame] $Program.Main()(JIT of a collectible assembly) on theInvokeCompiledMainTask.Runworker.~CompiledBaseline"completes" only because it excludesSmokeTest.This is a known .NET runtime fragility around collectible-ALC unload under JIT + concurrent GC (cf. dotnet/runtime#34838, #40547, #45285; MS Learn unloadability docs note unload is "cooperative," the JIT can hold hidden refs, and R2R is ignored → everything JITs). It is not a SharpTS logic bug — the same emitted IL runs cleanly in the workers and in standalone DLLs.
Fix
Test262Runner.cs:UseNonCollectibleLoadchanged from a racy mutable process-global static to an immutable per-instance property, set via a new optional ctor parameter (useNonCollectibleLoad = false). Per-instance so concurrently-running xUnit collections can't clobber each other's mode.SmokeTest.cs: all in-process runners opt intouseNonCollectibleLoad: true(curated lists are small → per-process leak negligible).Diagnostic_CompiledPhaseBreakdownrewritten to build a per-mode runner and skipped, since its "collectible" arm intentionally drives the crash path.Worker/Program.cs: passes the flag via the ctor (behavior-identical — workers were already non-collectible).Test262Tests.cs: documents that the in-process baseline fallback (worker-not-found, ~11k tests) deliberately stays collectible to avoid the Test262 #10: Compile-mode regen accumulates dynamic assemblies — per-test isolation needed #109 28 GB OOM.Validation
--filter ~SmokeTestdotnet test0x80131506, compiled differ runs11384/11384(0 regressions, 8 new passes)The remaining
Failed: 2in the full run are the pre-existing baseline-drift assertions (interpreted 15 regressions / 49 new passes — timeout flakiness; compiled 8 new passes — the same 8 reported in #882's comment). That is separate#882-family staleness and is out of scope for this fix.