perf(compile): native-int modulo for counter % literal (#928)#963
Conversation
Emit `i % k` (loop-counter dividend, integer-literal divisor) as a native int64 `rem` + `conv.r8` instead of an FP `fmod` on two doubles. Split-isolation benchmarks for #928 showed the `fmod` is the dominant per-iteration cost in numeric write kernels (it ~doubles them); double multiply is already free and the int-vs-float element representation is not the bottleneck. Sound within the existing int-counter gate: C# `long %` and JS `%` are both truncated (sign of dividend), so the result is bit-identical to the double computation for every |dividend| <= 2^53. Only `%` goes native -- `/` stays double (JS `/` is true division). Divisor 0 falls through to the double path (NaN), so no DivideByZeroException. Perf @1m (min ms): typed-arrays Float64 4.65->2.55 (1.82x, now beats Node ~2.2x); int32-kernel 5.27->3.77 (1.40x); non-modulo kernels unchanged. - ForLoopAnalyzer.IntegerModuloEnabled (kill-switch SHARPTS_INT_MOD=0) - ILEmitter.Operators: TryEmitIntegerCounterModulo + TryEmitIntegerCounterValueI8 - ModuloParityTests: 8 cases x both modes, Node-ground-truthed Interop-safe: operates on counter values, not the byte[] backing; pure BCL opcodes, ILVerify-clean, no SharpTS.dll reference in standalone output. Full xUnit suite 14192/0 green with the opt default-on.
|
The pre-existing Test262 compiled-host crash referenced above (Test262 gives no usable signal for this change) is now tracked as #964 — it reproduces with this PR's optimization disabled and with a freshly-built worker, so it is unrelated to this change. |
|
Update — clean compiled Test262 signal now available (via the #964 workaround of running the Zero compiled-mode regressions from this change. The 8 new passes (Array.join/toString/with, Number.toFixed, String.split) and the single soft change (String.indexOf bigint) are all unrelated to modulo — pre-existing main drift. This supersedes the "Test262 gives no signal" note in the PR description: compiled conformance is confirmed neutral, as expected for a pure representation optimization. |
What & why (#928)
Native-int modulo fast path for the compiled IL backend:
counter % integerLiteralis emitted as a native int64rem+conv.r8instead of an FPfmodon two doubles.This came out of re-investigating #928. Split-isolation benchmarks overturned the issue's premise — the int32 gap is not element representation:
ToInt32/conv.r8are negligible), and SharpTS's Float64 stencil-read is already at parity with Node (Node's int32 speed is just its Smi lane on integer data).i % kdoublefmod— it nearly doubles any kernel containing it (store6416.4 →mod6432.6ms @1m; double multiply adds ~0).iis already an int64 loop-counter slot andkis an integer literal, soi % kas int64remis sound within the exact gate the int-counter already accepts (C#long %≡ JS%— both truncated, sign-of-dividend — for every|i| ≤ 2^53).Performance (A/B, only the compile-time gate differs; min ms @1m)
s += i*3-(i%7)%(store/mul/accumulate)Correctness & soundness
i±kdividends,%1, descending counters, modulo embedded in a mixed-double kernel.SharpTS.Tests/SharedTests/ModuloParityTests.cs— 8 cases × both modes (16 tests), every expected value ground-truthed against Node; includes the divisor-0 → NaN fallback (declined by the fast path, noDivideByZeroException) and the non-counter-dividend fallback.ILVerificationTests).main— the compiled host hits a fatal CLR error (0x80131506) that aborts the run before the compiled differ executes, and the interpreted baseline is stale/flaky (timeouts). This reproduces identically with the opt disabled (SHARPTS_INT_MOD=0): neither run reaches the[Compiled]differ, and the interpreted soft-change count even varies between two runs of the byte-identical interpreter (42 vs 19) — proving harness nondeterminism, not a regression from this change (which is compiled-only and cannot affect the interpreter). The subset also doesn't include the multiplicative/modulus operator folders, and Test262's modulus tests use constant operands that this loop-counter fast path never matches. So Test262 gives no signal here either way; the gate for this change is the dual-mode xUnit suite + parity tests + the 4-way differential above. The pre-existing crash is unrelated infra debt worth a separate issue.Scope / design
%goes native. Multiply stays double (measured free); division/must stay double (JS/is true division, not integer division).i,i±k,k+i) and the right is a non-zero integer literal; everything else falls through to the existing double path.SHARPTS_INT_MOD(default-on kill-switch, mirroringSHARPTS_INT_LOOP_COUNTER).Interop
Safe by construction — the change operates on counter values, never the
byte[]typed-array backing, soArrayBuffer/DataView/Atomicsand the$Runtime/$Arrayhost types are untouched. Emitted IL is pure BCL opcodes (ldloc/ldc.i8/rem/conv.r8), so standalone DLLs gain no SharpTS.dll reference.