Skip to content

perf(compile): native-int modulo for counter % literal (#928)#963

Merged
nickna merged 1 commit into
mainfrom
wrk/issue-928-int-modulo
Jun 27, 2026
Merged

perf(compile): native-int modulo for counter % literal (#928)#963
nickna merged 1 commit into
mainfrom
wrk/issue-928-int-modulo

Conversation

@nickna

@nickna nickna commented Jun 27, 2026

Copy link
Copy Markdown
Owner

What & why (#928)

Native-int modulo fast path for the compiled IL backend: counter % integerLiteral is emitted as a native int64 rem + conv.r8 instead of an FP fmod on two doubles.

This came out of re-investigating #928. Split-isolation benchmarks overturned the issue's premise — the int32 gap is not element representation:

  • SharpTS runs int and float typed arrays at the same speed (per-element ToInt32/conv.r8 are negligible), and SharpTS's Float64 stencil-read is already at parity with Node (Node's int32 speed is just its Smi lane on integer data).
  • The dominant per-iteration cost in numeric write kernels is the i % k double fmod — it nearly doubles any kernel containing it (store64 16.4 → mod64 32.6ms @1m; double multiply adds ~0).

i is already an int64 loop-counter slot and k is an integer literal, so i % k as int64 rem is sound within the exact gate the int-counter already accepts (C# long % ≡ JS % — both truncated, sign-of-dividend — for every |i| ≤ 2^53).

Performance (A/B, only the compile-time gate differs; min ms @1m)

Workload off on speedup
typed-arrays (Float64) 4.65 2.55 1.82× — now beats Node (5.64) ~2.2×, ~1.2× off Bun
int32-kernel 5.27 3.77 1.40×
bare s += i*3-(i%7) 29.3 14.8 1.99×
kernels with no % (store/mul/accumulate) unchanged 1.0×

Correctness & soundness

  • 4-way byte-identical (compiled-on / compiled-off / interpreter / Node) across an edge battery: negative dividend, negative divisor, i±k dividends, %1, descending counters, modulo embedded in a mixed-double kernel.
  • New SharpTS.Tests/SharedTests/ModuloParityTests.cs — 8 cases × both modes (16 tests), every expected value ground-truthed against Node; includes the divisor-0 → NaN fallback (declined by the fast path, no DivideByZeroException) and the non-counter-dividend fallback.
  • Full suite 14192/0 green with the opt default-on. ILVerify-clean (covered by ILVerificationTests).
  • Test262 (interp + compiled): the default subset is pre-existingly red on main — the compiled host hits a fatal CLR error (0x80131506) that aborts the run before the compiled differ executes, and the interpreted baseline is stale/flaky (timeouts). This reproduces identically with the opt disabled (SHARPTS_INT_MOD=0): neither run reaches the [Compiled] differ, and the interpreted soft-change count even varies between two runs of the byte-identical interpreter (42 vs 19) — proving harness nondeterminism, not a regression from this change (which is compiled-only and cannot affect the interpreter). The subset also doesn't include the multiplicative/modulus operator folders, and Test262's modulus tests use constant operands that this loop-counter fast path never matches. So Test262 gives no signal here either way; the gate for this change is the dual-mode xUnit suite + parity tests + the 4-way differential above. The pre-existing crash is unrelated infra debt worth a separate issue.

Scope / design

  • Only % goes native. Multiply stays double (measured free); division / must stay double (JS / is true division, not integer division).
  • Fires only when the left is a recognized integer-counter expression (i, i±k, k+i) and the right is a non-zero integer literal; everything else falls through to the existing double path.
  • Gated by SHARPTS_INT_MOD (default-on kill-switch, mirroring SHARPTS_INT_LOOP_COUNTER).

Interop

Safe by construction — the change operates on counter values, never the byte[] typed-array backing, so ArrayBuffer/DataView/Atomics and the $Runtime/$Array host types are untouched. Emitted IL is pure BCL opcodes (ldloc/ldc.i8/rem/conv.r8), so standalone DLLs gain no SharpTS.dll reference.

Emit `i % k` (loop-counter dividend, integer-literal divisor) as a native
int64 `rem` + `conv.r8` instead of an FP `fmod` on two doubles. Split-isolation
benchmarks for #928 showed the `fmod` is the dominant per-iteration cost in
numeric write kernels (it ~doubles them); double multiply is already free and
the int-vs-float element representation is not the bottleneck.

Sound within the existing int-counter gate: C# `long %` and JS `%` are both
truncated (sign of dividend), so the result is bit-identical to the double
computation for every |dividend| <= 2^53. Only `%` goes native -- `/` stays
double (JS `/` is true division). Divisor 0 falls through to the double path
(NaN), so no DivideByZeroException.

Perf @1m (min ms): typed-arrays Float64 4.65->2.55 (1.82x, now beats Node ~2.2x);
int32-kernel 5.27->3.77 (1.40x); non-modulo kernels unchanged.

- ForLoopAnalyzer.IntegerModuloEnabled (kill-switch SHARPTS_INT_MOD=0)
- ILEmitter.Operators: TryEmitIntegerCounterModulo + TryEmitIntegerCounterValueI8
- ModuloParityTests: 8 cases x both modes, Node-ground-truthed

Interop-safe: operates on counter values, not the byte[] backing; pure BCL
opcodes, ILVerify-clean, no SharpTS.dll reference in standalone output.

Full xUnit suite 14192/0 green with the opt default-on.
@nickna

nickna commented Jun 27, 2026

Copy link
Copy Markdown
Owner Author

The pre-existing Test262 compiled-host crash referenced above (Test262 gives no usable signal for this change) is now tracked as #964 — it reproduces with this PR's optimization disabled and with a freshly-built worker, so it is unrelated to this change.

@nickna

nickna commented Jun 27, 2026

Copy link
Copy Markdown
Owner Author

Update — clean compiled Test262 signal now available (via the #964 workaround of running the CompiledBaseline filter in isolation, since the unfiltered run crashes the host for unrelated reasons):

[Compiled] baseline drift: 0 regressions, 8 new passes, 0 new/removed entries  (16m44s, 11384 tests)

Zero compiled-mode regressions from this change. The 8 new passes (Array.join/toString/with, Number.toFixed, String.split) and the single soft change (String.indexOf bigint) are all unrelated to modulo — pre-existing main drift. This supersedes the "Test262 gives no signal" note in the PR description: compiled conformance is confirmed neutral, as expected for a pure representation optimization.

@nickna nickna merged commit 9a50249 into main Jun 27, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant