Add a full f64 "double window" to the Stack VM#19
Merged
Conversation
f64 was second-class in the stack VM: values rode the integer TOS window as bit patterns, paying a GPR<->FP crossing on every op, with zero fusion. The biquad_f64 benchmark exposed this — it ran ~2.9x slower than the f32 biquad (45 generic int-window ops per sample vs a handful of fused float-window superinstructions). Give f64 its own parallel `double` register window (d0..d3 + dfsp spill pointer), the exact analogue of the f32 float window: - Thread dfsp + d0..d3 through the preserve_none handler signature; the two FP windows use all 8 FP arg registers (v0..v7 / xmm0..xmm7). - A full `*D`-suffix StackOp family (arithmetic, all six comparisons, conversions, memory load/store, math intrinsics, print) with C handlers, bridge wiring, and stack-depth deltas. Every Type::Float64 codegen arm now targets the d-window, including call/return bridging (DToBitsD / BitsToDD) and window-aware DropD. - Mirrored fused superinstructions: get_get_dmul_sum* (the whole biquad FMA chain collapses to one op), get_set*D move chains, and get_f64const_dgt_jiz. A F32ConstF + F32ToF64D -> F64ConstD const-fold exposes `<lit> as f64` constants to the compare-branch fusion. - double_stack_delta + an assert_d_window_balanced corpus sweep mirror the f-window correctness checks. Result (benchmark/run.sh, Stack VM): biquad f64 ~0.36s -> ~0.16s, now ~1.2x the f32 biquad (the residual gap is f64's 2x state bandwidth). No regression on the f32/int benchmarks. Fixes a latent correctness bug too: f64 `>` / `>=` / `\!=` previously fell through to integer bit-pattern comparisons. Verified: cargo test --workspace green (golden tests across all four backends), new tests/cases/f64_window.lyte covers arithmetic, comparisons, FMA fusion, f64-across-calls, deep chains, conversions and math; biquad_f64 output identical on jit/vm/stack; both window-balance assertions pass over the full corpus. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The differential fuzz target compared jit/vm/asm/llvm but never the stack backend (the Clang-built C interpreter), so the stack VM — and its f32 single window and f64 double window in particular — had no differential coverage. - Add a `stack` arm to run_backend(), gated on has_stack_interp, that compiles via compile_stack() and runs through stack_interp_bridge::run (the same path cli uses for --backend stack), and compare it against the VM in the fuzz body. - Add fuzz/build.rs (mirroring cli/build.rs) plus the cc build-dep so has_stack_interp is actually defined for the fuzz crate. - Fix capture_stdout: the stack interpreter prints via C stdio (printf), which buffers independently of Rust's stdout. Flush all C streams (fflush(NULL)) before restoring fd 1, or the captured output is lost and the stack backend appears to print nothing. - Extend the program generator to emit f32 and f64 computations (with optional a*b+c helpers exercising float arg/return-window bridging), printing results as i32 so output stays comparable across backends despite Rust-vs-C float formatting differences. Bare float literals are f32, so f32 uses bare literals and only f64 gets an `as f64` cast. Ran cargo fuzz run differential for several minutes across i32/f32/f64 programs with no divergences. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What & why
f64 was second-class in the stack VM: values rode the integer TOS window as bit patterns, paying a GPR↔FP crossing on every op, with zero fusion. The
biquad_f64benchmark exposed this — it ran ~2.9× slower than the f32 biquad (≈45 generic int-window ops per sample vs a handful of fused float-window superinstructions). This was a documented "deferred" decision (docs/FP_CODEGEN_PLAN.md§6.2, Option B).This PR gives f64 its own parallel
doubleregister window (d0..d3+ adfspspill pointer) — the exact analogue of the existing f32 float window.What changed
dfsp+d0..d3through thepreserve_nonehandler signature; the two FP windows together use all 8 FP arg registers (v0..v7/xmm0..xmm7), verified to stay register-resident on this target.*Dop family — arithmetic, all six comparisons, conversions, memory load/store, the 24 math intrinsics, and print, with C handlers, bridge wiring, and stack-depth deltas. EveryType::Float64codegen arm now targets the d-window, including call/return bridging (DToBitsD/BitsToDD) and window-awareDropD.get_get_dmul_sum*(the whole biquad FMA chain → one op),get_set*Dmove chains, andget_f64const_dgt_jiz. AF32ConstF + F32ToF64D → F64ConstDconst-fold exposes<lit> as f64constants to the compare-branch fusion.double_stack_delta+ anassert_d_window_balancedcorpus sweep mirror the existing f-window correctness checks.Stack_VM.md,FP_CODEGEN_PLAN.md§6.2).Results (
benchmark/run.sh, 3-run avg, Stack VM)f64 went from ~2.9× slower than f32 to ~1.2× — a 2.3× speedup. The residual gap is f64's 2× state bandwidth, not dispatch/crossing overhead. No regression on the f32/int benchmarks.
Also fixes a latent correctness bug: f64
>,>=, and!=previously fell through to integer bit-pattern comparisons (wrong for negatives / NaN / -0.0).Testing
cargo test --workspacegreen — golden tests across all four backends (jit/vm/asm/stack).tests/cases/f64_window.lytecovers arithmetic, all comparisons, FMA fusion, f64-across-calls (arg/return bridging), deep expression chains (window spill), conversions, math, and struct store/load.biquad_f64output identical on jit/vm/stack.Reviewer notes
src/stack_vm.rs(the Rust reference VM) is intentionally untouched: it's unreachable from any real path (thestackbackend uses the C interpreter) and its own unit tests don't use window ops.🤖 Generated with Claude Code