Skip to content

feat(cuda): R4 DEEP composition + FRI commit phase on GPU#648

Open
ColoCarletti wants to merge 57 commits into
mainfrom
feat/cuda-pr4
Open

feat(cuda): R4 DEEP composition + FRI commit phase on GPU#648
ColoCarletti wants to merge 57 commits into
mainfrom
feat/cuda-pr4

Conversation

@ColoCarletti
Copy link
Copy Markdown
Collaborator

Summary

Extends the GPU-resident proving pipeline through Round 4. R4 DEEP composition and the full FRI commit phase (fold + per-layer Keccak leaves + pair-hash Merkle
tree) now run device-side, with only per-layer roots D2H'd for the transcript. The R2 composition-parts LDE moves to a _keep variant so its de-interleaved device
buffer is retained on Round2 and reused by R4 DEEP without a re-H2D. Also lands a Blelloch chunk-scan parallel batch-inverse kernel as infrastructure for future
GPU-side denominator inversion (not yet wired).

Changes

  • crypto/math-cuda/kernels/{inverse,deep,fri}.cu — new kernels.
  • crypto/math-cuda/src/{inverse,deep,fri}.rs — host orchestrators including FriCommitState (ping-pong eval buffers, in-place inv_twiddles squaring, per-layer
    fused fold + leaves + tree).
  • crypto/stark/src/gpu_lde.rs — new dispatches: try_evaluate_parts_on_lde_gpu_keep, try_deep_composition_gpu, try_fri_commit_gpu. New counters:
    gpu_deep_calls, gpu_fri_calls.
  • crypto/stark/src/prover.rsRound2.gpu_composition_parts holds the R2 keep handle; R4 DEEP fast path inside compute_deep_composition_poly_evaluations
    consumes R1 main/aux + R2 parts handles when available.
  • crypto/stark/src/fri/mod.rscommit_phase_from_evaluations routes through try_fri_commit_gpu when cuda is enabled.
  • Tests: parity for batch invert (n in {2..2^20}), DEEP, FRI per-layer tree (log_num_leaves in {1..18}); cuda_path_integration asserts the two new counters fire
    end-to-end.

Fallback

Every dispatch is gated by TypeId checks (Goldilocks + ext3) and the LDE-size threshold. Below threshold or on any cudarc error, the dispatch returns None and
the existing CPU implementation runs unchanged. Exception: mid-FRI-loop cudarc failure panics, because the transcript is already advanced and a CPU restart would
re-sample zeta_0 against mutated state.

Test plan

  • cargo test -p math-cuda --release --tests (GPU host) — 67 tests
  • cargo test -p stark --release --features cuda — 128 tests
  • cargo test -p stark --release (no cuda) — 128 tests
  • cargo test -p lambda-vm-prover --release (no cuda) — 384 tests
  • cargo test -p lambda-vm-prover --release --features cuda --test cuda_path_integration -- --ignored — all 6 counters fire end-to-end
  • cargo clippy --workspace --all-targets --features cuda -- -D warnings — clean
  • cargo clippy --workspace --all-targets -- -D warnings — clean
  • cargo fmt --all --check — clean

ColoCarletti and others added 30 commits May 6, 2026 15:12
Co-authored-by: Gabriel Bosio <38794644+gabrielbosio@users.noreply.github.com>
Co-authored-by: Gabriel Bosio <38794644+gabrielbosio@users.noreply.github.com>
Co-authored-by: Gabriel Bosio <38794644+gabrielbosio@users.noreply.github.com>
Co-authored-by: Gabriel Bosio <38794644+gabrielbosio@users.noreply.github.com>
Co-authored-by: Gabriel Bosio <38794644+gabrielbosio@users.noreply.github.com>
Co-authored-by: Gabriel Bosio <38794644+gabrielbosio@users.noreply.github.com>
Co-authored-by: Gabriel Bosio <38794644+gabrielbosio@users.noreply.github.com>
Co-authored-by: Gabriel Bosio <38794644+gabrielbosio@users.noreply.github.com>
Co-authored-by: Gabriel Bosio <38794644+gabrielbosio@users.noreply.github.com>
Co-authored-by: Gabriel Bosio <38794644+gabrielbosio@users.noreply.github.com>
Co-authored-by: Gabriel Bosio <38794644+gabrielbosio@users.noreply.github.com>
Co-authored-by: Gabriel Bosio <38794644+gabrielbosio@users.noreply.github.com>
ColoCarletti and others added 27 commits May 29, 2026 11:18
Co-authored-by: Gabriel Bosio <38794644+gabrielbosio@users.noreply.github.com>
Co-authored-by: Gabriel Bosio <38794644+gabrielbosio@users.noreply.github.com>
Co-authored-by: Gabriel Bosio <38794644+gabrielbosio@users.noreply.github.com>
Co-authored-by: Gabriel Bosio <38794644+gabrielbosio@users.noreply.github.com>
# Conflicts:
#	crypto/math-cuda/build.rs
#	crypto/math-cuda/src/device.rs
#	crypto/math-cuda/src/lib.rs
#	crypto/stark/src/gpu_lde.rs
#	crypto/stark/src/prover.rs
#	prover/tests/cuda_path_integration.rs
@gabrielbosio gabrielbosio added the gpu Related to GPU/CUDA development label Jun 4, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

gpu Related to GPU/CUDA development

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants