Skip to content

perf(stark/verifier): four micro-opts targeting the recursion guest#626

Draft
diegokingston wants to merge 1 commit into
mainfrom
perf/verifier-guest-microopts
Draft

perf(stark/verifier): four micro-opts targeting the recursion guest#626
diegokingston wants to merge 1 commit into
mainfrom
perf/verifier-guest-microopts

Conversation

@diegokingston
Copy link
Copy Markdown
Collaborator

Follow-up to #601's preprocessed-commitment caching: targets the same recursion-guest cost surface but inside crypto/stark/src/verifier.rs. Four independent, semantics-preserving changes that cut allocations and wasted Merkle work when the verifier runs as a RISC-V program.

Changes

  1. step_2_verify_claimed_composition_polynomial — drop the HashMap<usize, FieldElement<Field>> step memo for a small Vec<(usize, FE)> scanned linearly. Boundary constraints number in the low tens; in RISC-V the HashMap's hash function + heap allocator outweigh any O(B²) saving on a tiny B.

  2. reconstruct_deep_composition_poly_evaluations_for_all_queries — hoist lde_base, lde_base_sym, and denoms_scratch out of the per-query loop and .clear() each iteration instead of fresh Vec::new(). At ~64-128 queries this removes hundreds of small allocations per verify. reconstruct_deep_composition_poly_evaluation now takes the scratch buffer as a &mut Vec<FieldElement<FieldExtension>> parameter.

  3. verify — drop proof.clone() in the single-table wrapper. Added a multi_verify_proofs slice-taking variant; multi_verify is now a one-line wrapper preserving the existing &MultiProof API for all current callers (~50 across stark tests and prover). verify now calls the slice variant via core::slice::from_ref(proof).

  4. verify_trace_openings and verify_query_and_sym_openings — replace the result & openings_ok always-evaluate fold with explicit return false on each opening failure, cutting wasted Merkle work on tampered proofs. The FRI loop additionally drops the pre-allocated evaluation_point_vec in favour of stepping evaluation_point_squared in-place (one fewer Vec alloc per query) and adds an explicit length-alignment guard so the loop hits its i == last_layer_idx return for any well-formed proof.

Test plan

  • cargo build -p stark — clean.
  • make lint — clean across all three clippy configs.
  • cargo test -p stark — 126/126 passing (covers single-table verify, multi-table multi_verify, and all FRI / DEEP / boundary paths via the example AIRs).
  • cargo test -p lambda-vm-prover --lib — 273 pass / 77 fail; same counts as main (verified). All 77 failures are pre-existing on the base, unrelated to constraints (UnknownSyscall(5) etc.).

Notes

  • Pure micro-optimizations: no Fiat-Shamir surface, no proof format change, no API removal. multi_verify and MultiProof are unchanged from the caller's perspective.
  • Allocation count reduction is what's visible in benchmark instrumentation; cycle savings inside the recursion guest scale with num_queries × num_tables.
  • Items 5-7 from the analysis (refactor verifier into modules, lock down public surface, drop the Clone requirement on the transcript) are larger and out of scope for this PR.

All four changes preserve verifier semantics — `cargo test -p stark`
passes 126/126; `cargo test -p lambda-vm-prover --lib` keeps the
baseline 273 pass / 77 pre-existing failures (unrelated to constraints,
`UnknownSyscall(5)` on `main`); `make lint` is clean.

1) step_2_verify_claimed_composition_polynomial: drop the
   HashMap<usize, FE> step memo in favour of a small Vec scanned
   linearly. Boundary constraints number in the low tens; in RISC-V the
   HashMap's hash function + heap allocator outweigh O(B^2) scan savings.

2) reconstruct_deep_composition_poly_evaluations_for_all_queries: hoist
   lde_base / lde_base_sym / denoms_scratch out of the per-query loop
   and `.clear()` each iteration instead of fresh `Vec::new()`. At
   ~64-128 queries this removes hundreds of small allocations per
   verify. `reconstruct_deep_composition_poly_evaluation` now takes the
   scratch buffer as a `&mut Vec<FE<E>>` parameter.

3) verify: drop `proof.clone()` for the single-table wrapper. Added a
   `multi_verify_proofs` slice-taking variant; `multi_verify` is now a
   one-line wrapper preserving the existing `&MultiProof` API for all
   current callers (~50 across stark tests and prover).

4) verify_trace_openings and verify_query_and_sym_openings: replace
   `result & openings_ok` (always-evaluate fold) with early `return
   false` on each opening failure. Cuts wasted Merkle work on tampered
   proofs. The FRI loop additionally drops the pre-allocated
   `evaluation_point_vec` in favour of stepping `evaluation_point_squared`
   in-place — one fewer Vec alloc per query. A length-alignment guard
   makes the loop hit its `i == last_layer_idx` return for any
   well-formed proof.
@diegokingston diegokingston marked this pull request as draft May 26, 2026 17:03
@github-actions
Copy link
Copy Markdown

Codex Code Review

No issues found in the PR diff.

I checked the changed verifier logic for soundness, panic paths on malformed proofs, and meaningful performance regressions. The FRI length guard and short-circuiting preserve the boolean behavior while rejecting malformed decommitment lengths earlier.

Verification:

  • git diff --check passed.
  • cargo check -p stark could not run because rustup tried to write under read-only /home/runner/.rustup/tmp, so I could not use it as a compile signal.

// Unreachable: the length guard above ensures the loop iterates at
// least once (we passed the is_empty check) and hits the
// `i == last_layer_idx` return.
unreachable!("loop must hit the last_layer_idx return")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Low: unreachable!() in a verifier panics instead of rejecting

A verifier should never panic on any malformed input — a panic is a DoS vector. The invariant is sound given the guards (all three slices are equal-length and non-empty, so the loop always hits i == last_layer_idx), but return false is strictly safer for defense in depth.

Suggested change
unreachable!("loop must hit the last_layer_idx return")
return false; // all openings passed but loop fell through (should not happen)

@claude
Copy link
Copy Markdown

claude Bot commented May 26, 2026

Code Review: perf(stark/verifier): four micro-opts targeting the recursion guest

Overview

Four semantics-preserving performance changes to verifier.rs, all targeting allocation/cycle cost when the verifier runs as a RISC-V recursion guest. The math is preserved, the API surface is unchanged for existing callers, and all 126 stark tests pass.


Security

Positive — new FRI length guard (lines 514-518)
The old zip()-based fold silently truncated if layers_auth_paths or layers_evaluations_sym had fewer elements than fri_layers_merkle_roots. A malformed proof could exploit this to skip layer verifications. The explicit length check closes that path — this is the most important correctness improvement in the PR.

Low — unreachable!() in verifier context (line 561)
Already flagged as an inline comment. A verifier should never panic on any malformed input; return false is the safe default. The invariant is logically sound (the guards guarantee the loop hits i == last_layer_idx), but a panic on an unexpected code path is a DoS vector. Prefer return false.


Correctness

FRI evaluation_point_squared stepping — verified correct
The original code pre-collected evaluation_point_vec[i] = ep_inv^(2^(i+1)) via successors(...).take(n). The new code starts evaluation_point_squared = ep_inv.square() and squares at the end of each non-final iteration, reproducing the same sequence. The early return on i == last_layer_idx correctly skips the final unnecessary squaring.

denoms_scratch.reserve(n) after .clear()
After .clear() the length is 0, so reserve(n) ensures capacity >= n total. Correct. On subsequent queries the buffer already has sufficient capacity and no allocation occurs.

verify_trace_openings short-circuit
Semantically equivalent to the original result &= ... accumulation: all three checks (main, precomputed, aux) are still required. The early return only skips remaining checks when one has already failed, which is sound for a verifier.


Minor

O(B²) step_to_point Vec — Acknowledged in the PR. Fine for typical AIRs (B in the low tens). No action needed.

multi_verify_proofs on the trait — Adding a default-implemented method to a public trait is non-breaking for existing implementors, but it does expand the trait's API surface. Worth keeping in mind if downstream crates seal or mock IsStarkVerifier.


Summary

No critical or high issues. The one item worth addressing before merge is the unreachable!()return false change (Low, see inline comment). The FRI length guard is a genuine security improvement. The allocation-reduction changes are mechanically correct and well-motivated for the recursion guest context.

diegokingston added a commit that referenced this pull request May 29, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant