perf(stark/verifier): four micro-opts targeting the recursion guest by diegokingston · Pull Request #626 · yetanotherco/lambda_vm

diegokingston · 2026-05-26T17:02:05Z

Follow-up to #601's preprocessed-commitment caching: targets the same recursion-guest cost surface but inside crypto/stark/src/verifier.rs. Four independent, semantics-preserving changes that cut allocations and wasted Merkle work when the verifier runs as a RISC-V program.

Changes

step_2_verify_claimed_composition_polynomial — drop the HashMap<usize, FieldElement<Field>> step memo for a small Vec<(usize, FE)> scanned linearly. Boundary constraints number in the low tens; in RISC-V the HashMap's hash function + heap allocator outweigh any O(B²) saving on a tiny B.
reconstruct_deep_composition_poly_evaluations_for_all_queries — hoist lde_base, lde_base_sym, and denoms_scratch out of the per-query loop and .clear() each iteration instead of fresh Vec::new(). At ~64-128 queries this removes hundreds of small allocations per verify. reconstruct_deep_composition_poly_evaluation now takes the scratch buffer as a &mut Vec<FieldElement<FieldExtension>> parameter.
verify — drop proof.clone() in the single-table wrapper. Added a multi_verify_proofs slice-taking variant; multi_verify is now a one-line wrapper preserving the existing &MultiProof API for all current callers (~50 across stark tests and prover). verify now calls the slice variant via core::slice::from_ref(proof).
verify_trace_openings and verify_query_and_sym_openings — replace the result & openings_ok always-evaluate fold with explicit return false on each opening failure, cutting wasted Merkle work on tampered proofs. The FRI loop additionally drops the pre-allocated evaluation_point_vec in favour of stepping evaluation_point_squared in-place (one fewer Vec alloc per query) and adds an explicit length-alignment guard so the loop hits its i == last_layer_idx return for any well-formed proof.

Test plan

cargo build -p stark — clean.
make lint — clean across all three clippy configs.
cargo test -p stark — 126/126 passing (covers single-table verify, multi-table multi_verify, and all FRI / DEEP / boundary paths via the example AIRs).
cargo test -p lambda-vm-prover --lib — 273 pass / 77 fail; same counts as main (verified). All 77 failures are pre-existing on the base, unrelated to constraints (UnknownSyscall(5) etc.).

Notes

Pure micro-optimizations: no Fiat-Shamir surface, no proof format change, no API removal. multi_verify and MultiProof are unchanged from the caller's perspective.
Allocation count reduction is what's visible in benchmark instrumentation; cycle savings inside the recursion guest scale with num_queries × num_tables.
Items 5-7 from the analysis (refactor verifier into modules, lock down public surface, drop the Clone requirement on the transcript) are larger and out of scope for this PR.

All four changes preserve verifier semantics — `cargo test -p stark` passes 126/126; `cargo test -p lambda-vm-prover --lib` keeps the baseline 273 pass / 77 pre-existing failures (unrelated to constraints, `UnknownSyscall(5)` on `main`); `make lint` is clean. 1) step_2_verify_claimed_composition_polynomial: drop the HashMap<usize, FE> step memo in favour of a small Vec scanned linearly. Boundary constraints number in the low tens; in RISC-V the HashMap's hash function + heap allocator outweigh O(B^2) scan savings. 2) reconstruct_deep_composition_poly_evaluations_for_all_queries: hoist lde_base / lde_base_sym / denoms_scratch out of the per-query loop and `.clear()` each iteration instead of fresh `Vec::new()`. At ~64-128 queries this removes hundreds of small allocations per verify. `reconstruct_deep_composition_poly_evaluation` now takes the scratch buffer as a `&mut Vec<FE<E>>` parameter. 3) verify: drop `proof.clone()` for the single-table wrapper. Added a `multi_verify_proofs` slice-taking variant; `multi_verify` is now a one-line wrapper preserving the existing `&MultiProof` API for all current callers (~50 across stark tests and prover). 4) verify_trace_openings and verify_query_and_sym_openings: replace `result & openings_ok` (always-evaluate fold) with early `return false` on each opening failure. Cuts wasted Merkle work on tampered proofs. The FRI loop additionally drops the pre-allocated `evaluation_point_vec` in favour of stepping `evaluation_point_squared` in-place — one fewer Vec alloc per query. A length-alignment guard makes the loop hit its `i == last_layer_idx` return for any well-formed proof.

github-actions · 2026-05-26T17:04:07Z

Codex Code Review

No issues found in the PR diff.

I checked the changed verifier logic for soundness, panic paths on malformed proofs, and meaningful performance regressions. The FRI length guard and short-circuiting preserve the boolean behavior while rejecting malformed decommitment lengths earlier.

Verification:

git diff --check passed.
cargo check -p stark could not run because rustup tried to write under read-only /home/runner/.rustup/tmp, so I could not use it as a compile signal.

claude · 2026-05-26T17:06:05Z

+        // Unreachable: the length guard above ensures the loop iterates at
+        // least once (we passed the is_empty check) and hits the
+        // `i == last_layer_idx` return.
+        unreachable!("loop must hit the last_layer_idx return")


Low: unreachable!() in a verifier panics instead of rejecting

A verifier should never panic on any malformed input — a panic is a DoS vector. The invariant is sound given the guards (all three slices are equal-length and non-empty, so the loop always hits i == last_layer_idx), but return false is strictly safer for defense in depth.

Suggested change

unreachable!("loop must hit the last_layer_idx return")

return false; // all openings passed but loop fell through (should not happen)

claude · 2026-05-26T17:06:33Z

Code Review: perf(stark/verifier): four micro-opts targeting the recursion guest

Overview

Four semantics-preserving performance changes to verifier.rs, all targeting allocation/cycle cost when the verifier runs as a RISC-V recursion guest. The math is preserved, the API surface is unchanged for existing callers, and all 126 stark tests pass.

Security

Positive — new FRI length guard (lines 514-518)
The old zip()-based fold silently truncated if layers_auth_paths or layers_evaluations_sym had fewer elements than fri_layers_merkle_roots. A malformed proof could exploit this to skip layer verifications. The explicit length check closes that path — this is the most important correctness improvement in the PR.

Low — unreachable!() in verifier context (line 561)
Already flagged as an inline comment. A verifier should never panic on any malformed input; return false is the safe default. The invariant is logically sound (the guards guarantee the loop hits i == last_layer_idx), but a panic on an unexpected code path is a DoS vector. Prefer return false.

Correctness

FRI evaluation_point_squared stepping — verified correct
The original code pre-collected evaluation_point_vec[i] = ep_inv^(2^(i+1)) via successors(...).take(n). The new code starts evaluation_point_squared = ep_inv.square() and squares at the end of each non-final iteration, reproducing the same sequence. The early return on i == last_layer_idx correctly skips the final unnecessary squaring.

denoms_scratch.reserve(n) after .clear()
After .clear() the length is 0, so reserve(n) ensures capacity >= n total. Correct. On subsequent queries the buffer already has sufficient capacity and no allocation occurs.

verify_trace_openings short-circuit
Semantically equivalent to the original result &= ... accumulation: all three checks (main, precomputed, aux) are still required. The early return only skips remaining checks when one has already failed, which is sound for a verifier.

Minor

O(B²) step_to_point Vec — Acknowledged in the PR. Fine for typical AIRs (B in the low tens). No action needed.

multi_verify_proofs on the trait — Adding a default-implemented method to a public trait is non-breaking for existing implementors, but it does expand the trait's API surface. Worth keeping in mind if downstream crates seal or mock IsStarkVerifier.

Summary

No critical or high issues. The one item worth addressing before merge is the unreachable!() → return false change (Low, see inline comment). The FRI length guard is a genuine security improvement. The allocation-reduction changes are mechanically correct and well-motivated for the recursion guest context.

…erify, opening short-circuit)

diegokingston marked this pull request as draft May 26, 2026 17:03

claude Bot reviewed May 26, 2026

View reviewed changes

diegokingston added a commit that referenced this pull request May 29, 2026

perf(verifier): fold in #626 micro-opts (Vec step-memo, slice multi_v…

d4de863

…erify, opening short-circuit)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(stark/verifier): four micro-opts targeting the recursion guest#626

perf(stark/verifier): four micro-opts targeting the recursion guest#626
diegokingston wants to merge 1 commit into
mainfrom
perf/verifier-guest-microopts

diegokingston commented May 26, 2026

Uh oh!

github-actions Bot commented May 26, 2026

Uh oh!

claude Bot May 26, 2026

Uh oh!

claude Bot commented May 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	unreachable!("loop must hit the last_layer_idx return")
	return false; // all openings passed but loop fell through (should not happen)

Conversation

diegokingston commented May 26, 2026

Changes

Test plan

Notes

Uh oh!

github-actions Bot commented May 26, 2026

Codex Code Review

Uh oh!

claude Bot May 26, 2026

Choose a reason for hiding this comment

Uh oh!

claude Bot commented May 26, 2026

Code Review: perf(stark/verifier): four micro-opts targeting the recursion guest

Overview

Security

Correctness

Minor

Summary

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant