Skip to content

Perf/verifier recursion opts#636

Draft
diegokingston wants to merge 8 commits into
mainfrom
perf/verifier-recursion-opts
Draft

Perf/verifier recursion opts#636
diegokingston wants to merge 8 commits into
mainfrom
perf/verifier-recursion-opts

Conversation

@diegokingston
Copy link
Copy Markdown
Collaborator

No description provided.

@github-actions
Copy link
Copy Markdown

Codex Code Review

No issues found in the PR diff.

I reviewed the changed verifier logic for security, correctness, significant performance regressions, and unnecessary complexity, and didn’t find an actionable concern limited to these changes.

// it is the number of composition poly parts the proof advertises (the
// same array the consumer validates against). Hoist `z^N` once.
let number_of_parts = proof.composition_poly_parts_ood_evaluation.len();
let z_pow = challenges.z.pow(number_of_parts);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Low – silent semantic change for malformed proofs

number_of_parts is now taken from proof.composition_poly_parts_ood_evaluation.len() (the OOD header), whereas the old code took it from lde_composition_poly_parts_evaluation.len() (the LDE opening evaluations slice).

For valid proofs these are always equal, so there is no regression. For a malformed proof where the opening carries fewer parts than the header (M < N), the old code silently computed z^M (wrong denominator) and then hit None on the subsequent gammas.get(j)?; the new code computes the correct z^N from the start. This is actually a fix, but it is an implicit behavioral change worth calling out explicitly.

Consider adding a validation check so the caller fails fast and the intent is clear:

Suggested change
let z_pow = challenges.z.pow(number_of_parts);
let number_of_parts = proof.composition_poly_parts_ood_evaluation.len();
let z_pow = challenges.z.pow(number_of_parts);
// All openings must carry the same part count as the OOD header; validated
// per-entry inside reconstruct_deep_composition_poly_evaluation via the
// gammas/ood bounds checks there.

Comment on lines +1155 to +1157
// (it is stored owned in the returned `Challenges`). Clone exactly once
// here, where ownership is actually required — this removes the
// per-table clone that previously lived at the `multi_verify` call site.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Low – comment overstates the allocation saving

The clone is not eliminated — it moves from the call site (lookup_challenges.clone()) into verify_rounds_2_to_4 (rap_challenges.to_vec()). Total allocations per-table is unchanged.

The real wins from this PR are:

  1. logup_alpha_powers computed once globally instead of once per table.
  2. A single inplace_batch_inverse across all query denominators instead of one per query.

The comment should reflect that, so future readers don't expect a measurable per-table allocation reduction from this line change alone.

@claude
Copy link
Copy Markdown

claude Bot commented May 29, 2026

Review: Perf/verifier recursion opts

Summary

Two orthogonal performance improvements to the STARK verifier:

  1. Single global inplace_batch_inverse — denominators for all query/symmetric-point pairs are collected in one flat buffer and inverted in a single batch instead of one batch per query. This removes num_queries batch-inversion calls.

  2. Global LogUp alpha powerscompute_alpha_powers is called once with the maximum bus element count across all AIRs, and each table slices the prefix it needs, instead of calling compute_alpha_powers once per table.

The two-pass refactoring (collect denominators → batch invert → reconstruct) is sound. Ordering of deep_poly_evaluations and deep_poly_evaluations_sym is preserved correctly (non-sym and sym for iota[i] land at index i in their respective Vecs, matching what step_3_verify_fri expects).


Issues

Low — z_pow basis change (see inline comment on L576)
number_of_parts is now read from proof.composition_poly_parts_ood_evaluation.len() (OOD header) instead of the opening evaluation slice's length. For valid proofs these are always equal. For malformed proofs with fewer opening parts than the header (M < N) the old code used z^M (wrong); the new code uses z^N (correct). Net effect: the new code is more correct, but the implicit behavioral change should be documented.

Low — misleading clone comment (see inline comment on L1155-1157)
The total number of Vec allocations for rap_challenges per table is unchanged; to_vec() inside verify_rounds_2_to_4 replaces the clone() at the call site. The comment should attribute the real wins (global alpha powers + single batch inverse) rather than imply the clone was eliminated.


No issues found with

  • Batch-inverse failure propagation (ok()? correctly rejects any zero denominator from a malformed proof, matching prior semantics).
  • DeepEntry lifetime validity (all borrows come from the immutably-borrowed proof, so they coexist safely).
  • all_denoms buffer indexing (e * stride is safe because pass 1 fills exactly num_entries * stride elements with num_entries = num_queries * 2).
  • LogUp prefix slicing (the .min(logup_alpha_powers.len()) guard is purely defensive and correct).
  • Single-table verify path (routes through multi_verify, so all improvements apply there too).

@diegokingston
Copy link
Copy Markdown
Collaborator Author

/bench 5

@github-actions
Copy link
Copy Markdown

Benchmark — fib_iterative_8M (median of 5)

Table parallelism: auto (cores / 3)

Metric main PR Δ
Peak heap 51689 MB 51854 MB +165 MB (+0.3%) ⚪
Prove time 25.230s 25.021s -0.209s (-0.8%) ⚪

✅ No significant change.

✅ Low variance (time: 3.5%, heap: 1.1%)

Commit: 2a4cb75 · Baseline: cached · Runner: self-hosted bench

Toggles the keccak-precompile [patch] in an external recursion-guest
Cargo.toml, runs the verify bench both ways, prints the cycle comparison.
Parameterized by GUEST_DIR + VERIFY_BENCH_CMD (the guest harness lives
outside this repo).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant