Skip to content

Research/mmcs streaming#628

Draft
diegokingston wants to merge 22 commits into
mainfrom
research/mmcs-streaming
Draft

Research/mmcs streaming#628
diegokingston wants to merge 22 commits into
mainfrom
research/mmcs-streaming

Conversation

@diegokingston
Copy link
Copy Markdown
Collaborator

No description provided.

… tests

PR1 of the streaming-MMCS plan (see docs/mmcs-streaming-design.md for the
full plan). This commit adds the module standalone — NOT wired into the
prover yet — so the API and the 8-vector threat model can be reviewed in
isolation before any production hot-path change.

What's in:
- `MmcsBuilder` / `Mmcs` / `MmcsOpening` types over the existing
  `IsMerkleTreeBackend` abstraction.
- Plonky3-style layer-injection build: sort matrices by `padded_height`
  desc (ties by tag asc), layer 0 = largest matrix's leaves, compress
  pairs upward, inject smaller matrices at their height-layer via
  `compress(node, leaf)`.
- Open(global_index) -> per-matrix leaves at the shifted index + sibling
  chain. Verify reconstructs the root bottom-up; rejects malformed or
  tampered openings.

MVP restrictions (called out in the module doc):
- Matrices must have *distinct* padded_heights (matches lambda-vm's chip
  topology where heights almost never collide). Same-height case is
  Phase 2 — would mix multiple matrices at layer 0 row-by-row.
- No SIMD / parallel hashing yet.
- Caller materializes full leaf digest arrays per matrix (no streaming
  chunked absorption yet; that's wrapped on top later).
- Single root only — no caps in this iteration.

Security: 8 vectors from the threat model, each as a test:
  v1 cross-matrix row swap (rejected via matrix-leaf order check)
  v2 padding-byte freedom (rejected at build: NotPowerOfTwo)
  v3 same-height matrices (rejected: DuplicateHeight — MVP restriction)
  v4 leaf re-labelling (rejected via tag binding in opening)
  v5 wrong leaf data (rejected via root mismatch)
  v6 index tampering (rejected via path divergence)
  v7 truncated path (rejected: siblings.len() vs depth check)
  v8 verifier-side spec mismatch (rejected via expected_specs check)
Plus duplicate-tag and out-of-bounds checks.

13 tests, 0 failed. `make lint` clean across all three clippy configs.
No prover/verifier code touched, no proof format change.

Next steps (separate PRs):
- PR2: wire into Round 1 main-trace commit, replace per-chip Merkle loop.
- PR3: extend to aux trace + composition.
Drops the MVP `DuplicateHeight` restriction so the MMCS matches lambda-vm's
actual chip topology (3+ CPU chunks all at 2^20, BITWISE at 2^20, etc.).

Build pipeline (now general):
- Sort by `padded_height` desc, ties by `tag` asc — verifier reproduces.
- Layer 0 = first max-height matrix's leaves, then sequentially compress
  in every additional max-height matrix's leaves at the same row index.
- Each upper layer: compress pairs of children, then inject every matrix
  whose `padded_height` matches that layer's length, in tag-asc order.

Verify mirrors with a cursor over `matrix_leaves` (which the builder
already orders by height desc then tag asc). At layer L we combine in
every leaf at the current row position whose matrix has height equal to
the current layer length; cursor must end exactly at the leaves' end
(unconsumed leaves => topology mismatch => reject).

Tests:
- `same_height_pair_round_trips`: two matrices at max_height combine.
- `lambda_vm_style_multi_chunk_round_trips`: 3 chunks at 2^3, 2 at 2^2,
  1 at 2^0 — mirrors typical lambda-vm chip groupings.
- `insertion_order_does_not_change_root`: permutation invariance.
- `same_height_tampered_leaf_rejected`: tampering one of the combined
  leaves at the same layer still fails root match.

17 tests total, 0 failed. `make lint` clean.
Adds `#[cfg(feature = "parallel")]` paths to the MMCS finalize step:
- Layer-0 same-height combine: row-parallel (each row independently
  folds K matrices in tag-asc order; K is small, the per-row sequential
  chain is short, rows scale across cores).
- Pair compression upward: pair-parallel.
- Matrix injection at non-leaf layers: row-parallel.

Mirrors Plonky3's `first_digest_layer` + `compress_and_inject` parallel
shape (we read their `merkle-tree/src/merkle_tree.rs` to confirm the
algorithmic structure matches). Differences:
- Plonky3 SIMD-packs WIDTH rows per hash call via PackedField. lambda-vm
  uses scalar Keccak so we fall back to per-row Rayon parallelism
  (similar throughput class for our chip topology, no SIMD code needed).
- Plonky3 hashes raw matrix rows internally (multi-matrix same-height
  rows concatenated -> single hash). Ours takes pre-hashed leaves so a
  same-height combine is `compress(leaf_A, leaf_B)` (one extra compress
  per matrix per row vs Plonky3). The trade-off: ours lets the caller
  hash chip-by-chip and drop the LDE before the MMCS sees anything,
  giving better peak memory for lambda-vm's large per-chip LDEs. Worth
  reconsidering for PR2 if profiling shows the extra compresses matter.

Helpers factored out so the parallel/serial cfg lives in one place per
operation (`build_combined_layer`, `compress_pairs`, `inject_matrices`).

Tests: same 17 cases pass both with and without `--features parallel`.
`make lint` clean across all three configs.
Adds two `#[ignore]` tests that print concrete numbers when invoked
with `cargo test -p crypto --features parallel mmcs_bench --
--ignored --nocapture`. Two scenarios:

1. **Build time** comparison on a scaled-down lambda-vm topology
   (8 chips: 3 at 2^14, 2 at 2^12, 2 at 2^10, 1 at 2^8).
   Measured on this host:
       N independent trees:    4481 µs
       Single MMCS tree:       3176 µs
       MMCS / N-trees ratio:   0.709  (MMCS is ~30% faster)
   The savings come from sharing layer-0 work when multiple chips
   live at max_height — fewer hash chains end up duplicated.

2. **Per-query opening hash count** for the same topology:
       N independent trees:   102 hashes per query
       Unified MMCS:           21 hashes per query
       Reduction factor:      ~4.9x
   At production scale (~30 chips, max_h = 2^20) this projects to
   ~13x reduction in per-query Merkle hashes, which is the dominant
   recursion-guest cost after #601's preprocessed-commitment cache.

Caveats:
- Topology is scaled-down to keep the bench fast (<1 s); production
  numbers will differ but in the same direction.
- Build bench measures wall-clock with `Instant`, not statistically
  rigorous like Criterion. Good enough as a sanity gate before PR2.
- Opening bench counts *Merkle path hashes only*, not leaf-hash work
  or compressions inside `verify` — both equal between modes.

Validates the Phase B->C migration: MMCS build does NOT regress prover
time (it improves it), and the verifier-as-guest cycle saving is
real and measurable.
The MMCS leaf hash binds matrix identity via a `MatrixTag` (8 bytes per
matrix). Prover and verifier must derive the same tag for the same
chip-chunk; otherwise the Fiat-Shamir transcript diverges silently —
opaque verification failure with no clear diagnostic. Centralising the
tag derivation in one shared module turns "same tag" from a convention
into a compile-time guarantee.

Encoding:
    MatrixTag = [chip_type_id : u32 LE] [chunk_index : u32 LE]

Chip type IDs are STABLE wire constants (verified by `tag_encoding_is_stable`
test pinning specific bytes). Append-only — never reassign, never reuse
removed IDs. Split tables share a low ID range (CPU=0..MEMW_REGISTER=9),
single-instance tables a mid range (100..107), and per-page tables get a
single ID (200) with `chunk_index` encoding the page index.

Tests cover:
- Uniqueness across all realistic (chip, chunk) pairs (10 split tables
  × 64 chunks + 8 single + 256 page indices, ~944 distinct tags).
- Encoding stability (specific bytes pinned).
- Sensitivity (changing chip_type or chunk_index changes the tag).

No prover/verifier code touched yet. This module exists so PR2's
wire-up (which replaces N per-chip Merkle trees with one MMCS) can
import a single source of truth.
Bridges per-chip LDE columns to the unified MMCS over the main trace.
Not yet wired into `multi_prove` — exists so the API + leaf-hash format
can be reviewed and tested before the hot-path change in the next step.

Module surface:
- `compute_chip_leaves_with_tag(columns, tag) -> Vec<Commitment>`:
  per-row Keccak256 of `LEAF_DOMAIN_TAG || tag.0 || row_bytes_be`.
  Bit-reversed row order matches the existing FRI / Merkle layout. Caller
  can drop the LDE columns immediately after — memory peak is one chip's
  LDE at a time (same as today's per-chip Merkle build, no regression).
- `build_main_trace_mmcs(entries)`: thin wrapper that pours
  `(MatrixTag, leaves)` pairs into the standalone `MmcsBuilder` from
  `crypto/merkle_tree/mmcs`. Returns one Mmcs with one root; per-query
  opens come from `mmcs.open(global_index)`.

Soundness notes baked into the doc:
- The leaf hash MUST include the chip tag. Without it, a single shared
  root cannot bind matrix identity (cross-chip row swap becomes a real
  attack). The legacy `keccak_leaves_bit_reversed` is unsafe for MMCS
  use; mixing the two formats would be a silent soundness bug. The test
  `leaves_differ_from_legacy_format` pins that the two encodings produce
  different bytes.
- The domain tag `LAMBDAVM_MAIN_MMCS_LEAF_V1` is versioned so future
  changes to the leaf encoding can be detected by stale verifiers.

5 tests, 0 failed. `make lint` clean.

Next step: replace the per-chip transcript-absorb loop in `multi_prove`
Round 1 Phase A with a single MMCS build + absorb. That commit also
drops `StarkProof.lde_trace_main_merkle_root` and changes the per-query
opening format.
Foundation for the upcoming MMCS Phase C wire-up. No hot-path changes;
all new types are defined but unused. Lands the surfaces both the
prover-side wire-up commit and the verifier-side replay commit will
consume next.

- crypto/mmcs: add serde derives on MatrixTag and MmcsOpening so they
  can live in the proof format.
- stark/mmcs_leaf: single source of truth for the main-trace MMCS
  leaf hash format (LEAF_DOMAIN_TAG + hash_tagged_row_bytes +
  hash_tagged_row). Prover and verifier will both call this; the
  prover-side compute_chip_leaves_with_tag adapter calls it row-wise.
- stark/proof/stark: MainTraceOpening<F> struct (evaluations +
  evaluations_sym + MmcsOpening per row, paired with its symmetric
  counterpart). Not yet wired into DeepPolynomialOpening.
- stark/prover: MainCommit<F> struct (shared Arc<Mmcs> + per-table
  MatrixTag + optional precomputed tree). Unused at the wire-up level
  but defined here as the keystone type. Marked allow(dead_code).
- prover/lib: VmAirs::air_tags() returns the parallel Vec<MatrixTag>
  in air_trace_pairs / air_refs order. Prover and verifier must call
  this on identical VmAirs configurations.
Wires the parallel Vec<MatrixTag> through both prover and verifier
signatures. The value is not consumed yet (let _ = main_tags), only
validated against AIR count. Lays the surface for the upcoming Phase A
MMCS wire-up without changing any cryptographic behaviour.

- multi_prove(air_trace_pairs, main_tags, transcript, ...)
- multi_verify(airs, multi_proof, main_tags, transcript, balance)
- Both reject a mismatch between `main_tags.len()` and AIR count.
- Single-AIR `prove` / `verify` synthesize a default tag.
- stark::test_utils::multi_verify_ram helper hides the synthetic tag
  generation from stark internal tests; bulk-migrated all 37 call
  sites to use it.
- New `mmcs_leaf::synth_main_tags(n)` + `synth_main_tags_for(slice)`
  helpers for lambda-vm tests that build ad-hoc AIR slices.
- prover/src/lib.rs `prove` / `verify` wire `airs.air_tags()` to the
  shared MMCS-bound transcript.
- lambda-vm tests bulk-updated to pass synthesized tags (values are
  ignored under the current `let _ = main_tags` no-op).

Tests: stark 130/130 green; lambda-vm-prover non-ELF tests green
(prove_elfs failures pre-date this change — they need ELF binaries
not present in the working tree).
…t on per-table trees

Phase C wire-up of the streaming-MMCS plan. Non-preprocessed tables are now
committed under a single shared main-trace MMCS; preprocessed tables stay
OUT of the MMCS and keep their own per-table multiplicities Merkle tree
(in addition to the AIR-pinned precomputed tree).

Architecture:
- `MainCommit<F>` is now an enum:
  * `Shared { mmcs (Arc), tag, padded_height }` for non-preprocessed
  * `Preprocessed { multiplicities_tree, multiplicities_root,
                    precomputed_tree, precomputed_root, num_precomputed_cols }`
- `MainTraceOpening<F>` is now an enum:
  * `Mmcs { evaluations, evaluations_sym, mmcs_opening, mmcs_opening_sym }`
  * `Tree(PolynomialOpenings<F>)` for the preprocessed per-table tree
- `MultiProof` gains `main_mmcs_root: Commitment` and
  `main_mmcs_spec: Vec<(MatrixTag, usize)>` (the MMCS is at multi-proof
  level, hence single-AIR `prove` / `verify` now return / take MultiProof).
- `StarkProof.lde_trace_main_merkle_root: Option<Commitment>` is repurposed:
  `Some` for preprocessed tables (per-table multiplicities root), `None`
  for non-preprocessed tables (covered by `main_mmcs_root`).

Phase A absorb order (prover + verifier + `replay_transcript_phase_a` match):
  for each table in spec-fixed order:
      if preprocessed:
          absorb precomputed_root           (AIR-pinned, verifier rejects mismatch)
          absorb multiplicities_root         (per-table tree)
  absorb main_mmcs_root                      (once, over Shared tables only)

Per-query main-trace opening:
- Shared tables: `mmcs.open(iota*2 << shift)` and `mmcs.open((iota*2+1) << shift)`,
  where `shift = log2(max_height / this_padded_height)`. The verifier
  rehashes `evaluations` with `mmcs_leaf::hash_tagged_row`, compares against
  `matrix_leaves[table_idx]`, and authenticates against root+spec.
- Preprocessed tables: `verify_opening_pair` against
  `StarkProof.lde_trace_main_merkle_root` (unchanged single-tree path).

Test-suite updates (single-AIR `prove` now returns MultiProof):
- crypto/stark/src/tests/small_trace_tests.rs: field access via `proof.proofs[0]`,
  `MainTraceOpening` matched as enum.
- crypto/stark/src/tests/bus_tests/soundness_tests.rs: same.

Tests: stark 130/130 green. lambda-vm-prover non-ELF tests unchanged
(the 77 prove_elfs failures predate this change — same UnknownSyscall(5)
executor bug on `git stash` baseline).
10 tests covering the attack surface of the shared main-trace MMCS
introduced in the previous commit. All operate on a baseline-valid
two-table multi-proof over non-preprocessed AIRs (DummyAIR + BitFlagsAIR)
so every main opening is `MainTraceOpening::Mmcs`. The preprocessed
`MainTraceOpening::Tree` branch is exercised end-to-end by lambda-vm-prover
`bitwise_tests`.

Each test tampers with a single field and asserts verifier rejection:

  Field tampered                                  Detection mechanism
  -----------------------------------------------------------------------
  multi_proof.main_mmcs_root[0]                   transcript divergence
  multi_proof.main_mmcs_spec[0].1 (height)        reproduced-spec check
  multi_proof.main_mmcs_spec[0].0 (tag)           reproduced-spec check
  mmcs_opening.matrix_leaves[idx].1 (digest)      rehash-vs-leaf check
  mmcs_opening.matrix_leaves[idx].0 (tag)         leaf-tag check
  mmcs_opening.global_index                       g_primary/g_sym match
  mmcs_opening.siblings[0][0]                     MmcsOpening::verify
  evaluations[0]                                  rehash mismatch
  main_tags slice swapped at verifier             spec sort mismatch
  (baseline test)                                 verifies cleanly

Together these pin the soundness of every byte the verifier consults on
the new path, locking in behaviour before the C2 streaming-builder work.
Adds `cli proof-size <ELF>` which generates (or loads via `--proof`) a
VmProof and reports the serialized byte size, broken down by component
across every sub-proof. Intended for CI to track proof-size regressions
and improvements (e.g. the streaming MMCS migration that just landed).

Usage:
    cli proof-size <ELF>                            # human-readable table
    cli proof-size <ELF> --json                     # machine-readable
    cli proof-size <ELF> --proof bundle.bin         # skip the re-prove
    cli proof-size <ELF> --private-input file       # stdin if needed

Sections reported (summed across all sub-proofs):
- main_mmcs_root / main_mmcs_spec                   (multi-proof header)
- per_table_main_merkle_root (preprocessed only)
- per_table_precomputed_merkle_root
- per_table_aux_merkle_root
- deep_poly_openings.main_trace_polys               (MMCS vs Tree opens)
- deep_poly_openings.{precomputed,aux,composition_poly}
- fri_layers_merkle_roots / fri_query_list          (FRI; usually dominant)
- trace_ood_evaluations / composition_poly_parts_ood_evaluation
- bus_public_inputs
- other                                             (bundle delta — headers,
                                                     public_inputs, nonce, ...)

Encoding is bincode v1, matching `cli prove`'s output format so saved
bundles round-trip 1:1.

Cargo.lock updated only for the new `serde` + `serde_json` deps in
`bin/cli`. No core dep changes.
@diegokingston
Copy link
Copy Markdown
Collaborator Author

/bench 5

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 27, 2026

Benchmark — fib_iterative_8M (median of 5)

Table parallelism: auto (cores / 3)

Metric main PR Δ
Peak heap 51861 MB 45730 MB -6131 MB (-11.8%) 🟢
Prove time 25.291s 44.325s +19.034s (+75.3%) 🔴

⚠️ Regression detected — heap or time increased by more than 5%.

✅ Low variance (time: 1.3%, heap: 0.2%)

Commit: 880ec4d · Baseline: cached · Runner: self-hosted bench

@github-actions
Copy link
Copy Markdown

Codex Code Review

Found issues:

  1. Medium security / protocol bug: empty MMCS specs are accepted by the verifier with an unchecked root.
    In crypto/stark/src/verifier.rs, if every AIR is preprocessed, expected_spec stays empty, multi_proof.main_mmcs_spec == [] passes, and the verifier still absorbs multi_proof.main_mmcs_root at line 857 even though it authenticates nothing. The honest prover cannot produce this case because build_main_mmcs returns MmcsError::Empty. Either reject empty MMCS specs in the verifier to match the prover, or make the MMCS root/spec optional and skip absorbing it when no table participates.

  2. Low bug / malformed input panic: MmcsOpening::verify panics on an empty expected spec.
    crypto/crypto/src/merkle_tree/mmcs.rs:307 indexes specs[0] after only checking that matrix_leaves.len() == specs.len(). A crafted empty opening with empty specs reaches this and panics. Return false when specs.is_empty().

  3. Significant performance / memory issue: MMCS build clones all leaf vectors.
    crypto/stark/src/prover.rs:542 clones every table’s Vec<Commitment> into the builder while phase_a_outputs still owns the originals. For large VM traces this can transiently duplicate hundreds of MB. Since MmcsBuilder::add_matrix already takes ownership, pass Phase A outputs by value or drain/move the leaves into the builder.

I attempted cargo check -p stark -p lambda-vm-prover -p cli, but the environment cannot create rustup temp files because /home/runner/.rustup/tmp is read-only.

Comment thread prover/src/mmcs_commit.rs
/// (a) the bytes are clearly versioned against any future change and
/// (b) they cannot collide with leaves of a different MMCS (aux trace,
/// composition, ...). Bump the suffix on any encoding change.
const LEAF_DOMAIN_TAG: &[u8] = b"LAMBDAVM_MAIN_MMCS_LEAF_V1";
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Medium — Duplicate domain tag (maintenance hazard)

This constant duplicates crypto/stark/src/mmcs_leaf.rs:32:

pub const LEAF_DOMAIN_TAG: &[u8] = b"LAMBDAVM_MAIN_MMCS_LEAF_V1";

The module is explicitly "not yet wired into multi_prove," so the live prover uses mmcs_leaf::hash_tagged_row_bytes (which references mmcs_leaf::LEAF_DOMAIN_TAG). When this module eventually gets wired in as a replacement it must import that constant, not define its own copy. If one is bumped without the other, prover and verifier will silently compute different leaf hashes and every proof will fail.

Suggestion: delete the local copy now and use stark::mmcs_leaf::LEAF_DOMAIN_TAG; (or just call stark::mmcs_leaf::hash_tagged_row_bytes directly instead of re-implementing it).

Comment thread bin/cli/src/main.rs

fn ser_len<T: serde::Serialize>(value: &T) -> usize {
// bincode v1 mirrors the encoding used by VmProof callers (bin/cli prove
// and prover tests), so per-section sums add up to the total bundle.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Medium — Silent failure corrupts the size report

unwrap_or(0) means any serialization error produces a 0 byte entry in the table. The percentage column will silently misrepresent the breakdown (and the s_other residual could underflow or wildly overcount).

Since bincode::serialize only fails on structurally invalid types (which is a programmer error, not a runtime condition), an expect is more appropriate here:

Suggested change
// and prover tests), so per-section sums add up to the total bundle.
bincode::serialize(value).expect("bincode serialization must succeed for VmProof types").len()

Comment thread prover/src/lib.rs
/// Currently unused at the call sites; defined here as the foundation
/// for the upcoming MMCS Phase C wire-up (see
/// `docs/mmcs-streaming-c1-spec.md`).
#[allow(dead_code)]
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Low — Stale annotation

air_tags() is actively called by both prove_with_options_and_inputs and verify_with_options in this file, so #[allow(dead_code)] is incorrect and will confuse future readers into thinking it's an unused helper. Please remove it.

Comment thread prover/src/lib.rs
/// for the upcoming MMCS Phase C wire-up (see
/// `docs/mmcs-streaming-c1-spec.md`).
#[allow(dead_code)]
pub fn air_tags(&self) -> Vec<crypto::merkle_tree::mmcs::MatrixTag> {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Medium — Unverified parallel-array ordering invariant

air_tags() returns tags in a hardcoded order (BITWISE first, DECODE second, …, then split-chip loops) that must match the order of air_refs() and air_trace_pairs() element-for-element. Any divergence — adding a chip to one but not the other, or reordering — causes a silent MMCS spec mismatch that the verifier catches only at proof-verification time, with no helpful error pointing here.

Consider adding a debug_assert_eq!(self.air_tags().len(), self.air_refs().len(), "air_tags and air_refs must be in sync") at least in multi_prove (or a compile-time marker using a unit test that asserts a known-length fixture), so regressions surface immediately.

.map_err(map_mmcs_err)?;
}
}
let mmcs = builder.finalize().map_err(map_mmcs_err)?;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Low — All-preprocessed topology panics with an opaque error

If every AIR in the multi-proof is preprocessed, the loop above adds no matrices and builder.finalize() returns Err(MmcsError::Empty), which propagates as ProvingError::WrongParameter("MMCS: Empty"). Lambda-VM always has at least one non-preprocessed table, so this won't fire in production, but it's a sharp edge for anyone writing tests with only preprocessed AIRs. A doc comment on build_main_mmcs (or a dedicated ProvingError::NoSharedMmcsMatrices variant) would make the failure mode explicit.

@claude
Copy link
Copy Markdown

claude Bot commented May 27, 2026

Review: Research/mmcs-streaming

Overview: This PR introduces a Plonky3-style Multi-Matrix Commitment Scheme (MMCS) that replaces N independent per-table Merkle trees with a single shared root over all non-preprocessed tables. The core MMCS implementation (crypto/crypto/src/merkle_tree/mmcs.rs) is clean and well-tested against an 8-vector threat model. The Fiat-Shamir transcript ordering change (one MMCS root instead of N per-table roots) is correctly mirrored between prover and verifier.

Findings

Medium — Duplicate leaf-hash implementation (link)
prover/src/mmcs_commit.rs defines its own copy of LEAF_DOMAIN_TAG and re-implements the full leaf hash, identical byte-for-byte to the single-source-of-truth in crypto/stark/src/mmcs_leaf.rs. The module is labelled "not yet wired in," but when it gets wired in it must import the canonical constant rather than define its own — a divergent bump will silently produce different leaf hashes and break every proof.

Medium — ser_len swallows serialization errors (link)
bin/cli/src/main.rs: unwrap_or(0) on a serialization failure produces a silent 0 in the breakdown table, corrupting percentages and the s_other residual. Use expect since these types are always serializable.

Low — Stale #[allow(dead_code)] on a live function (link)
air_tags() is actively called by both prove_with_options_and_inputs and verify_with_options; the attribute is misleading.

Low — air_tags() / air_refs() ordering is an unverified invariant (link)
These two methods return parallel arrays whose element ordering must match exactly. A chip added/reordered in one but not the other produces a silent MMCS spec mismatch caught only at verification time. A debug_assert_eq! on lengths (or a unit test pinning known chip counts) would surface regressions early.

Low — All-preprocessed topology gives an opaque error (link)
build_main_mmcs returns ProvingError::WrongParameter("MMCS: Empty") when every AIR is preprocessed. Not a current concern for lambda-VM, but worth documenting as a precondition.

Adds a pure post-processing subcommand for CI to render proof-size
deltas between two `proof-size --json` reports. Mirrors the
`tooling/loc` workflow: prove on the baseline and on the PR, then diff
the JSONs to produce a comment-ready table.

    cli proof-size base.elf --json > base.json
    cli proof-size pr.elf   --json > pr.json
    cli proof-size-diff base.json pr.json --format github > comment.md

Formats:
- `text`  — plain aligned table for terminal / logs
- `github` — Markdown with a collapsible <details> per-section block;
  the headline row stays visible in the PR comment so reviewers see the
  total delta at a glance
- `slack` — Slack-flavoured Markdown (code fence for the breakdown)

Per-section diff handles the typical edge cases:
- Section present in both → bytes delta + percentage
- Section new in current → `+N (new)`
- Section dropped in current → `-N (gone)`
- Section order follows `current`, then prev-only at the end (lossless)

The previously-anonymous `Report` struct in `cmd_proof_size` is hoisted
to a public-shape `ProofSizeReport { elf, total_vm_proof_bytes,
multi_proof_bytes, sub_proof_count, main_mmcs_spec_entries, sections }`
so the JSON schema is stable enough for CI to depend on. `section` is
now `String` (was `&'static str`) so the same struct round-trips
through deserialization.

3 unit tests in `proof_size_diff_tests` cover:
- Headline + per-section deltas in text format
- New-section + removed-section handling
- GitHub format collapsible block + percentage rendering

Build clean, 3/3 tests green, smoke-tested both formats with fake JSONs.
Mirror of the main-trace MMCS C1 work, applied to the auxiliary trace.
Pulls per-table aux Merkle roots out of the per-table forked transcripts
into a single shared aux MMCS root absorbed into the SHARED transcript
BEFORE per-table forking — every fork inherits the same aux binding
identically, and the proof carries one aux root instead of N.

Domain separation:
- `LEAF_DOMAIN_TAG_AUX = "LAMBDAVM_AUX_MMCS_LEAF_V1"` (alongside the
  existing main tag, now also aliased as `LEAF_DOMAIN_TAG_MAIN`).
- `hash_tagged_row_bytes_aux` / `hash_tagged_row_aux` helpers use it.
- An aux-MMCS opening cannot authenticate a main leaf (or vice versa);
  a `(tag, row)` pair under the two domains produces distinct digests.
  Pinned by a new test in `mmcs_leaf::tests`.

Architecture:
- `MainTraceOpening` gains a sibling `AuxTraceOpening<E>` enum (Mmcs
  variant only — there is no preprocessed-equivalent for aux).
- `MainCommit` gains a sibling `AuxCommit<E>::Shared { mmcs (Arc), tag,
  padded_height }`. `Round1.aux: Option<AuxCommit<E>>` (`None` when an
  AIR has no aux trace).
- `DeepPolynomialOpening.aux_trace_polys: Option<AuxTraceOpening<E>>`
  (the old `PolynomialOpenings<E>` shape is gone).
- `StarkProof` drops `lde_trace_aux_merkle_root` entirely.
- `MultiProof` gains:
  - `aux_mmcs_root: Option<Commitment>`  — `None` when no AIR has aux.
  - `aux_mmcs_spec: Vec<(MatrixTag, usize)>` — filtered to has-aux tables.

Phase C absorb order (prover + verifier match exactly):
  1. Phase A: main MMCS root absorb (unchanged).
  2. Phase B: sample LogUp challenges (unchanged).
  3. NEW: build aux LDEs + tagged leaves for tables with aux,
          build shared aux MMCS,
          absorb its root into the SHARED transcript.
  4. Fork per-table.
  5. Per-table: bind `bus_public_inputs.table_contribution` (unchanged),
                run rounds 2-4 (unchanged). No per-table aux root absorb.

Per-query open / verify path:
- Prover: `aux_commit.mmcs.open((iota*2) << shift)` and
  `mmcs.open((iota*2+1) << shift)`, producing `AuxTraceOpening::Mmcs`.
- Verifier: `verify_aux_mmcs_pair_inner` rehashes evaluations with the
  AUX domain, compares against `mmcs_opening.matrix_leaves[table_idx]`,
  cross-checks `global_index`, and authenticates against root + spec.
- `verify_trace_openings` dispatches: `Some(_)` requires aux MMCS root
  to be present; `(None, _)` is fine (table has no aux); `(Some, None)`
  rejects (proof claims an opening but no MMCS exists).

Verifier spec validation:
- Reproduces `expected_aux_spec` from `airs.filter(has_aux_trace).map((tag, lde_size))`
  in spec-fixed order, sorts by `(height desc, tag asc)` to match
  `MmcsBuilder::finalize`, and rejects on any mismatch with the
  proof-supplied `aux_mmcs_spec`.

Dead code cleanup:
- Removed `TableCommit<F>` entirely. Both main and aux now go through
  their own enum-typed commits; nothing else used it.

Tests: stark 142/142 green (130 prior + the 10 main-MMCS soundness +
2 new aux leaf-hash tests). The 77 prove_elfs failures in
lambda-vm-prover predate this work (UnknownSyscall(5)).
…path

9 tests covering the aux MMCS attack surface (mirror of the existing
main-MMCS suite). All use a baseline two-table multi-proof over two
`LogReadOnlyRAP` AIRs so both tables contribute aux and therefore both
participate in the shared MMCS.

  Field tampered                                  Detection mechanism
  -----------------------------------------------------------------------
  multi_proof.aux_mmcs_root[0]                    transcript divergence
  multi_proof.aux_mmcs_root = None                root presence/spec check
  multi_proof.aux_mmcs_spec[0].1 (height)         reproduced-spec check
  multi_proof.aux_mmcs_spec[0].0 (tag)            reproduced-spec check
  mmcs_opening.matrix_leaves[idx].1 (digest)      rehash-vs-leaf check
  mmcs_opening.global_index                       g_primary/g_sym match
  mmcs_opening.siblings[0][0]                     MmcsOpening::verify
  evaluations[0] += 1                             rehash mismatch
  (baseline test)                                 verifies cleanly

Locks behaviour for the entire aux MMCS path — root absorb, spec sort,
leaf rehash with the aux domain separator, and Merkle authentication.
…mentally

Adds a streaming-build variant of `MmcsBuilder` that folds per-chip
leaves at the MAX height into a single shared running layer-0 as they
arrive, instead of holding every max-height chip's leaf vector alive
simultaneously. The other heights still need per-chip leaves stored
until `finalize` — see "Why max-height only?" below.

Equivalence guarantee: identical root + spec + layer bytes to
`MmcsBuilder` for the same input set (locked by 4 round-trip tests
against the one-shot path, plus order-violation rejection tests).

# Why "max-height only"?

MMCS layer-0 at max height is built by left-folding every chip's leaves
at row `i`. With no left-anchor to compose with, the running fold
`acc = hash(acc, chip_k[i])` is mathematically equivalent to the
one-shot `hash(hash(hash(chip_0[i], chip_1[i]), chip_2[i]), ...)`.

For chips at heights BELOW max, the MMCS injection rule is
`next[i] = hash(hash(hash(next[i], chip_0[i]), chip_1[i]), ...)`, which
mixes the upward-compressed `next[i]` into the left-fold. Keccak (and
any non-associative hash) makes it impossible to pre-fold the chips
into a single summary and inject that summary later — the resulting
digest would differ from the one-shot builder, breaking verifier
compatibility. So we keep per-chip leaves for non-max heights and
inject them in left-fold order at `finalize`.

# Add order

Callers MUST add matrices in (height desc, tag asc) order — the exact
sort `MmcsBuilder::finalize` does internally. Out-of-order calls return
`MmcsError::OutOfOrder`. Same-height tags must strictly ascend;
duplicate tags at different heights still trip `DuplicateTag`.

# Memory / wire-format

- Peak savings vs one-shot: `(num_max_height_chips - 1) * max_height * node_size`
  worth of per-chip leaf storage that is folded immediately and dropped.
  For lambda-vm with ~5 chips at 2^20 that is ~128 MB of transient
  Vec<Commitment>.
- Wire format unchanged: the produced `Mmcs` has the same `root()` and
  `spec()`. Layers Vec matches one-shot byte-for-byte.
- Per-chip `leaf_digests` is empty on the streaming output — `Mmcs::open`
  is therefore unavailable until callers supply leaves at open time.
  This is the prover wire-up that comes next.

# MmcsMatrix change

`MmcsMatrix.padded_height` is now a stored field instead of reading
`leaf_digests.len()`, so `spec()` reports the right value even when
the streaming builder has emptied per-chip leaves.

Tests: 9 new tests on top of the prior 17 MMCS tests; 26 total green.
`StreamingMmcsBuilder` discards per-chip leaves at build time, so
`Mmcs::open(global_index)` — which reads `matrix.leaf_digests[idx]` to
fill `MmcsOpening.matrix_leaves` — would return empty leaves and a bad
opening. This commit adds the complementary API:

    pub fn open_with_leaves<F>(global_index, leaf_fn) -> MmcsOpening
    where F: FnMut(matrix_idx, local_idx) -> B::Node

The closure provides each matrix's per-row leaf at the appropriate
shifted index. The prover-side use case (next commit) is to rehash a
row from a per-table LDE on demand, replacing the one-shot Mmcs's
internal leaf storage with on-the-fly rehashing.

`Mmcs::open` is now a thin wrapper around `open_with_leaves` whose
closure reads from `self.matrices[i].leaf_digests` — back-compat, no
caller change.

Equivalence pinned by a new test
`streaming_open_with_leaves_round_trips_against_one_shot`:
- Build lambda-vm-shaped topology two ways (one-shot + streaming).
- Compare roots + specs (already covered).
- For every global_index in [0, max_height): open one-shot, open
  streaming via `open_with_leaves` feeding leaves from the input set,
  assert `global_index`, `siblings`, and `matrix_leaves` byte-identical.
- Assert the streaming opening verifies against the streaming root.

27/27 MMCS tests green (1 new on top of 26).

Foundation for the per-chunk MMCS refactor: with `open_with_leaves`
available, the prover can stream-build per-chunk MMCSes and rehash
leaves from chunk-shared LDEs at open time.
Foundation for the per-chunk MMCS migration. The chunk-grouped main and
aux MMCSes need to rehash chunk-mate LDE rows on demand at open time
(the prover-side input to `Mmcs::open_with_leaves`). Without Arc-shared
columns, each chunk-mate's open would have to copy ~600 MB of column
data per CPU LDE — a non-starter.

Change:
- `LDETraceTable.main_columns: Vec<Vec<FE<F>>>` → `Arc<Vec<Vec<FE<F>>>>`.
- Same for `aux_columns`.
- `from_columns(Vec, Vec, ...)` still works (wraps internally with
  `Arc::new`) — all existing call sites unaffected.
- New `from_columns_arc(Arc, Arc, ...)` for callers (next commit) that
  already hold Arc-shared column data.
- New `main_columns_arc()` / `aux_columns_arc()` cheap-clone accessors
  for the chunk-shared MMCS open helpers.
- `into_columns()` now returns `(Arc, Arc)` instead of `(Vec, Vec)`.
  No internal caller uses the old signature outside `LDETraceTable`
  construction (which uses Arc-aware path); external uses would have
  to migrate, but searching the workspace shows no such callers.

No protocol change. No proof-format change. No memory cost: a single
`Arc::new` on construction; deref is free; column reads via `[col][row]`
still go through `Vec` Deref. Existing 151/151 stark tests pass without
edits.
…thin chunk

Refactors the main + aux MMCSes from one-global to per-chunk: each
parallel chunk of K = `table_parallelism()` tables gets its own
streaming MMCS, with chunk-mate LDEs Arc-shared so the per-query open
path can rehash chunk-mate rows on demand. Final piece of the streaming
plan (after `StreamingMmcsBuilder` and `Mmcs::open_with_leaves`).

# Why per-chunk?

The streaming MMCS drops per-chip leaves at build time, so `Mmcs::open`
needs to recompute leaves on the fly via `open_with_leaves`. That
closure must produce leaves for every matrix in the MMCS spec — not
just the current table — because `MmcsOpening::verify` walks all
max-height matrices to reconstruct layer-0. Per-chunk grouping bounds
the cross-table LDE access to the K tables a chunk owns (already kept
alive together by the existing parallel R2-R4 loop), avoiding the
alternatives of (a) global Arc<Vec<Arc<Lde>>> threaded through every
fork, or (b) reordering iota sampling earlier.

# Proof format

`MultiProof`:
- `main_mmcs_root: Commitment`            → `main_mmcs_roots: Vec<Option<Commitment>>`
- `main_mmcs_spec: Vec<(...)>`            → `main_mmcs_specs: Vec<Vec<(...)>>`
- `aux_mmcs_root: Option<Commitment>`     → `aux_mmcs_roots: Vec<Option<Commitment>>`
- `aux_mmcs_spec: Vec<(...)>`             → `aux_mmcs_specs: Vec<Vec<(...)>>`
- + `chunk_size: u32` (pinned `table_parallelism()` so the verifier
   chunks the AIR slice the same way the prover did).

`None` entries in the *_roots Vecs mark chunks with no MMCS-eligible
tables (all-preprocessed for main, no-aux for aux). Per-query openings
shrink: each carries ≤K matrix_leaves instead of N.

# Phase A / Phase C absorb order (prover + verifier match exactly)

Per chunk in chunk order:
  for each table in spec order:
    - absorb its preprocessed root (preprocessed only)
    - absorb its per-table multiplicities root (preprocessed only)
  - absorb that chunk's main MMCS root (Some) or skip (None)
After Phase A → sample LogUp challenges → Phase C aux mirrors main:
  for each chunk in chunk order:
    - absorb that chunk's aux MMCS root (Some) or skip (None)
Then fork per-table → per-table table_contribution → rounds 2-4.

# Data plumbing

`MainCommit::Shared` now carries `Arc<ChunkMainMmcsContext<F>>` +
`chunk_idx` instead of an `Arc<Mmcs>` directly. The context holds the
chunk's MMCS + Arc-cloned LDE columns for the chunk-mates in MMCS spec
sort order. Aux mirrors via `ChunkAuxMmcsContext<E>`.

`Lde { main, aux }` columns are now `Arc<Vec<Vec<FE>>>` (built on
B1's Arc-wrapped `LDETraceTable`). Each table's `Round1.lde_trace`
shares the same Arc as the chunk context — no duplication.

# Open path

`open_deep_composition_poly` dispatches on the chunk context's MMCS via
`mmcs.open_with_leaves`, with a closure that rehashes chunk-mate rows
via the new `rehash_main_chip_leaf` / `rehash_aux_chip_leaf` helpers
(read from chunk-shared LDE columns + matrix tag, hash with the
appropriate `LEAF_DOMAIN_TAG_*`).

# Verifier

`multi_verify` reads `chunk_size` from the proof, walks chunks of the
AIR slice, validates each chunk's expected spec against the supplied
one, and absorbs roots in chunk order. `verify_rounds_2_to_4` and
`verify_main_mmcs_pair` take `main_mmcs_root: Option<&Commitment>` (and
similarly aux): `Shared` opening with `None` root → reject.

# Tests

- Existing main + aux soundness suites adapted to the per-chunk shape
  via `first_populated_main_chunk` / `first_populated_aux_chunk`
  helpers + `proof.main_mmcs_roots[chunk_idx]` field access.
- New soundness test `tampered_chunk_size_rejected`: pinned chunk_size
  mismatch must be rejected (Vec length cross-check fires).
- bin/cli/proof-size breakdown updated: now reports `main_mmcs_roots /
  main_mmcs_specs / aux_mmcs_roots / aux_mmcs_specs / chunk_size` as
  separate sections (multi-proof header), drops the obsolete
  `per_table_aux_merkle_root` and pre-Vec `main_mmcs_root/spec` rows.

Results: 152/152 stark tests green (151 before + new
`tampered_chunk_size_rejected`); 27/27 crypto mmcs tests green; 3/3
cli unit tests green; lambda-vm-prover bitwise (preprocessed-path) +
non-ELF tests pass; the 77 prove_elfs failures are the same pre-existing
`UnknownSyscall(5)` executor bug present on main.
@diegokingston
Copy link
Copy Markdown
Collaborator Author

/bench 5

Third MMCS after main + aux. Each chunk's tables share a composition
MMCS built between R2 LDE build and R3 z-sample. Drops per-table
composition_poly_root from StarkProof entirely; the chunk root is
absorbed into each chunk-mate's fork once per chunk.

# Why structurally different from main/aux

Composition polys are built INSIDE the per-table R2-R4 loop after each
fork samples its own beta. Per-chunk MMCS therefore needs a join point
mid-rounds:
  R2a (per-table, parallel): sample beta + build composition LDE +
                              tagged row-pair leaves
  [chunk join]               build chunk composition MMCS via
                              StreamingMmcsBuilder; absorb root into
                              each chunk-mate's fork
  R2b → R3 → R4 (per-table): sample z, run OOD + FRI + opens

The old `prove_rounds_2_to_4` is now `prove_round_2a` (R2 LDE build)
+ `prove_rounds_2b_to_4` (everything after the chunk join). Verifier
mirrors: `replay_rounds_after_round_1` takes an Option<&Commitment>
chunk-comp root and absorbs it between beta and z (replacing the old
per-table `composition_poly_root` absorb).

# Leaf hash + helpers

Composition leaves hash a row PAIR (br_0 || br_1) rather than a single
row, so domain separation needs its own helpers:
- `LEAF_DOMAIN_TAG_COMPOSITION = "LAMBDAVM_COMP_MMCS_LEAF_V1"`.
- `hash_tagged_row_pair_bytes_composition` (bytes flavour).
- `hash_tagged_row_pair_composition` (elements flavour).
- `compute_tagged_leaves_row_pair_bit_reversed_composition` (prover-side
  bulk-leaf computation).
- `rehash_comp_chip_leaf` (open-time on-demand leaf for chunk-mates).

# Proof format

`StarkProof`:
- drops `composition_poly_root: Commitment` (now at MultiProof / chunk
  level).

`DeepPolynomialOpening`:
- `composition_poly: PolynomialOpenings<E>` → `CompositionTraceOpening<E>`
  (new enum, single `Mmcs` variant with `evaluations`, `evaluations_sym`,
  and ONE `mmcs_opening` — the row-pair leaf covers both rows).

`MultiProof`:
- `comp_mmcs_roots: Vec<Option<Commitment>>` (parallel to main/aux,
  always Some in practice since every table has a composition poly).
- `comp_mmcs_specs: Vec<Vec<(MatrixTag, usize)>>` — chunk specs in MMCS
  sort order. Padded height = lde_size / 2 (row-pair count).

# Architecture

- `ChunkCompMmcsContext<E>` — sister of main/aux chunk contexts.
- `CompCommit<E>::Shared { chunk_ctx, chunk_idx, tag, padded_height }`.
- `Round2<F>` now holds `Arc<Vec<Vec<FE>>>` LDE + `CompCommit`. No
  per-table Merkle tree.
- `R2aResult<E>` — handoff carrier between R2a and the chunk join.
- `build_chunk_comp_mmcs` — mirror of `build_chunk_main_mmcs` /
  `build_chunk_aux_mmcs`; uses `StreamingMmcsBuilder`.

# Open + verifier

- `open_composition_poly` produces `CompositionTraceOpening::Mmcs` via
  `chunk_ctx.mmcs.open_with_leaves` + the rehash closure.
- `verify_comp_mmcs_pair_inner` rehashes the row-pair leaf, checks
  `matrix_leaves[table_idx]`, authenticates against the chunk root.
- `step_4_verify_trace_and_composition_openings` threads
  `comp_mmcs_root` + `comp_mmcs_spec` through to the per-query check.
- `multi_verify` adds per-chunk comp-spec validation (every chunk has
  Some root; spec sorted height-desc, tag-asc).

# Tests

- 3 new composition soundness tests: tampered root / spec height /
  missing root all rejected.
- 2 new mmcs_leaf tests: composition domain separates from main+aux;
  bytes-flavour matches element-flavour.
- `cli proof-size` breakdown gains `comp_mmcs_roots (per-chunk)` and
  `comp_mmcs_specs (per-chunk)` sections.

Results: 157/157 stark tests green (152 prior + 3 new comp soundness +
2 new mmcs_leaf comp tests). cli 3/3, lambda-vm-prover bitwise
(preprocessed) tests pass, 77 prove_elfs failures are the pre-existing
UnknownSyscall(5) executor bug present on main.

# Performance note

R2a and R2b-R4 within a chunk are sequential rather than per-table
parallel — `chunk_size = table_parallelism()` is small (typically 8),
and the dominant cost in R2 (constraint eval, FFT) already has internal
rayon parallelism. Keeping the chunk loop sequential avoids
`par_iter_mut()` on transcript slices that breaks with downstream
generic `IsStarkTranscript` bounds.
Mathematical primitive for batched FRI: combine N same-size evaluation
vectors into one using successive powers of δ_fri. A bucket's chip-DEEP
evaluations combine into one polynomial; FRI commits + opens only that
combined polynomial, while per-chip openings (main/aux/comp) still
authenticate each individual D_i value at the bucket-shared iotas.

Singleton bucket is a fast path (no combination needed). Empty input is
a usage bug (debug-asserted).

Not yet wired up — prover + verifier integration lands in the follow-up
chunk-join refactor.

4/4 new fri:: unit tests green. 157 prior stark tests unaffected.
@diegokingston
Copy link
Copy Markdown
Collaborator Author

/bench 5

1 similar comment
@diegokingston
Copy link
Copy Markdown
Collaborator Author

/bench 5

… combine

Phase D of the MMCS streaming plan: one FRI commit per height bucket
inside a chunk. Bucket-mates' DEEP composition polynomials are linearly
combined with successive powers of δ_fri (sampled from the chunk-shared,
post-OOD-broadcast transcript) and a single FRI commit + grinding +
query phase runs on the combined polynomial. Drops 4 per-table proof
fields; adds one `MultiProof` field.

Built on top of (and is the final piece of) the streaming MMCS plan:
trace-MMCS (b162180) and composition-MMCS (c22fca9) shave ~10-15% of
proof bytes; batched FRI is where the rest of the proof-size win lives.

# Proof format

`StarkProof` drops:
  - `fri_layers_merkle_roots: Vec<Commitment>`
  - `fri_last_value: FieldElement<E>`
  - `query_list: Vec<FriDecommitment<E>>`
  - `nonce: Option<u64>`

`MultiProof` gains:
  - `fri_chunk_buckets: Vec<Vec<ChunkBucketFri<E>>>`

`ChunkBucketFri { lde_size, members, layer_roots, last_value,
decommitments, nonce }` — bucket members listed in canonical
chunk-local-index order so δ_fri^i powers are reproducible verifier-side.

# Bucket-seed transcript (canonical OOD broadcast)

For chunk c, after Phase A+B+C absorbs, the bucket seed extends a clone
of the pre-fork shared state by appending, in exact order:
  1. per chunk-mate j in chunk-local order: table_contribution_j (when
     bus_public_inputs is Some)
  2. chunk composition MMCS root (when Some)
  3. per chunk-mate j in chunk-local order: trace_ood_evaluations
     columns + composition_poly_parts_ood_evaluation

Per bucket b: bucket_transcript = bucket_seed.clone(), append
`(lde_size as u64).to_le_bytes()`, sample δ_fri, then standard
commit_phase_from_evaluations / grinding / iotas interaction.

The verifier reconstructs the bucket seed from `MultiProof` data alone
(table contributions from `bus_public_inputs`, comp root from
`comp_mmcs_roots`, OODs from each `StarkProof.trace_ood_evaluations` +
`composition_poly_parts_ood_evaluation`).

# Prover (multi_prove)

R2b → R3.5 split into a light per-fork pass + heavy bucket-level pass:

  - `prove_rounds_2b_to_3_5(...)` per chunk-mate (now parallel via
    rayon `par_iter_mut()` on the chunk transcript slice): sample z,
    compute OOD, absorb own OOD, sample γ, build the DEEP coefficient
    layout (`DeepCoeffs { z, gammas, trace_term_coeffs, lde_size }`).
    Does NOT build the DEEP LDE.

  - Build `bucket_seed` (canonical replay on `pre_fork_transcript`).

  - Bucket chunk-mates by `lde_size` (first-encounter order on
    chunk-local index).

  - Per bucket: clone bucket_seed, append lde_size, sample δ_fri.
    Streaming combine: per member, `compute_deep_lde_with_coeffs` →
    fold into the bucket accumulator with δ_fri^i_local → drop. First
    member of every bucket reuses its DEEP LDE as the accumulator
    base (no zero-init copy in the singleton case). Then
    commit_phase_from_evaluations / grind / sample iotas /
    query_phase on the combined polynomial.

  - Per chunk-mate: open trace / aux / comp / precomputed at the
    bucket-shared iotas, assemble per-table `StarkProof`. Iotas come
    from the bucket cache; no transcript replay needed.

# Verifier (multi_verify)

  - `replay_rounds_after_round_1` stops at γ — leaves `zetas`,
    `iotas`, and `grinding_seed` empty in the returned `Challenges`.
  - `verify_rounds_2_to_4` split into `replay_and_verify_step_2`
    (per-fork, returns `Option<Challenges>`) + `verify_step_4_at_iotas`
    (driven from `multi_verify` after the bucket FRI sets iotas).
  - `step_3_verify_fri` / `verify_query_and_sym_openings` replaced by
    `verify_bucket_fri_query` which takes the bucket's
    `layer_roots` + `last_value` + per-iota `FriDecommitment`
    externally and runs the standard FRI fold check against the
    combined-D evaluations.

  - `multi_verify` chunk loop:
      1. Per chunk-mate: fork transcript → bind table_contribution →
         `replay_and_verify_step_2` → store Challenges.
      2. Build `bucket_seed` from `MultiProof` data.
      3. Validate `fri_chunk_buckets[chunk_idx]` structure: bucket
         count matches the expected lde_size grouping; per bucket
         lde_size and member tag order match.
      4. Per bucket: replay layer-root absorbs + zetas + last_value +
         grinding + iotas (cached in `bucket_iotas_cache`).
         Reconstruct each member's D_i evaluations via the existing
         barycentric helper, combine with δ_fri^i, run
         `verify_bucket_fri_query` per iota.
      5. Per chunk-mate: `verify_step_4_at_iotas` using cached iotas.

# Performance considerations (F1 fixes)

The first draft of D5 retained K × DEEP LDE simultaneously and lost
per-chunk-mate parallelism in R2b-R3.5 (c22fca9 had serialised it):
proving time on the workload regressed ~117% while memory only dropped
~10%. Three fixes shipped together:

  - F1.1 (streaming bucket combine): DEEP_i materialises one at a
    time inside the bucket loop and folds into the accumulator with
    δ_fri^i, then drops. Peak DEEP memory inside the bucket loop is
    2 × |LDE| (current member + accumulator), restored from K × |LDE|.

  - F1.2 (restore per-chunk-mate parallelism): `chunk_transcripts.
    par_iter_mut().enumerate()` runs R2b → R3 → DEEP-coeffs
    concurrently per chunk-mate. `chunk_airs` + `chunk_num_rows` are
    extracted before the parallel iter so we don't capture the
    Sync-unfriendly `&mut TraceTable` inside `air_trace_pairs`.

  - F1.3 (verifier iota cache): `bucket_iotas_cache: Vec<Vec<usize>>`
    is populated during the FRI fold check pass and reused in the
    step 4 pass, eliminating a redundant clone+replay of the bucket
    transcript per chunk-mate.

# Tests (D7)

New `tests/bucket_fri_soundness_tests.rs` (7 tests):
  - baseline_phase_d_proof_verifies
  - tampered_bucket_last_value_rejected
  - tampered_bucket_layer_root_rejected
  - truncated_bucket_decommitments_rejected
  - missing_chunk_buckets_rejected
  - wrong_bucket_lde_size_rejected
  - swapped_member_order_rejected

`bin/cli/proof-size` breakdown: drops `fri_layers_merkle_roots` and
`fri_query_list`; adds `fri_chunk_buckets (per-chunk batched FRI)`.

# Results

  - `cargo test -p stark --lib` → 168/168 (161 prior + 7 new bucket
    FRI soundness tests).
  - `cargo test -p crypto --lib` → 73/73.
  - `cargo test -p cli` → 3/3.
  - `cargo test -p lambda-vm-prover --lib tests::bitwise*` → 29/29
    (preprocessed-AIR exercise).
  - 77 `tests::prove_elfs_tests::*` failures are the unchanged
    pre-existing `UnknownSyscall(5)` executor bug, present on main.

# Out of scope (follow-ups)

  - Fuse compute-DEEP + bucket-combine into a single per-LDE-point
    pass (eliminate the intermediate `Vec` for the current member).
  - Adaptive chunk sizing by LDE memory budget (replace fixed
    `chunk_size = table_parallelism()` with a bin-packing decision).
  - Disk-spill `StarkProof` + `ChunkBucketFri` between chunks for
    very-large-table workloads.
@diegokingston
Copy link
Copy Markdown
Collaborator Author

/bench 5

@diegokingston
Copy link
Copy Markdown
Collaborator Author

/bench 5

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant