Research/mmcs streaming#628
Conversation
… tests PR1 of the streaming-MMCS plan (see docs/mmcs-streaming-design.md for the full plan). This commit adds the module standalone — NOT wired into the prover yet — so the API and the 8-vector threat model can be reviewed in isolation before any production hot-path change. What's in: - `MmcsBuilder` / `Mmcs` / `MmcsOpening` types over the existing `IsMerkleTreeBackend` abstraction. - Plonky3-style layer-injection build: sort matrices by `padded_height` desc (ties by tag asc), layer 0 = largest matrix's leaves, compress pairs upward, inject smaller matrices at their height-layer via `compress(node, leaf)`. - Open(global_index) -> per-matrix leaves at the shifted index + sibling chain. Verify reconstructs the root bottom-up; rejects malformed or tampered openings. MVP restrictions (called out in the module doc): - Matrices must have *distinct* padded_heights (matches lambda-vm's chip topology where heights almost never collide). Same-height case is Phase 2 — would mix multiple matrices at layer 0 row-by-row. - No SIMD / parallel hashing yet. - Caller materializes full leaf digest arrays per matrix (no streaming chunked absorption yet; that's wrapped on top later). - Single root only — no caps in this iteration. Security: 8 vectors from the threat model, each as a test: v1 cross-matrix row swap (rejected via matrix-leaf order check) v2 padding-byte freedom (rejected at build: NotPowerOfTwo) v3 same-height matrices (rejected: DuplicateHeight — MVP restriction) v4 leaf re-labelling (rejected via tag binding in opening) v5 wrong leaf data (rejected via root mismatch) v6 index tampering (rejected via path divergence) v7 truncated path (rejected: siblings.len() vs depth check) v8 verifier-side spec mismatch (rejected via expected_specs check) Plus duplicate-tag and out-of-bounds checks. 13 tests, 0 failed. `make lint` clean across all three clippy configs. No prover/verifier code touched, no proof format change. Next steps (separate PRs): - PR2: wire into Round 1 main-trace commit, replace per-chip Merkle loop. - PR3: extend to aux trace + composition.
Drops the MVP `DuplicateHeight` restriction so the MMCS matches lambda-vm's actual chip topology (3+ CPU chunks all at 2^20, BITWISE at 2^20, etc.). Build pipeline (now general): - Sort by `padded_height` desc, ties by `tag` asc — verifier reproduces. - Layer 0 = first max-height matrix's leaves, then sequentially compress in every additional max-height matrix's leaves at the same row index. - Each upper layer: compress pairs of children, then inject every matrix whose `padded_height` matches that layer's length, in tag-asc order. Verify mirrors with a cursor over `matrix_leaves` (which the builder already orders by height desc then tag asc). At layer L we combine in every leaf at the current row position whose matrix has height equal to the current layer length; cursor must end exactly at the leaves' end (unconsumed leaves => topology mismatch => reject). Tests: - `same_height_pair_round_trips`: two matrices at max_height combine. - `lambda_vm_style_multi_chunk_round_trips`: 3 chunks at 2^3, 2 at 2^2, 1 at 2^0 — mirrors typical lambda-vm chip groupings. - `insertion_order_does_not_change_root`: permutation invariance. - `same_height_tampered_leaf_rejected`: tampering one of the combined leaves at the same layer still fails root match. 17 tests total, 0 failed. `make lint` clean.
Adds `#[cfg(feature = "parallel")]` paths to the MMCS finalize step: - Layer-0 same-height combine: row-parallel (each row independently folds K matrices in tag-asc order; K is small, the per-row sequential chain is short, rows scale across cores). - Pair compression upward: pair-parallel. - Matrix injection at non-leaf layers: row-parallel. Mirrors Plonky3's `first_digest_layer` + `compress_and_inject` parallel shape (we read their `merkle-tree/src/merkle_tree.rs` to confirm the algorithmic structure matches). Differences: - Plonky3 SIMD-packs WIDTH rows per hash call via PackedField. lambda-vm uses scalar Keccak so we fall back to per-row Rayon parallelism (similar throughput class for our chip topology, no SIMD code needed). - Plonky3 hashes raw matrix rows internally (multi-matrix same-height rows concatenated -> single hash). Ours takes pre-hashed leaves so a same-height combine is `compress(leaf_A, leaf_B)` (one extra compress per matrix per row vs Plonky3). The trade-off: ours lets the caller hash chip-by-chip and drop the LDE before the MMCS sees anything, giving better peak memory for lambda-vm's large per-chip LDEs. Worth reconsidering for PR2 if profiling shows the extra compresses matter. Helpers factored out so the parallel/serial cfg lives in one place per operation (`build_combined_layer`, `compress_pairs`, `inject_matrices`). Tests: same 17 cases pass both with and without `--features parallel`. `make lint` clean across all three configs.
Adds two `#[ignore]` tests that print concrete numbers when invoked
with `cargo test -p crypto --features parallel mmcs_bench --
--ignored --nocapture`. Two scenarios:
1. **Build time** comparison on a scaled-down lambda-vm topology
(8 chips: 3 at 2^14, 2 at 2^12, 2 at 2^10, 1 at 2^8).
Measured on this host:
N independent trees: 4481 µs
Single MMCS tree: 3176 µs
MMCS / N-trees ratio: 0.709 (MMCS is ~30% faster)
The savings come from sharing layer-0 work when multiple chips
live at max_height — fewer hash chains end up duplicated.
2. **Per-query opening hash count** for the same topology:
N independent trees: 102 hashes per query
Unified MMCS: 21 hashes per query
Reduction factor: ~4.9x
At production scale (~30 chips, max_h = 2^20) this projects to
~13x reduction in per-query Merkle hashes, which is the dominant
recursion-guest cost after #601's preprocessed-commitment cache.
Caveats:
- Topology is scaled-down to keep the bench fast (<1 s); production
numbers will differ but in the same direction.
- Build bench measures wall-clock with `Instant`, not statistically
rigorous like Criterion. Good enough as a sanity gate before PR2.
- Opening bench counts *Merkle path hashes only*, not leaf-hash work
or compressions inside `verify` — both equal between modes.
Validates the Phase B->C migration: MMCS build does NOT regress prover
time (it improves it), and the verifier-as-guest cycle saving is
real and measurable.
The MMCS leaf hash binds matrix identity via a `MatrixTag` (8 bytes per
matrix). Prover and verifier must derive the same tag for the same
chip-chunk; otherwise the Fiat-Shamir transcript diverges silently —
opaque verification failure with no clear diagnostic. Centralising the
tag derivation in one shared module turns "same tag" from a convention
into a compile-time guarantee.
Encoding:
MatrixTag = [chip_type_id : u32 LE] [chunk_index : u32 LE]
Chip type IDs are STABLE wire constants (verified by `tag_encoding_is_stable`
test pinning specific bytes). Append-only — never reassign, never reuse
removed IDs. Split tables share a low ID range (CPU=0..MEMW_REGISTER=9),
single-instance tables a mid range (100..107), and per-page tables get a
single ID (200) with `chunk_index` encoding the page index.
Tests cover:
- Uniqueness across all realistic (chip, chunk) pairs (10 split tables
× 64 chunks + 8 single + 256 page indices, ~944 distinct tags).
- Encoding stability (specific bytes pinned).
- Sensitivity (changing chip_type or chunk_index changes the tag).
No prover/verifier code touched yet. This module exists so PR2's
wire-up (which replaces N per-chip Merkle trees with one MMCS) can
import a single source of truth.
Bridges per-chip LDE columns to the unified MMCS over the main trace. Not yet wired into `multi_prove` — exists so the API + leaf-hash format can be reviewed and tested before the hot-path change in the next step. Module surface: - `compute_chip_leaves_with_tag(columns, tag) -> Vec<Commitment>`: per-row Keccak256 of `LEAF_DOMAIN_TAG || tag.0 || row_bytes_be`. Bit-reversed row order matches the existing FRI / Merkle layout. Caller can drop the LDE columns immediately after — memory peak is one chip's LDE at a time (same as today's per-chip Merkle build, no regression). - `build_main_trace_mmcs(entries)`: thin wrapper that pours `(MatrixTag, leaves)` pairs into the standalone `MmcsBuilder` from `crypto/merkle_tree/mmcs`. Returns one Mmcs with one root; per-query opens come from `mmcs.open(global_index)`. Soundness notes baked into the doc: - The leaf hash MUST include the chip tag. Without it, a single shared root cannot bind matrix identity (cross-chip row swap becomes a real attack). The legacy `keccak_leaves_bit_reversed` is unsafe for MMCS use; mixing the two formats would be a silent soundness bug. The test `leaves_differ_from_legacy_format` pins that the two encodings produce different bytes. - The domain tag `LAMBDAVM_MAIN_MMCS_LEAF_V1` is versioned so future changes to the leaf encoding can be detected by stale verifiers. 5 tests, 0 failed. `make lint` clean. Next step: replace the per-chip transcript-absorb loop in `multi_prove` Round 1 Phase A with a single MMCS build + absorb. That commit also drops `StarkProof.lde_trace_main_merkle_root` and changes the per-query opening format.
Foundation for the upcoming MMCS Phase C wire-up. No hot-path changes; all new types are defined but unused. Lands the surfaces both the prover-side wire-up commit and the verifier-side replay commit will consume next. - crypto/mmcs: add serde derives on MatrixTag and MmcsOpening so they can live in the proof format. - stark/mmcs_leaf: single source of truth for the main-trace MMCS leaf hash format (LEAF_DOMAIN_TAG + hash_tagged_row_bytes + hash_tagged_row). Prover and verifier will both call this; the prover-side compute_chip_leaves_with_tag adapter calls it row-wise. - stark/proof/stark: MainTraceOpening<F> struct (evaluations + evaluations_sym + MmcsOpening per row, paired with its symmetric counterpart). Not yet wired into DeepPolynomialOpening. - stark/prover: MainCommit<F> struct (shared Arc<Mmcs> + per-table MatrixTag + optional precomputed tree). Unused at the wire-up level but defined here as the keystone type. Marked allow(dead_code). - prover/lib: VmAirs::air_tags() returns the parallel Vec<MatrixTag> in air_trace_pairs / air_refs order. Prover and verifier must call this on identical VmAirs configurations.
Wires the parallel Vec<MatrixTag> through both prover and verifier signatures. The value is not consumed yet (let _ = main_tags), only validated against AIR count. Lays the surface for the upcoming Phase A MMCS wire-up without changing any cryptographic behaviour. - multi_prove(air_trace_pairs, main_tags, transcript, ...) - multi_verify(airs, multi_proof, main_tags, transcript, balance) - Both reject a mismatch between `main_tags.len()` and AIR count. - Single-AIR `prove` / `verify` synthesize a default tag. - stark::test_utils::multi_verify_ram helper hides the synthetic tag generation from stark internal tests; bulk-migrated all 37 call sites to use it. - New `mmcs_leaf::synth_main_tags(n)` + `synth_main_tags_for(slice)` helpers for lambda-vm tests that build ad-hoc AIR slices. - prover/src/lib.rs `prove` / `verify` wire `airs.air_tags()` to the shared MMCS-bound transcript. - lambda-vm tests bulk-updated to pass synthesized tags (values are ignored under the current `let _ = main_tags` no-op). Tests: stark 130/130 green; lambda-vm-prover non-ELF tests green (prove_elfs failures pre-date this change — they need ELF binaries not present in the working tree).
…t on per-table trees
Phase C wire-up of the streaming-MMCS plan. Non-preprocessed tables are now
committed under a single shared main-trace MMCS; preprocessed tables stay
OUT of the MMCS and keep their own per-table multiplicities Merkle tree
(in addition to the AIR-pinned precomputed tree).
Architecture:
- `MainCommit<F>` is now an enum:
* `Shared { mmcs (Arc), tag, padded_height }` for non-preprocessed
* `Preprocessed { multiplicities_tree, multiplicities_root,
precomputed_tree, precomputed_root, num_precomputed_cols }`
- `MainTraceOpening<F>` is now an enum:
* `Mmcs { evaluations, evaluations_sym, mmcs_opening, mmcs_opening_sym }`
* `Tree(PolynomialOpenings<F>)` for the preprocessed per-table tree
- `MultiProof` gains `main_mmcs_root: Commitment` and
`main_mmcs_spec: Vec<(MatrixTag, usize)>` (the MMCS is at multi-proof
level, hence single-AIR `prove` / `verify` now return / take MultiProof).
- `StarkProof.lde_trace_main_merkle_root: Option<Commitment>` is repurposed:
`Some` for preprocessed tables (per-table multiplicities root), `None`
for non-preprocessed tables (covered by `main_mmcs_root`).
Phase A absorb order (prover + verifier + `replay_transcript_phase_a` match):
for each table in spec-fixed order:
if preprocessed:
absorb precomputed_root (AIR-pinned, verifier rejects mismatch)
absorb multiplicities_root (per-table tree)
absorb main_mmcs_root (once, over Shared tables only)
Per-query main-trace opening:
- Shared tables: `mmcs.open(iota*2 << shift)` and `mmcs.open((iota*2+1) << shift)`,
where `shift = log2(max_height / this_padded_height)`. The verifier
rehashes `evaluations` with `mmcs_leaf::hash_tagged_row`, compares against
`matrix_leaves[table_idx]`, and authenticates against root+spec.
- Preprocessed tables: `verify_opening_pair` against
`StarkProof.lde_trace_main_merkle_root` (unchanged single-tree path).
Test-suite updates (single-AIR `prove` now returns MultiProof):
- crypto/stark/src/tests/small_trace_tests.rs: field access via `proof.proofs[0]`,
`MainTraceOpening` matched as enum.
- crypto/stark/src/tests/bus_tests/soundness_tests.rs: same.
Tests: stark 130/130 green. lambda-vm-prover non-ELF tests unchanged
(the 77 prove_elfs failures predate this change — same UnknownSyscall(5)
executor bug on `git stash` baseline).
10 tests covering the attack surface of the shared main-trace MMCS introduced in the previous commit. All operate on a baseline-valid two-table multi-proof over non-preprocessed AIRs (DummyAIR + BitFlagsAIR) so every main opening is `MainTraceOpening::Mmcs`. The preprocessed `MainTraceOpening::Tree` branch is exercised end-to-end by lambda-vm-prover `bitwise_tests`. Each test tampers with a single field and asserts verifier rejection: Field tampered Detection mechanism ----------------------------------------------------------------------- multi_proof.main_mmcs_root[0] transcript divergence multi_proof.main_mmcs_spec[0].1 (height) reproduced-spec check multi_proof.main_mmcs_spec[0].0 (tag) reproduced-spec check mmcs_opening.matrix_leaves[idx].1 (digest) rehash-vs-leaf check mmcs_opening.matrix_leaves[idx].0 (tag) leaf-tag check mmcs_opening.global_index g_primary/g_sym match mmcs_opening.siblings[0][0] MmcsOpening::verify evaluations[0] rehash mismatch main_tags slice swapped at verifier spec sort mismatch (baseline test) verifies cleanly Together these pin the soundness of every byte the verifier consults on the new path, locking in behaviour before the C2 streaming-builder work.
Adds `cli proof-size <ELF>` which generates (or loads via `--proof`) a
VmProof and reports the serialized byte size, broken down by component
across every sub-proof. Intended for CI to track proof-size regressions
and improvements (e.g. the streaming MMCS migration that just landed).
Usage:
cli proof-size <ELF> # human-readable table
cli proof-size <ELF> --json # machine-readable
cli proof-size <ELF> --proof bundle.bin # skip the re-prove
cli proof-size <ELF> --private-input file # stdin if needed
Sections reported (summed across all sub-proofs):
- main_mmcs_root / main_mmcs_spec (multi-proof header)
- per_table_main_merkle_root (preprocessed only)
- per_table_precomputed_merkle_root
- per_table_aux_merkle_root
- deep_poly_openings.main_trace_polys (MMCS vs Tree opens)
- deep_poly_openings.{precomputed,aux,composition_poly}
- fri_layers_merkle_roots / fri_query_list (FRI; usually dominant)
- trace_ood_evaluations / composition_poly_parts_ood_evaluation
- bus_public_inputs
- other (bundle delta — headers,
public_inputs, nonce, ...)
Encoding is bincode v1, matching `cli prove`'s output format so saved
bundles round-trip 1:1.
Cargo.lock updated only for the new `serde` + `serde_json` deps in
`bin/cli`. No core dep changes.
|
/bench 5 |
Benchmark — fib_iterative_8M (median of 5)Table parallelism: auto (cores / 3)
Commit: 880ec4d · Baseline: cached · Runner: self-hosted bench |
Codex Code ReviewFound issues:
I attempted |
| /// (a) the bytes are clearly versioned against any future change and | ||
| /// (b) they cannot collide with leaves of a different MMCS (aux trace, | ||
| /// composition, ...). Bump the suffix on any encoding change. | ||
| const LEAF_DOMAIN_TAG: &[u8] = b"LAMBDAVM_MAIN_MMCS_LEAF_V1"; |
There was a problem hiding this comment.
Medium — Duplicate domain tag (maintenance hazard)
This constant duplicates crypto/stark/src/mmcs_leaf.rs:32:
pub const LEAF_DOMAIN_TAG: &[u8] = b"LAMBDAVM_MAIN_MMCS_LEAF_V1";The module is explicitly "not yet wired into multi_prove," so the live prover uses mmcs_leaf::hash_tagged_row_bytes (which references mmcs_leaf::LEAF_DOMAIN_TAG). When this module eventually gets wired in as a replacement it must import that constant, not define its own copy. If one is bumped without the other, prover and verifier will silently compute different leaf hashes and every proof will fail.
Suggestion: delete the local copy now and use stark::mmcs_leaf::LEAF_DOMAIN_TAG; (or just call stark::mmcs_leaf::hash_tagged_row_bytes directly instead of re-implementing it).
|
|
||
| fn ser_len<T: serde::Serialize>(value: &T) -> usize { | ||
| // bincode v1 mirrors the encoding used by VmProof callers (bin/cli prove | ||
| // and prover tests), so per-section sums add up to the total bundle. |
There was a problem hiding this comment.
Medium — Silent failure corrupts the size report
unwrap_or(0) means any serialization error produces a 0 byte entry in the table. The percentage column will silently misrepresent the breakdown (and the s_other residual could underflow or wildly overcount).
Since bincode::serialize only fails on structurally invalid types (which is a programmer error, not a runtime condition), an expect is more appropriate here:
| // and prover tests), so per-section sums add up to the total bundle. | |
| bincode::serialize(value).expect("bincode serialization must succeed for VmProof types").len() |
| /// Currently unused at the call sites; defined here as the foundation | ||
| /// for the upcoming MMCS Phase C wire-up (see | ||
| /// `docs/mmcs-streaming-c1-spec.md`). | ||
| #[allow(dead_code)] |
There was a problem hiding this comment.
Low — Stale annotation
air_tags() is actively called by both prove_with_options_and_inputs and verify_with_options in this file, so #[allow(dead_code)] is incorrect and will confuse future readers into thinking it's an unused helper. Please remove it.
| /// for the upcoming MMCS Phase C wire-up (see | ||
| /// `docs/mmcs-streaming-c1-spec.md`). | ||
| #[allow(dead_code)] | ||
| pub fn air_tags(&self) -> Vec<crypto::merkle_tree::mmcs::MatrixTag> { |
There was a problem hiding this comment.
Medium — Unverified parallel-array ordering invariant
air_tags() returns tags in a hardcoded order (BITWISE first, DECODE second, …, then split-chip loops) that must match the order of air_refs() and air_trace_pairs() element-for-element. Any divergence — adding a chip to one but not the other, or reordering — causes a silent MMCS spec mismatch that the verifier catches only at proof-verification time, with no helpful error pointing here.
Consider adding a debug_assert_eq!(self.air_tags().len(), self.air_refs().len(), "air_tags and air_refs must be in sync") at least in multi_prove (or a compile-time marker using a unit test that asserts a known-length fixture), so regressions surface immediately.
| .map_err(map_mmcs_err)?; | ||
| } | ||
| } | ||
| let mmcs = builder.finalize().map_err(map_mmcs_err)?; |
There was a problem hiding this comment.
Low — All-preprocessed topology panics with an opaque error
If every AIR in the multi-proof is preprocessed, the loop above adds no matrices and builder.finalize() returns Err(MmcsError::Empty), which propagates as ProvingError::WrongParameter("MMCS: Empty"). Lambda-VM always has at least one non-preprocessed table, so this won't fire in production, but it's a sharp edge for anyone writing tests with only preprocessed AIRs. A doc comment on build_main_mmcs (or a dedicated ProvingError::NoSharedMmcsMatrices variant) would make the failure mode explicit.
Review: Research/mmcs-streamingOverview: This PR introduces a Plonky3-style Multi-Matrix Commitment Scheme (MMCS) that replaces N independent per-table Merkle trees with a single shared root over all non-preprocessed tables. The core MMCS implementation ( FindingsMedium — Duplicate leaf-hash implementation (link) Medium — Low — Stale Low — Low — All-preprocessed topology gives an opaque error (link) |
Adds a pure post-processing subcommand for CI to render proof-size
deltas between two `proof-size --json` reports. Mirrors the
`tooling/loc` workflow: prove on the baseline and on the PR, then diff
the JSONs to produce a comment-ready table.
cli proof-size base.elf --json > base.json
cli proof-size pr.elf --json > pr.json
cli proof-size-diff base.json pr.json --format github > comment.md
Formats:
- `text` — plain aligned table for terminal / logs
- `github` — Markdown with a collapsible <details> per-section block;
the headline row stays visible in the PR comment so reviewers see the
total delta at a glance
- `slack` — Slack-flavoured Markdown (code fence for the breakdown)
Per-section diff handles the typical edge cases:
- Section present in both → bytes delta + percentage
- Section new in current → `+N (new)`
- Section dropped in current → `-N (gone)`
- Section order follows `current`, then prev-only at the end (lossless)
The previously-anonymous `Report` struct in `cmd_proof_size` is hoisted
to a public-shape `ProofSizeReport { elf, total_vm_proof_bytes,
multi_proof_bytes, sub_proof_count, main_mmcs_spec_entries, sections }`
so the JSON schema is stable enough for CI to depend on. `section` is
now `String` (was `&'static str`) so the same struct round-trips
through deserialization.
3 unit tests in `proof_size_diff_tests` cover:
- Headline + per-section deltas in text format
- New-section + removed-section handling
- GitHub format collapsible block + percentage rendering
Build clean, 3/3 tests green, smoke-tested both formats with fake JSONs.
Mirror of the main-trace MMCS C1 work, applied to the auxiliary trace.
Pulls per-table aux Merkle roots out of the per-table forked transcripts
into a single shared aux MMCS root absorbed into the SHARED transcript
BEFORE per-table forking — every fork inherits the same aux binding
identically, and the proof carries one aux root instead of N.
Domain separation:
- `LEAF_DOMAIN_TAG_AUX = "LAMBDAVM_AUX_MMCS_LEAF_V1"` (alongside the
existing main tag, now also aliased as `LEAF_DOMAIN_TAG_MAIN`).
- `hash_tagged_row_bytes_aux` / `hash_tagged_row_aux` helpers use it.
- An aux-MMCS opening cannot authenticate a main leaf (or vice versa);
a `(tag, row)` pair under the two domains produces distinct digests.
Pinned by a new test in `mmcs_leaf::tests`.
Architecture:
- `MainTraceOpening` gains a sibling `AuxTraceOpening<E>` enum (Mmcs
variant only — there is no preprocessed-equivalent for aux).
- `MainCommit` gains a sibling `AuxCommit<E>::Shared { mmcs (Arc), tag,
padded_height }`. `Round1.aux: Option<AuxCommit<E>>` (`None` when an
AIR has no aux trace).
- `DeepPolynomialOpening.aux_trace_polys: Option<AuxTraceOpening<E>>`
(the old `PolynomialOpenings<E>` shape is gone).
- `StarkProof` drops `lde_trace_aux_merkle_root` entirely.
- `MultiProof` gains:
- `aux_mmcs_root: Option<Commitment>` — `None` when no AIR has aux.
- `aux_mmcs_spec: Vec<(MatrixTag, usize)>` — filtered to has-aux tables.
Phase C absorb order (prover + verifier match exactly):
1. Phase A: main MMCS root absorb (unchanged).
2. Phase B: sample LogUp challenges (unchanged).
3. NEW: build aux LDEs + tagged leaves for tables with aux,
build shared aux MMCS,
absorb its root into the SHARED transcript.
4. Fork per-table.
5. Per-table: bind `bus_public_inputs.table_contribution` (unchanged),
run rounds 2-4 (unchanged). No per-table aux root absorb.
Per-query open / verify path:
- Prover: `aux_commit.mmcs.open((iota*2) << shift)` and
`mmcs.open((iota*2+1) << shift)`, producing `AuxTraceOpening::Mmcs`.
- Verifier: `verify_aux_mmcs_pair_inner` rehashes evaluations with the
AUX domain, compares against `mmcs_opening.matrix_leaves[table_idx]`,
cross-checks `global_index`, and authenticates against root + spec.
- `verify_trace_openings` dispatches: `Some(_)` requires aux MMCS root
to be present; `(None, _)` is fine (table has no aux); `(Some, None)`
rejects (proof claims an opening but no MMCS exists).
Verifier spec validation:
- Reproduces `expected_aux_spec` from `airs.filter(has_aux_trace).map((tag, lde_size))`
in spec-fixed order, sorts by `(height desc, tag asc)` to match
`MmcsBuilder::finalize`, and rejects on any mismatch with the
proof-supplied `aux_mmcs_spec`.
Dead code cleanup:
- Removed `TableCommit<F>` entirely. Both main and aux now go through
their own enum-typed commits; nothing else used it.
Tests: stark 142/142 green (130 prior + the 10 main-MMCS soundness +
2 new aux leaf-hash tests). The 77 prove_elfs failures in
lambda-vm-prover predate this work (UnknownSyscall(5)).
…path 9 tests covering the aux MMCS attack surface (mirror of the existing main-MMCS suite). All use a baseline two-table multi-proof over two `LogReadOnlyRAP` AIRs so both tables contribute aux and therefore both participate in the shared MMCS. Field tampered Detection mechanism ----------------------------------------------------------------------- multi_proof.aux_mmcs_root[0] transcript divergence multi_proof.aux_mmcs_root = None root presence/spec check multi_proof.aux_mmcs_spec[0].1 (height) reproduced-spec check multi_proof.aux_mmcs_spec[0].0 (tag) reproduced-spec check mmcs_opening.matrix_leaves[idx].1 (digest) rehash-vs-leaf check mmcs_opening.global_index g_primary/g_sym match mmcs_opening.siblings[0][0] MmcsOpening::verify evaluations[0] += 1 rehash mismatch (baseline test) verifies cleanly Locks behaviour for the entire aux MMCS path — root absorb, spec sort, leaf rehash with the aux domain separator, and Merkle authentication.
…mentally Adds a streaming-build variant of `MmcsBuilder` that folds per-chip leaves at the MAX height into a single shared running layer-0 as they arrive, instead of holding every max-height chip's leaf vector alive simultaneously. The other heights still need per-chip leaves stored until `finalize` — see "Why max-height only?" below. Equivalence guarantee: identical root + spec + layer bytes to `MmcsBuilder` for the same input set (locked by 4 round-trip tests against the one-shot path, plus order-violation rejection tests). # Why "max-height only"? MMCS layer-0 at max height is built by left-folding every chip's leaves at row `i`. With no left-anchor to compose with, the running fold `acc = hash(acc, chip_k[i])` is mathematically equivalent to the one-shot `hash(hash(hash(chip_0[i], chip_1[i]), chip_2[i]), ...)`. For chips at heights BELOW max, the MMCS injection rule is `next[i] = hash(hash(hash(next[i], chip_0[i]), chip_1[i]), ...)`, which mixes the upward-compressed `next[i]` into the left-fold. Keccak (and any non-associative hash) makes it impossible to pre-fold the chips into a single summary and inject that summary later — the resulting digest would differ from the one-shot builder, breaking verifier compatibility. So we keep per-chip leaves for non-max heights and inject them in left-fold order at `finalize`. # Add order Callers MUST add matrices in (height desc, tag asc) order — the exact sort `MmcsBuilder::finalize` does internally. Out-of-order calls return `MmcsError::OutOfOrder`. Same-height tags must strictly ascend; duplicate tags at different heights still trip `DuplicateTag`. # Memory / wire-format - Peak savings vs one-shot: `(num_max_height_chips - 1) * max_height * node_size` worth of per-chip leaf storage that is folded immediately and dropped. For lambda-vm with ~5 chips at 2^20 that is ~128 MB of transient Vec<Commitment>. - Wire format unchanged: the produced `Mmcs` has the same `root()` and `spec()`. Layers Vec matches one-shot byte-for-byte. - Per-chip `leaf_digests` is empty on the streaming output — `Mmcs::open` is therefore unavailable until callers supply leaves at open time. This is the prover wire-up that comes next. # MmcsMatrix change `MmcsMatrix.padded_height` is now a stored field instead of reading `leaf_digests.len()`, so `spec()` reports the right value even when the streaming builder has emptied per-chip leaves. Tests: 9 new tests on top of the prior 17 MMCS tests; 26 total green.
`StreamingMmcsBuilder` discards per-chip leaves at build time, so
`Mmcs::open(global_index)` — which reads `matrix.leaf_digests[idx]` to
fill `MmcsOpening.matrix_leaves` — would return empty leaves and a bad
opening. This commit adds the complementary API:
pub fn open_with_leaves<F>(global_index, leaf_fn) -> MmcsOpening
where F: FnMut(matrix_idx, local_idx) -> B::Node
The closure provides each matrix's per-row leaf at the appropriate
shifted index. The prover-side use case (next commit) is to rehash a
row from a per-table LDE on demand, replacing the one-shot Mmcs's
internal leaf storage with on-the-fly rehashing.
`Mmcs::open` is now a thin wrapper around `open_with_leaves` whose
closure reads from `self.matrices[i].leaf_digests` — back-compat, no
caller change.
Equivalence pinned by a new test
`streaming_open_with_leaves_round_trips_against_one_shot`:
- Build lambda-vm-shaped topology two ways (one-shot + streaming).
- Compare roots + specs (already covered).
- For every global_index in [0, max_height): open one-shot, open
streaming via `open_with_leaves` feeding leaves from the input set,
assert `global_index`, `siblings`, and `matrix_leaves` byte-identical.
- Assert the streaming opening verifies against the streaming root.
27/27 MMCS tests green (1 new on top of 26).
Foundation for the per-chunk MMCS refactor: with `open_with_leaves`
available, the prover can stream-build per-chunk MMCSes and rehash
leaves from chunk-shared LDEs at open time.
Foundation for the per-chunk MMCS migration. The chunk-grouped main and aux MMCSes need to rehash chunk-mate LDE rows on demand at open time (the prover-side input to `Mmcs::open_with_leaves`). Without Arc-shared columns, each chunk-mate's open would have to copy ~600 MB of column data per CPU LDE — a non-starter. Change: - `LDETraceTable.main_columns: Vec<Vec<FE<F>>>` → `Arc<Vec<Vec<FE<F>>>>`. - Same for `aux_columns`. - `from_columns(Vec, Vec, ...)` still works (wraps internally with `Arc::new`) — all existing call sites unaffected. - New `from_columns_arc(Arc, Arc, ...)` for callers (next commit) that already hold Arc-shared column data. - New `main_columns_arc()` / `aux_columns_arc()` cheap-clone accessors for the chunk-shared MMCS open helpers. - `into_columns()` now returns `(Arc, Arc)` instead of `(Vec, Vec)`. No internal caller uses the old signature outside `LDETraceTable` construction (which uses Arc-aware path); external uses would have to migrate, but searching the workspace shows no such callers. No protocol change. No proof-format change. No memory cost: a single `Arc::new` on construction; deref is free; column reads via `[col][row]` still go through `Vec` Deref. Existing 151/151 stark tests pass without edits.
…thin chunk
Refactors the main + aux MMCSes from one-global to per-chunk: each
parallel chunk of K = `table_parallelism()` tables gets its own
streaming MMCS, with chunk-mate LDEs Arc-shared so the per-query open
path can rehash chunk-mate rows on demand. Final piece of the streaming
plan (after `StreamingMmcsBuilder` and `Mmcs::open_with_leaves`).
# Why per-chunk?
The streaming MMCS drops per-chip leaves at build time, so `Mmcs::open`
needs to recompute leaves on the fly via `open_with_leaves`. That
closure must produce leaves for every matrix in the MMCS spec — not
just the current table — because `MmcsOpening::verify` walks all
max-height matrices to reconstruct layer-0. Per-chunk grouping bounds
the cross-table LDE access to the K tables a chunk owns (already kept
alive together by the existing parallel R2-R4 loop), avoiding the
alternatives of (a) global Arc<Vec<Arc<Lde>>> threaded through every
fork, or (b) reordering iota sampling earlier.
# Proof format
`MultiProof`:
- `main_mmcs_root: Commitment` → `main_mmcs_roots: Vec<Option<Commitment>>`
- `main_mmcs_spec: Vec<(...)>` → `main_mmcs_specs: Vec<Vec<(...)>>`
- `aux_mmcs_root: Option<Commitment>` → `aux_mmcs_roots: Vec<Option<Commitment>>`
- `aux_mmcs_spec: Vec<(...)>` → `aux_mmcs_specs: Vec<Vec<(...)>>`
- + `chunk_size: u32` (pinned `table_parallelism()` so the verifier
chunks the AIR slice the same way the prover did).
`None` entries in the *_roots Vecs mark chunks with no MMCS-eligible
tables (all-preprocessed for main, no-aux for aux). Per-query openings
shrink: each carries ≤K matrix_leaves instead of N.
# Phase A / Phase C absorb order (prover + verifier match exactly)
Per chunk in chunk order:
for each table in spec order:
- absorb its preprocessed root (preprocessed only)
- absorb its per-table multiplicities root (preprocessed only)
- absorb that chunk's main MMCS root (Some) or skip (None)
After Phase A → sample LogUp challenges → Phase C aux mirrors main:
for each chunk in chunk order:
- absorb that chunk's aux MMCS root (Some) or skip (None)
Then fork per-table → per-table table_contribution → rounds 2-4.
# Data plumbing
`MainCommit::Shared` now carries `Arc<ChunkMainMmcsContext<F>>` +
`chunk_idx` instead of an `Arc<Mmcs>` directly. The context holds the
chunk's MMCS + Arc-cloned LDE columns for the chunk-mates in MMCS spec
sort order. Aux mirrors via `ChunkAuxMmcsContext<E>`.
`Lde { main, aux }` columns are now `Arc<Vec<Vec<FE>>>` (built on
B1's Arc-wrapped `LDETraceTable`). Each table's `Round1.lde_trace`
shares the same Arc as the chunk context — no duplication.
# Open path
`open_deep_composition_poly` dispatches on the chunk context's MMCS via
`mmcs.open_with_leaves`, with a closure that rehashes chunk-mate rows
via the new `rehash_main_chip_leaf` / `rehash_aux_chip_leaf` helpers
(read from chunk-shared LDE columns + matrix tag, hash with the
appropriate `LEAF_DOMAIN_TAG_*`).
# Verifier
`multi_verify` reads `chunk_size` from the proof, walks chunks of the
AIR slice, validates each chunk's expected spec against the supplied
one, and absorbs roots in chunk order. `verify_rounds_2_to_4` and
`verify_main_mmcs_pair` take `main_mmcs_root: Option<&Commitment>` (and
similarly aux): `Shared` opening with `None` root → reject.
# Tests
- Existing main + aux soundness suites adapted to the per-chunk shape
via `first_populated_main_chunk` / `first_populated_aux_chunk`
helpers + `proof.main_mmcs_roots[chunk_idx]` field access.
- New soundness test `tampered_chunk_size_rejected`: pinned chunk_size
mismatch must be rejected (Vec length cross-check fires).
- bin/cli/proof-size breakdown updated: now reports `main_mmcs_roots /
main_mmcs_specs / aux_mmcs_roots / aux_mmcs_specs / chunk_size` as
separate sections (multi-proof header), drops the obsolete
`per_table_aux_merkle_root` and pre-Vec `main_mmcs_root/spec` rows.
Results: 152/152 stark tests green (151 before + new
`tampered_chunk_size_rejected`); 27/27 crypto mmcs tests green; 3/3
cli unit tests green; lambda-vm-prover bitwise (preprocessed-path) +
non-ELF tests pass; the 77 prove_elfs failures are the same pre-existing
`UnknownSyscall(5)` executor bug present on main.
|
/bench 5 |
Third MMCS after main + aux. Each chunk's tables share a composition
MMCS built between R2 LDE build and R3 z-sample. Drops per-table
composition_poly_root from StarkProof entirely; the chunk root is
absorbed into each chunk-mate's fork once per chunk.
# Why structurally different from main/aux
Composition polys are built INSIDE the per-table R2-R4 loop after each
fork samples its own beta. Per-chunk MMCS therefore needs a join point
mid-rounds:
R2a (per-table, parallel): sample beta + build composition LDE +
tagged row-pair leaves
[chunk join] build chunk composition MMCS via
StreamingMmcsBuilder; absorb root into
each chunk-mate's fork
R2b → R3 → R4 (per-table): sample z, run OOD + FRI + opens
The old `prove_rounds_2_to_4` is now `prove_round_2a` (R2 LDE build)
+ `prove_rounds_2b_to_4` (everything after the chunk join). Verifier
mirrors: `replay_rounds_after_round_1` takes an Option<&Commitment>
chunk-comp root and absorbs it between beta and z (replacing the old
per-table `composition_poly_root` absorb).
# Leaf hash + helpers
Composition leaves hash a row PAIR (br_0 || br_1) rather than a single
row, so domain separation needs its own helpers:
- `LEAF_DOMAIN_TAG_COMPOSITION = "LAMBDAVM_COMP_MMCS_LEAF_V1"`.
- `hash_tagged_row_pair_bytes_composition` (bytes flavour).
- `hash_tagged_row_pair_composition` (elements flavour).
- `compute_tagged_leaves_row_pair_bit_reversed_composition` (prover-side
bulk-leaf computation).
- `rehash_comp_chip_leaf` (open-time on-demand leaf for chunk-mates).
# Proof format
`StarkProof`:
- drops `composition_poly_root: Commitment` (now at MultiProof / chunk
level).
`DeepPolynomialOpening`:
- `composition_poly: PolynomialOpenings<E>` → `CompositionTraceOpening<E>`
(new enum, single `Mmcs` variant with `evaluations`, `evaluations_sym`,
and ONE `mmcs_opening` — the row-pair leaf covers both rows).
`MultiProof`:
- `comp_mmcs_roots: Vec<Option<Commitment>>` (parallel to main/aux,
always Some in practice since every table has a composition poly).
- `comp_mmcs_specs: Vec<Vec<(MatrixTag, usize)>>` — chunk specs in MMCS
sort order. Padded height = lde_size / 2 (row-pair count).
# Architecture
- `ChunkCompMmcsContext<E>` — sister of main/aux chunk contexts.
- `CompCommit<E>::Shared { chunk_ctx, chunk_idx, tag, padded_height }`.
- `Round2<F>` now holds `Arc<Vec<Vec<FE>>>` LDE + `CompCommit`. No
per-table Merkle tree.
- `R2aResult<E>` — handoff carrier between R2a and the chunk join.
- `build_chunk_comp_mmcs` — mirror of `build_chunk_main_mmcs` /
`build_chunk_aux_mmcs`; uses `StreamingMmcsBuilder`.
# Open + verifier
- `open_composition_poly` produces `CompositionTraceOpening::Mmcs` via
`chunk_ctx.mmcs.open_with_leaves` + the rehash closure.
- `verify_comp_mmcs_pair_inner` rehashes the row-pair leaf, checks
`matrix_leaves[table_idx]`, authenticates against the chunk root.
- `step_4_verify_trace_and_composition_openings` threads
`comp_mmcs_root` + `comp_mmcs_spec` through to the per-query check.
- `multi_verify` adds per-chunk comp-spec validation (every chunk has
Some root; spec sorted height-desc, tag-asc).
# Tests
- 3 new composition soundness tests: tampered root / spec height /
missing root all rejected.
- 2 new mmcs_leaf tests: composition domain separates from main+aux;
bytes-flavour matches element-flavour.
- `cli proof-size` breakdown gains `comp_mmcs_roots (per-chunk)` and
`comp_mmcs_specs (per-chunk)` sections.
Results: 157/157 stark tests green (152 prior + 3 new comp soundness +
2 new mmcs_leaf comp tests). cli 3/3, lambda-vm-prover bitwise
(preprocessed) tests pass, 77 prove_elfs failures are the pre-existing
UnknownSyscall(5) executor bug present on main.
# Performance note
R2a and R2b-R4 within a chunk are sequential rather than per-table
parallel — `chunk_size = table_parallelism()` is small (typically 8),
and the dominant cost in R2 (constraint eval, FFT) already has internal
rayon parallelism. Keeping the chunk loop sequential avoids
`par_iter_mut()` on transcript slices that breaks with downstream
generic `IsStarkTranscript` bounds.
Mathematical primitive for batched FRI: combine N same-size evaluation vectors into one using successive powers of δ_fri. A bucket's chip-DEEP evaluations combine into one polynomial; FRI commits + opens only that combined polynomial, while per-chip openings (main/aux/comp) still authenticate each individual D_i value at the bucket-shared iotas. Singleton bucket is a fast path (no combination needed). Empty input is a usage bug (debug-asserted). Not yet wired up — prover + verifier integration lands in the follow-up chunk-join refactor. 4/4 new fri:: unit tests green. 157 prior stark tests unaffected.
|
/bench 5 |
1 similar comment
|
/bench 5 |
… combine Phase D of the MMCS streaming plan: one FRI commit per height bucket inside a chunk. Bucket-mates' DEEP composition polynomials are linearly combined with successive powers of δ_fri (sampled from the chunk-shared, post-OOD-broadcast transcript) and a single FRI commit + grinding + query phase runs on the combined polynomial. Drops 4 per-table proof fields; adds one `MultiProof` field. Built on top of (and is the final piece of) the streaming MMCS plan: trace-MMCS (b162180) and composition-MMCS (c22fca9) shave ~10-15% of proof bytes; batched FRI is where the rest of the proof-size win lives. # Proof format `StarkProof` drops: - `fri_layers_merkle_roots: Vec<Commitment>` - `fri_last_value: FieldElement<E>` - `query_list: Vec<FriDecommitment<E>>` - `nonce: Option<u64>` `MultiProof` gains: - `fri_chunk_buckets: Vec<Vec<ChunkBucketFri<E>>>` `ChunkBucketFri { lde_size, members, layer_roots, last_value, decommitments, nonce }` — bucket members listed in canonical chunk-local-index order so δ_fri^i powers are reproducible verifier-side. # Bucket-seed transcript (canonical OOD broadcast) For chunk c, after Phase A+B+C absorbs, the bucket seed extends a clone of the pre-fork shared state by appending, in exact order: 1. per chunk-mate j in chunk-local order: table_contribution_j (when bus_public_inputs is Some) 2. chunk composition MMCS root (when Some) 3. per chunk-mate j in chunk-local order: trace_ood_evaluations columns + composition_poly_parts_ood_evaluation Per bucket b: bucket_transcript = bucket_seed.clone(), append `(lde_size as u64).to_le_bytes()`, sample δ_fri, then standard commit_phase_from_evaluations / grinding / iotas interaction. The verifier reconstructs the bucket seed from `MultiProof` data alone (table contributions from `bus_public_inputs`, comp root from `comp_mmcs_roots`, OODs from each `StarkProof.trace_ood_evaluations` + `composition_poly_parts_ood_evaluation`). # Prover (multi_prove) R2b → R3.5 split into a light per-fork pass + heavy bucket-level pass: - `prove_rounds_2b_to_3_5(...)` per chunk-mate (now parallel via rayon `par_iter_mut()` on the chunk transcript slice): sample z, compute OOD, absorb own OOD, sample γ, build the DEEP coefficient layout (`DeepCoeffs { z, gammas, trace_term_coeffs, lde_size }`). Does NOT build the DEEP LDE. - Build `bucket_seed` (canonical replay on `pre_fork_transcript`). - Bucket chunk-mates by `lde_size` (first-encounter order on chunk-local index). - Per bucket: clone bucket_seed, append lde_size, sample δ_fri. Streaming combine: per member, `compute_deep_lde_with_coeffs` → fold into the bucket accumulator with δ_fri^i_local → drop. First member of every bucket reuses its DEEP LDE as the accumulator base (no zero-init copy in the singleton case). Then commit_phase_from_evaluations / grind / sample iotas / query_phase on the combined polynomial. - Per chunk-mate: open trace / aux / comp / precomputed at the bucket-shared iotas, assemble per-table `StarkProof`. Iotas come from the bucket cache; no transcript replay needed. # Verifier (multi_verify) - `replay_rounds_after_round_1` stops at γ — leaves `zetas`, `iotas`, and `grinding_seed` empty in the returned `Challenges`. - `verify_rounds_2_to_4` split into `replay_and_verify_step_2` (per-fork, returns `Option<Challenges>`) + `verify_step_4_at_iotas` (driven from `multi_verify` after the bucket FRI sets iotas). - `step_3_verify_fri` / `verify_query_and_sym_openings` replaced by `verify_bucket_fri_query` which takes the bucket's `layer_roots` + `last_value` + per-iota `FriDecommitment` externally and runs the standard FRI fold check against the combined-D evaluations. - `multi_verify` chunk loop: 1. Per chunk-mate: fork transcript → bind table_contribution → `replay_and_verify_step_2` → store Challenges. 2. Build `bucket_seed` from `MultiProof` data. 3. Validate `fri_chunk_buckets[chunk_idx]` structure: bucket count matches the expected lde_size grouping; per bucket lde_size and member tag order match. 4. Per bucket: replay layer-root absorbs + zetas + last_value + grinding + iotas (cached in `bucket_iotas_cache`). Reconstruct each member's D_i evaluations via the existing barycentric helper, combine with δ_fri^i, run `verify_bucket_fri_query` per iota. 5. Per chunk-mate: `verify_step_4_at_iotas` using cached iotas. # Performance considerations (F1 fixes) The first draft of D5 retained K × DEEP LDE simultaneously and lost per-chunk-mate parallelism in R2b-R3.5 (c22fca9 had serialised it): proving time on the workload regressed ~117% while memory only dropped ~10%. Three fixes shipped together: - F1.1 (streaming bucket combine): DEEP_i materialises one at a time inside the bucket loop and folds into the accumulator with δ_fri^i, then drops. Peak DEEP memory inside the bucket loop is 2 × |LDE| (current member + accumulator), restored from K × |LDE|. - F1.2 (restore per-chunk-mate parallelism): `chunk_transcripts. par_iter_mut().enumerate()` runs R2b → R3 → DEEP-coeffs concurrently per chunk-mate. `chunk_airs` + `chunk_num_rows` are extracted before the parallel iter so we don't capture the Sync-unfriendly `&mut TraceTable` inside `air_trace_pairs`. - F1.3 (verifier iota cache): `bucket_iotas_cache: Vec<Vec<usize>>` is populated during the FRI fold check pass and reused in the step 4 pass, eliminating a redundant clone+replay of the bucket transcript per chunk-mate. # Tests (D7) New `tests/bucket_fri_soundness_tests.rs` (7 tests): - baseline_phase_d_proof_verifies - tampered_bucket_last_value_rejected - tampered_bucket_layer_root_rejected - truncated_bucket_decommitments_rejected - missing_chunk_buckets_rejected - wrong_bucket_lde_size_rejected - swapped_member_order_rejected `bin/cli/proof-size` breakdown: drops `fri_layers_merkle_roots` and `fri_query_list`; adds `fri_chunk_buckets (per-chunk batched FRI)`. # Results - `cargo test -p stark --lib` → 168/168 (161 prior + 7 new bucket FRI soundness tests). - `cargo test -p crypto --lib` → 73/73. - `cargo test -p cli` → 3/3. - `cargo test -p lambda-vm-prover --lib tests::bitwise*` → 29/29 (preprocessed-AIR exercise). - 77 `tests::prove_elfs_tests::*` failures are the unchanged pre-existing `UnknownSyscall(5)` executor bug, present on main. # Out of scope (follow-ups) - Fuse compute-DEEP + bucket-combine into a single per-LDE-point pass (eliminate the intermediate `Vec` for the current member). - Adaptive chunk sizing by LDE memory budget (replace fixed `chunk_size = table_parallelism()` with a bin-packing decision). - Disk-spill `StarkProof` + `ChunkBucketFri` between chunks for very-large-table workloads.
|
/bench 5 |
|
/bench 5 |
No description provided.