Skip to content

perf: avoid cloning CPU witness MLEs#1342

Open
hero78119 wants to merge 1 commit into
masterfrom
feat/no_clone
Open

perf: avoid cloning CPU witness MLEs#1342
hero78119 wants to merge 1 commit into
masterfrom
feat/no_clone

Conversation

@hero78119
Copy link
Copy Markdown
Collaborator

@hero78119 hero78119 commented May 20, 2026

Problem

left-over from #923. CPU trace commit cloned large witness MLEs before proving, adding avoidable memory traffic on the prover hot path.

Design Rationale

Keep committed witness MLEs behind Arc and drain/transport ownership where possible, avoiding deep clones without changing proof semantics.

Change Highlights

  • ceno_zkvm: return Arc witness MLEs from trace commit and consume structural MLEs during transport.
  • ceno_zkvm: keep GPU trait shape aligned while preserving existing GPU behavior.

Benchmark / Performance Impact

Operation

Operation master (s) this PR (s) Improve (master -> this PR)
CPU proving, keccak e2e shard total 6.942 6.596 4.98% faster
GPU proving, keccak e2e shard total 1.191 1.186 0.44% faster

Layer

Layer master (s) this PR (s) Improve (master -> this PR)
N/A: shard-level proving total measured N/A N/A no regression observed

Benchmark command(s):

cargo run --config net.git-fetch-with-cli=true --release --package ceno_zkvm --bin e2e -- --platform=ceno --max-cycle-per-shard=1600 examples/target/riscv32im-ceno-zkvm-elf/release/examples/keccak_syscall
cargo run --config net.git-fetch-with-cli=true --features gpu --release --package ceno_zkvm --bin e2e -- --platform=ceno --max-cycle-per-shard=1600 examples/target/riscv32im-ceno-zkvm-elf/release/examples/keccak_syscall

Environment: local x86_64 Linux, release build, local ../ceno-gpu/cuda_hal patch for GPU validation.

raw data:

  • master: CPU shards 3.272s + 3.670s; GPU shards 0.624s + 0.568s
  • this PR: CPU shards 3.336s + 3.260s; GPU shards 0.593s + 0.593s

Testing

cargo check --config net.git-fetch-with-cli=true --package ceno_zkvm --bin e2e
cargo run --config net.git-fetch-with-cli=true --release --package ceno_zkvm --bin e2e -- --platform=ceno --max-cycle-per-shard=1600 examples/target/riscv32im-ceno-zkvm-elf/release/examples/keccak_syscall
cargo run --config net.git-fetch-with-cli=true --features gpu --release --package ceno_zkvm --bin e2e -- --platform=ceno --max-cycle-per-shard=1600 examples/target/riscv32im-ceno-zkvm-elf/release/examples/keccak_syscall

Risks and Rollout

Low risk: prover-side ownership change only. Rollback is reverting the Arc witness-MLE plumbing.

Follow-ups (optional)

None.

Copilot Reviewer Directive (keep this section)

When Copilot reviews this PR, apply .github/copilot-instructions.md strictly.

@hero78119 hero78119 changed the title perf fix: unnecessary cpu mle cloned perf: avoid cloning CPU witness MLEs May 20, 2026
@hero78119 hero78119 enabled auto-merge May 20, 2026 13:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant