perf: avoid cloning CPU witness MLEs by hero78119 · Pull Request #1342 · scroll-tech/ceno

hero78119 · 2026-05-20T12:12:28Z

Problem

left-over from #923. CPU trace commit cloned large witness MLEs before proving, adding avoidable memory traffic on the prover hot path.

Design Rationale

Keep committed witness MLEs behind Arc and drain/transport ownership where possible, avoiding deep clones without changing proof semantics.

Change Highlights

ceno_zkvm: return Arc witness MLEs from trace commit and consume structural MLEs during transport.
ceno_zkvm: keep GPU trait shape aligned while preserving existing GPU behavior.

Benchmark / Performance Impact

Operation

Operation	master (s)	this PR (s)	Improve (master -> this PR)
CPU proving, keccak e2e shard total	6.942	6.596	4.98% faster
GPU proving, keccak e2e shard total	1.191	1.186	0.44% faster

Layer

Layer	master (s)	this PR (s)	Improve (master -> this PR)
N/A: shard-level proving total measured	N/A	N/A	no regression observed

Benchmark command(s):

cargo run --config net.git-fetch-with-cli=true --release --package ceno_zkvm --bin e2e -- --platform=ceno --max-cycle-per-shard=1600 examples/target/riscv32im-ceno-zkvm-elf/release/examples/keccak_syscall
cargo run --config net.git-fetch-with-cli=true --features gpu --release --package ceno_zkvm --bin e2e -- --platform=ceno --max-cycle-per-shard=1600 examples/target/riscv32im-ceno-zkvm-elf/release/examples/keccak_syscall

Environment: local x86_64 Linux, release build, local ../ceno-gpu/cuda_hal patch for GPU validation.

raw data:

master: CPU shards 3.272s + 3.670s; GPU shards 0.624s + 0.568s
this PR: CPU shards 3.336s + 3.260s; GPU shards 0.593s + 0.593s

Testing

cargo check --config net.git-fetch-with-cli=true --package ceno_zkvm --bin e2e
cargo run --config net.git-fetch-with-cli=true --release --package ceno_zkvm --bin e2e -- --platform=ceno --max-cycle-per-shard=1600 examples/target/riscv32im-ceno-zkvm-elf/release/examples/keccak_syscall
cargo run --config net.git-fetch-with-cli=true --features gpu --release --package ceno_zkvm --bin e2e -- --platform=ceno --max-cycle-per-shard=1600 examples/target/riscv32im-ceno-zkvm-elf/release/examples/keccak_syscall

Risks and Rollout

Low risk: prover-side ownership change only. Rollback is reverting the Arc witness-MLE plumbing.

Follow-ups (optional)

None.

Copilot Reviewer Directive (keep this section)

When Copilot reviews this PR, apply .github/copilot-instructions.md strictly.

perf fix: unnecessary cpu mle cloned

f5b2dc0

hero78119 changed the title ~~perf fix: unnecessary cpu mle cloned~~ perf: avoid cloning CPU witness MLEs May 20, 2026

hero78119 enabled auto-merge May 20, 2026 13:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: avoid cloning CPU witness MLEs#1342

perf: avoid cloning CPU witness MLEs#1342
hero78119 wants to merge 1 commit into
masterfrom
feat/no_clone

hero78119 commented May 20, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

hero78119 commented May 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Design Rationale

Change Highlights

Benchmark / Performance Impact

Operation

Layer

Testing

Risks and Rollout

Follow-ups (optional)

Copilot Reviewer Directive (keep this section)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

hero78119 commented May 20, 2026 •

edited

Loading