gpu: add device-backed WITGEN commit path#1339
Merged
Merged
Conversation
…_mle_zero_padding
…/ceno into feat/prover_mle_zero_padding
hero78119
commented
May 19, 2026
|
|
||
| #[cfg(feature = "gpu")] | ||
| { | ||
| if false |
Collaborator
Author
There was a problem hiding this comment.
this is some debug left-over. Fixed this bring back concurrent prove and baseline improve 1.037x e2e performance
kunxian-xia
approved these changes
May 19, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
GPU_WITGEN,CACHE=1can produce witness traces on GPU, but the prover path still needs a clean device-resident commit flow. The goal is to keep GPU-generated witness data usable through commit without falling back to replay/deferred raw-cache logic or unnecessary host materialization.Design Rationale
This PR treats GPU witness output as the source of truth for the commit path: traces are normalized into device-backed row-major metadata, committed through the GPU PCS path, and released once q'/commit no longer needs the raw backing. The post-commit proving flow stays aligned with the existing
CPU_WITGENpath so correctness-sensitive transcript, opening, and proof assembly logic remain shared.The design avoids retaining replay plans as a second witness source. This keeps ownership simpler: GPU witness generation owns raw device buffers until q'/commit construction, then releases them before chip proving pressure grows.
Change Highlights
ceno_zkvm: add GPU witness/device-backed trace commit path forGPU_WITGEN,CACHE=1.ceno_zkvm: keep post-commit proving and opening flow shared with the existing GPU prover path.ceno_zkvm: release shard GPU witness caches after proof construction.gkr_iop: support GPU-side batched main-constraint proving integration.CI Benchmark Summary
Compared CI benchmark runs:
GPU_WITGEN: original PR benchmark numbers, kept for context.CPU_WITGEN:26067686212, branchfeat/witgen_gpu,CENO_GPU_ENABLE_WITGEN=0CPU_WITGEN (baseline):26037135648, branchfeat/update_dep,CENO_GPU_ENABLE_WITGEN=0Benchmark / Performance Impact
This is performance-sensitive. CI benchmark runs are used for comparable end-to-end numbers because local wall time depends heavily on runner scheduling and GPU availability.
Operation
Layer
Benchmark command(s):
# ceno-reth-benchmark CI, GPU_WITGEN,CACHE=1 and CPU_WITGEN,CACHE=1 comparison runsEnvironment (CPU/GPU, core count, rust toolchain, commit hash):
CI benchmark runner metadata and commit hashes are recorded in the linked workflow runs.
raw data:
Testing
Risks and Rollout
CENO_GPU_ENABLE_WITGENto return to the existingCPU_WITGENGPU proving path.Follow-ups (optional)
Copilot Reviewer Directive (keep this section)
When Copilot reviews this PR, apply
.github/copilot-instructions.mdstrictly.