Feat/shrink cpu byte alu#644
Conversation
…layout, and document the deviations
|
/bench 5 |
Benchmark — fib_iterative_8M (median of 5)Table parallelism: auto (cores / 3)
Commit: 00ec7a0 · Baseline: cached · Runner: self-hosted bench |
Codex Code ReviewFound one issue. High - CPU32 padding rows can emit disconnected bus operations prover/src/tables/cpu32.rs uses This can inject extra register reads/writes through MEMW without a corresponding executed Verification note: I tried to run the focused prover tests, but the container’s Rust toolchain failed before build because |
| let rvd = step.get_main_evaluation_element(0, rvd_col).clone(); | ||
| let pc = step.get_main_evaluation_element(0, pc_col).clone(); | ||
| // expected = (pc + len) for low word; (pc) for high word. | ||
| // The carry from low → high is omitted (pc < 2^31 in practice). |
There was a problem hiding this comment.
Medium — missing carry propagation for rvd on BRANCH rows
The high-word constraint for the JAL/JALR return address (rvd = pc + instruction_length) silently discards any carry-out from the low word:
// expected (high word) = pc[1] ← no carry from pc[0] + len
let expected = if high { pc } else { pc + &half_len + &half_len };For a program counter near 0xFFFF_FFFC, adding 4 overflows the 32-bit low word, and rvd[1] would be written as pc[1] instead of pc[1] + 1. This is the same assumption that exists in NextPcAddConstraint's carry_1 computation (which does propagate carry_0 correctly in the IS_BIT enforcement, but only for the non-branching path).
A maliciously crafted trace with pc[0] = 0xFFFF_FFFE and a 4-byte instruction would satisfy this constraint with rvd[1] = pc[1] (off by one), while the next_pc constraint would record the correct wrapped value. The prover won't generate such a trace for real ELF binaries, but there is no cryptographic enforcement.
Recommendation: Either propagate carry here (mirroring the compute_carry_0 / compute_carry_1 pattern), or add a range constraint on pc[0] that rules out overflow (pc[0] + max_instruction_length < 2^32).
| let carry = match self.carry_idx { | ||
| 0 => self.compute_carry_0(step), | ||
| 1 => self.compute_carry_1(step), | ||
| _ => panic!("Invalid carry index"), |
There was a problem hiding this comment.
Low — use unreachable! instead of panic!
The _ => panic!("Invalid carry index") arm is dead code: carry_idx is already validated as <= 1 by the assert! in new. Using panic! here signals a recoverable runtime error; unreachable! communicates the correct intent (this is a logic bug, not a runtime condition) and may allow the compiler to eliminate the arm entirely in release builds.
| _ => panic!("Invalid carry index"), | |
| _ => unreachable!("carry_idx validated <= 1 at construction"), |
| constraints.push(add_hi.boxed()); | ||
|
|
||
| // IS_BIT(invert) | ||
| let (is_bit, next) = new_is_bit_constraints(&[cols::INVERT], idx); |
There was a problem hiding this comment.
Low — eq column has no explicit IS_BIT constraint
invert gets an IS_BIT constraint here, but the eq column does not. eq is soundly forced to be a bit via the ZERO bus lookup (ZERO[Σ diff[i]] → eq), which can only produce 0 or 1. However, this makes the EQ table's constraint set non-self-contained: its soundness depends on the ZERO bus receiver in the BITWISE table enforcing a binary output domain.
Adding IS_BIT(eq) here makes the constraints self-contained and removes the implicit dependency on the bus receiver's co-domain. The constraint is degree 2 and cheap.
| alu_op::AND => self.a & self.b, | ||
| alu_op::OR => self.a | self.b, | ||
| alu_op::XOR => self.a ^ self.b, | ||
| other => panic!("BYTEWISE only handles AND/OR/XOR, got opcode {other}"), |
There was a problem hiding this comment.
Low — panic! in trace-generation path aborts proof creation ungracefully
This panic! (and the identical one at line 81) fires if the trace builder passes an unexpected opcode. Since this is inside trace generation (not constraint evaluation), a panic would crash the prover process rather than returning a recoverable error.
Consider returning Result<_, ProverError> from compute_res / collect_bitwise_ops and propagating the error up through the trace builder, so a malformed trace produces a proper error instead of an abort.
| cols::PREV_PC_TIMESTAMP_BORROW, | ||
| cols::PC_DOUBLE_READ, | ||
| cols::PREV_PC_TIMESTAMP_BORROW, | ||
| ]; |
There was a problem hiding this comment.
Low — BRANCH_COND removed from IS_BIT enforcement
BRANCH_COND was previously in BIT_FLAG_COLUMNS. It is no longer there; its binary-ness is now argued compositionally:
BRANCH,mem_flags(=JALR on BRANCH rows), andres[0](from EQ chip) are all bits.- Therefore
BRANCH·JALR + BRANCH·(1−JALR)·res[0] ∈ {0,1}⟹branch_cond ∈ {0,1}.
This is sound for valid traces. The NextPcAddConstraint uses (1 − branch_cond) in a degree-3 product; if branch_cond were ever outside {0,1} (e.g., via a bug in the bus receiver domain or a future refactor breaking the mutex), that constraint could be satisfied spuriously without the prover noticing.
Adding BRANCH_COND back to BIT_FLAG_COLUMNS costs one degree-2 constraint and makes this invariant explicit and self-contained.
Code Review — Feat/shrink-cpu-byte-alu (#644)This is a significant, well-structured CPU table rework. The reduction from ~76 to ~39 columns, the unified ALU/MEMORY buses, and the delegation of Constraint CompletenessThe overall constraint set is correct and complete for valid traces:
Issues Found
Inline comments posted for each issue. Minor Notes
|
This PR implements the CPU rework specified in #624.
The CPU table is reduced from ~76 to ~39 columns by moving word
instructions to a dedicated table and collapsing the per-chip ALU/memory
buses into two shared buses.
Summary
Dense
packed_decode.The per-opcode one-hot ALU selectors and the
*_ext_bit/arg1columns were removed.
Each CPU row now carries a single
packed_decodefield containingflags, register indices, and the
alu_flags/mem_flagsbytes. Thisvalue is checked against the
DECODEtable.Unified buses.
The per-chip
Lt/Mul/Dvrm/Shiftbuses, together with theload/store path, were collapsed into two shared buses:
ALU[out; in1, in2, flags]MEMORY[out; timestamp, address, value, flags]The CPU now dispatches every arithmetic, comparison, and shift operation
through
ALU, and every memory operation throughMEMORY.BYTE_ALUlookup.Bitwise
AND/OR/XORnow use a singleBYTE_ALU[opsel, X, Y] -> outlookup served by theBITWISEtable.Word instructions delegated to
CPU32.*Winstructions are delegated to a newCPU32table, which has itsown decode and 32-bit execution logic.
In the main CPU table, these rows now only delegate execution, advance
the PC, and send the corresponding
CPU32lookup.New chips.
Added the following chips:
CPU32: handles word instructions.EQ: handlesBEQ/BNE.BYTEWISE: handlesAND/OR/XORthroughBYTE_ALU.STORE: split fromMEMW;MEMORYnow dispatches betweenLOADandSTORE.