Skip to content

Feat/shrink cpu byte alu#644

Open
jotabulacios wants to merge 19 commits into
mainfrom
feat/shrink-cpu-byte-alu
Open

Feat/shrink cpu byte alu#644
jotabulacios wants to merge 19 commits into
mainfrom
feat/shrink-cpu-byte-alu

Conversation

@jotabulacios
Copy link
Copy Markdown
Collaborator

@jotabulacios jotabulacios commented Jun 2, 2026

This PR implements the CPU rework specified in #624.

The CPU table is reduced from ~76 to ~39 columns by moving word
instructions to a dedicated table and collapsing the per-chip ALU/memory
buses into two shared buses.

Summary

  • Dense packed_decode.
    The per-opcode one-hot ALU selectors and the *_ext_bit / arg1
    columns were removed.
    Each CPU row now carries a single packed_decode field containing
    flags, register indices, and the alu_flags / mem_flags bytes. This
    value is checked against the DECODE table.

  • Unified buses.
    The per-chip Lt / Mul / Dvrm / Shift buses, together with the
    load/store path, were collapsed into two shared buses:

    • ALU[out; in1, in2, flags]
    • MEMORY[out; timestamp, address, value, flags]

    The CPU now dispatches every arithmetic, comparison, and shift operation
    through ALU, and every memory operation through MEMORY.

  • BYTE_ALU lookup.
    Bitwise AND / OR / XOR now use a single BYTE_ALU[opsel, X, Y] -> out lookup served by the BITWISE table.

  • Word instructions delegated to CPU32.
    *W instructions are delegated to a new CPU32 table, which has its
    own decode and 32-bit execution logic.
    In the main CPU table, these rows now only delegate execution, advance
    the PC, and send the corresponding CPU32 lookup.

  • New chips.
    Added the following chips:

    • CPU32: handles word instructions.
    • EQ: handles BEQ / BNE.
    • BYTEWISE: handles AND / OR / XOR through BYTE_ALU.
    • STORE: split from MEMW; MEMORY now dispatches between LOAD and
      STORE.

Comment thread prover/src/constraints/cpu.rs Outdated
@diegokingston
Copy link
Copy Markdown
Collaborator

/bench 5

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jun 2, 2026

Benchmark — fib_iterative_8M (median of 5)

Table parallelism: auto (cores / 3)

Metric main PR Δ
Peak heap 51835 MB 35336 MB -16499 MB (-31.8%) 🟢
Prove time 24.520s 18.744s -5.776s (-23.6%) 🟢

🎉 Improvement detected — heap or time decreased by more than 5%.

✅ Low variance (time: 0.7%, heap: 1.0%)

Commit: 00ec7a0 · Baseline: cached · Runner: self-hosted bench

@jotabulacios jotabulacios marked this pull request as ready for review June 2, 2026 19:14
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jun 2, 2026

Codex Code Review

Found one issue.

High - CPU32 padding rows can emit disconnected bus operations

prover/src/tables/cpu32.rs uses READ_REGISTER1, READ_REGISTER2, WRITE_REGISTER, and ALU directly as bus multiplicities, but the constraints only bit-check those flags and MU; they never enforce flag => MU at prover/src/tables/cpu32.rs. Since padded CPU32 rows have MU = 0, a malicious witness can set one of these flags on a row that is not connected to a main CPU CPU32 delegation or DECODE lookup, then emit MEMW/ALU interactions from an otherwise inactive row.

This can inject extra register reads/writes through MEMW without a corresponding executed *W instruction, weakening execution soundness. Add constraints like (1 - MU) * flag = 0 for READ_REGISTER1, READ_REGISTER2, WRITE_REGISTER, ALU, ADD, and SUB or make all CPU32 bus multiplicities gated by MU.

Verification note: I tried to run the focused prover tests, but the container’s Rust toolchain failed before build because /home/runner/.rustup/tmp is read-only.

let rvd = step.get_main_evaluation_element(0, rvd_col).clone();
let pc = step.get_main_evaluation_element(0, pc_col).clone();
// expected = (pc + len) for low word; (pc) for high word.
// The carry from low → high is omitted (pc < 2^31 in practice).
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Medium — missing carry propagation for rvd on BRANCH rows

The high-word constraint for the JAL/JALR return address (rvd = pc + instruction_length) silently discards any carry-out from the low word:

// expected (high word) = pc[1]   ← no carry from pc[0] + len
let expected = if high { pc } else { pc + &half_len + &half_len };

For a program counter near 0xFFFF_FFFC, adding 4 overflows the 32-bit low word, and rvd[1] would be written as pc[1] instead of pc[1] + 1. This is the same assumption that exists in NextPcAddConstraint's carry_1 computation (which does propagate carry_0 correctly in the IS_BIT enforcement, but only for the non-branching path).

A maliciously crafted trace with pc[0] = 0xFFFF_FFFE and a 4-byte instruction would satisfy this constraint with rvd[1] = pc[1] (off by one), while the next_pc constraint would record the correct wrapped value. The prover won't generate such a trace for real ELF binaries, but there is no cryptographic enforcement.

Recommendation: Either propagate carry here (mirroring the compute_carry_0 / compute_carry_1 pattern), or add a range constraint on pc[0] that rules out overflow (pc[0] + max_instruction_length < 2^32).

let carry = match self.carry_idx {
0 => self.compute_carry_0(step),
1 => self.compute_carry_1(step),
_ => panic!("Invalid carry index"),
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Low — use unreachable! instead of panic!

The _ => panic!("Invalid carry index") arm is dead code: carry_idx is already validated as <= 1 by the assert! in new. Using panic! here signals a recoverable runtime error; unreachable! communicates the correct intent (this is a logic bug, not a runtime condition) and may allow the compiler to eliminate the arm entirely in release builds.

Suggested change
_ => panic!("Invalid carry index"),
_ => unreachable!("carry_idx validated <= 1 at construction"),

Comment thread prover/src/tables/eq.rs
constraints.push(add_hi.boxed());

// IS_BIT(invert)
let (is_bit, next) = new_is_bit_constraints(&[cols::INVERT], idx);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Low — eq column has no explicit IS_BIT constraint

invert gets an IS_BIT constraint here, but the eq column does not. eq is soundly forced to be a bit via the ZERO bus lookup (ZERO[Σ diff[i]] → eq), which can only produce 0 or 1. However, this makes the EQ table's constraint set non-self-contained: its soundness depends on the ZERO bus receiver in the BITWISE table enforcing a binary output domain.

Adding IS_BIT(eq) here makes the constraints self-contained and removes the implicit dependency on the bus receiver's co-domain. The constraint is degree 2 and cheap.

alu_op::AND => self.a & self.b,
alu_op::OR => self.a | self.b,
alu_op::XOR => self.a ^ self.b,
other => panic!("BYTEWISE only handles AND/OR/XOR, got opcode {other}"),
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Low — panic! in trace-generation path aborts proof creation ungracefully

This panic! (and the identical one at line 81) fires if the trace builder passes an unexpected opcode. Since this is inside trace generation (not constraint evaluation), a panic would crash the prover process rather than returning a recoverable error.

Consider returning Result<_, ProverError> from compute_res / collect_bitwise_ops and propagating the error up through the trace builder, so a malformed trace produces a proper error instead of an abort.

cols::PREV_PC_TIMESTAMP_BORROW,
cols::PC_DOUBLE_READ,
cols::PREV_PC_TIMESTAMP_BORROW,
];
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Low — BRANCH_COND removed from IS_BIT enforcement

BRANCH_COND was previously in BIT_FLAG_COLUMNS. It is no longer there; its binary-ness is now argued compositionally:

  • BRANCH, mem_flags (=JALR on BRANCH rows), and res[0] (from EQ chip) are all bits.
  • Therefore BRANCH·JALR + BRANCH·(1−JALR)·res[0] ∈ {0,1}branch_cond ∈ {0,1}.

This is sound for valid traces. The NextPcAddConstraint uses (1 − branch_cond) in a degree-3 product; if branch_cond were ever outside {0,1} (e.g., via a bug in the bus receiver domain or a future refactor breaking the mutex), that constraint could be satisfied spuriously without the prover noticing.

Adding BRANCH_COND back to BIT_FLAG_COLUMNS costs one degree-2 constraint and makes this invariant explicit and self-contained.

@claude
Copy link
Copy Markdown

claude Bot commented Jun 2, 2026

Code Review — Feat/shrink-cpu-byte-alu (#644)

This is a significant, well-structured CPU table rework. The reduction from ~76 to ~39 columns, the unified ALU/MEMORY buses, and the delegation of *W instructions to CPU32 are all sound architectural choices. The new chips (EQ, BYTEWISE, CPU32, STORE) are clean and well-documented.

Constraint Completeness

The overall constraint set is correct and complete for valid traces:

  • All 12 IS_BIT flags cover the top-level flag columns.
  • The six word_instr · {MEMORY,BRANCH,ECALL,...} = 0 decode mutexes are present.
  • ADD/SUB fast-path templates, the unified ALU/MEMORY bus fingerprints, and the DECODE/BRANCH/ECALL lookups are correctly wired.

Issues Found

Severity Location Issue
Medium constraints/cpu.rs:413 BranchRvdConstraint omits carry from low→high word for JAL/JALR return address — not cryptographically enforced
Low constraints/cpu.rs:46 BRANCH_COND removed from IS_BIT enforcement; soundness is compositional and fragile under future refactors
Low tables/eq.rs:317 eq column has no explicit IS_BIT constraint; soundness depends on the ZERO bus receiver's output domain
Low constraints/cpu.rs:546 _ => panic!(...) should be unreachable!() — it is dead code guarded by an assert! at construction
Low tables/bytewise.rs:69,81 panic! in trace-generation path aborts the prover ungracefully on invalid opcode; should return Result

Inline comments posted for each issue.

Minor Notes

  • MemFlagsBitConstraint (non-MEMORY rows) correctly covers mem_flags as JALR. The dependency chain is non-obvious but sound — a brief cross-reference comment pointing to the mutex would help future readers.
  • NextPcAddConstraint's compute_carry_1 correctly propagates carry_0 for next_pc, but BranchRvdConstraint doesn't mirror this for rvd (the medium issue above).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants