Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions spec/.gitignore
Original file line number Diff line number Diff line change
@@ -1,2 +1,3 @@
dist/*
interaction_count.json
ebook.pdf
10 changes: 10 additions & 0 deletions spec/about_ecalls.typ
Original file line number Diff line number Diff line change
Expand Up @@ -22,3 +22,13 @@ When `ECALL` is executed, it is assumed that:
- the return value is written to `A0`,
where `A0`-`A7` are symbolic names for the registers `x10`-`x17`
#footnote([RISC-V - Register sets; en.wikipedia.org, #link("https://web.archive.org/web/20260209053447/https://en.wikipedia.org/wiki/RISC-V#Register_sets")[[src]]]).

= ECALL number overview

We provide a list of supported ECALL numbers.
Negative numbers (represented as 2s complement 64-bit numbers), are used for our own custom accelerators/extensions.

/ 64: `write` (@commit)
/ 93: `exit` (@halt)
/ -1: `SHA256` (@sha256)
/ -2: `KECCAK` (@keccak)
6 changes: 4 additions & 2 deletions spec/bitwise.typ
Original file line number Diff line number Diff line change
Expand Up @@ -26,19 +26,21 @@ and convenience functionalities over small domains.

The #bitwise chip is comprised of #nr_variables variables that are expressed using #nr_columns columns.
Of these, the _input_ and _output_ variables (#nr_precomputed in total) are precomputed.

#render_chip_variable_table(chip, config)

*Note*: This table contains one row for every possible value of `(X, Y, Z)`.
As such, it has length $2^8 dot 2^8 dot 2^4 = 2^(20)$.

We use the ALU operation descriptors from @decode to identify the operations in the `BYTE_ALU` interaction.
Since each of the three columns is only $2^16$ rows long, they can be combined in a single $2^20$ column (with room to spare).

= Lookup
This chip adds the following interactions to the lookup:
#render_constraint_table(chip, config)

= Notes/Optimizations
The following ideas may prove to be optimizations for the #bitwise chip:
+ Extend `IS_BYTE[X]` to `ARE_BYTES[X, Y]`, such that two bytes are range checked at once.
When only a single check is required, one can still execute `IS_BYTE[X] := ARE_BYTES[X, 0]`.
+ Drop `MSB8` column, and instead define the `MSB8` lookup as `MSB8<X> := MSB16[256X]`.
Note: currently, `MSB8` also implicity range checks the input `X` (the lookup fails if `X` is not a `Byte`).
This optimization should only be executed when all chips leveraging `MSB8` do _not_ need this implicit range check.
Expand Down
35 changes: 26 additions & 9 deletions spec/book.typ
Original file line number Diff line number Diff line change
Expand Up @@ -18,30 +18,37 @@
)),
("TEMPLATES", (
("is_bit.typ", [`IS_BIT` template], <isbit>),
("is_byte.typ", [`IS_BYTE` template], <isbyte>),
("sign.typ", [`SIGN` template], <sign>),
("add.typ", [`ADD`/`SUB` template], <add>),
("neg.typ", [`NEG` template], <neg>),
)),
("MEMORY", (
("memw.typ", [`MEMW` chip], <memw>),
)),
("CPU", (
("decode.typ", [`DECODE` table], <decode>),
("cpu.typ", [`CPU` chip], <cpu>),
("cpu32.typ", [`CPU32` chip], <cpu32>),
)),
("ALU", (
("shift.typ", [`SHIFT` chip], <shift>),
("branch.typ", [`BRANCH` chip], <branch>),
("lt.typ", [`LT` chip], <lt>),
("eq.typ", [`EQ` chip], <eq>),
("mul.typ", [`MUL` chip], <mul>),
("dvrm.typ", [`DVRM` chip], <dvrm>),
("load.typ", [`LOAD` chip], <load>),
("bitwise.typ", [`BITWISE` chips], <bitwise>),
("bytewise.typ", [`BYTEWISE` chip], <bytewise>)
)),
("MEMORY", (
("memw.typ", [`MEMW` chip], <memw>),
("load.typ", [`LOAD` chip], <load>),
("store.typ", [`STORE` chip], <store>),
)),
("ECALLS", (
("about_ecalls.typ", [About `ECALL`], <ecall>),
("halt.typ", [`HALT` chip], <halt>),
("commit.typ", [`COMMIT` chip], <commit>),
("sha256.typ", [`SHA256` accelerator], <sha256>),
("keccak.typ", [`KECCAK` accelerator], <keccak>),
))
)
)
Expand Down Expand Up @@ -105,6 +112,7 @@

// Invisibly include another chapter, so that its labels can be resolved
#let xref-include(f) = {
show ref: none
context {
place(hide(box(width: auto, height: 0%, strip-all(include "/" + f))))
}
Expand Down Expand Up @@ -142,7 +150,7 @@
} else {
rf.supplement
}
[#supplement#numbering(fig.numbering, ..counter.at(lbl))]
[#supplement #numbering(fig.numbering, ..counter.at(lbl))]
}
cross-link("/" + ch, reference: shiroa-label, link-content)
}
Expand All @@ -169,13 +177,22 @@
}
})
let cond() = _toplevel.final() == file
project.with(..args, title: context meta_sections.find(x => x.at(0) == _toplevel.final()).at(1), cond: cond)([
#show ref: it => context if _toplevel.final() == file {
xref(it)
}
show ref: it => context if cond() { xref(it) }
let title = context {
// Strip raw, because shiroa already makes the title raw
show raw: it => it.text
meta_sections.find(x => x.at(0) == _toplevel.final()).at(1)
}
project.with(..args, title: title, description: plain-text(meta_sections.find(x => x.at(0) == file).at(1)), cond: cond)([
#context _xref-included.final().pairs().map(((key, value)) => context if value and cond() {
xref-include(key)
}).join()
#metadata(json("interaction_count.json").sum(default: (:)))<interaction_count>

#let chapter-index = meta_sections.position(x => x.at(0) == file) + 1
#set heading(numbering: (..args) => [#chapter-index.#numbering("1.1", ..args)])
#counter(heading).update(0)

#body
])
}
Expand Down
5 changes: 5 additions & 0 deletions spec/branch.typ
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,11 @@ The #branch chip is comprised of #nr_variables variables that are expressed usin

#render_chip_assumptions(chip, config)

Some of the assumptions can be checked with only arithmetic constraints, so we
provide these below.

#render_constraint_table(chip, config, groups: "assumptions")

= Constraints

We constrain `next_pc` to be $#`base_address` + #`offset`$,
Expand Down
27 changes: 27 additions & 0 deletions spec/build_shiroa.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
#!/usr/bin/env bash

set -euo pipefail

# cd into the script directory
cd "$(dirname "${BASH_SOURCE[0]}")"

# Clean up potential old file
rm -f interaction_count.json

# Always clean up after ourselves
trap 'rm -f interaction_count.json' EXIT

# Query the ebook version for the proper counts
typst query ebook.typ '<interaction_count>' --field value > interaction_count.json

# Check if there's enough memory available for a parallel shiroa build
# 20GiB as comfortable baseline
available_kb=$(awk '/MemAvailable/ { print $2 }' /proc/meminfo)
required_kb=$((20 * 1024 * 1024))
if [ "$available_kb" -lt "$required_kb" ]; then
echo "Falling back to single-thread"
export RAYON_NUM_THREADS=1
fi

# And build
shiroa build
38 changes: 38 additions & 0 deletions spec/bytewise.typ
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
#import "/book.typ": book-page, rj
#import "/src.typ": load_config, load_chip
#import "/chip.typ": (
render_chip_assumptions,
render_chip_variable_table,
total_nr_variables,
total_nr_instantiated_columns,
compute_nr_interactions,
render_constraint_table,
render_chip_padding_table,
)

#let config = load_config()
#let chip = load_chip("src/bytewise.toml", config)

#show: book-page(chip.name)
#let bytewise = raw(chip.name)

The #bytewise chip is an ALU chip that decomposes the input `DWordWL` values into bytes and
performs a `BITWISE` operation pairwise (AND, OR, XOR).
The `BITWISE` lookup inherently performs a range check, so no further constraints are necessary.

= Variables
#let nr_variables = total_nr_variables(chip)
#let nr_columns = total_nr_instantiated_columns(chip, config)
#let nr_interactions = compute_nr_interactions(chip)

The #bytewise chip is comprised of #nr_variables variables that are expressed using #nr_columns columns and leverages #nr_interactions interaction(s):
#render_chip_variable_table(chip, config)

= Constraints

#render_constraint_table(chip, config)

= Padding

The chip can be padded with the following values:
#render_chip_padding_table(chip, config)
26 changes: 17 additions & 9 deletions spec/chip.typ
Original file line number Diff line number Diff line change
Expand Up @@ -14,12 +14,15 @@
.map(pair => pair.at(1))
.flatten()
.map(var => {
let (label, factor) = if type(var.type) == array {
(var.type.at(0), var.type.at(1))
} else {
(var.type, 1)
let (factor, var_type) = (1, var.type)
while type(var_type) == array {
assert(var_type.len() == 2, message: "invalid var (sub)type length: " + repr(var.type))
assert(type(var_type.at(1)) == int, message: "invalid var (sub)type length: " + repr(var.type))
factor *= var_type.at(1)
var_type = var_type.at(0)
}
config.variables.types.filter(type => type.label == label).first().subtypes.len() * factor

config.variables.types.filter(type => type.label == var_type).first().subtypes.len() * factor
})
.sum()
}
Expand Down Expand Up @@ -48,14 +51,19 @@
// store it as metadata under the `<interaction_count>` label
// with tag `chip.name`. This tag is overwritten by `name` when specified.
#let set_nr_interactions(chip, name: none) = {
// Skip when building shiroa, since the web/chapter structure fails to converge properly
import "book.typ": is-shiroa
if is-shiroa {
return
}
if name == none {
name = chip.name
}
name = chip.name
}

let constraints = chip
.constraints
.values()
.flatten()
.sum(default: ())

// nr. of direct interactions
let nr-direct-interactions = constraints
Expand Down Expand Up @@ -290,7 +298,7 @@
} else if type(groups) == str {
groups = (groups,)
}
assert(groups.all(group => group in all_groups), message: "unknown group")
assert(groups.all(group => group in all_groups), message: "unknown group: " + repr(groups))
let selected_constraints = groups.map(g => ((g): chip.constraints.at(g))).join()

// Find the group definition in the constraint_groups
Expand Down
84 changes: 57 additions & 27 deletions spec/cpu.typ
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@
#let cpu = raw(chip.name)

The #cpu chip coordinates memory accesses and dispatches to other chips for arithmetic and logical operations.
It bases its decisions on the entry of the `DECODE` table (@decode) corresponding the the current program counter (PC).
It bases its decisions on the entry of the `DECODE` table (@decode) corresponding the current program counter (PC).

= Variables
#let nr_variables = total_nr_variables(chip)
Expand All @@ -30,62 +30,92 @@ The #cpu chip is comprised of #nr_variables variables that are expressed using #
= Assumptions
#render_chip_assumptions(chip, config)

Additionally, the following constraints can be used to provide defense-in-depth
validation of the assumptions.

#render_constraint_table(chip, config, groups: "assumptions")

= Constraints

First, we perform a decoding lookup for the current PC.
Instructions having the `word_instr` flag set are not decoded here, as they are delegated to the `CPU32` chip.
In that case, we ensure that the current row of the CPU cannot have any other observable effects.

#render_constraint_table(chip, config, groups: "decode")

== Range checks

We constrain all columns to have the appropriate ranges.
The flags and register indices looked up from the decoding need to be checked,
as they are communicated through the interaction in a packed form.
All values in `packed_decode` need to be checked to ensure
the packing is correct for the interaction.
In contrast, we know ahead of time that decoding will ensure proper range checks for `pc` and `imm`.
Similarly, since `next_pc` will propagate through the memory argument and be looked up
in the instruction decoding on the next cycle, it is forced to be in the correct range.#rj[is this true, do we need this elsewhere for chip assumptions?]
For the auxiliary columns, we need to check the limbs of `arg1`, `arg2`, and `res`.
in the instruction decoding on the next cycle, it is forced to be in the correct range;
the final value for `next_pc` is similarly fixed by the memory finalization.
For the auxiliary columns, we need to check the limbs of `res`, since
`rv1` and `rv2` are enforced by the memory argument, and `rvd` is correct by the correctness of the dependent chips.
The ranges of the other auxiliary columns are enforced through later constraints.
#rj[Make sure we argue for every column here]
#rj[is `rvd` still sufficiently constrained? (can also be done through the memory argument like `pc`?)]

#render_constraint_table(chip, config, groups: "range")

== ALU

The ALU functionality is then obtained through judicious dispatching to the corresponding chips.
The ALU functionality is then obtained through delegation to the `ALU` signature, backed by the various ALU chips,
or by using the appropriate template.
For the pure ALU path, `arg2` is computed as `rv2 + imm`, which relies on @cpu:a:arg2-multiplex to
be either `rv2` or `imm`, depending on the instruction.
The other contributions for `arg2` are specific to the (mutually exclusive, @cpu:a:mem-branch-mutex)
`MEMORY` and `BRANCH` flags:
- For the `MEMORY` path, we want the output of the ALU to be $#`rv1` + #`imm`$, as that is the
address at which the memory access occurs.
- For the `BRANCH` path, we want the ALU output to reflect the branch condition (or just be inactive for JALR).

#render_constraint_table(chip, config, groups: "alu")

== Memory
== Memory<cpu:memory>

The interactions with the memory, both for register loading and storing, as for `LOAD` and `STORE` instructions are handled.
Note that since registers need no byte-addressing, we store them in the memory argument with `Word` limbs.
The timestamps are ensured to be disjoint for disjoint memory locations.
Note that since registers need no byte-addressing, we store them in the memory argument with `Word` limbs,
simultaneously ensuring that register reads are properly range checked as long as all writes are.
The `pc` register behaves very predictably with respect to its timestamps and when it is being read,
so for performance reasons, we inline its memory interactions directly into the #cpu chip.

Potentially overlapping memory accesses are ensured to have disjoint timestamps.
One consequence of that is that `next_pc` is written at `timestamp + 1`
to ensure the access is disjoint with the `pc` read into `rv1` as part of the `AUIPC` instruction.
to ensure the access is disjoint with the `pc` read into `rv1` as part of the `AUIPC` instruction (see @cpu:c:read_rv1 and @decode:decoding-overview).
Constraints regarding whether `pc_double_read` corresponds to an `AUIPC` instruction are not necessary,
as regardless of its value, the old timestamp is guaranteed smaller than the new timestamp,
and the integrity of the memory argument therefore ensures the correctness of this bit.

#render_constraint_table(chip, config, groups: "mem")
The memory interaction itself is handled by the `MEMORY` signature,
which will read the `mem_flags` argument to perform either a `LOAD` or a `STORE`.
We refer to the previous section's description of `arg2` for how
the address is computed.

== System
The value to (potentially) be written back to `rd` is stored in `rvd`,
which can either come from the ALU --- in case of an ALU operation or a JALR branch ---
or from the MEMORY interaction.

The interactions with the wider system.
#render_constraint_table(chip, config, groups: "mem")

#render_constraint_table(chip, config, groups: "sys")
== Branching

== Input and output to the ALU
A branch is expressed by having the `BRANCH` flag set to 1.
Since `BRANCH` and `MEMORY` are mutually exclusive (@cpu:a:mem-branch-mutex),
we can repurpose the `mem_flags` field to indicate a JALR instruction.
When JALR is not set, we have a conditional branch that is decided upon by the result of the ALU instructions,
as set in the `res` variable.
As such, we can set `branch_cond` appropriately as multiplicity flag for the `BRANCH` chip.

We constrain `arg1`, `arg2` and `rvd` to correspond to the wanted values,
including the appropriate sign/zero extension, depending on `word_instr`.
#render_constraint_table(chip, config, groups: "branch")

#render_constraint_table(chip, config, groups: "ext")
== System

== Other constraints
For @cpu:c:is_equal, note that @cpu:c:sub sets `res` to be the difference between `arg1` and `arg2` whenever `BEQ` is $1$.
Given that this difference is $0$ when both are equal, @cpu:c:is_equal ensures `is_equal` is set to $1$ if and only if $#`arg1` = #`arg2`$ and `BEQ` is set.
The interactions with the wider system go through the `ECALL` interface.
Since we treat `EBREAK` instructions as unprovable traps, we avoid emitting `DECODE` rows
for these, and do not need any further handling in the CPU.

#render_constraint_table(chip, config, groups: "misc")
#render_constraint_table(chip, config, groups: "sys")

#rj[Document the choice to not have a multiplicity column here for padding]

= Padding

Expand All @@ -94,4 +124,4 @@ in the DECODE table, at the _odd_ address 1, only reachable through a HALT ecall

#render_chip_padding_table(chip, config)

This approach minimizes the number of dependent lookups, increasing only multiplicities in the DECODE table and the IS_BYTE lookup.
This approach minimizes the number of dependent lookups, increasing only multiplicities in the `DECODE` table and the `IS_BYTE` and `IS_HALF` lookups.
Loading
Loading