Skip to content

spec: update constraint ID system#570

Open
erik-3milabs wants to merge 19 commits into
spec/mainfrom
spec/constraint_id
Open

spec: update constraint ID system#570
erik-3milabs wants to merge 19 commits into
spec/mainfrom
spec/constraint_id

Conversation

@erik-3milabs
Copy link
Copy Markdown
Collaborator

This PR introduces content-derived constraint IDs, i.e., the ID of a constraint is derived from its content, rather than its index in the constraint list. This

  1. permit addition, removing and reorganizing constraints without changing any IDs, and
  2. enforces ID updates upon constraint modification

i.e., two constraints with the same ID are 100-ε% certain to be identical.

Note: the ID is derived in a way that updating a variable name does not lead to a name update. This is at the expense of an ID update when variables are reordered.

Before:
image

After:
image

@erik-3milabs erik-3milabs self-assigned this Apr 28, 2026
@erik-3milabs erik-3milabs added the spec Updates and improvements to the spec document label Apr 28, 2026
Comment thread spec/src.typ Outdated
Comment thread spec/chip.typ Outdated
Comment thread spec/src.typ Outdated
@claude
Copy link
Copy Markdown

claude Bot commented Apr 28, 2026

Review

This PR adds stable, content-derived constraint IDs to the spec rendering pipeline: an FNV-1a hash of each constraint's structure is encoded into a short 4-char human-readable tag (e.g. BRH-1J3K). It also restructures the constraint table (separate tag/counter columns, per-constraint dividers) and adds short code fields to all chip TOML files.

The approach is sound. Three bugs found:


Medium — Crash on empty variable groups (src.typ:191–197)

.sum() without a default: on an empty array returns none in Typst, not an empty dict. If any chip has a variable group with zero entries, the inner .sum() returns none, causing a crash on the outer .sum(). Fix: add default: (:) to both calls.

Low — z-fill panics for ≥100 constraints in one group (chip.typ:310)

"0" * (2 - s.len()) produces a negative repeat count when the counter hits 100+. Typst panics on negative string repeat. Fix: calc.max(0, 2 - s.len()).

Low — Hash input ambiguity from unescaped separator (src.typ:182)

The chip name, group, and type are joined with , without escaping. A comma in any field makes two distinct inputs hash identically, potentially causing ID collisions. Current names are safe, but it's a latent fragility. Fix: use a separator that cannot appear in those values (e.g. \x00).

@github-actions
Copy link
Copy Markdown

Codex Code Review

No actionable issues found in the PR diff.

The changes are limited to the Typst spec rendering/tagging logic and chip metadata. I did not find security-relevant runtime changes, VM behavior changes, or significant performance concerns.

I could not run a full Typst/shiroa render because neither typst nor shiroa is installed in this environment.

@erik-3milabs erik-3milabs marked this pull request as draft April 28, 2026 13:28
@erik-3milabs erik-3milabs marked this pull request as ready for review April 28, 2026 13:37
@github-actions
Copy link
Copy Markdown

Codex Code Review

Findings:

  • Low - spec/tooling/chip.py:990 ignores the new TOML code field. It sets self.code = data.get("name", ""), so the validator checks name twice and never validates code. Invalid code values would pass Python tooling, and any tooling using Chip.code will see the full chip name instead of the short code added in this PR. Use data.get("code", self.name) and report self.code in the identifier error.

No security issues found in the PR diff. I did not run a Typst render because typst is not installed in this environment.

Comment thread spec/tooling/chip.py Outdated
Comment thread spec/tooling/chip.py Outdated
Comment thread spec/src.typ
@claude
Copy link
Copy Markdown

claude Bot commented Apr 28, 2026

Review: spec/constraint_id

Overview

Replaces sequential constraint indices with content-derived IDs using FNV-1a hashing. Variables are identified by (chip, group, position, type) so renaming a variable doesn't change constraint IDs, but reordering does. The 4-character Base32 ID (20 bits of entropy) is stable across reorganisation and signals stale IDs when a constraint's semantics change. TOML chips all gain a code shorthand field.

Bugs

[High] chip.py reads wrong key for codeself.code = data.get("name", "") always resolves to the chip name; the new code field (e.g. "BRH") is never read or validated. See inline comment on line 990.

[Medium] tag excluded from constraint digestdigestable_constraint omits the tag field for interaction and template constraints. Two constraints calling different tables with identical inputs/outputs/iters receive the same ID, breaking the core invariant. See inline comment on lines 210-213.

[Low] Wrong variable in error messagereporter.asserts(self.code.isidentifier(), f"Invalid identifier: {self.name!r}") prints the name, not the code. See inline comment on line 994.

FNV-1a implementation

The two-limb 64-bit multiplication is mathematically correct: carry propagation from lo into hi is handled, and the hash[1] * prime[1] term (≥ 2⁶⁴) is correctly discarded. No issues here.

Minor observations

  • CONSTRAINT_ID_CHAR_COUNT = 4 gives 20 bits of ID space. With ~50 constraints per chip the birthday-collision probability is negligible (~0.1%), so this is fine for the current scale.
  • bytes-to-hex's z-fill silently produces wrong output for strings longer than 2 chars, but since all inputs are single bytes (0–255) this can't trigger in practice.

erik-3milabs and others added 3 commits April 28, 2026 15:43
Copy link
Copy Markdown
Collaborator

@RobinJadoul RobinJadoul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very brief look only, so far; haven't properly looked at the canonicalization/serialization of the constraints yet

Comment thread spec/src.typ
///
/// This implementation operates on two 32-bit limbs, rather than a single
/// 64-bit limb, since Typst does not support u64s.
#let FNV-1a(bytes) = {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reasoning for not using a typst package for hashing?
e.g. digestify should be running in wasm, so speed should not be a massive concern (probably)

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • digestify is available under the MIT license. When I skimmed the license last week, I thought that using it might require our spec to also be released under the MIT (or compatible) license. Having a closer look now, this might not actually be the case. Still, I'm not a legal expert, so I decided to avoid it just in case.
  • The other hashing package (jumble) is slower than this implementation.
  • It was fun to learn a bit about non-cryptographic hashing, and implement this technique here.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MIT licensed is fine, it only comes in to play if we're redistributing the licensed thing itself

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

One reason a standard hash function could be nice is if we want to be able to recompute the tags in another environment (e.g. python when doing some code extraction or even type checking).
On the other hand, then FNV shouldn't be too hard to implement elsewhere either, so your call.

Comment thread spec/chip.typ Outdated

let indices = (("",) + iters_of(constraint).map(it => it.at(0))).join(".")

let z-fill(s) = "0" * calc.max(2 - s.len(), 0) + s
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Appears a few times, maybe lift to the top level.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is used twice (in src.typ and chip.typ). Since those two files don't have any shared dependencies/don't depend on one another, I'd have to create a new file that both can depend on. I didn't feel like doing this just for this single one-line function.

Comment thread spec/chip.typ Outdated
Comment thread spec/src.typ
}

/// Converts a byte array to a hexadecimal string
#let bytes-to-hex(bytes) = {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there much need to carry hex around instead of going straight from bytes to base 32?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not really. dropped it

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And I suppose this function can disappear then too

Comment thread spec/chip.typ

let indices = (("",) + iters_of(constraint).map(it => it.at(0))).join(".")

let pad-width() = calc.max(calc.ceil(calc.log(counter(figure.where(kind: counter-kind)).final().at(0))), 2)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we technically a + 1 inside of the log, since e.g. ceil(log(10)) is still only 1

Comment thread spec/chip.typ
let lbl = [#chip.name\-A]
show figure: (it) => align(left, block[#lbl#context with_index(it.counter.display())])
cref(assumption)[#figure(kind: chip.name + "assumption", numbering: (i) => [#lbl#i], supplement: [], [])]
let code = if "code" in chip {chip.code} else {chip.name}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can do the same as further down

Suggested change
let code = if "code" in chip {chip.code} else {chip.name}
let code = chip.at("code", default: chip.name)

Comment thread spec/src.typ
///
/// This implementation operates on two 32-bit limbs, rather than a single
/// 64-bit limb, since Typst does not support u64s.
#let FNV-1a(bytes) = {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MIT licensed is fine, it only comes in to play if we're redistributing the licensed thing itself

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

One reason a standard hash function could be nice is if we want to be able to recompute the tags in another environment (e.g. python when doing some code extraction or even type checking).
On the other hand, then FNV shouldn't be too hard to implement elsewhere either, so your call.

Comment thread spec/src.typ
}

/// Converts a byte array to a hexadecimal string
#let bytes-to-hex(bytes) = {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And I suppose this function can disappear then too

Comment thread spec/src.typ
}
}

let flattened_type = lower(flatten_vartype(var.type))
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why lowercase?

Comment thread spec/src.typ

let flattened_type = lower(flatten_vartype(var.type))
let input = (chip, group, str(idx), flattened_type).join("\x00")
digest_to_id(nchf(input))
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could carry the whole digest for a variable ID, since it never sees the outside

Comment thread spec/src.typ
.map(cat => (str(cat): replace_variable_with_ID(c.at(cat))))
.sum(default: (:))

repr(id_tagged)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may depend a bit too much on typst internals for canonicalization
Maybe json.encode can work, but then there's still a potential ordering issue

Comment thread spec/src.typ

// Replace variable names with their ID
let digestable_constraint(c) = {
let CONSTRAINT_CAT_TO_SCOPE = (
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it more future-proof to include-list or exclude-list here?

Comment thread spec/src.typ
)

assert(c.kind in CONSTRAINT_CAT_TO_SCOPE)
let id_tagged = CONSTRAINT_CAT_TO_SCOPE
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's include the kind here too

Comment thread spec/src.typ
Comment on lines +230 to +232
"interaction": ("tag", "iter", "input", "output", "multiplicity"),
"template": ("tag", "iter", "input", "output", "cond"),
"arith": ("iter", "poly")
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't include iters

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, allowing iter variables to be renamed without changing the ID could be nice

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We also might want to deal with summations similarly

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

spec Updates and improvements to the spec document

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants