spec: update constraint ID system#570
Conversation
ReviewThis PR adds stable, content-derived constraint IDs to the spec rendering pipeline: an FNV-1a hash of each constraint's structure is encoded into a short 4-char human-readable tag (e.g. The approach is sound. Three bugs found: Medium — Crash on empty variable groups (src.typ:191–197)
Low —
|
Codex Code ReviewNo actionable issues found in the PR diff. The changes are limited to the Typst spec rendering/tagging logic and chip metadata. I did not find security-relevant runtime changes, VM behavior changes, or significant performance concerns. I could not run a full Typst/shiroa render because neither |
Codex Code ReviewFindings:
No security issues found in the PR diff. I did not run a Typst render because |
Review: spec/constraint_idOverviewReplaces sequential constraint indices with content-derived IDs using FNV-1a hashing. Variables are identified by (chip, group, position, type) so renaming a variable doesn't change constraint IDs, but reordering does. The 4-character Base32 ID (20 bits of entropy) is stable across reorganisation and signals stale IDs when a constraint's semantics change. TOML chips all gain a Bugs[High] [Medium] [Low] Wrong variable in error message — FNV-1a implementationThe two-limb 64-bit multiplication is mathematically correct: carry propagation from Minor observations
|
Co-authored-by: claude[bot] <209825114+claude[bot]@users.noreply.github.com>
RobinJadoul
left a comment
There was a problem hiding this comment.
Very brief look only, so far; haven't properly looked at the canonicalization/serialization of the constraints yet
| /// | ||
| /// This implementation operates on two 32-bit limbs, rather than a single | ||
| /// 64-bit limb, since Typst does not support u64s. | ||
| #let FNV-1a(bytes) = { |
There was a problem hiding this comment.
Any reasoning for not using a typst package for hashing?
e.g. digestify should be running in wasm, so speed should not be a massive concern (probably)
There was a problem hiding this comment.
digestifyis available under the MIT license. When I skimmed the license last week, I thought that using it might require our spec to also be released under the MIT (or compatible) license. Having a closer look now, this might not actually be the case. Still, I'm not a legal expert, so I decided to avoid it just in case.- The other hashing package (
jumble) is slower than this implementation. - It was fun to learn a bit about non-cryptographic hashing, and implement this technique here.
There was a problem hiding this comment.
MIT licensed is fine, it only comes in to play if we're redistributing the licensed thing itself
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
One reason a standard hash function could be nice is if we want to be able to recompute the tags in another environment (e.g. python when doing some code extraction or even type checking).
On the other hand, then FNV shouldn't be too hard to implement elsewhere either, so your call.
|
|
||
| let indices = (("",) + iters_of(constraint).map(it => it.at(0))).join(".") | ||
|
|
||
| let z-fill(s) = "0" * calc.max(2 - s.len(), 0) + s |
There was a problem hiding this comment.
Appears a few times, maybe lift to the top level.
There was a problem hiding this comment.
It is used twice (in src.typ and chip.typ). Since those two files don't have any shared dependencies/don't depend on one another, I'd have to create a new file that both can depend on. I didn't feel like doing this just for this single one-line function.
| } | ||
|
|
||
| /// Converts a byte array to a hexadecimal string | ||
| #let bytes-to-hex(bytes) = { |
There was a problem hiding this comment.
Is there much need to carry hex around instead of going straight from bytes to base 32?
There was a problem hiding this comment.
not really. dropped it
There was a problem hiding this comment.
And I suppose this function can disappear then too
99898d2 to
a27e78f
Compare
|
|
||
| let indices = (("",) + iters_of(constraint).map(it => it.at(0))).join(".") | ||
|
|
||
| let pad-width() = calc.max(calc.ceil(calc.log(counter(figure.where(kind: counter-kind)).final().at(0))), 2) |
There was a problem hiding this comment.
I think we technically a + 1 inside of the log, since e.g. ceil(log(10)) is still only 1
| let lbl = [#chip.name\-A] | ||
| show figure: (it) => align(left, block[#lbl#context with_index(it.counter.display())]) | ||
| cref(assumption)[#figure(kind: chip.name + "assumption", numbering: (i) => [#lbl#i], supplement: [], [])] | ||
| let code = if "code" in chip {chip.code} else {chip.name} |
There was a problem hiding this comment.
Can do the same as further down
| let code = if "code" in chip {chip.code} else {chip.name} | |
| let code = chip.at("code", default: chip.name) |
| /// | ||
| /// This implementation operates on two 32-bit limbs, rather than a single | ||
| /// 64-bit limb, since Typst does not support u64s. | ||
| #let FNV-1a(bytes) = { |
There was a problem hiding this comment.
MIT licensed is fine, it only comes in to play if we're redistributing the licensed thing itself
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
One reason a standard hash function could be nice is if we want to be able to recompute the tags in another environment (e.g. python when doing some code extraction or even type checking).
On the other hand, then FNV shouldn't be too hard to implement elsewhere either, so your call.
| } | ||
|
|
||
| /// Converts a byte array to a hexadecimal string | ||
| #let bytes-to-hex(bytes) = { |
There was a problem hiding this comment.
And I suppose this function can disappear then too
| } | ||
| } | ||
|
|
||
| let flattened_type = lower(flatten_vartype(var.type)) |
|
|
||
| let flattened_type = lower(flatten_vartype(var.type)) | ||
| let input = (chip, group, str(idx), flattened_type).join("\x00") | ||
| digest_to_id(nchf(input)) |
There was a problem hiding this comment.
We could carry the whole digest for a variable ID, since it never sees the outside
| .map(cat => (str(cat): replace_variable_with_ID(c.at(cat)))) | ||
| .sum(default: (:)) | ||
|
|
||
| repr(id_tagged) |
There was a problem hiding this comment.
This may depend a bit too much on typst internals for canonicalization
Maybe json.encode can work, but then there's still a potential ordering issue
|
|
||
| // Replace variable names with their ID | ||
| let digestable_constraint(c) = { | ||
| let CONSTRAINT_CAT_TO_SCOPE = ( |
There was a problem hiding this comment.
Is it more future-proof to include-list or exclude-list here?
| ) | ||
|
|
||
| assert(c.kind in CONSTRAINT_CAT_TO_SCOPE) | ||
| let id_tagged = CONSTRAINT_CAT_TO_SCOPE |
There was a problem hiding this comment.
Let's include the kind here too
| "interaction": ("tag", "iter", "input", "output", "multiplicity"), | ||
| "template": ("tag", "iter", "input", "output", "cond"), | ||
| "arith": ("iter", "poly") |
There was a problem hiding this comment.
Also, allowing iter variables to be renamed without changing the ID could be nice
There was a problem hiding this comment.
We also might want to deal with summations similarly
This PR introduces content-derived constraint IDs, i.e., the ID of a constraint is derived from its content, rather than its index in the constraint list. This
i.e., two constraints with the same ID are 100-ε% certain to be identical.
Note: the ID is derived in a way that updating a variable name does not lead to a name update. This is at the expense of an ID update when variables are reordered.
Before:

After:
