From 73e1fec23250b1ede6357e35994b06ff2fea3543 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Silv=C3=A8re=20Gangloff?= Date: Thu, 21 May 2026 23:07:12 +0900 Subject: [PATCH 1/7] Create Encoding.lean --- .../Machines/SingleTapeTuring/Encoding.lean | 49 +++++++++++++++++++ 1 file changed, 49 insertions(+) create mode 100644 Cslib/Computability/Machines/SingleTapeTuring/Encoding.lean diff --git a/Cslib/Computability/Machines/SingleTapeTuring/Encoding.lean b/Cslib/Computability/Machines/SingleTapeTuring/Encoding.lean new file mode 100644 index 000000000..2ad75e4fc --- /dev/null +++ b/Cslib/Computability/Machines/SingleTapeTuring/Encoding.lean @@ -0,0 +1,49 @@ +Here is the exact code for the new file. + +You can create a new file named Encoding.lean in the Cslib/Computability/Turing/ directory and copy-paste this directly into it. I've included the standard copyright header that matches the rest of the files in that folder. + +Lean +/- +Copyright (c) 2026 Silvère Gangloff. All rights reserved. +Released under Apache 2.0 license as described in the file LICENSE. +Authors: Bolton Bailey, Pim Spelier, Daan van Gent +-/ + +import Mathlib.Tactic.Basic + +/-! +# Turing Machine Tape Encodings + +This file defines the `TapeEncodable` typeclass, which provides a framework +for encoding arbitrary types onto a Turing machine tape. This allows us to +define computability for functions on types other than just `List Symbol`. +-/ + +namespace Turing + +variable {Symbol : Type} + +/-- +A typeclass for types that can be encoded onto a Turing machine tape. +Provides a canonical way to translate back and forth between a type `α` +and a `List Symbol`, alongside a proof that decoding an encoded value succeeds. +-/ +class TapeEncodable (α : Type) (Symbol : Type) where + /-- Translates the type into a tape-compatible list of symbols -/ + encode : α → List Symbol + /-- Attempts to parse a list of symbols back into the type -/ + decode : List Symbol → Option α + /-- Proof that decoding a freshly encoded value yields the original value -/ + decode_encode_eq : ∀ (a : α), decode (encode a) = some a + +/-- +The trivial encoding for `List Symbol` itself. +This ensures backward compatibility with machines that already operate +directly on tape strings. +-/ +instance : TapeEncodable (List Symbol) Symbol where + encode := id + decode := some + decode_encode_eq _ := rfl + +end Turing From 826da9a2e3b2fef47cd379d62b62eeb93b459fc0 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Silv=C3=A8re=20Gangloff?= Date: Thu, 21 May 2026 23:07:38 +0900 Subject: [PATCH 2/7] Update Basic.lean --- Cslib/Computability/Machines/SingleTapeTuring/Basic.lean | 1 + 1 file changed, 1 insertion(+) diff --git a/Cslib/Computability/Machines/SingleTapeTuring/Basic.lean b/Cslib/Computability/Machines/SingleTapeTuring/Basic.lean index 477ab60b1..2e1c76a98 100644 --- a/Cslib/Computability/Machines/SingleTapeTuring/Basic.lean +++ b/Cslib/Computability/Machines/SingleTapeTuring/Basic.lean @@ -9,6 +9,7 @@ module public import Cslib.Foundations.Data.BiTape public import Cslib.Foundations.Data.RelatesInSteps public import Mathlib.Algebra.Polynomial.Eval.Defs +public import Cslib.Computability.Turing.Encoding /-! # Single-Tape Turing Machines From 9fe102eeeee85322363ba2ecd7a882c14cdd3e2a Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Silv=C3=A8re=20Gangloff?= Date: Tue, 26 May 2026 10:52:02 +0900 Subject: [PATCH 3/7] AI leftcover cleaning --- Cslib/Computability/Machines/SingleTapeTuring/Encoding.lean | 5 ----- 1 file changed, 5 deletions(-) diff --git a/Cslib/Computability/Machines/SingleTapeTuring/Encoding.lean b/Cslib/Computability/Machines/SingleTapeTuring/Encoding.lean index 2ad75e4fc..1131ab6f3 100644 --- a/Cslib/Computability/Machines/SingleTapeTuring/Encoding.lean +++ b/Cslib/Computability/Machines/SingleTapeTuring/Encoding.lean @@ -1,8 +1,3 @@ -Here is the exact code for the new file. - -You can create a new file named Encoding.lean in the Cslib/Computability/Turing/ directory and copy-paste this directly into it. I've included the standard copyright header that matches the rest of the files in that folder. - -Lean /- Copyright (c) 2026 Silvère Gangloff. All rights reserved. Released under Apache 2.0 license as described in the file LICENSE. From 3e16d6bb349667dbe120fff4c1a17a82e9a61ef3 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Silv=C3=A8re=20Gangloff?= Date: Tue, 26 May 2026 10:52:30 +0900 Subject: [PATCH 4/7] fix: name change --- Cslib/Computability/Machines/SingleTapeTuring/Encoding.lean | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Cslib/Computability/Machines/SingleTapeTuring/Encoding.lean b/Cslib/Computability/Machines/SingleTapeTuring/Encoding.lean index 1131ab6f3..89824bccb 100644 --- a/Cslib/Computability/Machines/SingleTapeTuring/Encoding.lean +++ b/Cslib/Computability/Machines/SingleTapeTuring/Encoding.lean @@ -1,7 +1,7 @@ /- Copyright (c) 2026 Silvère Gangloff. All rights reserved. Released under Apache 2.0 license as described in the file LICENSE. -Authors: Bolton Bailey, Pim Spelier, Daan van Gent +Authors: Silvère Gangloff -/ import Mathlib.Tactic.Basic From 09067ea61a7f822786336a6e48d0d84f318de6e6 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Silv=C3=A8re=20Gangloff?= Date: Tue, 26 May 2026 11:16:41 +0900 Subject: [PATCH 5/7] fix: Change TabeEncodable -> StringEncodable --- .../Machines/SingleTapeTuring/Encoding.lean | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/Cslib/Computability/Machines/SingleTapeTuring/Encoding.lean b/Cslib/Computability/Machines/SingleTapeTuring/Encoding.lean index 89824bccb..eb1004a91 100644 --- a/Cslib/Computability/Machines/SingleTapeTuring/Encoding.lean +++ b/Cslib/Computability/Machines/SingleTapeTuring/Encoding.lean @@ -9,8 +9,8 @@ import Mathlib.Tactic.Basic /-! # Turing Machine Tape Encodings -This file defines the `TapeEncodable` typeclass, which provides a framework -for encoding arbitrary types onto a Turing machine tape. This allows us to +This file defines the `StringEncodable` typeclass, which provides a framework +for encoding arbitrary types onto a string type. This allows us to define computability for functions on types other than just `List Symbol`. -/ @@ -19,11 +19,11 @@ namespace Turing variable {Symbol : Type} /-- -A typeclass for types that can be encoded onto a Turing machine tape. +A typeclass for types that can be encoded onto a string type. Provides a canonical way to translate back and forth between a type `α` and a `List Symbol`, alongside a proof that decoding an encoded value succeeds. -/ -class TapeEncodable (α : Type) (Symbol : Type) where +class StringEncodable (α : Type) (Symbol : Type) where /-- Translates the type into a tape-compatible list of symbols -/ encode : α → List Symbol /-- Attempts to parse a list of symbols back into the type -/ @@ -36,7 +36,7 @@ The trivial encoding for `List Symbol` itself. This ensures backward compatibility with machines that already operate directly on tape strings. -/ -instance : TapeEncodable (List Symbol) Symbol where +instance : StringEncodable (List Symbol) Symbol where encode := id decode := some decode_encode_eq _ := rfl From 6373e2a851b859db880302b001d0fb4e009d2656 Mon Sep 17 00:00:00 2001 From: Sfgangloff Date: Wed, 27 May 2026 18:26:27 +0900 Subject: [PATCH 6/7] feat: split to string encoding and encodable + examples of string encodable types and encodings --- Cslib.lean | 1 + Cslib/Computability/Encoding.lean | 216 ++++++++++++++++++ .../Machines/SingleTapeTuring/Basic.lean | 2 +- .../Machines/SingleTapeTuring/Encoding.lean | 44 ---- encoding-instances-report.md | 175 ++++++++++++++ 5 files changed, 393 insertions(+), 45 deletions(-) create mode 100644 Cslib/Computability/Encoding.lean delete mode 100644 Cslib/Computability/Machines/SingleTapeTuring/Encoding.lean create mode 100644 encoding-instances-report.md diff --git a/Cslib.lean b/Cslib.lean index b504973c3..f479ad35f 100644 --- a/Cslib.lean +++ b/Cslib.lean @@ -22,6 +22,7 @@ public import Cslib.Computability.Automata.NA.Prod public import Cslib.Computability.Automata.NA.Sum public import Cslib.Computability.Automata.NA.ToDA public import Cslib.Computability.Automata.NA.Total +public import Cslib.Computability.Encoding public import Cslib.Computability.Languages.Congruences.BuchiCongruence public import Cslib.Computability.Languages.Congruences.RightCongruence public import Cslib.Computability.Languages.ExampleEventuallyZero diff --git a/Cslib/Computability/Encoding.lean b/Cslib/Computability/Encoding.lean new file mode 100644 index 000000000..8087e1724 --- /dev/null +++ b/Cslib/Computability/Encoding.lean @@ -0,0 +1,216 @@ +/- +Copyright (c) 2026 Silvère Gangloff. All rights reserved. +Released under Apache 2.0 license as described in the file LICENSE. +Authors: Silvère Gangloff +-/ + +module + +public import Mathlib.Tactic.Basic +public import Mathlib.Data.Fintype.Defs + +/-! +# String Encodings + +This file defines: + +* `StringEncoding α Symbol` — the primitive notion: a concrete encoder/decoder + pair for `α` onto `List Symbol`, together with a roundtrip proof. A single + type can carry many encodings (for example, unary and binary encodings of + `Nat` over the `Bool` alphabet are both provided below). + +* `StringEncodable α Symbol` — the *property* that some encoding exists. + Useful for declarations that only need the existence of an encoding without + committing to a particular choice. + +The concrete encodings provided here are: + +* `StringEncoding.listSymbol`: identity encoding of `List Symbol`. +* `StringEncoding.bool`: single-bit encoding of `Bool`. +* `StringEncoding.natUnary`: self-delimiting unary `Nat` encoding. +* `StringEncoding.natBinary`: Least significant bit first binary `Nat` encoding. +* `StringEncoding.option`: derived `Option α` encoding given any encoding of `α`. +* `StringEncoding.natUnaryPair`: a nontrivial example, `Nat × Nat` via two + concatenated unary codes. +-/ + +@[expose] public section + +namespace Encodable + +variable {Symbol : Type} [Fintype Symbol] + +/-- +The primitive notion of encoding: a function `α → List Symbol` together with +a partial inverse `List Symbol → Option α` and a proof that the inverse +recovers any encoded value. The alphabet `Symbol` must be finite — without +this constraint the notion is vacuous (one could take `Symbol = α`). +-/ +structure StringEncoding (α : Type) (Symbol : Type) [Fintype Symbol] where + /-- The encoder. -/ + encode : α → List Symbol + /-- The decoder; returns `none` on inputs that aren't valid codes. -/ + decode : List Symbol → Option α + /-- Decoding recovers an encoded value. -/ + decode_encode_eq : ∀ (a : α), decode (encode a) = some a + +/-- +A type is `StringEncodable α Symbol` when some `StringEncoding α Symbol` exists. +This is a property, the specific choice of encoding is *not* +part of the typeclass and must be supplied separately when it matters. +-/ +class StringEncodable (α : Type) (Symbol : Type) [Fintype Symbol] : Prop where + /-- An encoding of `α` exists. -/ + nonempty_encoding : Nonempty (StringEncoding α Symbol) + +/-- Promote a concrete `StringEncoding` to the existence property `StringEncodable`. -/ +theorem StringEncodable.of_encoding {α : Type} {Symbol : Type} [Fintype Symbol] + (e : StringEncoding α Symbol) : StringEncodable α Symbol := + ⟨⟨e⟩⟩ + +namespace StringEncoding + +/-! ## Generic trivial encoding -/ + +/-- +The trivial identity encoding for `List Symbol`. Compatible with machines +that already operate directly on tape strings. +-/ +def listSymbol : StringEncoding (List Symbol) Symbol where + encode := id + decode := some + decode_encode_eq _ := rfl + +instance : StringEncodable (List Symbol) Symbol := .of_encoding listSymbol + +/-! ## Basic encodings over the `Bool` alphabet -/ + +/-- Single-bit encoding of `Bool`: `b ↦ [b]`. -/ +def bool : StringEncoding Bool Bool where + encode b := [b] + decode + | [b] => some b + | _ => none + decode_encode_eq _ := rfl + +instance : StringEncodable Bool Bool := .of_encoding bool + +/-- +Unary encoding of `Nat` over the `Bool` alphabet: +`n ↦ replicate n true ++ [false]`. The trailing `false` makes the encoding +self-delimiting — a decoder can locate the end of the code without an +external length field. This is what makes `natUnaryPair` below work. +-/ +def natUnary : StringEncoding Nat Bool where + encode n := List.replicate n true ++ [false] + decode l := some (l.takeWhile id).length + decode_encode_eq n := by + induction n with + | zero => rfl + | succ k _ => simp [List.replicate_succ] + +/-! ### Binary encoding of `Nat` -/ + +/-- Least significant bit first binary encoder for `Nat`, used by `natBinary`. -/ +def encodeBinaryNat : Nat → List Bool + | 0 => [] + | n+1 => + have : (n+1) / 2 < n+1 := Nat.div_lt_self (Nat.succ_pos n) (by omega) + decide ((n+1) % 2 = 1) :: encodeBinaryNat ((n+1) / 2) + +/-- Least significant bit first binary decoder for `Nat`, used by `natBinary`. -/ +def decodeBinaryNat : List Bool → Nat + | [] => 0 + | b :: bs => (if b then 1 else 0) + 2 * decodeBinaryNat bs + +lemma decodeBinaryNat_encodeBinaryNat (n : Nat) : + decodeBinaryNat (encodeBinaryNat n) = n := by + induction n using Nat.strongRecOn with + | _ n ih => + match n with + | 0 => rw [encodeBinaryNat, decodeBinaryNat] + | k+1 => + have hlt : (k+1) / 2 < k+1 := Nat.div_lt_self (Nat.succ_pos k) (by omega) + rw [encodeBinaryNat, decodeBinaryNat, ih ((k+1) / 2) hlt] + by_cases h : (k+1) % 2 = 1 + · simp [h]; omega + · simp [h]; omega + +/-- +Least significant bit first binary encoding of `Nat` over the `Bool` alphabet. +-/ +def natBinary : StringEncoding Nat Bool where + encode := encodeBinaryNat + decode l := some (decodeBinaryNat l) + decode_encode_eq n := by simp [decodeBinaryNat_encodeBinaryNat] + +instance : StringEncodable Nat Bool := .of_encoding natUnary + +/-! ## Derived / composite encodings -/ + +/-- +Tagged encoding of `Option α`: prefix `false` for `none`, prefix `true` for +`some a` followed by `e.encode a`. Parameterized by the underlying encoding +`e` — different choices of `e` yield different `Option`-encodings. +-/ +def option {α : Type} (e : StringEncoding α Bool) : StringEncoding (Option α) Bool where + encode + | .none => [false] + | .some a => true :: e.encode a + decode + | [] => none + | false :: _ => some none + | true :: rest => (e.decode rest).map some + decode_encode_eq + | .none => rfl + | .some a => by + change Option.map some (e.decode (e.encode a)) = some (some a) + rw [e.decode_encode_eq] + rfl + +instance {α : Type} [h : StringEncodable α Bool] : StringEncodable (Option α) Bool := by + obtain ⟨e⟩ := h.nonempty_encoding + exact .of_encoding (option e) + +/-! ## Encoding of `Nat × Nat` -/ + +lemma takeWhile_id_encode_natUnary_append (a : Nat) (rest : List Bool) : + (List.replicate a true ++ false :: rest).takeWhile id = List.replicate a true := by + induction a with + | zero => rfl + | succ n ih => simp [List.replicate_succ, ih] + +lemma dropWhile_id_encode_natUnary_append (a : Nat) (rest : List Bool) : + (List.replicate a true ++ false :: rest).dropWhile id = false :: rest := by + induction a with + | zero => rfl + | succ n ih => simp [List.replicate_succ, ih] + +/-- +Encoding of `Nat × Nat` as the concatenation of two unary `Nat`-encodings. +The canonical illustration of why self-delimiting codes matter: the decoder +splits the flat list at the first `false` without any auxiliary length field. +-/ +def natUnaryPair : StringEncoding (Nat × Nat) Bool where + encode := fun ⟨a, b⟩ => natUnary.encode a ++ natUnary.encode b + decode l := + let first := (l.takeWhile id).length + match l.dropWhile id with + | [] => none + | _ :: rest => some (first, (rest.takeWhile id).length) + decode_encode_eq := by + rintro ⟨a, b⟩ + have hencode : natUnary.encode a ++ natUnary.encode b = + List.replicate a true ++ false :: (List.replicate b true ++ [false]) := by + change (List.replicate a true ++ [false]) ++ (List.replicate b true ++ [false]) = _ + simp + have htake_b : (List.replicate b true ++ [false]).takeWhile id = List.replicate b true := by + simp [takeWhile_id_encode_natUnary_append b []] + simp only [hencode, takeWhile_id_encode_natUnary_append, dropWhile_id_encode_natUnary_append, + htake_b, List.length_replicate] + +instance : StringEncodable (Nat × Nat) Bool := .of_encoding natUnaryPair + +end StringEncoding + +end Encodable diff --git a/Cslib/Computability/Machines/SingleTapeTuring/Basic.lean b/Cslib/Computability/Machines/SingleTapeTuring/Basic.lean index 2e1c76a98..a26ebdb80 100644 --- a/Cslib/Computability/Machines/SingleTapeTuring/Basic.lean +++ b/Cslib/Computability/Machines/SingleTapeTuring/Basic.lean @@ -9,7 +9,7 @@ module public import Cslib.Foundations.Data.BiTape public import Cslib.Foundations.Data.RelatesInSteps public import Mathlib.Algebra.Polynomial.Eval.Defs -public import Cslib.Computability.Turing.Encoding +public import Cslib.Computability.Encoding /-! # Single-Tape Turing Machines diff --git a/Cslib/Computability/Machines/SingleTapeTuring/Encoding.lean b/Cslib/Computability/Machines/SingleTapeTuring/Encoding.lean deleted file mode 100644 index eb1004a91..000000000 --- a/Cslib/Computability/Machines/SingleTapeTuring/Encoding.lean +++ /dev/null @@ -1,44 +0,0 @@ -/- -Copyright (c) 2026 Silvère Gangloff. All rights reserved. -Released under Apache 2.0 license as described in the file LICENSE. -Authors: Silvère Gangloff --/ - -import Mathlib.Tactic.Basic - -/-! -# Turing Machine Tape Encodings - -This file defines the `StringEncodable` typeclass, which provides a framework -for encoding arbitrary types onto a string type. This allows us to -define computability for functions on types other than just `List Symbol`. --/ - -namespace Turing - -variable {Symbol : Type} - -/-- -A typeclass for types that can be encoded onto a string type. -Provides a canonical way to translate back and forth between a type `α` -and a `List Symbol`, alongside a proof that decoding an encoded value succeeds. --/ -class StringEncodable (α : Type) (Symbol : Type) where - /-- Translates the type into a tape-compatible list of symbols -/ - encode : α → List Symbol - /-- Attempts to parse a list of symbols back into the type -/ - decode : List Symbol → Option α - /-- Proof that decoding a freshly encoded value yields the original value -/ - decode_encode_eq : ∀ (a : α), decode (encode a) = some a - -/-- -The trivial encoding for `List Symbol` itself. -This ensures backward compatibility with machines that already operate -directly on tape strings. --/ -instance : StringEncodable (List Symbol) Symbol where - encode := id - decode := some - decode_encode_eq _ := rfl - -end Turing diff --git a/encoding-instances-report.md b/encoding-instances-report.md new file mode 100644 index 000000000..c9203053f --- /dev/null +++ b/encoding-instances-report.md @@ -0,0 +1,175 @@ +# Report: Encoding.lean refactor and example encodings + +## Context + +PR review feedback from SamuelSchlesinger on the `StringEncodable` typeclass: + +> I think it would be nice if we had instances for common types, which I +> understand is hard to do with a generic Symbol. Feels not so useful without +> that. + +In parallel, the file `StringEncoding.lean` had been moved from +`Cslib/Computability/Machines/SingleTapeTuring/Encoding.lean` to +`Cslib/Computability/Encoding.lean`, leaving a broken import in +`Machines/SingleTapeTuring/Basic.lean` and an untracked file in the new +location. + +After an initial pass adding instances directly to a class-based +`StringEncodable`, the user pointed out a deeper design issue: for Turing +machines you typically want to *commit to a particular encoding* (unary, +binary, …), and the same type can carry several. Bundling encoder/decoder as +typeclass data forces a single canonical choice, which is the wrong shape. + +This report documents the cleanup and the refactor. + +## Changes + +### 1. File location + +Kept `StringEncoding.lean` at `Cslib/Computability/Encoding.lean`. + +**Why:** `StringEncoding α Symbol` and `StringEncodable α Symbol` are generic +notions with no dependency on Turing-machine specifics. Other machine +models (URM, lambda calculus, etc.) could reasonably consume them. Putting +the file under `Machines/SingleTapeTuring/` was overly narrow; the top +level of `Computability/` reflects its scope correctly. + +### 2. Fixed broken imports / module wiring + +- `Cslib/Computability/Machines/SingleTapeTuring/Basic.lean` line 12: + `Cslib.Computability.Turing.Encoding` → `Cslib.Computability.Encoding`. +- `Cslib.lean`: added `public import Cslib.Computability.Encoding` in + alphabetical position. +- `Cslib/Computability/Encoding.lean`: added the `module` declaration and + an `@[expose] public section` so it can be `public import`-ed by other + modules (which `Basic.lean` requires). + +### 3. Finiteness of the alphabet + +Both `StringEncoding` and `StringEncodable` now carry a `[Fintype Symbol]` +instance argument. Without finiteness the abstraction is vacuous — one can +always take `Symbol = α` and encode `a ↦ [a]`. Using `Fintype` (rather than +the weaker `Finite`) matches the constraint already used by +`SingleTapeTM Symbol [Inhabited Symbol] [Fintype Symbol]`, so an encoding +slots into a Turing machine without an extra hypothesis. Required a new +import: `Mathlib.Data.Fintype.Defs`. + +### 4. Design refactor: `StringEncoding` as the primitive notion + +Previously `StringEncodable` was a class bundling encoder + decoder + +correctness proof — i.e. data masquerading as a property. This forces a +single canonical encoding per type, which is wrong for computability +because the choice of encoding matters (e.g. unary vs. binary `Nat`). + +The new design splits these: + +```lean +structure StringEncoding (α : Type) (Symbol : Type) where + encode : α → List Symbol + decode : List Symbol → Option α + decode_encode_eq : ∀ a, decode (encode a) = some a + +class StringEncodable (α : Type) (Symbol : Type) : Prop where + nonempty_encoding : Nonempty (StringEncoding α Symbol) +``` + +- `StringEncoding α Symbol` is a **value-level** record. A type can carry many + encodings, named distinctly (`StringEncoding.natUnary`, `StringEncoding.natBinary`, + …). Code that depends on a specific representation takes a `StringEncoding` + argument explicitly. + +- `StringEncodable α Symbol` is a **`Prop`-valued** typeclass — purely the + existence claim "some encoding exists". It carries no representation + choice and so cannot accidentally pick one for you. + +- `StringEncodable.of_encoding` promotes any concrete `StringEncoding` to the + existence property. + +### 5. Concrete encodings + +All instances live over the `Bool` alphabet (the textbook computability +convention). + +#### Trivial / basic + +- **`StringEncoding.listSymbol`** — identity encoding of `List Symbol`. Generic + in `Symbol`; the only one that does not fix the alphabet. + +- **`StringEncoding.bool`** — `b ↦ [b]`. Sanity-check level. + +- **`StringEncoding.natUnary`** — `n ↦ replicate n true ++ [false]`. The + trailing `false` makes this **self-delimiting** — a decoder can locate + the end of the code without an external length field. This is the + property that makes `StringEncoding.natUnaryPair` (below) work. + +- **`StringEncoding.natBinary`** — LSB-first binary encoding via `encodeBinaryNat` + / `decodeBinaryNat`. Exponentially more compact than `natUnary` but + *not* self-delimiting, so it cannot be naively concatenated. Correctness + proved by strong induction. Provided as the second `Nat`-encoding to + demonstrate that multiple choices coexist as plain values. + +#### Composite / derived + +- **`StringEncoding.option (e : StringEncoding α Bool) : StringEncoding (Option α) Bool`** — + tagged with a leading bit. Note this is a *function* on encodings, not + an instance: different choices of `e` give different `Option`-encodings. + A `StringEncodable` instance is still derived (because *some* encoding + exists once we have one for `α`). + +- **`StringEncoding.natUnaryPair`** (nontrivial example) — `Nat × Nat` as the + concatenation of two unary codes. The canonical illustration of why + self-delimiting codes matter: the decoder splits the flat list at the + first `false` without any auxiliary length field. Proof factored + through two lemmas describing how a unary `Nat` encoding interacts with + any trailing list. + +### 6. `StringEncodable` instances + +Provided for `List Symbol`, `Bool`, `Nat` (via `natUnary` as the default +witness), `Option α` (given `StringEncodable α Bool`), and `Nat × Nat`. +These only register *existence* — the specific `Nat`-encoding is not +fixed by the instance, and downstream code that cares must take an +explicit `StringEncoding Nat Bool` argument. + +## Design choices and trade-offs + +- **Public, not private, helpers.** Lean's new module system disallows + exposing a public definition whose RHS directly aliases a `private` + helper. The helper functions (`encodeBinaryNat`, `decodeBinaryNat`, + `takeWhile_id_encode_natUnary_append`, …) are therefore public but + prefixed and docstring-tagged as implementation support. A + finer-grained encapsulation could be done with explicit `protected` + declarations if desired. + +- **`Bool` alphabet, not generic.** Useful instances over a fully + generic `Symbol` require extra structure (separators, etc.) — exactly + the difficulty the reviewer noted. Committing to `Bool` gives concrete + examples immediately. A follow-up could add encodings over richer + alphabets (`Char`, `Fin n`, …) or factor out a "self-delimiting" + predicate. + +- **Two `Nat` encodings, one default witness.** The `StringEncodable` + instance uses `natUnary`. This is a witness, not a canonical choice — + any code that depends on the representation should not consult + `StringEncodable` and should instead take a `StringEncoding Nat Bool` + argument. + +- **Single file, not split.** Encodings were added inline rather than in + `Encoding/Instances.lean`. The file is still under 200 lines; the + split would be premature. + +## Verification + +All three files were checked via the Lean LSP and compile with no errors: + +- `Cslib/Computability/Encoding.lean` +- `Cslib/Computability/Machines/SingleTapeTuring/Basic.lean` +- `Cslib.lean` + +A full `lake build` was attempted but failed unrelated to the code: the +machine is out of disk space (38 MiB free on the volume, mathlib cache +extraction failing with `No space left on device`). The Lean LSP — which +incrementally compiles in-memory and does not need to write `.olean` +files — confirms the code is well-formed. + +No commits were made. From bb1e0df9c4f22d78b6efada876de3be2e4be9a55 Mon Sep 17 00:00:00 2001 From: Sfgangloff Date: Wed, 27 May 2026 18:27:12 +0900 Subject: [PATCH 7/7] fix: remove .md --- encoding-instances-report.md | 175 ----------------------------------- 1 file changed, 175 deletions(-) delete mode 100644 encoding-instances-report.md diff --git a/encoding-instances-report.md b/encoding-instances-report.md deleted file mode 100644 index c9203053f..000000000 --- a/encoding-instances-report.md +++ /dev/null @@ -1,175 +0,0 @@ -# Report: Encoding.lean refactor and example encodings - -## Context - -PR review feedback from SamuelSchlesinger on the `StringEncodable` typeclass: - -> I think it would be nice if we had instances for common types, which I -> understand is hard to do with a generic Symbol. Feels not so useful without -> that. - -In parallel, the file `StringEncoding.lean` had been moved from -`Cslib/Computability/Machines/SingleTapeTuring/Encoding.lean` to -`Cslib/Computability/Encoding.lean`, leaving a broken import in -`Machines/SingleTapeTuring/Basic.lean` and an untracked file in the new -location. - -After an initial pass adding instances directly to a class-based -`StringEncodable`, the user pointed out a deeper design issue: for Turing -machines you typically want to *commit to a particular encoding* (unary, -binary, …), and the same type can carry several. Bundling encoder/decoder as -typeclass data forces a single canonical choice, which is the wrong shape. - -This report documents the cleanup and the refactor. - -## Changes - -### 1. File location - -Kept `StringEncoding.lean` at `Cslib/Computability/Encoding.lean`. - -**Why:** `StringEncoding α Symbol` and `StringEncodable α Symbol` are generic -notions with no dependency on Turing-machine specifics. Other machine -models (URM, lambda calculus, etc.) could reasonably consume them. Putting -the file under `Machines/SingleTapeTuring/` was overly narrow; the top -level of `Computability/` reflects its scope correctly. - -### 2. Fixed broken imports / module wiring - -- `Cslib/Computability/Machines/SingleTapeTuring/Basic.lean` line 12: - `Cslib.Computability.Turing.Encoding` → `Cslib.Computability.Encoding`. -- `Cslib.lean`: added `public import Cslib.Computability.Encoding` in - alphabetical position. -- `Cslib/Computability/Encoding.lean`: added the `module` declaration and - an `@[expose] public section` so it can be `public import`-ed by other - modules (which `Basic.lean` requires). - -### 3. Finiteness of the alphabet - -Both `StringEncoding` and `StringEncodable` now carry a `[Fintype Symbol]` -instance argument. Without finiteness the abstraction is vacuous — one can -always take `Symbol = α` and encode `a ↦ [a]`. Using `Fintype` (rather than -the weaker `Finite`) matches the constraint already used by -`SingleTapeTM Symbol [Inhabited Symbol] [Fintype Symbol]`, so an encoding -slots into a Turing machine without an extra hypothesis. Required a new -import: `Mathlib.Data.Fintype.Defs`. - -### 4. Design refactor: `StringEncoding` as the primitive notion - -Previously `StringEncodable` was a class bundling encoder + decoder + -correctness proof — i.e. data masquerading as a property. This forces a -single canonical encoding per type, which is wrong for computability -because the choice of encoding matters (e.g. unary vs. binary `Nat`). - -The new design splits these: - -```lean -structure StringEncoding (α : Type) (Symbol : Type) where - encode : α → List Symbol - decode : List Symbol → Option α - decode_encode_eq : ∀ a, decode (encode a) = some a - -class StringEncodable (α : Type) (Symbol : Type) : Prop where - nonempty_encoding : Nonempty (StringEncoding α Symbol) -``` - -- `StringEncoding α Symbol` is a **value-level** record. A type can carry many - encodings, named distinctly (`StringEncoding.natUnary`, `StringEncoding.natBinary`, - …). Code that depends on a specific representation takes a `StringEncoding` - argument explicitly. - -- `StringEncodable α Symbol` is a **`Prop`-valued** typeclass — purely the - existence claim "some encoding exists". It carries no representation - choice and so cannot accidentally pick one for you. - -- `StringEncodable.of_encoding` promotes any concrete `StringEncoding` to the - existence property. - -### 5. Concrete encodings - -All instances live over the `Bool` alphabet (the textbook computability -convention). - -#### Trivial / basic - -- **`StringEncoding.listSymbol`** — identity encoding of `List Symbol`. Generic - in `Symbol`; the only one that does not fix the alphabet. - -- **`StringEncoding.bool`** — `b ↦ [b]`. Sanity-check level. - -- **`StringEncoding.natUnary`** — `n ↦ replicate n true ++ [false]`. The - trailing `false` makes this **self-delimiting** — a decoder can locate - the end of the code without an external length field. This is the - property that makes `StringEncoding.natUnaryPair` (below) work. - -- **`StringEncoding.natBinary`** — LSB-first binary encoding via `encodeBinaryNat` - / `decodeBinaryNat`. Exponentially more compact than `natUnary` but - *not* self-delimiting, so it cannot be naively concatenated. Correctness - proved by strong induction. Provided as the second `Nat`-encoding to - demonstrate that multiple choices coexist as plain values. - -#### Composite / derived - -- **`StringEncoding.option (e : StringEncoding α Bool) : StringEncoding (Option α) Bool`** — - tagged with a leading bit. Note this is a *function* on encodings, not - an instance: different choices of `e` give different `Option`-encodings. - A `StringEncodable` instance is still derived (because *some* encoding - exists once we have one for `α`). - -- **`StringEncoding.natUnaryPair`** (nontrivial example) — `Nat × Nat` as the - concatenation of two unary codes. The canonical illustration of why - self-delimiting codes matter: the decoder splits the flat list at the - first `false` without any auxiliary length field. Proof factored - through two lemmas describing how a unary `Nat` encoding interacts with - any trailing list. - -### 6. `StringEncodable` instances - -Provided for `List Symbol`, `Bool`, `Nat` (via `natUnary` as the default -witness), `Option α` (given `StringEncodable α Bool`), and `Nat × Nat`. -These only register *existence* — the specific `Nat`-encoding is not -fixed by the instance, and downstream code that cares must take an -explicit `StringEncoding Nat Bool` argument. - -## Design choices and trade-offs - -- **Public, not private, helpers.** Lean's new module system disallows - exposing a public definition whose RHS directly aliases a `private` - helper. The helper functions (`encodeBinaryNat`, `decodeBinaryNat`, - `takeWhile_id_encode_natUnary_append`, …) are therefore public but - prefixed and docstring-tagged as implementation support. A - finer-grained encapsulation could be done with explicit `protected` - declarations if desired. - -- **`Bool` alphabet, not generic.** Useful instances over a fully - generic `Symbol` require extra structure (separators, etc.) — exactly - the difficulty the reviewer noted. Committing to `Bool` gives concrete - examples immediately. A follow-up could add encodings over richer - alphabets (`Char`, `Fin n`, …) or factor out a "self-delimiting" - predicate. - -- **Two `Nat` encodings, one default witness.** The `StringEncodable` - instance uses `natUnary`. This is a witness, not a canonical choice — - any code that depends on the representation should not consult - `StringEncodable` and should instead take a `StringEncoding Nat Bool` - argument. - -- **Single file, not split.** Encodings were added inline rather than in - `Encoding/Instances.lean`. The file is still under 200 lines; the - split would be premature. - -## Verification - -All three files were checked via the Lean LSP and compile with no errors: - -- `Cslib/Computability/Encoding.lean` -- `Cslib/Computability/Machines/SingleTapeTuring/Basic.lean` -- `Cslib.lean` - -A full `lake build` was attempted but failed unrelated to the code: the -machine is out of disk space (38 MiB free on the volume, mathlib cache -extraction failing with `No space left on device`). The Lean LSP — which -incrementally compiles in-memory and does not need to write `.olean` -files — confirms the code is well-formed. - -No commits were made.