SecurityLab-UCD · EYH0602 · Feb 6, 2026 · Feb 6, 2026 · Feb 6, 2026 · Feb 6, 2026
diff --git a/AGENTS.md b/AGENTS.md
@@ -20,7 +20,7 @@ Future metrics (e.g., audio coverage, state-graph coverage) will follow the same
 │       ├── cov_base.py          # Abstract protocols: CoverageItem, Coverage, CoverageMonitor
 │       ├── frame.py             # Frame dataclass (PIL Image wrapper with average-hash)
 │       ├── dedup.py             # Deduplication algorithms (pHash, SSIM [deprecated])
-│       ├── frame_cov.py         # FrameCoverage, FrameMonitor, BKFrameMonitor, BK-tree
+│       ├── frame_cov.py         # FrameCoverage, FrameMonitor, BKFrameMonitor, BK-tree, UnionFind
 │       ├── loader.py            # MP4 loading: bulk, lazy (generator), last-n
 │       ├── writer.py            # MP4 writing: imageio and OpenCV backends
 │       ├── stitch.py            # Panorama stitching of unique frames
@@ -59,7 +59,7 @@ See [docs/design.md](docs/design.md) for the coverage framework architecture, fr
 | `cov_base.py` | `CoverageItem`, `Coverage[T]`, `CoverageMonitor[T]` protocols/ABC |
 | `frame.py` | `Frame` dataclass (PIL Image + average-hash) |
 | `dedup.py` | `is_dup()`, `dedup_unique_frames()`, `dedup_unique_hashes()`, `ssim_dedup()` [deprecated] |
-| `frame_cov.py` | `FrameCoverage`, `FrameMonitor`, `BKFrameMonitor`, `get_frame_cov()` |
+| `frame_cov.py` | `FrameCoverage`, `FrameMonitor`, `BKFrameMonitor`, `get_frame_cov()`, `_UnionFind`, `_BKTree` |
 | `loader.py` | `load_mp4()`, `load_mp4_lazy()`, `load_mp4_last_n()` |
 | `writer.py` | `write_mp4()`, `write_mp4_cv2()` |
 | `stitch.py` | `stitch_images()` (panorama via AffineStitcher) |

diff --git a/docs/design.md b/docs/design.md
diff --git a/docs/frame_cov.md b/docs/frame_cov.md
@@ -0,0 +1,206 @@
+# Design
+
+## Coverage Framework
+
+`gamecov` is built on a generic `Protocol`-based framework (`cov_base.py`) that decouples coverage types from monitoring logic.
+
+### Protocols
+
+- **`CoverageItem`** — anything hashable and stringable. All coverage data points must satisfy this protocol.
+- **`Coverage[T]`** — an execution trace exposing:
+    - `.trace` — ordered list of all items encountered.
+    - `.coverage` — deduplicated set of unique items.
+    - `.path_id` — SHA1 fingerprint of the unique coverage set.
+- **`CoverageMonitor[T]`** — accumulates coverage across sessions:
+    - `.add_cov(cov)` — merge new coverage into the monitor.
+    - `.is_seen(cov)` — check whether a path has already been recorded.
+    - `.item_seen` / `.path_seen` — running totals.
+
+Frame coverage (`FrameCoverage`, `FrameMonitor`, `BKFrameMonitor`) is the first concrete implementation. The framework is designed to support future metrics such as audio coverage or state-graph coverage with no changes to the monitoring interface.
+
+### Key Invariants
+
+- `len(item_seen)` is **monotonically non-decreasing** — `add_cov()` can only grow the set of distinct hashes, never shrink it. This property is verified by the `test_monotone*` test suite.
+- `coverage_count` (connected-component count, `BKFrameMonitor` only) is **order-independent**: the same set of hashes always produces the same count regardless of insertion order. It may transiently decrease when a bridging hash merges two clusters.
+
+## Why Frame Coverage Is a Valid Fuzzing Metric
+
+A useful fuzzing coverage metric must satisfy two properties:
+
+1. **Monotonicity** — coverage never decreases as more inputs are observed. A fuzzer can safely interpret "coverage stopped growing" as saturation.
+2. **Order-independence** — the final coverage value depends only on *which* inputs were observed, not *when*. This makes coverage comparable across runs with different scheduling strategies.
+
+`gamecov` provides two monitors. We justify each property for both.
+
+### Definitions
+
+Let *H* = {h₁, h₂, …} be the universe of pHash values (64-bit vectors).
+Define the **neighbourhood graph** G(S) for a set S ⊆ H as:
+
+- Vertices: S
+- Edges: {(a, b) | hamming(a, b) ≤ RADIUS}
+
+Hamming distance is a **metric** (non-negative, symmetric, zero iff equal, and satisfies the triangle inequality). This makes G(S) a well-defined undirected graph for any S.
+
+### Monotonicity
+
+**`item_seen` (both monitors).** `add_cov` only ever *inserts* hashes into `item_seen`; it never removes them. Therefore `|item_seen|` is monotonically non-decreasing across successive `add_cov` calls.
+
+*Proof.* Each call iterates over `cov.coverage`. A hash is added to `item_seen` if and only if it was not already present (the exact-duplicate check short-circuits). No code path removes elements from the set. ∎
+
+This is the metric used by `FrameMonitor.coverage_count` (which returns `len(item_seen)`). It is also monotonic in `BKFrameMonitor`, but `BKFrameMonitor` uses a different primary metric (see below).
+
+### Order-Independence (`BKFrameMonitor`)
+
+**`coverage_count` = number of connected components of G(item_seen).**
+
+*Claim.* For any fixed set S of hashes, the connected-component count cc(G(S)) is uniquely determined by S, regardless of the order in which the hashes were inserted.
+
+*Proof.* The graph G(S) is defined purely by the set S and the distance predicate hamming(a, b) ≤ RADIUS. Neither depends on insertion order. The number of connected components is a property of the graph, not of how it was constructed.
+
+The implementation maintains this invariant incrementally via union-find:
+
+1. When a new hash x is inserted, `find_all_within(x, RADIUS)` returns **all** existing hashes within the radius (not just the first match).
+2. x is unioned with every such neighbour. After the union step, any path of radius-edges connecting x to any existing component is faithfully captured.
+3. Because union-find tracks *all* edges, not just first-seen ones, the resulting component structure is identical to computing cc(G(S)) from scratch.
+
+This is verified empirically by `test_order_independent_coverage`, which asserts identical `coverage_count` across original, reversed, and randomly shuffled insertion orders. ∎
+
+**Why `FrameMonitor` is order-dependent.** The greedy first-seen-wins dedup skips a hash if *any* existing hash is within RADIUS. Because the "within RADIUS" relation is **not transitive** (a is near b, b is near c, but a may not be near c), the set of retained hashes depends on which hash was encountered first. Different orderings can yield different retained sets and therefore different counts.
+
+### Non-Monotonicity of `coverage_count` Is Expected
+
+`BKFrameMonitor.coverage_count` may *decrease* when a new hash bridges two previously separate components. For example:
+
+```
+Before:  {A}  {B}       (2 components, hamming(A,B) > RADIUS)
+Insert C where hamming(A,C) ≤ RADIUS and hamming(B,C) ≤ RADIUS
+After:   {A, B, C}      (1 component)
+```
+
+This is correct: the new hash genuinely reduces the number of distinct visual clusters. In a fuzzing context, `coverage_count` decreasing means the fuzzer discovered that two previously-distinct regions are actually connected — this is valuable information, not a metric error. A fuzzer should track `coverage_count` (clusters explored) alongside `len(item_seen)` (total distinct observations) and use both signals.
+
+### Cross-Run Comparability
+
+Because `coverage_count` depends only on the set of observed hashes:
+
+- Two fuzzing campaigns over the same game can be directly compared: the campaign with more connected components explored more visually distinct regions.
+- Merging coverage from two campaigns is straightforward: take the union of their hash sets and recompute components. The result equals what a single campaign observing all those hashes would report.
+- The metric is **idempotent**: adding a recording that contributes no new hashes changes nothing.
+
+These properties make `BKFrameMonitor.coverage_count` suitable as a fuzzing progress metric analogous to edge coverage in traditional software fuzzing.
+
+### Summary of Metric Properties
+
+| Property | `FrameMonitor` (`len(item_seen)`) | `BKFrameMonitor` (`coverage_count`) |
+|----------|-----------------------------------|-------------------------------------|
+| Monotonic | Yes | No (may decrease on bridge) |
+| Order-independent | No (greedy first-seen-wins) | Yes (graph-theoretic) |
+| Cross-run comparable | No | Yes |
+| Mergeable | No | Yes (set union) |
+| Idempotent | Yes | Yes |
+
+`BKFrameMonitor` is the recommended monitor for production fuzzing. `FrameMonitor` remains available for backward compatibility and as a simpler baseline.
+
+## Frame Coverage
+
+### Perceptual Hashing
+
+Each video frame is hashed with pHash (`imagehash.phash`, 8x8 by default). Two frames are considered duplicates if the Hamming distance between their hashes is within `RADIUS` (default 5 bits). This tolerates minor visual differences (compression artifacts, slight camera movement) while distinguishing meaningfully different game states.
+
+The `Frame` dataclass uses `imagehash.average_hash` for Python `__hash__`/set membership, while deduplication logic uses the more discriminating pHash.
+
+### Pipeline
+
+```
+MP4 recording
+  |
+  v
+loader.py -- load_mp4_lazy() / load_mp4() / load_mp4_last_n()
+  |
+  v
+Iterable[Frame]           (PIL Image wrapped with average-hash)
+  |
+  v
+dedup.py -- dedup_unique_hashes()    (pHash + Hamming distance)
+  |
+  v
+FrameCoverage             (.coverage -> set[ImageHash], .path_id -> SHA1)
+  |
+  v
+FrameMonitor / BKFrameMonitor       (.add_cov() accumulates unique items)
+  |
+  v
+Coverage statistics       (.item_seen, .path_seen)
+```
+
+### Loading Strategies
+
+| Function            | Behavior                       | Use Case                           |
+| ------------------- | ------------------------------ | ---------------------------------- |
+| `load_mp4()`        | Decode all frames into memory  | Small videos, random access needed |
+| `load_mp4_lazy()`   | Generator, one frame at a time | Large videos, memory-constrained   |
+| `load_mp4_last_n()` | Seek + decode last _n_ frames  | Tail sampling                      |
+
+All loaders use `imageio.v3` with the PyAV plugin.
+
+## BK-Tree Optimization
+
+The naive `FrameMonitor` checks each new hash against all previously seen hashes — O(N\*M) per session. `BKFrameMonitor` uses a [Burkhard-Keller tree](https://en.wikipedia.org/wiki/BK-tree) that indexes hashes by Hamming distance in a metric space.
+
+### How it works
+
+1. Each image hash is packed into an integer via `numpy.packbits`.
+2. The BK-tree stores these integers. Distances are computed with `(x ^ y).bit_count()` (popcount = Hamming distance).
+3. On lookup, the triangle inequality prunes branches: for a query point _x_ with radius _r_ at a node with distance _d_, only children with keys in [d-r, d+r] need to be visited.
+
+### Order-Independent Coverage via Union-Find
+
+The greedy first-seen-wins dedup in `FrameMonitor` is **order-dependent**: processing the same recordings in different orders can yield different coverage counts (because the "is duplicate" relation is not transitive).
+
+`BKFrameMonitor` solves this with a union-find (disjoint-set) structure:
+
+1. **Every** distinct hash is inserted into the BK-tree (no greedy skip).
+2. On insertion, `find_all_within(x, radius)` locates all existing neighbours.
+3. The new hash is unioned with every neighbour in the union-find.
+4. `coverage_count` = number of connected components = number of disjoint clusters.
+
+Because the Hamming-distance graph depends only on which hashes exist (not insertion order), the connected-component count is fully order-independent.
+
+**Trade-off**: `coverage_count` may transiently _decrease_ when a new hash bridges two previously separate components. `len(item_seen)` (total distinct hashes) remains monotonically non-decreasing.
+
+### Performance
+
+Benchmarked on the SMB dataset with `N_MAX=500` recordings:
+
+| Monitor          | Time  |
+| ---------------- | ----- |
+| `FrameMonitor`   | ~237s |
+| `BKFrameMonitor` | ~187s |
+
+~21% speedup, and the gap widens as the number of accumulated hashes grows.
+
+## Stitching
+
+`stitch_images()` combines unique frames into a panorama using `stitching.AffineStitcher`. Feature detection uses SIFT (default) or ORB. The confidence threshold controls how aggressively frames are matched (range 0.4-0.6). This is primarily a visualization tool — the coverage measurement itself does not depend on stitching.
+
+## Configuration
+
+| Environment Variable | Default | Description                                        |
+| -------------------- | ------- | -------------------------------------------------- |
+| `RADIUS`             | `5`     | Hamming distance threshold for frame deduplication |
+| `N_MAX`              | `100`   | Max recordings to process in monotonicity tests    |
+
+## Dependencies
+
+| Library          | Purpose                                   |
+| ---------------- | ----------------------------------------- |
+| `imageio` + `av` | Video decoding/encoding via PyAV          |
+| `pillow`         | Image representation                      |
+| `imagehash`      | Perceptual hashing (pHash, average hash)  |
+| `opencv-python`  | Color conversion, video writing           |
+| `stitching`      | Panorama stitching                        |
+| `numpy`          | Numerical operations                      |
+| `returns`        | Functional error handling (`Result` type) |
+| `typer`          | CLI framework                             |
+| `hypothesis`     | Property-based testing                    |
diff --git a/src/gamecov/cov_base.py b/src/gamecov/cov_base.py
@@ -48,6 +48,16 @@ def is_seen(self, cov: Coverage[T]) -> bool:
     def add_cov(self, cov: Coverage[T]) -> None:
         """Add a new execution coverage record to the monitor."""
 
+    @property
+    def coverage_count(self) -> int:
+        """Number of unique coverage items.
+
+        The default implementation returns ``len(self.item_seen)``.
+        Subclasses may override to provide order-independent metrics
+        (e.g., connected-component count via union-find).
+        """
+        return len(self.item_seen)
+
     def reset(self) -> None:
         """Reset the monitor state."""
         self.path_seen.clear()