Skip to content

sketches: structural Clone (KLL, CMS-heap) + KLL direct bit-exact reconstruction#60

Merged
zzylol merged 2 commits into
mainfrom
kll-direct-reconstruction
May 26, 2026
Merged

sketches: structural Clone (KLL, CMS-heap) + KLL direct bit-exact reconstruction#60
zzylol merged 2 commits into
mainfrom
kll-direct-reconstruction

Conversation

@zzylol

@zzylol zzylol commented May 26, 2026

Copy link
Copy Markdown
Contributor

Summary

Avoid serialize→deserialize / extract→rebuild round-trips when copying or
decoding the portable sketches.

KLL

  1. Structural KllSketch::clone. clone() previously did a full msgpack
    serialize → deserialize round-trip. The backing KLL<f64> already derives
    Clone, so this clones it structurally instead — same sketch, no rmp
    encode/decode, far fewer allocations.

  2. KLL::from_portable_state — direct reconstruction. Rebuilds a sketch
    directly from its portable wire form (k + level-ordered items + the
    levels[] boundary array) by placing items into the internal buffer at the
    correct offset and fixing up the level boundaries, instead of replaying every
    retained item through update(). This is bit-exact (identical quantiles
    to the source — the replay path was a lossy statistical reconstruction) and
    several× faster. Exposed via KllSketch::from_portable_state(...) for
    proto/envelope decoders; the empty-sketch case is handled.

CMS-with-heap

  1. Structural CountMinSketchWithHeap::clone. CMSHeap's fields
    (CountMin, HHHeap) both derive Clone, so CMSHeap now derives Clone
    and the wrapper clones the backend structurally instead of extracting the
    count matrix + top-k heap to wire form and rebuilding it on every clone.

Test

  • from_portable_state_reproduces_source_exactly asserts the reconstructed KLL
    sketch's quantiles match the source exactly across [0.0 .. 1.0], plus the
    empty-sketch round-trip.
  • CMS-with-heap clone/round-trip is covered by the existing suite
    (count_min_sketch_with_heap_round_trip, count_min_with_heap_wire_shape).

🤖 Generated with Claude Code

@zzylol zzylol requested a review from GordonYuanyc May 26, 2026 19:12
… state

KllSketch::clone serialized to msgpack and deserialized back on every
clone. The backing KLL<f64> derives Clone, so clone it structurally
instead — identical sketch, no rmp encode/decode and far fewer
allocations.

Add KLL::from_portable_state to rebuild a sketch directly from its
portable wire form (k + level-ordered items + the levels[] boundary
array): place the items straight into the internal buffer at the correct
offset and fix up the level boundaries, then recompute the capacity
cache. This replaces reconstructing a sketch by replaying every retained
item through update() — it is bit-exact (identical quantiles, vs a lossy
statistical reconstruction) and several times faster (no per-item
compaction/sort/RNG). Exposed via KllSketch::from_portable_state for
proto/envelope decoders; the empty-sketch case is handled.

Add a test asserting from_portable_state reproduces the source sketch's
quantiles exactly across the quantile range, plus the empty-sketch case.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@zzylol zzylol force-pushed the kll-direct-reconstruction branch from 42561d6 to 9b57222 Compare May 26, 2026 19:14
CMSHeap's fields (CountMin and HHHeap) both derive Clone, so derive Clone on
CMSHeap and clone CountMinSketchWithHeap structurally instead of extracting the
count matrix + top-k heap to wire form and rebuilding the backend on every
clone. Same sketch, far less work per clone.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@zzylol zzylol changed the title KLL: structural clone + direct bit-exact reconstruction from portable state sketches: structural Clone (KLL, CMS-heap) + KLL direct bit-exact reconstruction May 26, 2026
@zzylol zzylol merged commit 92d25a6 into main May 26, 2026
3 checks passed
@zzylol zzylol deleted the kll-direct-reconstruction branch May 26, 2026 19:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant