sketches: structural Clone (KLL, CMS-heap) + KLL direct bit-exact reconstruction#60
Merged
Conversation
… state KllSketch::clone serialized to msgpack and deserialized back on every clone. The backing KLL<f64> derives Clone, so clone it structurally instead — identical sketch, no rmp encode/decode and far fewer allocations. Add KLL::from_portable_state to rebuild a sketch directly from its portable wire form (k + level-ordered items + the levels[] boundary array): place the items straight into the internal buffer at the correct offset and fix up the level boundaries, then recompute the capacity cache. This replaces reconstructing a sketch by replaying every retained item through update() — it is bit-exact (identical quantiles, vs a lossy statistical reconstruction) and several times faster (no per-item compaction/sort/RNG). Exposed via KllSketch::from_portable_state for proto/envelope decoders; the empty-sketch case is handled. Add a test asserting from_portable_state reproduces the source sketch's quantiles exactly across the quantile range, plus the empty-sketch case. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
42561d6 to
9b57222
Compare
CMSHeap's fields (CountMin and HHHeap) both derive Clone, so derive Clone on CMSHeap and clone CountMinSketchWithHeap structurally instead of extracting the count matrix + top-k heap to wire form and rebuilding the backend on every clone. Same sketch, far less work per clone. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Avoid serialize→deserialize / extract→rebuild round-trips when copying or
decoding the portable sketches.
KLL
Structural
KllSketch::clone.clone()previously did a full msgpackserialize → deserialize round-trip. The backing
KLL<f64>already derivesClone, so this clones it structurally instead — same sketch, no rmpencode/decode, far fewer allocations.
KLL::from_portable_state— direct reconstruction. Rebuilds a sketchdirectly from its portable wire form (
k+ level-ordereditems+ thelevels[]boundary array) by placing items into the internal buffer at thecorrect offset and fixing up the level boundaries, instead of replaying every
retained item through
update(). This is bit-exact (identical quantilesto the source — the replay path was a lossy statistical reconstruction) and
several× faster. Exposed via
KllSketch::from_portable_state(...)forproto/envelope decoders; the empty-sketch case is handled.
CMS-with-heap
CountMinSketchWithHeap::clone.CMSHeap's fields(
CountMin,HHHeap) both deriveClone, soCMSHeapnow derivesCloneand the wrapper clones the backend structurally instead of extracting the
count matrix + top-k heap to wire form and rebuilding it on every clone.
Test
from_portable_state_reproduces_source_exactlyasserts the reconstructed KLLsketch's quantiles match the source exactly across
[0.0 .. 1.0], plus theempty-sketch round-trip.
(
count_min_sketch_with_heap_round_trip,count_min_with_heap_wire_shape).🤖 Generated with Claude Code