Skip to content

Commit 6b24b26

Browse files
committed
Integrate sketchlib-rust for KLL quantile sketches
1 parent 0226a98 commit 6b24b26

24 files changed

Lines changed: 2263 additions & 442 deletions

Cargo.lock

Lines changed: 15 additions & 1 deletion
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

asap-common/sketch-core/Cargo.toml

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,3 +9,8 @@ serde.workspace = true
99
rmp-serde = "1.1"
1010
xxhash-rust = { version = "0.8", features = ["xxh32"] }
1111
dsrs = { git = "https://github.com/ProjectASAP/datasketches-rs" }
12+
sketchlib-rust = { git = "https://github.com/ProjectASAP/sketchlib-rust" }
13+
clap = { version = "4.0", features = ["derive"] }
14+
15+
[dev-dependencies]
16+
ctor = "0.2"

asap-common/sketch-core/report.md

Lines changed: 131 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,131 @@
1+
# Report
2+
3+
Compares the **legacy** sketch implementations in `sketch-core` vs the new **sketchlib-rust** backends for:
4+
5+
- `CountMinSketch`
6+
- `CountMinSketchWithHeap` (Count-Min portion)
7+
- `KllSketch`
8+
- `HydraKllSketch` (via `KllSketch`)
9+
10+
11+
12+
13+
### Fidelity harness
14+
15+
The fidelity binary now selects backends via CLI flags instead of environment variables.
16+
17+
| Goal | Command |
18+
|--------------------------|--------------------------------------------------------------------------------------------------------------|
19+
| Default (all sketchlib) | `cargo run -p sketch-core --bin sketchlib_fidelity` |
20+
| All legacy | `cargo run -p sketch-core --bin sketchlib_fidelity -- --cms-impl legacy --kll-impl legacy --cmwh-impl legacy` |
21+
| Legacy KLL only | `cargo run -p sketch-core --bin sketchlib_fidelity -- --cms-impl sketchlib --kll-impl legacy --cmwh-impl sketchlib` |
22+
23+
### Unit tests
24+
25+
Unit tests always run with **legacy** backends enabled (the test ctor calls
26+
`force_legacy_mode_for_tests()`), so you only need:
27+
28+
```bash
29+
cargo test -p sketch-core
30+
```
31+
32+
## Results
33+
34+
### CountMinSketch (accuracy vs exact counts)
35+
36+
#### depth=3
37+
38+
| width | n | domain | Mode | Pearson corr | MAPE (%) | RMSE (%) |
39+
|-------|--------|--------|----------------|----------------|----------|----------|
40+
| 1024 | 100000 | 1000 | Legacy | 0.9998451189 | 24.48 | 52.76 |
41+
| 1024 | 100000 | 1000 | sketchlib-rust | 0.9998387103 | 24.36 | 54.11 |
42+
43+
#### depth=5
44+
45+
| width | n | domain | Mode | Pearson corr | MAPE (%) | RMSE (%) |
46+
|-------|--------|--------|----------------|----------------|----------|----------|
47+
| 2048 | 200000 | 2000 | Legacy | 0.9999733814 | 8.75 | 29.94 |
48+
| 2048 | 200000 | 2000 | sketchlib-rust | 0.9999744627 | 8.37 | 28.84 |
49+
| 2048 | 50000 | 500 | Legacy | 1.0000000000 | 0.00 | 0.00 |
50+
| 2048 | 50000 | 500 | sketchlib-rust | 1.0000000000 | 0.00 | 0.00 |
51+
52+
#### depth=7
53+
54+
| width | n | domain | Mode | Pearson corr | MAPE (%) | RMSE (%) |
55+
|-------|--------|--------|----------------|----------------|----------|----------|
56+
| 4096 | 200000 | 2000 | Legacy | 0.9999993694 | 0.20 | 3.69 |
57+
| 4096 | 200000 | 2000 | sketchlib-rust | 0.9999993499 | 0.21 | 4.27 |
58+
59+
---
60+
61+
### CountMinSketchWithHeap (top-k + CMS accuracy on exact top-k)
62+
63+
The heap is maintained by local updates; recall is measured against the **true** top-k at the end of the stream.
64+
65+
#### depth=3
66+
67+
| width | n | domain | heap_size | Mode | Top-k recall | Pearson (top-k) | MAPE (%) | RMSE (%) |
68+
|-------|--------|--------|-----------|----------------|--------------|-----------------|----------|----------|
69+
| 1024 | 100000 | 1000 | 10 | Legacy | 0.40 | 0.9571 | 0.174 | 0.319 |
70+
| 1024 | 100000 | 1000 | 10 | sketchlib-rust | 0.40 | 1.0000 | 0.000 | 0.000 |
71+
72+
#### depth=5
73+
74+
| width | n | domain | heap_size | Mode | Top-k recall | Pearson (top-k) | MAPE (%) | RMSE (%) |
75+
|-------|--------|--------|-----------|----------------|--------------|-----------------|----------|----------|
76+
| 2048 | 200000 | 2000 | 20 | Legacy | 0.60 | 0.9964 | 0.045 | 0.101 |
77+
| 2048 | 200000 | 2000 | 20 | sketchlib-rust | 0.60 | 0.9982 | 0.021 | 0.067 |
78+
| 2048 | 200000 | 2000 | 50 | Legacy | 0.40 | 0.9999983 | 5.60 | 16.49 |
79+
| 2048 | 200000 | 2000 | 50 | sketchlib-rust | 0.40 | 0.9999990 | 3.90 | 12.95 |
80+
81+
---
82+
83+
### KllSketch (quantiles, absolute rank error)
84+
85+
For each quantile \(q\), we compute the sketch estimate `est_value`, then:
86+
`abs_rank_error = |rank_fraction(exact_sorted_values, est_value) - q|`.
87+
88+
#### k=20
89+
90+
| n_updates | Mode | q=0.5 | q=0.9 | q=0.99 |
91+
|-----------|----------------|---------|---------|---------|
92+
| 200000 | Legacy | 0.0104 | 0.0145 | 0.0028 |
93+
| 200000 | sketchlib-rust | 0.0275 | 0.0470 | 0.0061 |
94+
| 50000 | Legacy | 0.0131 | 0.0091 | 0.0054 |
95+
| 50000 | sketchlib-rust | 0.0110 | 0.0116 | 0.0031 |
96+
97+
#### k=50
98+
99+
| n_updates | Mode | q=0.5 | q=0.9 | q=0.99 |
100+
|-----------|----------------|---------|---------|---------|
101+
| 200000 | Legacy | 0.0013 | 0.0021 | 0.0012 |
102+
| 200000 | sketchlib-rust | 0.0101 | 0.0044 | 0.0074 |
103+
104+
#### k=200
105+
106+
| n_updates | Mode | q=0.5 | q=0.9 | q=0.99 |
107+
|-----------|----------------|---------|---------|---------|
108+
| 200000 | Legacy | 0.0021 | 0.0036 | 0.0000 |
109+
| 200000 | sketchlib-rust | 0.0015 | 0.0001 | 0.0002 |
110+
111+
---
112+
113+
### HydraKllSketch (per-key quantiles, mean/max absolute rank error across 50 keys)
114+
115+
#### rows=2, cols=64
116+
117+
| k | n | domain | Mode | q=0.5 (mean / max) | q=0.9 (mean / max) |
118+
|-----|--------|--------|----------------|--------------------|--------------------|
119+
| 20 | 200000 | 200 | Legacy | 0.0170 / 0.0546 | 0.0165 / 0.0452 |
120+
| 20 | 200000 | 200 | sketchlib-rust | 0.0254 / 0.0629 | 0.0546 / 0.0942 |
121+
122+
#### rows=3, cols=128
123+
124+
| k | n | domain | Mode | q=0.5 (mean / max) | q=0.9 (mean / max) |
125+
|-----|--------|--------|----------------|--------------------|--------------------|
126+
| 20 | 200000 | 200 | Legacy | 0.0166 / 0.0591 | 0.0114 / 0.0304 |
127+
| 20 | 200000 | 200 | sketchlib-rust | 0.0216 / 0.0534 | 0.0238 / 0.1087 |
128+
| 50 | 200000 | 200 | Legacy | 0.0099 / 0.0352 | 0.0087 / 0.0330 |
129+
| 50 | 200000 | 200 | sketchlib-rust | 0.0119 / 0.0458 | 0.0119 / 0.0296 |
130+
| 20 | 100000 | 100 | Legacy | 0.0141 / 0.0574 | 0.0149 / 0.0471 |
131+
| 20 | 100000 | 100 | sketchlib-rust | 0.0202 / 0.0621 | 0.0287 / 0.0779 |

0 commit comments

Comments
 (0)