|
| 1 | +# Report |
| 2 | + |
| 3 | +Compares the **legacy** sketch implementations in `sketch-core` vs the new **sketchlib-rust** backends for: |
| 4 | + |
| 5 | +- `CountMinSketch` |
| 6 | +- `CountMinSketchWithHeap` (Count-Min portion) |
| 7 | +- `KllSketch` |
| 8 | +- `HydraKllSketch` (via `KllSketch`) |
| 9 | + |
| 10 | + |
| 11 | + |
| 12 | + |
| 13 | +### Fidelity harness |
| 14 | + |
| 15 | +The fidelity binary now selects backends via CLI flags instead of environment variables. |
| 16 | + |
| 17 | +| Goal | Command | |
| 18 | +|--------------------------|--------------------------------------------------------------------------------------------------------------| |
| 19 | +| Default (all sketchlib) | `cargo run -p sketch-core --bin sketchlib_fidelity` | |
| 20 | +| All legacy | `cargo run -p sketch-core --bin sketchlib_fidelity -- --cms-impl legacy --kll-impl legacy --cmwh-impl legacy` | |
| 21 | +| Legacy KLL only | `cargo run -p sketch-core --bin sketchlib_fidelity -- --cms-impl sketchlib --kll-impl legacy --cmwh-impl sketchlib` | |
| 22 | + |
| 23 | +### Unit tests |
| 24 | + |
| 25 | +Unit tests always run with **legacy** backends enabled (the test ctor calls |
| 26 | +`force_legacy_mode_for_tests()`), so you only need: |
| 27 | + |
| 28 | +```bash |
| 29 | +cargo test -p sketch-core |
| 30 | +``` |
| 31 | + |
| 32 | +## Results |
| 33 | + |
| 34 | +### CountMinSketch (accuracy vs exact counts) |
| 35 | + |
| 36 | +#### depth=3 |
| 37 | + |
| 38 | +| width | n | domain | Mode | Pearson corr | MAPE (%) | RMSE (%) | |
| 39 | +|-------|--------|--------|----------------|----------------|----------|----------| |
| 40 | +| 1024 | 100000 | 1000 | Legacy | 0.9998451189 | 24.48 | 52.76 | |
| 41 | +| 1024 | 100000 | 1000 | sketchlib-rust | 0.9998387103 | 24.36 | 54.11 | |
| 42 | + |
| 43 | +#### depth=5 |
| 44 | + |
| 45 | +| width | n | domain | Mode | Pearson corr | MAPE (%) | RMSE (%) | |
| 46 | +|-------|--------|--------|----------------|----------------|----------|----------| |
| 47 | +| 2048 | 200000 | 2000 | Legacy | 0.9999733814 | 8.75 | 29.94 | |
| 48 | +| 2048 | 200000 | 2000 | sketchlib-rust | 0.9999744627 | 8.37 | 28.84 | |
| 49 | +| 2048 | 50000 | 500 | Legacy | 1.0000000000 | 0.00 | 0.00 | |
| 50 | +| 2048 | 50000 | 500 | sketchlib-rust | 1.0000000000 | 0.00 | 0.00 | |
| 51 | + |
| 52 | +#### depth=7 |
| 53 | + |
| 54 | +| width | n | domain | Mode | Pearson corr | MAPE (%) | RMSE (%) | |
| 55 | +|-------|--------|--------|----------------|----------------|----------|----------| |
| 56 | +| 4096 | 200000 | 2000 | Legacy | 0.9999993694 | 0.20 | 3.69 | |
| 57 | +| 4096 | 200000 | 2000 | sketchlib-rust | 0.9999993499 | 0.21 | 4.27 | |
| 58 | + |
| 59 | +--- |
| 60 | + |
| 61 | +### CountMinSketchWithHeap (top-k + CMS accuracy on exact top-k) |
| 62 | + |
| 63 | +The heap is maintained by local updates; recall is measured against the **true** top-k at the end of the stream. |
| 64 | + |
| 65 | +#### depth=3 |
| 66 | + |
| 67 | +| width | n | domain | heap_size | Mode | Top-k recall | Pearson (top-k) | MAPE (%) | RMSE (%) | |
| 68 | +|-------|--------|--------|-----------|----------------|--------------|-----------------|----------|----------| |
| 69 | +| 1024 | 100000 | 1000 | 10 | Legacy | 0.40 | 0.9571 | 0.174 | 0.319 | |
| 70 | +| 1024 | 100000 | 1000 | 10 | sketchlib-rust | 0.40 | 1.0000 | 0.000 | 0.000 | |
| 71 | + |
| 72 | +#### depth=5 |
| 73 | + |
| 74 | +| width | n | domain | heap_size | Mode | Top-k recall | Pearson (top-k) | MAPE (%) | RMSE (%) | |
| 75 | +|-------|--------|--------|-----------|----------------|--------------|-----------------|----------|----------| |
| 76 | +| 2048 | 200000 | 2000 | 20 | Legacy | 0.60 | 0.9964 | 0.045 | 0.101 | |
| 77 | +| 2048 | 200000 | 2000 | 20 | sketchlib-rust | 0.60 | 0.9982 | 0.021 | 0.067 | |
| 78 | +| 2048 | 200000 | 2000 | 50 | Legacy | 0.40 | 0.9999983 | 5.60 | 16.49 | |
| 79 | +| 2048 | 200000 | 2000 | 50 | sketchlib-rust | 0.40 | 0.9999990 | 3.90 | 12.95 | |
| 80 | + |
| 81 | +--- |
| 82 | + |
| 83 | +### KllSketch (quantiles, absolute rank error) |
| 84 | + |
| 85 | +For each quantile \(q\), we compute the sketch estimate `est_value`, then: |
| 86 | +`abs_rank_error = |rank_fraction(exact_sorted_values, est_value) - q|`. |
| 87 | + |
| 88 | +#### k=20 |
| 89 | + |
| 90 | +| n_updates | Mode | q=0.5 | q=0.9 | q=0.99 | |
| 91 | +|-----------|----------------|---------|---------|---------| |
| 92 | +| 200000 | Legacy | 0.0104 | 0.0145 | 0.0028 | |
| 93 | +| 200000 | sketchlib-rust | 0.0275 | 0.0470 | 0.0061 | |
| 94 | +| 50000 | Legacy | 0.0131 | 0.0091 | 0.0054 | |
| 95 | +| 50000 | sketchlib-rust | 0.0110 | 0.0116 | 0.0031 | |
| 96 | + |
| 97 | +#### k=50 |
| 98 | + |
| 99 | +| n_updates | Mode | q=0.5 | q=0.9 | q=0.99 | |
| 100 | +|-----------|----------------|---------|---------|---------| |
| 101 | +| 200000 | Legacy | 0.0013 | 0.0021 | 0.0012 | |
| 102 | +| 200000 | sketchlib-rust | 0.0101 | 0.0044 | 0.0074 | |
| 103 | + |
| 104 | +#### k=200 |
| 105 | + |
| 106 | +| n_updates | Mode | q=0.5 | q=0.9 | q=0.99 | |
| 107 | +|-----------|----------------|---------|---------|---------| |
| 108 | +| 200000 | Legacy | 0.0021 | 0.0036 | 0.0000 | |
| 109 | +| 200000 | sketchlib-rust | 0.0015 | 0.0001 | 0.0002 | |
| 110 | + |
| 111 | +--- |
| 112 | + |
| 113 | +### HydraKllSketch (per-key quantiles, mean/max absolute rank error across 50 keys) |
| 114 | + |
| 115 | +#### rows=2, cols=64 |
| 116 | + |
| 117 | +| k | n | domain | Mode | q=0.5 (mean / max) | q=0.9 (mean / max) | |
| 118 | +|-----|--------|--------|----------------|--------------------|--------------------| |
| 119 | +| 20 | 200000 | 200 | Legacy | 0.0170 / 0.0546 | 0.0165 / 0.0452 | |
| 120 | +| 20 | 200000 | 200 | sketchlib-rust | 0.0254 / 0.0629 | 0.0546 / 0.0942 | |
| 121 | + |
| 122 | +#### rows=3, cols=128 |
| 123 | + |
| 124 | +| k | n | domain | Mode | q=0.5 (mean / max) | q=0.9 (mean / max) | |
| 125 | +|-----|--------|--------|----------------|--------------------|--------------------| |
| 126 | +| 20 | 200000 | 200 | Legacy | 0.0166 / 0.0591 | 0.0114 / 0.0304 | |
| 127 | +| 20 | 200000 | 200 | sketchlib-rust | 0.0216 / 0.0534 | 0.0238 / 0.1087 | |
| 128 | +| 50 | 200000 | 200 | Legacy | 0.0099 / 0.0352 | 0.0087 / 0.0330 | |
| 129 | +| 50 | 200000 | 200 | sketchlib-rust | 0.0119 / 0.0458 | 0.0119 / 0.0296 | |
| 130 | +| 20 | 100000 | 100 | Legacy | 0.0141 / 0.0574 | 0.0149 / 0.0471 | |
| 131 | +| 20 | 100000 | 100 | sketchlib-rust | 0.0202 / 0.0621 | 0.0287 / 0.0779 | |
0 commit comments