-
Notifications
You must be signed in to change notification settings - Fork 417
RFC: integrate PiPNN as an alternative graph-index build algorithm #1049
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
f330d73
e8d1cd2
4ba6d4a
4fe210f
77950b3
018dbe9
6121119
61f1ea5
d090895
7de117e
da8b2ac
d0baaba
e1c2b8c
4de515c
d37bedc
eb52c60
7faab09
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,303 @@ | ||
| # Integrate PiPNN as an Alternative Graph-Index Build Algorithm | ||
|
|
||
| | | | | ||
| |---|---| | ||
| | **Authors** | Weiyao Luo | | ||
| | **Contributors** | DiskANN team | | ||
| | **Created** | 2026-05-11 | | ||
| | **Updated** | 2026-05-14 | | ||
|
|
||
| ## Summary | ||
|
|
||
| Add **PiPNN** ([Pick-in-Partitions Nearest Neighbors](https://arxiv.org/abs/2602.21247)) as a second algorithm for **DiskANN's disk-index full-build path** — both initial builds and full rebuilds. PiPNN writes its graph in **Vamana's existing disk format** (same headers, node/edge layout, frozen-point convention), so the disk writer, PQ codes, and search pipeline are unchanged. The graph **contents themselves** differ from what Vamana would build at the same parameters — the two algorithms construct graphs differently — but recall@k matches at equivalent L (see Benchmarks). PiPNN is **up to 6.3× faster to build** on measured workloads. Vamana stays the default; PiPNN is opt-in behind a Cargo feature. In-memory PiPNN build is intentionally out of scope — DiskANN's in-mem path is for streaming per-point construction, which PiPNN's batch algorithm cannot do. | ||
|
|
||
| ## Motivation | ||
|
|
||
| ### How DiskANN builds today | ||
|
|
||
| DiskANN uses one algorithm — **Vamana** — which inserts points one-by-one with greedy search + `RobustPrune`. Clients use it in three modes: | ||
|
|
||
| | Mode | Description | | ||
| |---|---| | ||
| | Incremental | Per-point inserts on an in-memory graph (Vamana). Disk index not mutated. | | ||
| | Full rebuild | Rebuild the entire graph from a snapshot; produces an immutable disk index. | | ||
| | Partitioned full rebuild | Shard, build, merge — bounds peak RAM (`build_merged_vamana_index`). | | ||
|
|
||
| PiPNN is a faster substitute for modes 2 and 3. Mode 1 stays with Vamana — PiPNN's batch design has no efficient per-point insert (see "Batch-only" below). | ||
|
|
||
| ### What PiPNN does | ||
|
|
||
| A four-phase **batch** builder: | ||
|
|
||
| 1. **Partition (RBC).** Randomized Ball Carving recursively splits the dataset into small *overlapping* leaf clusters; each point lands in `fanout` of its nearest leaders at every level. Recursion stops at a leaf-size cap (`c_max`, ~256–1024). | ||
| 2. **Local k-NN per leaf (GEMM).** For each leaf, compute the full pairwise distance matrix as one batched GEMM (intra-leaf N×N where N ≈ `c_max`), then extract per-point top-`leaf_k`. This is structurally different from a 1×N flat scan ([#1036](https://github.com/microsoft/DiskANN/issues/1036)) — the batching across `c_max²` evaluations is where PiPNN's wall-clock advantage comes from. | ||
| 3. **HashPrune merge.** Merge edges from all leaves into a per-point reservoir of bounded size (`l_max` ~64–128), keyed by an LSH angular bucket for diversity. Naturally streamable — see memory mitigation. | ||
| 4. **Optional final RobustPrune.** Same algorithm Vamana uses, applied as a single pass when the workload wants more geometric diversification. | ||
|
|
||
| Output: `Vec<Vec<u32>>` adjacency lists, handed to the existing disk writer. PQ training and search are unchanged. | ||
|
|
||
| ### Problem statement | ||
|
|
||
| Vamana full builds at 10M+ are the bottleneck: | ||
|
|
||
| | Dataset | Vamana build | | ||
| |---|---:| | ||
| | Enron 1M (384d) | 70s | | ||
| | BigANN 10M (128d) | 358s | | ||
| | Enron 10M (384d) | 844s | | ||
|
|
||
| PiPNN completes the same builds **up to 6.3× faster** at matching recall (numbers below). | ||
|
|
||
| #### Trade-off hypothesis | ||
|
|
||
| > Given a fixed worker (CPU/RAM/SSD), PiPNN delivers higher build throughput than Vamana at matching recall *when its working set fits*. Below that threshold, the three-tier dispatch (one-shot → disk-edges → merged-shards) keeps PiPNN at or under Vamana's RAM footprint. | ||
|
|
||
| On BigANN 10M, 16 threads: | ||
|
|
||
| | RAM budget | PiPNN strategy | PiPNN build | Vamana build | | ||
| |---:|---|---:|---:| | ||
| | ≥ 12 GB | one-shot | 80–133s | 358s | | ||
| | 6–12 GB | disk-edges | ~126s | 358s | | ||
| | 3–6 GB | merged-shards | ~332s | ~358s (partitioned) | | ||
|
Comment on lines
+57
to
+61
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can you elaborate on the
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
||
|
|
||
| **Neither algorithm converts surplus RAM into faster builds.** PiPNN's wall-clock is bottlenecked by HashPrune + GEMM; once the working set fits, more RAM doesn't help. PiPNN trades a higher *minimum* RAM budget for a faster build at that budget. Validation: see **M4**. | ||
|
|
||
| ### Hybrid update model (Stage 2 direction) | ||
|
|
||
| Both algorithms write the same disk format, so a graph built by either can be loaded and (once in memory) extended by Vamana. Production update story: | ||
|
|
||
| - **Full rebuild → PiPNN.** Several times faster than Vamana. | ||
| - **Incremental insert → in-memory Vamana.** Unchanged — applies to the in-memory graph; the disk index file is not mutated in place. | ||
| - **Rebuild triggers.** Embedding rotation, schema/parameter retuning, large batch inserts, or periodic safety rebuilds — not just gradual recall drift. DiskANN's claim that incremental updates keep recall healthy still stands; PiPNN just makes the eventual rebuild cheaper. | ||
|
|
||
| This is why we don't need PiPNN to support `insert(point)` — the disk format is the integration point between batch and incremental. | ||
|
Comment on lines
+69
to
+73
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't understand what this section is providing a solution for. Is the key takeaway here that building the graph with one algorithm doesn't preclude us from using another one in the future?
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. yes, basically the two graph are quality identical and disk format identical |
||
|
|
||
| ### Two-stage rollout | ||
|
|
||
| - **Stage 1 (this RFC).** Land PiPNN as an alternative builder for the disk-index full-build path (initial builds *and* rebuilds), behind a `pipnn` Cargo feature. Vamana stays default. | ||
| - **Stage 2 (separate proposal, gated by Stage-1 milestones).** Retire Vamana's full-build path; keep Vamana for in-memory incremental inserts. | ||
|
|
||
| In-memory PiPNN build is not in any stage — see "Out of scope". | ||
|
|
||
| ### Goals (Stage 1) | ||
|
|
||
| 1. **Pluggable disk builder** — a selector that routes Vamana (default) vs. PiPNN (opt-in). No behavior change at existing call sites. | ||
| 2. **Disk-format compatibility** — PiPNN emits Vamana's on-disk schema (node/edge layout, headers, frozen-point convention, PQ/SQ codes all unchanged), so search loads PiPNN-built indexes with no code change. The graph **contents** differ from Vamana's by design — the two algorithms construct graphs differently — but recall@k is equivalent across tested datasets. | ||
| 3. **API backward compatibility** — `DiskIndexBuilder`, `IndexConfiguration`, JSON schema all stay additive. | ||
| 4. **Feature parity for the full-build role** — deliver the Vamana disk-build capabilities PiPNN still lacks (quantization, label filters). | ||
| 5. **Memory mitigation** — a three-tier build path that brings PiPNN's peak RSS to or under Vamana's at a documented build-time cost. | ||
|
|
||
| ## Proposal | ||
|
|
||
| ### Workspace structure | ||
|
|
||
| Add a crate `diskann-pipnn` depending on `diskann`, `diskann-linalg`, `diskann-vector`, `diskann-quantization`, `diskann-utils`. **It does not depend on `diskann-disk`** — that would form a cycle with the consumer-side feature gate. | ||
|
|
||
| ```text | ||
| diskann, diskann-linalg, diskann-quantization, diskann-vector, diskann-utils | ||
| │ (shared deps, no edges to builders) | ||
| ┌───────────┴────────────┐ | ||
| diskann-pipnn diskann-disk | ||
| │ ↑ feature "pipnn" | ||
| └───→ Vec<Vec<u32>> ────┘ | ||
| ``` | ||
|
|
||
| ### `BuildAlgorithm` enum | ||
|
|
||
| ```rust | ||
| #[derive(Debug, Clone, Default, PartialEq, Serialize, Deserialize)] | ||
| #[serde(tag = "algorithm")] | ||
| pub enum BuildAlgorithm { | ||
| #[default] | ||
| Vamana, | ||
|
|
||
| #[cfg(feature = "pipnn")] | ||
| PiPNN { | ||
| c_max: usize, c_min: usize, p_samp: f64, | ||
| fanout: Vec<usize>, leaf_k: usize, replicas: usize, | ||
| l_max: usize, num_hash_planes: usize, | ||
| final_prune: bool, leader_cap: usize, | ||
| saturate_after_prune: bool, | ||
| num_threads: usize, // 0 = all logical CPUs | ||
| }, | ||
| } | ||
| ``` | ||
|
|
||
| `DiskIndexBuildParameters` gains a `build_algorithm` field; JSON config gains an optional `build_algorithm` block. | ||
|
|
||
| **`num_threads` is not a RAM knob.** Per-thread overhead is ~16–20 MB (stripe + leaf-build scratch). Use `build_ram_limit_gb` to bound RAM. | ||
|
|
||
| **Feature-flag deserialization is config-only, not index-file.** Index files built by either algorithm load with or without the `pipnn` feature. A JSON config naming `"algorithm": "PiPNN"` fed to a binary built without the feature fails fast with `unknown variant 'PiPNN'`; configs that omit `build_algorithm` parse identically across feature builds. | ||
|
|
||
| ### Builder dispatch | ||
|
|
||
| ```rust | ||
| match build_parameters.build_algorithm() { | ||
| BuildAlgorithm::Vamana => self.build_inmem_vamana_index().await, | ||
| #[cfg(feature = "pipnn")] | ||
| BuildAlgorithm::PiPNN { .. } => self.build_inmem_pipnn_index().await, | ||
| } | ||
| ``` | ||
|
|
||
| The PiPNN branch produces `Vec<Vec<u32>>` via `diskann_pipnn::builder::build_typed` and hands it to the existing `DiskIndexWriter`. | ||
|
|
||
| ### Compatibility | ||
|
|
||
| | Surface | Status | | ||
| |---|---| | ||
| | On-disk graph format | unchanged | | ||
| | PQ/SQ codes on disk | unchanged | | ||
| | Search API | unchanged | | ||
| | Public Rust types | additive only (new field with default) | | ||
| | Benchmark JSON config | additive only (new optional field) | | ||
|
|
||
| ### Feature gating | ||
|
|
||
| `diskann-disk` gains a `pipnn` Cargo feature, off by default. With it off, `BuildAlgorithm::PiPNN` does not exist at the type level — no runtime branch, no `diskann-pipnn` dependency. | ||
|
|
||
| ## Trade-offs | ||
|
|
||
| ### Batch-only (algorithmic, not implementation) | ||
|
|
||
| The PiPNN paper "eliminates search from graph-building" — partition first, then one batched GEMM per leaf. The batching advantage **requires knowing leaf membership before computing distances**; at batch size 1, PiPNN reduces to per-point distance work no faster than Vamana's greedy insert. Two phases (partition, leaf k-NN) are batch-by-design; HashPrune and final RobustPrune happen to be online, but you've already paid for the batch phases by the time you reach them. | ||
|
|
||
| ### Memory vs. build speed | ||
|
|
||
| PiPNN holds more working memory than Vamana — dominated by the **HashPrune reservoir** (`l_max × 8 B` per point). On BigANN 10M (`c_max=256, fanout=[10,3], leaf_k=3, l_max=64`): | ||
|
|
||
| | | PiPNN one-shot | Vamana | | ||
| |---|---:|---:| | ||
| | Peak RSS | 10.8 GB | 6.3 GB | | ||
|
|
||
| Mitigation via the three-tier build (dispatched by the existing `build_ram_limit_gb` knob): | ||
|
|
||
| | Strategy | Peak RSS | Build | Recall@10 L=50 | Trigger | | ||
| |---|---:|---:|---:|---| | ||
| | One-shot | 10.8 GB | 133s | 95.00% | RAM ≥ ~32 GB | | ||
| | Disk-edges | 6.4 GB | 126s | 95.00% | RAM 8–32 GB | | ||
| | Merged shards | 3.3 GB | 332s | 95.31% | RAM 4–8 GB | | ||
|
|
||
| Disk-edges matches Vamana's RAM at ~3× the build speed. Merged-shards uses *less* RAM than Vamana (3.3 vs. 6.3 GB) at a 2.5× build-time cost. | ||
|
|
||
| ### Alternatives considered | ||
|
|
||
| | | Choice | Rejected because | | ||
| |---|---|---| | ||
| | A | (Chosen) Add PiPNN behind a feature flag | — | | ||
| | B | Replace Vamana immediately | PiPNN lacks checkpoint / full quantization / label-filter parity; production-validation gap | | ||
| | C | Separate top-level crate/binary | Duplicates PQ training, disk writer, search pipeline — maintenance burden, no compatibility benefit | | ||
|
|
||
| ### Algorithm risks | ||
|
|
||
| Recall depends on partition overlap (`fanout`) and reservoir size (`l_max`) — a larger parameter space than Vamana's `R`/`L_build`. Mitigation: keep Vamana as default and ship reference parameter sets per workload class. | ||
|
|
||
| ## Benchmark Results | ||
|
|
||
| Azure `Standard_L16s_v3`, 16 threads, NVMe, `RUSTFLAGS=-C target-cpu=native`. | ||
|
|
||
| ### Build time | ||
|
|
||
| | Dataset | Vamana | PiPNN | Speedup | | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Are the Vamana results here with the spherical/scalar quantizer?
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. no, the result here is fp build |
||
| |---|---:|---:|---:| | ||
| | Enron 1M (384d) | 70s | 13s | 5.4× | | ||
| | BigANN 10M (128d) | 358s | 80s | 4.5× | | ||
| | Enron 10M (384d) | 844s | 133s | 6.3× | | ||
|
|
||
| ### BigANN 10M — recall × QPS | ||
|
|
||
| Default PiPNN (`c_max=256, fanout=[10,3], leaf_k=3, l_max=64, pq_chunks=64`) vs. Vamana (`R=64, L=64, pq_chunks=64`): | ||
|
|
||
| | L | PiPNN R@10 | PiPNN QPS | Vamana R@10 | Vamana QPS | | ||
| |---|---:|---:|---:|---:| | ||
| | 10 | 77.76% | 10,670 | 79.23% | 11,618 | | ||
| | 50 | 96.31% | 5,574 | 97.10% | 5,940 | | ||
| | 100 | 98.61% | 3,430 | 99.01% | 3,568 | | ||
|
|
||
| With a higher-recall config (`c_max=512, fanout=[10,4], l_max=128, final_prune`), PiPNN matches/exceeds Vamana at L=50 (97.22%) and L=100 (99.21%) at 143s (still 2.5× faster). | ||
|
|
||
| ### Enron 10M (384d) — recall × QPS | ||
|
|
||
| PiPNN (`c_max=256, fanout=[8,3], leaf_k=2, l_max=64, pq_chunks=192`) vs. Vamana (`R=64, L=72, pq_chunks=192`): | ||
|
|
||
| | L | PiPNN R@1000 | PiPNN QPS | Vamana R@1000 | Vamana QPS | | ||
| |---|---:|---:|---:|---:| | ||
| | 1000 | 89.99% | 378 | 89.33% | 384 | | ||
| | 2000 | 96.46% | 192 | 95.36% | 195 | | ||
| | 3000 | 97.74% | 129 | 96.68% | 130 | | ||
|
|
||
| PiPNN beats Vamana on recall at every L at parity QPS — and 6.3× faster build. | ||
|
|
||
| ## Future Work — Stage-1 Milestones | ||
|
|
||
| Stage 1 covers build-from-scratch and full rebuilds with PiPNN. M0 ships in this RFC; M1–M6 are follow-on work, parallelizable where dependencies allow. | ||
|
|
||
| ### M0 — Skeleton integration (this RFC) | ||
| Crate, `BuildAlgorithm` enum, dispatch behind `pipnn` Cargo feature. JSON config gains optional `build_algorithm`. CI smoke test (SIFT-1M) with `--features pipnn`. | ||
|
|
||
| ### M1 — Quantization parity | ||
| PiPNN's leaf-build computes pairwise distances via batched GEMM, so its quantization options are constrained by what hardware GEMM units accelerate: **int8** (AVX-512-VNNI, AMX-INT8), **fp16** (AVX-512-FP16), and **bf16** (AVX-512-BF16, AMX-BF16). 1-bit is not in that set — GEMM hardware does multiply-accumulate, not XOR+popcount. | ||
|
|
||
| Vamana's existing quantized path is **SQ1 (1-bit) Hamming**, which is a fundamentally different kernel (XOR+popcount, not MAC). PiPNN cannot directly reuse Vamana's SQ1 quantizer for the same speedup story; porting requires adapting the quantization pipeline to a GEMM-compatible format and computing distance through the quantized-GEMM dispatch. | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why is supporting quantized GEMM in M1? This feels like a nice to have that can come later no? Unless it's needed for RAM savings...
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. supporting quantized gemm also unblock using fp16/u8 native gemm, but the proiority could be modified |
||
|
|
||
| Scope: | ||
| - Extend PiPNN with **SQ_8 (int8)** quantization via the trained `ScalarQuantizer` from `diskann-quantization`, producing the packed int8 buffers the quantized-GEMM interface consumes. | ||
| - fp16 and bf16 are natural follow-ons through the same dispatch path. | ||
|
|
||
| **Pass:** SQ_8 PiPNN recall within 0.5% of FP-PiPNN on BigANN 10M and Enron 10M. | ||
|
|
||
| ### M2 — Label-filtered indexes | ||
| Run filter benchmark configs with `BuildAlgorithm::PiPNN`; confirm filter-recall within ±1% of Vamana. Partition may need label-aware leaf assignment for high-cardinality labels. | ||
|
|
||
| ### M3 — Three-tier memory dispatch | ||
| Implement and validate the disk-edges + merged-shards paths selected by `build_ram_limit_gb`. **Pass:** at `build_ram_limit_gb=4`, PiPNN-merged on BigANN 10M has peak RSS ≤ 4 GB and recall within 1% of one-shot. | ||
|
|
||
| Two disk-edges variants are on the table: (i) materialize all leaf edges then stream HashPrune (current prototype), or (ii) interleave leaf-build + HashPrune in chunks. The second avoids full edge-set materialization at the cost of a second partition pass. | ||
|
|
||
| ### M4 — Fixed-resource trade-off validation | ||
| Validates the **trade-off hypothesis** from the Problem Statement. | ||
|
|
||
| - **Setup.** Lock CPU/SSD on a fixed worker; enforce RAM via cgroups. Sweep RAM `{3, 6, 8, 12, 16, 24, 32}` GB on BigANN 10M; include rows for Enron 10M and a 100M-scale dataset. | ||
| - **Cells.** Vamana one-shot, Vamana partitioned, PiPNN one-shot, PiPNN disk-edges, PiPNN merged-shards. OOM cells are valid results. | ||
| - **Metrics.** Wall-clock, peak RSS, CPU util, SSD bytes, recall@K, QPS — reported as **vectors/min/worker** for cross-shape comparison. | ||
| - **Pass.** Documented matrix with a clearly-better algorithm (or tie) per budget at matching recall. Surprises are Stage-1 blockers. | ||
|
|
||
| ### M5 — Production validation: recall × QPS × dimensionality | ||
| End-to-end on the full workload mix. Datasets: BigANN, Enron, plus one production-representative. Scales 10M and 100M (billion if hardware permits). Metrics `squared_l2` and `cosine_normalized`. **Pass:** per cell, PiPNN recall@K within ±1% of Vamana's at matching QPS, or higher QPS at matching recall. | ||
|
|
||
| ### M6 — Operational readiness | ||
| Telemetry (per-phase timing + RSS via existing OTel tracer), permanent docs replacing experimental `CLAUDE.md` notes, runbook (OOM, partition timeout, `l_max` saturation), default parameter recommendations per workload class. | ||
|
|
||
| ### M7 — Abstraction convergence with `diskann` *(Stage-2 entry gate)* | ||
|
|
||
| Stage-1 lands PiPNN as a separate crate that touches several primitives that ought to be shared with the rest of `diskann`. Stage 2 doesn't commit to retiring Vamana's full-rebuild path until those primitives converge — otherwise we accept two parallel implementations of the same low-level building blocks long-term. | ||
|
|
||
| - **a. Pruning.** Vamana's `RobustPrune` / `occlude_list` (currently `pub(crate)` in `diskann/src/graph/internal/prune.rs`) is exposed via a shared pruning surface — either promoted to `pub` or extracted to a `diskann-prune` crate. PiPNN's `final_prune_from_candidates` (currently a duplicate of that logic in `diskann-pipnn/src/builder.rs`) is replaced by a call into the shared API. PiPNN's HashPrune (LSH-bucketed reservoir merge) co-locates in the same shared pruning module for code-organization consistency. Whether Vamana can usefully consume HashPrune is an exploration item — if yes, Vamana reuses; if no, the code still lives in the right place. | ||
|
|
||
| - **b. Quantized GEMM.** PiPNN's quantized leaf-build distance consumes the **quantized-GEMM interface** that `diskann-quantization` is developing for multi-vectors (int8/bf16/fp16 inputs, f32 accumulation). Whichever side ships first contributes: if the multi-vector quantized GEMM lands first, PiPNN reuses; if PiPNN ships its GEMM solution first, that solution is contributed back to `diskann-quantization` so other components (including a future Vamana variant if it wants) can adopt it. Either direction, no PiPNN-internal quantized distance kernel remains. | ||
|
|
||
| - **c. GEMM dispatch.** `diskann-pipnn/src/gemm.rs` (today a thin `faer` wrapper) consumes the same shared GEMM / quantized-GEMM interface per (b). PiPNN's leaf-build shapes (m=n=256, k=128 / 384) and partition-stripe shapes (m≈1000, n up to 65k, k=128 / 384) are included in that interface's coverage tests. | ||
|
|
||
| - **d. Distance kernels.** PiPNN's per-leaf and per-stripe distance computations fully delegate to `diskann_vector::distance` for both FP and quantized metrics. No bypass paths. | ||
|
|
||
| **Validation:** a diff of `diskann-pipnn/src/` showing only algorithm-specific code (partition / scheduler / builder orchestration), no primitive duplication. | ||
|
|
||
| ### Deferred to Stage 2 | ||
|
|
||
| - **Hybrid update model validation.** The Stage-2 loop (PiPNN build → incremental Vamana inserts → recall decay → rebuild) belongs with the Stage-2 proposal that adopts the model. | ||
|
|
||
| - **Checkpoint / resume.** Vamana's streaming checkpoint design doesn't fit PiPNN's batch phases, and operational value is lower (PiPNN's BigANN-10M build is ~80s). | ||
|
|
||
| ### Out of scope: not part of any stage | ||
|
|
||
| - **In-memory PiPNN build.** The in-mem `DiskANNIndex` exists for streaming construction, which PiPNN can't do efficiently. If a non-streaming in-mem consumer ever wants PiPNN's speed: build to disk, then load. | ||
| - **Build-time PQ distance kernel.** Not used by Vamana in production today. | ||
| - **PiPNN incremental insert/delete API.** The hybrid update model (Vamana inserts on the in-memory graph, PiPNN for full rebuilds) removes the need. | ||
| - **Frozen-point semantics.** PiPNN writes the medoid as the single frozen start point — already byte-compatible with Vamana's default. | ||
| - **Multi-vector index support.** Revisit only if a production workload requires it. | ||
|
|
||
| ## References | ||
|
|
||
| 1. [PiPNN: Pick-in-Partitions Nearest Neighbors (arXiv:2602.21247)](https://arxiv.org/abs/2602.21247) | ||
| 2. [Vamana / DiskANN (NeurIPS 2019)](https://papers.nips.cc/paper/9527-rand-nsg-fast-accurate-billion-point-nearest-neighbor-search-on-a-single-node.pdf) | ||
| 3. Existing disk index layout: `diskann-disk/src/storage/` | ||
| 4. Existing Vamana builder: `diskann-disk/src/build/builder/build.rs` | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you help me replicate these results using #856 for the inmem index? If I use the following to compare:
{ "search_directories": [ "<snip>" ], "output_directory": null, "jobs": [ { "type": "async-index-build", "content": { "search_phase": { "groundtruth": "sift-1m.gt.bin", "num_threads": [ 16 ], "queries": "sift-1m.queries.u8.bin", "reps": 5, "runs": [ { "recall_k": 10, "search_l": [ 10, 20, 30, 40, 50, 60, 70, 80, 90, 100 ], "search_n": 10 } ], "search-type": "topk" }, "source": { "alpha": 1.2000000476837158, "backedge_ratio": 1.0, "data": "sift-1m.u8.bin", "data_type": "uint8", "distance": "squared_l2", "index-source": "Build", "insert_retry": null, "l_build": 50, "max_degree": 32, "multi_insert": null, "num_threads": 16, "save_path": null, "start_point_strategy": "medoid" } } }, { "type": "async-index-build", "content": { "search_phase": { "groundtruth": "sift-1m.gt.bin", "num_threads": [ 16 ], "queries": "sift-1m.queries.u8.bin", "reps": 5, "runs": [ { "recall_k": 10, "search_l": [ 10, 20, 30, 40, 50, 60, 70, 80, 90, 100 ], "search_n": 10 } ], "search-type": "topk" }, "source": { "alpha": 1.2000000476837158, "backedge_ratio": 1.0, "data": "sift-1m.u8.bin", "data_type": "uint8", "distance": "squared_l2", "index-source": "Build", "insert_retry": null, "l_build": 50, "max_degree": 32, "multi_insert": null, "num_threads": 16, "save_path": null, "pipnn": { "algorithm": "PiPNN" }, "start_point_strategy": "medoid" } } } ] }on my laptop, I get 11.2 seconds for index build with the inmem index and 15.6 seconds with PiPNN with worse final recall. Note that I am using the default hyper-parameters for pipnn as I don't have a good idea on how to configure them. The display for PiPNN is as follows:
Can you provide suggestions on which hyper-parameters to use? Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Mark, with R 32, L 50, pipnn is quite close to vamana. You can use this config:
it should match vaman's build speed and recall
And when increase max_degree to 64 L build 72 (pipnn l_max 72), the build time for vamana is 13s and pipnn's remains 7s