bf_tree migration away from diskann-providers#1020
Conversation
b94f4df to
44be934
Compare
Codecov Report❌ Patch coverage is
❌ Your patch status has failed because the patch coverage (84.96%) is below the target coverage (90.00%). You can increase the patch coverage or adjust the target coverage. Additional details and impacted files@@ Coverage Diff @@
## main #1020 +/- ##
==========================================
+ Coverage 89.46% 90.55% +1.08%
==========================================
Files 459 473 +14
Lines 85482 89653 +4171
==========================================
+ Hits 76474 81181 +4707
+ Misses 9008 8472 -536
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
ca413d4 to
1051152
Compare
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
Migrates the async bf_tree provider away from diskann-providers by extracting it into a dedicated diskann-bf_tree-provider crate, while also modernizing quantization (PQ → spherical) and simplifying deletion semantics (hard deletes).
Changes:
- Extracts bf_tree + caching into
diskann-bf_tree-providerand updates workspace/CI accordingly. - Replaces PQ-based quant vector store with spherical quantization (
Poly<dyn Quantizer>) + updated serialization. - Simplifies delete handling by removing the delete-provider generic and switching to hard deletes.
Reviewed changes
Copilot reviewed 18 out of 20 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| diskann-providers/src/model/graph/provider/async_/table_delete_provider.rs | Makes delete-table operations public and removes bf_tree-specific bitmap (de)serialization helpers/tests. |
| diskann-providers/src/model/graph/provider/async_/mod.rs | Adjusts module exports, removes bf_tree/caching modules, re-exports postprocess utilities. |
| diskann-providers/src/model/graph/provider/async_/caching/utils.rs | Updates import path and minor formatting in tests. |
| diskann-providers/src/model/graph/provider/async_/caching/provider.rs | Refactors imports/bounds formatting; adds Future import. |
| diskann-providers/src/model/graph/provider/async_/caching/mod.rs | Removes caching module root from diskann-providers. |
| diskann-providers/src/model/graph/provider/async_/caching/example.rs | Updates test imports to use diskann_providers / local module paths. |
| diskann-providers/src/model/graph/provider/async_/caching/bf_cache.rs | Updates ConfigError import path and test formatting. |
| diskann-providers/src/model/graph/provider/async_/bf_tree/vector_provider.rs | Adds hard-delete helper and adjusts TestCallCount assertions. |
| diskann-providers/src/model/graph/provider/async_/bf_tree/quant_vector_provider.rs | Switches PQ → spherical quantization + new query/distance computer adapter + test updates. |
| diskann-providers/src/model/graph/provider/async_/bf_tree/provider.rs | Removes delete-provider generic, implements hard delete, switches postprocess to CopyIds, updates quantizer serialization. |
| diskann-providers/src/model/graph/provider/async_/bf_tree/neighbor_provider.rs | Updates TestCallCount import path. |
| diskann-providers/src/model/graph/provider/async_/bf_tree/mod.rs | Switches re-exports to new bf_tree_provider module and removes QuantAccessor re-export. |
| diskann-providers/Cargo.toml | Removes bf_tree feature/deps from diskann-providers. |
| diskann-bf_tree-provider/src/lib.rs | Introduces new crate entry point with provider + caching modules. |
| diskann-bf_tree-provider/src/caching/mod.rs | Adds caching module root (example tests disabled with TODO). |
| diskann-bf_tree-provider/Cargo.toml | Adds new crate manifest + dependencies for bf_tree provider and spherical quantization. |
| Cargo.toml | Adds diskann-bf_tree-provider to workspace members. |
| .github/workflows/ci.yml | Removes bf_tree from explicitly tested feature list. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
1219bcc to
cbea923
Compare
|
@copilot would you mind giving this a review |
Reviewed the current branch at |
0ad39b8 to
83e6231
Compare
hildebrandmw
left a comment
There was a problem hiding this comment.
Thanks Jordan, we can do quite a bit of cleanup on the crate structure and item names which I think would really start to tighten up the crate.
One question I had for you is what the expected end state is: are we targeting a full clean up or an incremental, non-published state while we continue working? I ask because the bf-tree provider inherited a bit of cruft from earlier iterations of the code, and I'm wondering how aggressiver we want to be in this PR in particular, considering that we still need integration tests and benchmarks.
harsha-simhadri
left a comment
There was a problem hiding this comment.
Thanks for the PR, Jordan.
Could you wire it through diskann-benchmarks and run it in streaming mode to replicate the recall results in the IP-diskann paper? That would be good E2E validation of the provider. Please all latency and total runtime to analyze performance.
I'll take care of that after I finish addressing Mark's feedback. |
Squashed 30 commits for rebase. Key changes: - Replace PQ with spherical quantization in bf_tree-provider - Remove D generic parameter from BfTreeProvider - Add QuantAccessor and quantized search support - Implement DeletionCheck traits - Remove hybrid computer (quant distances work alone) - Remove benches/ directory - Add streaming benchmark support - Various fixes and cleanups Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
0d60e42 to
a4c41f3
Compare
hildebrandmw
left a comment
There was a problem hiding this comment.
One more small round, then looks good to me. Thanks!
…d verify that 1..5 are returned
Migrate bf_tree provider to standalone
diskann-bftreecrateMotivation
The long-term goal is to remove
diskann-providersentirely. The bf_tree provider was one of several components keeping that crate alive — it depended ondiskann-providersfor generic delete infrastructure (DeletionCheck/AsDeletionCheck/RemoveDeletedIdsAndCopy) and PQ quantization types. This PR extracts bf_tree into a standalonediskann-bftreecrate, cuts those dependencies, simplifies the generics, and switches from PQ to spherical quantization — all to reduce the surface area ofdiskann-providersand move closer to removing it.What Changed
Crate Extraction & Rename
Moved the bf_tree provider from
diskann-providers/src/model/graph/provider/async_/bf_tree/into a standalonediskann-bftreecrate (flat module structure:lib.rs,provider.rs,vectors.rs,quant.rs,neighbors.rs). Updated workspaceCargo.tomland CI configuration.PQ → Spherical Quantization
QuantVectorProvider: StoresPoly<dyn Quantizer>(spherical) instead ofArc<FixedChunkPQTable>.QuantQueryComputer: Newtype wrapper adapting spherical'sOpaque-based API toPreprocessedDistanceFunction<&[u8], f32>.Quantizer::serialize/iface::try_deserialize(flatbuffers format).QueryLayout::FullPrecisionfor better recall with low-bit quantizers.benches/directory).Remove
D(Delete Provider) ParameterBfTreeProvider<T, Q, D>→BfTreeProvider<T, Q>.delete(key)API replace bitmap-based soft deletes.delete()removes from all three trees (neighbors, full_vectors, quant_vectors).delete_bitmap_serde.rs,NoDeletes/TableBasedDeletesfrom constructors.Simplify to Two Strategies
Removed the
Hybridstrategy entirely — onlyFullPrecisionandQuantizedremain.Quantizedsearch usesRerankpost-processor to re-rank candidates using full-precision vectors.Reduce
diskann-providersCouplingFullPrecisionandQuantizedmarker structs moved todiskann::graph::strategy(canonical home), re-exported fromdiskann-providers.NoStoredefined locally indiskann-bftree.TestCallCountduplicated locally (conditional compile: real counter in test, no-op in release).AsKeytrait added locally to replace scatteredbytes_ofcalls.ToRankedError HandlingReplaced blanket
ANNErrorwith ranked error types for vector access:VectorError(Deleted|NotFound): transient errors from bf_tree reads.VectorUnavailable: implementsTransientError<ANNError>— acknowledge (skip) or escalate.AccessError=RankedError<VectorUnavailable, ANNError>: type alias used asGetError.on_elements_unordered: skips transient errors, propagates real errors.status_by_internal_id: transient →ElementStatus::Deleted, real → propagate.get_delete_element: escalates (delete target must exist).Rerank::post_process: acknowledges transient (skips candidate), propagates real errors.Other Cleanups
new_emptyremoved —newrequires start points upfront.Mapworking set for prune (avoids full refetch).SearchAccessorError→Infallible(search accessor cannot fail).create_test_quantizerinquant.rs).Poly<dyn Quantizer>instead ofFixedChunkPQTable.Dparameter removed,new_emptyremoved,newrequires start points.GetErroris nowAccessError(ranked) instead ofANNError.Net Impact
diskann-bftreeis a self-contained crate with two generic parameters (T,Q)FullPrecisionandQuantized(withRerank)DeletionCheck/AsDeletionCheck/RemoveDeletedIdsAndCopyOpen Items
NeighborAccessortrait.Debugimpls indiskann-quantizationfirst.