Skip to content

bf_tree migration away from diskann-providers#1020

Merged
JordanMaples merged 7 commits into
mainfrom
jordanmaples/bf_tree_migration
May 21, 2026
Merged

bf_tree migration away from diskann-providers#1020
JordanMaples merged 7 commits into
mainfrom
jordanmaples/bf_tree_migration

Conversation

@JordanMaples
Copy link
Copy Markdown
Contributor

@JordanMaples JordanMaples commented May 5, 2026

Migrate bf_tree provider to standalone diskann-bftree crate

Motivation

The long-term goal is to remove diskann-providers entirely. The bf_tree provider was one of several components keeping that crate alive — it depended on diskann-providers for generic delete infrastructure (DeletionCheck/AsDeletionCheck/RemoveDeletedIdsAndCopy) and PQ quantization types. This PR extracts bf_tree into a standalone diskann-bftree crate, cuts those dependencies, simplifies the generics, and switches from PQ to spherical quantization — all to reduce the surface area of diskann-providers and move closer to removing it.

What Changed

Crate Extraction & Rename

Moved the bf_tree provider from diskann-providers/src/model/graph/provider/async_/bf_tree/ into a standalone diskann-bftree crate (flat module structure: lib.rs, provider.rs, vectors.rs, quant.rs, neighbors.rs). Updated workspace Cargo.toml and CI configuration.

PQ → Spherical Quantization

  • QuantVectorProvider: Stores Poly<dyn Quantizer> (spherical) instead of Arc<FixedChunkPQTable>.
  • QuantQueryComputer: Newtype wrapper adapting spherical's Opaque-based API to PreprocessedDistanceFunction<&[u8], f32>.
  • Serialization: Uses Quantizer::serialize / iface::try_deserialize (flatbuffers format).
  • Distance layout: Uses QueryLayout::FullPrecision for better recall with low-bit quantizers.
  • Removed criterion/iai benchmarks (benches/ directory).

Remove D (Delete Provider) Parameter

  • BfTreeProvider<T, Q, D>BfTreeProvider<T, Q>.
  • Hard deletes via bf_tree's native delete(key) API replace bitmap-based soft deletes.
  • delete() removes from all three trees (neighbors, full_vectors, quant_vectors).
  • Removed delete_bitmap_serde.rs, NoDeletes/TableBasedDeletes from constructors.

Simplify to Two Strategies

Removed the Hybrid strategy entirely — only FullPrecision and Quantized remain. Quantized search uses Rerank post-processor to re-rank candidates using full-precision vectors.

Reduce diskann-providers Coupling

  • FullPrecision and Quantized marker structs moved to diskann::graph::strategy (canonical home), re-exported from diskann-providers.
  • NoStore defined locally in diskann-bftree.
  • TestCallCount duplicated locally (conditional compile: real counter in test, no-op in release).
  • AsKey trait added locally to replace scattered bytes_of calls.

ToRanked Error Handling

Replaced blanket ANNError with ranked error types for vector access:

  • VectorError (Deleted | NotFound): transient errors from bf_tree reads.
  • VectorUnavailable: implements TransientError<ANNError> — acknowledge (skip) or escalate.
  • AccessError = RankedError<VectorUnavailable, ANNError>: type alias used as GetError.
  • on_elements_unordered: skips transient errors, propagates real errors.
  • status_by_internal_id: transient → ElementStatus::Deleted, real → propagate.
  • get_delete_element: escalates (delete target must exist).
  • Rerank::post_process: acknowledges transient (skips candidate), propagates real errors.

Other Cleanups

  • new_empty removed — new requires start points upfront.
  • Uses Map working set for prune (avoids full refetch).
  • SearchAccessorErrorInfallible (search accessor cannot fail).
  • Broken intra-doc links fixed.
  • Test quantizer helper deduplicated (create_test_quantizer in quant.rs).

⚠️ Breaking Changes

  • Serialization format: Persisted indexes using the PQ format will not load. Full reindex required.
  • Training API: Callers must provide Poly<dyn Quantizer> instead of FixedChunkPQTable.
  • Constructor signatures: D parameter removed, new_empty removed, new requires start points.
  • Delete semantics: Deletes are immediate and permanent (hard delete from bf_tree).
  • Error types: GetError is now AccessError (ranked) instead of ANNError.

Net Impact

  • -36 net lines (17 files changed, +1607 / -1643)
  • diskann-bftree is a self-contained crate with two generic parameters (T, Q)
  • Two indexing strategies: FullPrecision and Quantized (with Rerank)
  • No dependency on DeletionCheck / AsDeletionCheck / RemoveDeletedIdsAndCopy
  • 23 unit tests + 2 doc tests covering both strategies (insert, search, delete, save/load)

Open Items

  • Delete ordering: Discuss whether current delete operation order is sufficient for concurrent access (see review thread).
  • NeighborAccessor ToRanked: Follow-up to add ranked errors to the core NeighborAccessor trait.
  • QueryComputer Debug: Deferred — needs Debug impls in diskann-quantization first.

@JordanMaples JordanMaples force-pushed the jordanmaples/bf_tree_migration branch from b94f4df to 44be934 Compare May 5, 2026 20:58
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented May 5, 2026

Codecov Report

❌ Patch coverage is 84.96732% with 46 lines in your changes missing coverage. Please review.
✅ Project coverage is 90.55%. Comparing base (a47a1e6) to head (70da9e8).
⚠️ Report is 10 commits behind head on main.

Files with missing lines Patch % Lines
diskann-bftree/src/lib.rs 51.21% 20 Missing ⚠️
diskann-bftree/src/quant.rs 91.52% 20 Missing ⚠️
diskann-bftree/src/vectors.rs 76.00% 6 Missing ⚠️

❌ Your patch status has failed because the patch coverage (84.96%) is below the target coverage (90.00%). You can increase the patch coverage or adjust the target coverage.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #1020      +/-   ##
==========================================
+ Coverage   89.46%   90.55%   +1.08%     
==========================================
  Files         459      473      +14     
  Lines       85482    89653    +4171     
==========================================
+ Hits        76474    81181    +4707     
+ Misses       9008     8472     -536     
Flag Coverage Δ
miri 90.55% <84.96%> (+1.08%) ⬆️
unittests 90.51% <84.96%> (+1.42%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
diskann-bftree/src/neighbors.rs 92.21% <100.00%> (ø)
diskann-bftree/src/provider.rs 91.93% <ø> (ø)
...roviders/src/model/graph/provider/async_/common.rs 84.79% <ø> (-1.76%) ⬇️
...del/graph/provider/async_/table_delete_provider.rs 97.41% <ø> (ø)
diskann-bftree/src/vectors.rs 93.44% <76.00%> (ø)
diskann-bftree/src/lib.rs 51.21% <51.21%> (ø)
diskann-bftree/src/quant.rs 91.52% <91.52%> (ø)

... and 84 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Comment thread diskann-bf_tree-provider/src/provider/delete_bitmap_serde.rs Outdated
Comment thread diskann-bf_tree-provider/src/provider/quant_vector_provider.rs
@JordanMaples JordanMaples force-pushed the jordanmaples/bf_tree_migration branch from ca413d4 to 1051152 Compare May 11, 2026 18:26
@JordanMaples JordanMaples marked this pull request as ready for review May 11, 2026 21:00
@JordanMaples JordanMaples requested review from a team and Copilot May 11, 2026 21:00
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Migrates the async bf_tree provider away from diskann-providers by extracting it into a dedicated diskann-bf_tree-provider crate, while also modernizing quantization (PQ → spherical) and simplifying deletion semantics (hard deletes).

Changes:

  • Extracts bf_tree + caching into diskann-bf_tree-provider and updates workspace/CI accordingly.
  • Replaces PQ-based quant vector store with spherical quantization (Poly<dyn Quantizer>) + updated serialization.
  • Simplifies delete handling by removing the delete-provider generic and switching to hard deletes.

Reviewed changes

Copilot reviewed 18 out of 20 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
diskann-providers/src/model/graph/provider/async_/table_delete_provider.rs Makes delete-table operations public and removes bf_tree-specific bitmap (de)serialization helpers/tests.
diskann-providers/src/model/graph/provider/async_/mod.rs Adjusts module exports, removes bf_tree/caching modules, re-exports postprocess utilities.
diskann-providers/src/model/graph/provider/async_/caching/utils.rs Updates import path and minor formatting in tests.
diskann-providers/src/model/graph/provider/async_/caching/provider.rs Refactors imports/bounds formatting; adds Future import.
diskann-providers/src/model/graph/provider/async_/caching/mod.rs Removes caching module root from diskann-providers.
diskann-providers/src/model/graph/provider/async_/caching/example.rs Updates test imports to use diskann_providers / local module paths.
diskann-providers/src/model/graph/provider/async_/caching/bf_cache.rs Updates ConfigError import path and test formatting.
diskann-providers/src/model/graph/provider/async_/bf_tree/vector_provider.rs Adds hard-delete helper and adjusts TestCallCount assertions.
diskann-providers/src/model/graph/provider/async_/bf_tree/quant_vector_provider.rs Switches PQ → spherical quantization + new query/distance computer adapter + test updates.
diskann-providers/src/model/graph/provider/async_/bf_tree/provider.rs Removes delete-provider generic, implements hard delete, switches postprocess to CopyIds, updates quantizer serialization.
diskann-providers/src/model/graph/provider/async_/bf_tree/neighbor_provider.rs Updates TestCallCount import path.
diskann-providers/src/model/graph/provider/async_/bf_tree/mod.rs Switches re-exports to new bf_tree_provider module and removes QuantAccessor re-export.
diskann-providers/Cargo.toml Removes bf_tree feature/deps from diskann-providers.
diskann-bf_tree-provider/src/lib.rs Introduces new crate entry point with provider + caching modules.
diskann-bf_tree-provider/src/caching/mod.rs Adds caching module root (example tests disabled with TODO).
diskann-bf_tree-provider/Cargo.toml Adds new crate manifest + dependencies for bf_tree provider and spherical quantization.
Cargo.toml Adds diskann-bf_tree-provider to workspace members.
.github/workflows/ci.yml Removes bf_tree from explicitly tested feature list.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread diskann-providers/src/model/graph/provider/async_/mod.rs Outdated
Comment thread diskann-providers/src/model/graph/provider/async_/table_delete_provider.rs Outdated
Comment thread diskann-providers/src/model/graph/provider/async_/table_delete_provider.rs Outdated
Comment thread diskann-providers/src/model/graph/provider/async_/table_delete_provider.rs Outdated
Comment thread diskann-providers/src/model/graph/provider/async_/table_delete_provider.rs Outdated
@JordanMaples JordanMaples marked this pull request as draft May 11, 2026 22:51
@JordanMaples JordanMaples linked an issue May 12, 2026 that may be closed by this pull request
@JordanMaples JordanMaples force-pushed the jordanmaples/bf_tree_migration branch from 1219bcc to cbea923 Compare May 12, 2026 20:47
@JordanMaples JordanMaples marked this pull request as ready for review May 12, 2026 20:49
@JordanMaples
Copy link
Copy Markdown
Contributor Author

@copilot would you mind giving this a review

Copy link
Copy Markdown
Contributor

Copilot AI commented May 12, 2026

@copilot would you mind giving this a review

Reviewed the current branch at cbea923: cargo test -p diskann-bf_tree and cargo clippy -p diskann-bf_tree --all-targets -- -D warnings both pass locally, and CI run 25761279810 shows no failed jobs. I don’t see additional blockers right now.

@JordanMaples JordanMaples force-pushed the jordanmaples/bf_tree_migration branch from 0ad39b8 to 83e6231 Compare May 12, 2026 21:59
Copy link
Copy Markdown
Contributor

@hildebrandmw hildebrandmw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Jordan, we can do quite a bit of cleanup on the crate structure and item names which I think would really start to tighten up the crate.

One question I had for you is what the expected end state is: are we targeting a full clean up or an incremental, non-published state while we continue working? I ask because the bf-tree provider inherited a bit of cruft from earlier iterations of the code, and I'm wondering how aggressiver we want to be in this PR in particular, considering that we still need integration tests and benchmarks.

Comment thread Cargo.toml Outdated
Comment thread diskann-providers/src/model/graph/provider/async_/mod.rs Outdated
Comment thread diskann-bf_tree/Cargo.toml Outdated
Comment thread diskann-bf_tree/src/lib.rs Outdated
Comment thread diskann-bf_tree/src/provider/vector_provider.rs Outdated
Comment thread diskann-bftree/src/provider.rs
Comment thread diskann-bf_tree/src/provider/bf_tree_provider.rs Outdated
Comment thread diskann-bf_tree/src/provider/bf_tree_provider.rs Outdated
Comment thread diskann-bf_tree/src/provider/bf_tree_provider.rs Outdated
Comment thread diskann-bf_tree/src/provider/bf_tree_provider.rs Outdated
Comment thread diskann-bftree/src/provider.rs
Copy link
Copy Markdown
Contributor

@harsha-simhadri harsha-simhadri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR, Jordan.

Could you wire it through diskann-benchmarks and run it in streaming mode to replicate the recall results in the IP-diskann paper? That would be good E2E validation of the provider. Please all latency and total runtime to analyze performance.

@JordanMaples
Copy link
Copy Markdown
Contributor Author

Thanks for the PR, Jordan.

Could you wire it through diskann-benchmarks and run it in streaming mode to replicate the recall results in the IP-diskann paper? That would be good E2E validation of the provider. Please all latency and total runtime to analyze performance.

I'll take care of that after I finish addressing Mark's feedback.

Squashed 30 commits for rebase. Key changes:
- Replace PQ with spherical quantization in bf_tree-provider
- Remove D generic parameter from BfTreeProvider
- Add QuantAccessor and quantized search support
- Implement DeletionCheck traits
- Remove hybrid computer (quant distances work alone)
- Remove benches/ directory
- Add streaming benchmark support
- Various fixes and cleanups

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@JordanMaples JordanMaples force-pushed the jordanmaples/bf_tree_migration branch from 0d60e42 to a4c41f3 Compare May 18, 2026 15:43
@JordanMaples JordanMaples requested a review from hildebrandmw May 19, 2026 14:55
Comment thread diskann-bftree/Cargo.toml
Copy link
Copy Markdown
Contributor

@hildebrandmw hildebrandmw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more small round, then looks good to me. Thanks!

Comment thread diskann-bftree/src/lib.rs
Comment thread diskann-bftree/src/provider.rs Outdated
Comment thread diskann-bftree/src/provider.rs Outdated
Comment thread diskann-bftree/src/provider.rs Outdated
Comment thread diskann-bftree/src/provider.rs Outdated
@JordanMaples JordanMaples enabled auto-merge (squash) May 21, 2026 15:26
@JordanMaples JordanMaples merged commit 5443ca0 into main May 21, 2026
25 checks passed
@JordanMaples JordanMaples deleted the jordanmaples/bf_tree_migration branch May 21, 2026 15:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Extract bftree provider into its own crate with appropriate tests.

6 participants