Skip to content

Predict IVF-PQ FP16 overflow and auto-switch to FP32#2246

Open
huuanhhuyn wants to merge 12 commits into
NVIDIA:mainfrom
huuanhhuyn:ivfpq_fp16_overflow
Open

Predict IVF-PQ FP16 overflow and auto-switch to FP32#2246
huuanhhuyn wants to merge 12 commits into
NVIDIA:mainfrom
huuanhhuyn:ivfpq_fp16_overflow

Conversation

@huuanhhuyn

@huuanhhuyn huuanhhuyn commented Jun 16, 2026

Copy link
Copy Markdown

Large-magnitude unnormalized datasets (e.g. SIFT_1M) contains vectors whose norm overflowed the FP16 internal search types of IVF-PQ.

This PR detects it by sampling a fraction from the dataset, compute its L2 squared norm and estimate the max squared distance from the samples. When the distance reaches FP16 overflow range, the internal types fall back to FP32.

We computed max norm on the top 20k samples of the dataset

@copy-pr-bot

copy-pr-bot Bot commented Jun 16, 2026

Copy link
Copy Markdown

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@coderabbitai

coderabbitai Bot commented Jun 16, 2026

Copy link
Copy Markdown

Review Change Stack

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds a new ivf_pq_fp16_overflow.cuh header that samples dataset rows to estimate the maximum squared norm via map-reduce, derives a distance upper bound, and compares it against a defensive FP16 threshold. The CAGRA build(...) function includes this header and calls estimate_fp16_overflow to auto-upgrade distance dtypes to FP32 when overflow is detected.

Changes

FP16 Overflow Detection and CAGRA Integration

Layer / File(s) Summary
FP16 overflow detection header: sampler and threshold logic
cpp/src/neighbors/ivf_pq/ivf_pq_fp16_overflow.cuh
Adds includes for RAFT CUDA utilities, MDSpan, row-sampling, RNG, and distance utilities. Implements detail::estimate_max_squared_norm that samples a saturated fraction of dataset rows into device memory, computes mapped squared norms per row, reduces to the global maximum via map-reduce, synchronizes, and returns the scalar. Implements helpers::estimate_fp16_overflow that early-exits for empty datasets and CosineExpanded metrics, defines an FP16 max-based overflow threshold with defensive margin, derives a distance upper bound (scaled by 4.0 for L2Expanded), and returns whether the bound exceeds the threshold.
CAGRA build: include and FP16 guard wiring
cpp/src/neighbors/detail/cagra/cagra_build.cuh
Adds the include for ivf_pq_fp16_overflow.cuh at the top of the file. Inserts a pre-build overflow-risk check in build(...) that, when ivf_pq_params is detected and either internal_distance_dtype or coarse_search_dtype is FP16, calls estimate_fp16_overflow and, on detected overflow risk, logs a warning and overrides both dtypes to CUDA_R_32F while leaving lut_dtype unchanged.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title 'Predict IVF-PQ FP16 overflow and auto-switch to FP32' accurately and concisely describes the main changes: FP16 overflow prediction and automatic precision switching in IVF-PQ operations.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description check ✅ Passed The pull request description clearly explains the problem (FP16 overflow in IVF-PQ for large datasets), the solution (sampling-based overflow detection), and implementation details (sampling strategy and fallback to FP32).

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@cpp/src/neighbors/ivf_pq/ivf_pq_fp16_overflow.cuh`:
- Line 39: Remove the debug printf statement at line 39 in the
ivf_pq_fp16_overflow.cuh file. This printf executes within a loop that iterates
over every dimension of every sampled row, causing severe performance
degradation and output flooding. Simply delete the entire printf line as it
serves no purpose in production code.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 086831a1-8cdd-42f2-aaaf-8ac33b75c173

📥 Commits

Reviewing files that changed from the base of the PR and between 6672103 and faa1609.

📒 Files selected for processing (2)
  • cpp/src/neighbors/detail/cagra/cagra_build.cuh
  • cpp/src/neighbors/ivf_pq/ivf_pq_fp16_overflow.cuh

Comment thread cpp/src/neighbors/ivf_pq/ivf_pq_fp16_overflow.cuh Outdated

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
cpp/src/neighbors/ivf_pq/ivf_pq_fp16_overflow.cuh (1)

70-88: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

HIGH: Missing validation could cause kernel launch with zero grid blocks.

If n_rows == 0 (empty dataset) or if sample_fraction is extremely small with a moderate dataset size (e.g., n_rows=5000, sample_fraction=0.0001), n_sample becomes 0, resulting in grid_size = 0. Launching a kernel with zero blocks is invalid.

Consider adding a minimum sample size guard:

Suggested fix
   int64_t n_sample = static_cast<int64_t>(sample_fraction * static_cast<double>(n_rows));
   if (n_rows <= 1000) {
     n_sample = n_rows;  // for small datasets, just use them all and skip the sampling overhead
   } else if (n_rows > 100000) {
     // cap the sample size to 100k for speed and keep memory use within the limited workspace resource
     n_sample = 100000;
   }
+  
+  // Handle empty dataset or degenerate sample size
+  if (n_sample <= 0) { return 0.0f; }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cpp/src/neighbors/ivf_pq/ivf_pq_fp16_overflow.cuh` around lines 70 - 88, Add
a minimum sample size guard to prevent grid_size from being zero when launching
the kernel. After the existing size constraint checks for n_rows (the checks for
n_rows <= 1000 and n_rows > 100000), add a condition to ensure n_sample is at
least 1, since if n_sample is 0, the subsequent calculation of grid_size will be
0, resulting in an invalid kernel launch with zero blocks. This guards against
both empty datasets (n_rows == 0) and extremely small sample fractions.

Source: Coding guidelines

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@cpp/src/neighbors/ivf_pq/ivf_pq_fp16_overflow.cuh`:
- Around line 70-88: Add a minimum sample size guard to prevent grid_size from
being zero when launching the kernel. After the existing size constraint checks
for n_rows (the checks for n_rows <= 1000 and n_rows > 100000), add a condition
to ensure n_sample is at least 1, since if n_sample is 0, the subsequent
calculation of grid_size will be 0, resulting in an invalid kernel launch with
zero blocks. This guards against both empty datasets (n_rows == 0) and extremely
small sample fractions.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: a794a6f7-8a2b-4f29-ac12-49b5d25d10ab

📥 Commits

Reviewing files that changed from the base of the PR and between faa1609 and 3b5130f.

📒 Files selected for processing (1)
  • cpp/src/neighbors/ivf_pq/ivf_pq_fp16_overflow.cuh

@mfoerste4 mfoerste4 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @huuanhhuyn for the PR. Please utilize raft for the norm computation if possible.

Comment thread cpp/src/neighbors/ivf_pq/ivf_pq_fp16_overflow.cuh Outdated
Comment thread cpp/src/neighbors/ivf_pq/ivf_pq_fp16_overflow.cuh Outdated
Comment thread cpp/src/neighbors/detail/cagra/cagra_build.cuh Outdated
@huuanhhuyn huuanhhuyn requested a review from mfoerste4 June 22, 2026 10:26

@mfoerste4 mfoerste4 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @huuanhhuyn for the PR!

Comment thread cpp/src/neighbors/ivf_pq/ivf_pq_fp16_overflow.cuh Outdated

@tfeher tfeher left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Huy for the PR! It looks good, but I have one concern regarding the "defensive margin". Let's see what others think about this questien.

Comment thread cpp/src/neighbors/ivf_pq/ivf_pq_fp16_overflow.cuh Outdated
@tfeher tfeher added bug Something isn't working non-breaking Introduces a non-breaking change labels Jun 26, 2026
@tfeher

tfeher commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

/ok to test 7f41b35

@achirkin achirkin left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @huuanhhuyn, thanks for tackling this long-standing problem!
I have some doubts about the approach though. Have you considered using the IVF-PQ search or index itself to decide if FP16 is permissible? Examples:

  1. Call IVF-PQ search on a small subset of queries and detect if any of the results are infinite/invalid; if so, adjust the search params to increase one or both internal dtypes. This is agnostic of the selected distance type.
  2. Calculate the distance maximum like you do here, but use the cluster centers of the IVF-PQ index rather than the source data. This gives a small number of points that cover the dataset well.
  3. A combination of 1+2: call IVF-PQ search using its own cluster centers as queries.
  4. Calculate per-component min/max of the cluster centers to get a bounding box; use the diagonal of the bounding box as a proxy for the spread of points in the dataset.

Comment thread cpp/src/neighbors/ivf_pq/ivf_pq_fp16_overflow.cuh Outdated
Comment on lines +32 to +34
float estimate_max_squared_norm(
raft::resources const& handle,
raft::mdspan<const DataT, raft::matrix_extent<int64_t>, raft::row_major, Accessor> dataset)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about the other distance types? Would it make sense to generalize it to pass the whatever distance type is requested by the constructed index?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This catches the overflow on IVF-PQ search only. build_knn_graph by IVF_PQ only allows 3 distance types cagra_build.cuh#L1635 (L2Expanded, CosineExpanded, InnerProduct).

HNSW excludes further the use of Cosine distance hnsw.hpp#L123

I have refactored the code for explicitness with a switch around line 104 of the same file.

Tested: overflow detected with the remaining distance types - L2Expanded and InnerProduct

Comment thread cpp/src/neighbors/ivf_pq/ivf_pq_fp16_overflow.cuh Outdated
@huuanhhuyn huuanhhuyn force-pushed the ivfpq_fp16_overflow branch from 7f41b35 to 3ddd98a Compare June 29, 2026 12:42
@huuanhhuyn huuanhhuyn requested a review from a team as a code owner June 29, 2026 12:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working non-breaking Introduces a non-breaking change

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants