Predict IVF-PQ FP16 overflow and auto-switch to FP32 by huuanhhuyn · Pull Request #2246 · NVIDIA/cuvs

huuanhhuyn · 2026-06-16T07:22:29Z

Large-magnitude unnormalized datasets (e.g. SIFT_1M) contains vectors whose norm overflowed the FP16 internal search types of IVF-PQ.

This PR detects it by sampling a fraction from the dataset, compute its L2 squared norm and estimate the max squared distance from the samples. When the distance reaches FP16 overflow range, the internal types fall back to FP32.

We computed max norm on the top 20k samples of the dataset

copy-pr-bot · 2026-06-16T07:22:32Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

coderabbitai · 2026-06-16T07:26:23Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

Adds a new ivf_pq_fp16_overflow.cuh header that samples dataset rows to estimate the maximum squared norm via map-reduce, derives a distance upper bound, and compares it against a defensive FP16 threshold. The CAGRA build(...) function includes this header and calls estimate_fp16_overflow to auto-upgrade distance dtypes to FP32 when overflow is detected.

Changes

FP16 Overflow Detection and CAGRA Integration

Layer / File(s)	Summary
FP16 overflow detection header: sampler and threshold logic `cpp/src/neighbors/ivf_pq/ivf_pq_fp16_overflow.cuh`	Adds includes for RAFT CUDA utilities, MDSpan, row-sampling, RNG, and distance utilities. Implements `detail::estimate_max_squared_norm` that samples a saturated fraction of dataset rows into device memory, computes mapped squared norms per row, reduces to the global maximum via map-reduce, synchronizes, and returns the scalar. Implements `helpers::estimate_fp16_overflow` that early-exits for empty datasets and `CosineExpanded` metrics, defines an FP16 max-based overflow threshold with defensive margin, derives a distance upper bound (scaled by `4.0` for `L2Expanded`), and returns whether the bound exceeds the threshold.
CAGRA build: include and FP16 guard wiring `cpp/src/neighbors/detail/cagra/cagra_build.cuh`	Adds the include for `ivf_pq_fp16_overflow.cuh` at the top of the file. Inserts a pre-build overflow-risk check in `build(...)` that, when `ivf_pq_params` is detected and either `internal_distance_dtype` or `coarse_search_dtype` is FP16, calls `estimate_fp16_overflow` and, on detected overflow risk, logs a warning and overrides both dtypes to `CUDA_R_32F` while leaving `lut_dtype` unchanged.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'Predict IVF-PQ FP16 overflow and auto-switch to FP32' accurately and concisely describes the main changes: FP16 overflow prediction and automatic precision switching in IVF-PQ operations.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Description check	✅ Passed	The pull request description clearly explains the problem (FP16 overflow in IVF-PQ for large datasets), the solution (sampling-based overflow detection), and implementation details (sampling strategy and fallback to FP32).

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@cpp/src/neighbors/ivf_pq/ivf_pq_fp16_overflow.cuh`:
- Line 39: Remove the debug printf statement at line 39 in the
ivf_pq_fp16_overflow.cuh file. This printf executes within a loop that iterates
over every dimension of every sampled row, causing severe performance
degradation and output flooding. Simply delete the entire printf line as it
serves no purpose in production code.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 086831a1-8cdd-42f2-aaaf-8ac33b75c173

📥 Commits

Reviewing files that changed from the base of the PR and between 6672103 and faa1609.

📒 Files selected for processing (2)

cpp/src/neighbors/detail/cagra/cagra_build.cuh
cpp/src/neighbors/ivf_pq/ivf_pq_fp16_overflow.cuh

coderabbitai

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

cpp/src/neighbors/ivf_pq/ivf_pq_fp16_overflow.cuh (1)
70-88: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

HIGH: Missing validation could cause kernel launch with zero grid blocks.

If n_rows == 0 (empty dataset) or if sample_fraction is extremely small with a moderate dataset size (e.g., n_rows=5000, sample_fraction=0.0001), n_sample becomes 0, resulting in grid_size = 0. Launching a kernel with zero blocks is invalid.

Consider adding a minimum sample size guard:
Suggested fix
   int64_t n_sample = static_cast<int64_t>(sample_fraction * static_cast<double>(n_rows));
   if (n_rows <= 1000) {
     n_sample = n_rows;  // for small datasets, just use them all and skip the sampling overhead
   } else if (n_rows > 100000) {
     // cap the sample size to 100k for speed and keep memory use within the limited workspace resource
     n_sample = 100000;
   }
+  
+  // Handle empty dataset or degenerate sample size
+  if (n_sample <= 0) { return 0.0f; }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cpp/src/neighbors/ivf_pq/ivf_pq_fp16_overflow.cuh` around lines 70 - 88, Add
a minimum sample size guard to prevent grid_size from being zero when launching
the kernel. After the existing size constraint checks for n_rows (the checks for
n_rows <= 1000 and n_rows > 100000), add a condition to ensure n_sample is at
least 1, since if n_sample is 0, the subsequent calculation of grid_size will be
0, resulting in an invalid kernel launch with zero blocks. This guards against
both empty datasets (n_rows == 0) and extremely small sample fractions.
Source: Coding guidelines

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@cpp/src/neighbors/ivf_pq/ivf_pq_fp16_overflow.cuh`:
- Around line 70-88: Add a minimum sample size guard to prevent grid_size from
being zero when launching the kernel. After the existing size constraint checks
for n_rows (the checks for n_rows <= 1000 and n_rows > 100000), add a condition
to ensure n_sample is at least 1, since if n_sample is 0, the subsequent
calculation of grid_size will be 0, resulting in an invalid kernel launch with
zero blocks. This guards against both empty datasets (n_rows == 0) and extremely
small sample fractions.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: a794a6f7-8a2b-4f29-ac12-49b5d25d10ab

📥 Commits

Reviewing files that changed from the base of the PR and between faa1609 and 3b5130f.

📒 Files selected for processing (1)

cpp/src/neighbors/ivf_pq/ivf_pq_fp16_overflow.cuh

mfoerste4

Thanks @huuanhhuyn for the PR. Please utilize raft for the norm computation if possible.

mfoerste4

LGTM, thanks @huuanhhuyn for the PR!

tfeher

Thanks Huy for the PR! It looks good, but I have one concern regarding the "defensive margin". Let's see what others think about this questien.

tfeher · 2026-06-26T15:41:52Z

/ok to test 7f41b35

achirkin

Hi @huuanhhuyn, thanks for tackling this long-standing problem!
I have some doubts about the approach though. Have you considered using the IVF-PQ search or index itself to decide if FP16 is permissible? Examples:

Call IVF-PQ search on a small subset of queries and detect if any of the results are infinite/invalid; if so, adjust the search params to increase one or both internal dtypes. This is agnostic of the selected distance type.
Calculate the distance maximum like you do here, but use the cluster centers of the IVF-PQ index rather than the source data. This gives a small number of points that cover the dataset well.
A combination of 1+2: call IVF-PQ search using its own cluster centers as queries.
Calculate per-component min/max of the cluster centers to get a bounding box; use the diagonal of the bounding box as a proxy for the spread of points in the dataset.

achirkin · 2026-06-29T06:21:01Z

+float estimate_max_squared_norm(
+  raft::resources const& handle,
+  raft::mdspan<const DataT, raft::matrix_extent<int64_t>, raft::row_major, Accessor> dataset)


What about the other distance types? Would it make sense to generalize it to pass the whatever distance type is requested by the constructed index?

This catches the overflow on IVF-PQ search only. build_knn_graph by IVF_PQ only allows 3 distance types cagra_build.cuh#L1635 (L2Expanded, CosineExpanded, InnerProduct).

HNSW excludes further the use of Cosine distance hnsw.hpp#L123

I have refactored the code for explicitness with a switch around line 104 of the same file.

Tested: overflow detected with the remaining distance types - L2Expanded and InnerProduct

… SIFT 1M)

Sampling is not necessary anyway.

huuanhhuyn requested a review from a team as a code owner June 16, 2026 07:22

github-project-automation Bot added this to Unstructured Data Processing Jun 16, 2026

coderabbitai Bot reviewed Jun 16, 2026

View reviewed changes

Comment thread cpp/src/neighbors/ivf_pq/ivf_pq_fp16_overflow.cuh Outdated

coderabbitai Bot reviewed Jun 16, 2026

View reviewed changes

mfoerste4 suggested changes Jun 16, 2026

View reviewed changes

Comment thread cpp/src/neighbors/ivf_pq/ivf_pq_fp16_overflow.cuh Outdated

Comment thread cpp/src/neighbors/ivf_pq/ivf_pq_fp16_overflow.cuh Outdated

huuanhhuyn commented Jun 17, 2026

View reviewed changes

Comment thread cpp/src/neighbors/detail/cagra/cagra_build.cuh Outdated

huuanhhuyn requested a review from mfoerste4 June 22, 2026 10:26

mfoerste4 approved these changes Jun 22, 2026

View reviewed changes

Comment thread cpp/src/neighbors/ivf_pq/ivf_pq_fp16_overflow.cuh Outdated

tfeher requested changes Jun 26, 2026

View reviewed changes

Comment thread cpp/src/neighbors/ivf_pq/ivf_pq_fp16_overflow.cuh Outdated

tfeher added bug Something isn't working non-breaking Introduces a non-breaking change labels Jun 26, 2026

achirkin requested changes Jun 29, 2026

View reviewed changes

huuanhhuyn added 11 commits June 29, 2026 12:36

IVF-PQ FP16 overflow detection

19bc18f

Explain the use of mapping

01f234a

Move params fallback to params init phase

08302cf

Choose sampling equation

1586a42

Use raft built-in kernels and remove manual one.

431d3ac

Add kDelay param to adjust the sampling fraction growth speed

7fcdb1d

Reduce kSaturation to 20k (just sufficient to detect FP16 overflow in…

e09cb73

… SIFT 1M)

Explain uniform sampling decision

c4b344c

Avoid expensive random sampling over the full dataset.

49ee283

Sampling is not necessary anyway.

Edit comments

e4ee618

Fix sampling number and remove defensive margin

3ddd98a

huuanhhuyn force-pushed the ivfpq_fp16_overflow branch from 7f41b35 to 3ddd98a Compare June 29, 2026 12:42

huuanhhuyn requested a review from a team as a code owner June 29, 2026 12:42

Refactor for explicitness of supported distance types

d94fe86

Uh oh!

Conversation

huuanhhuyn commented Jun 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

copy-pr-bot Bot commented Jun 16, 2026

Uh oh!

coderabbitai Bot commented Jun 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Estimated code review effort

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

mfoerste4 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mfoerste4 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

tfeher left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

tfeher commented Jun 26, 2026

Uh oh!

achirkin left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

achirkin Jun 29, 2026

Choose a reason for hiding this comment

Uh oh!

huuanhhuyn Jun 29, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

huuanhhuyn commented Jun 16, 2026 •

edited

Loading

coderabbitai Bot commented Jun 16, 2026 •

edited

Loading