TurboQuant again! by connortsui20 · Pull Request #7829 · vortex-data/vortex

connortsui20 · 2026-05-07T14:33:07Z

Summary

Tracking issue: #7830

Moves TurboQuant out of vortex-tensor into a new vortex-turboquant crate.

The first commit was mostly copying and pasting a bunch of code, as well as adding the unpack method to replace canonicalization. The second commit was cleaning up everything holistically.

A lot of the code in vortex-tensor was reviewed pretty lightly because we knew that it was unstable, but now that we are more certain about the implementation (not necessarily about the exact design, but the actual implementation of the TQ algorithms), I think it is worth reviewing everything as a whole.

Testing

These tests were mostly there before, but now there are more!

codspeed-hq · 2026-05-07T22:02:17Z

Merging this PR will not alter performance

⚠️

Unknown Walltime execution environment detected

Using the Walltime instrument on standard Hosted Runners will lead to inconsistent data.

For the most accurate results, we recommend using CodSpeed Macro Runners: bare-metal machines fine-tuned for performance measurement consistency.

✅ 1208 untouched benchmarks

_{Comparing ct/tq-pull-out (11ac1c0) with develop (f3d5f09)}

connortsui20 · 2026-05-07T22:25:45Z

codex has some findings that I will have to investigate tomorrow:

Details

[P1] Lazy TQPack still cannot be used as the write-time path. initialize() only registers the TQUnpack array plugin, and TQPack’s array serialization returns None / deserialization bails, so writing TQPack::try_new_array(...).into_array() will fail before pack_vector runs. The file test currently writes an already-executed TurboQuant array, so it does not cover the actual lazy write path. See lib.rs (line 81), pack.rs (line 158), and file.rs (line 30).

[P2] vortex_turboquant::initialize() is not self-contained for lazy TQUnpack file reads. TQUnpack deserialization requires the parent dtype to downcast as AnyVector, but initialize() no longer registers vortex_tensor::vector::Vector. The current file tests mask this by separately calling vortex_tensor::initialize(&session). A session that only calls the TurboQuant initializer can register TQUnpack but still fail to deserialize its Vector parent dtype. See lib.rs (line 76) and unpack.rs (line 167).

[P2] The public SORF dimension padding path uses unchecked next_power_of_two(). SorfMatrix::try_new and SorfTransform::return_dtype can panic in debug builds on oversized dimensions instead of returning VortexResult, and validate_sorf_options does not reject the bad dimension before serialized metadata reaches this path. Use checked_next_power_of_two() and reject zero/overflow in validation. See rotation.rs (line 79) and vtable.rs (line 96).

joseph-isaacs · 2026-05-08T09:34:49Z

Please can we think a little about the reviewer this is a 4k+ PR with a large move and a new feature

joseph-isaacs

please can this be split up

gatesn · 2026-05-08T12:07:35Z

+    }
+}
+
+pub(super) fn serialize_config(config: &TurboQuantConfig) -> Vec<u8> {


Seems a bit weird these aren't on the TQScalarFnMetadata impl? Don't mind that much 🤷 - you could also just inline them to the call site they're so small

I mean we only use the struct as an intermediate type, these are not really methods

gatesn · 2026-05-08T12:11:58Z

+}
+
+impl ScalarFnVTable for TQUnpack {
+    type Options = TurboQuantConfig;


Do you need this? Isn't this entirely encapsulated by the child's ext dtype metadata?

You should model this as e.g. options the user must provide in their SQL query SELECT tq.unpack(..., <options>)

We do actually need this in order to (lossily) reconstruct the original vectors:

/// Configuration for lossy TurboQuant packing. #[derive(Clone, Debug, PartialEq, Eq, Hash)] pub struct TurboQuantConfig { bit_width: u8, seed: u64, num_rounds: u8, }

None of this information can be taken from the dtype

robert3005

Can we use prs as atomic units instead of commits, please. Github isn't made for reviewing individual commits

connortsui20 · 2026-05-08T12:58:17Z

@joseph-isaacs how do you suggest I split this PR up?

The best alternative here that I can think of is that I split this into a) copy and paste a bunch of code from vortex-tensor, b) clean up, and c) add the scalar functions, which is frankly a waste of time if we know most of the code is going to change.

This is all due to the fact that we decided to change the design quite significantly but the implementation is still mostly the same.

connortsui20 · 2026-05-08T15:07:53Z

I will clean up my git history and we can figure out if it makes sense to split changes out of this or if everything together is required for context for reviewing

Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>

its done now

connortsui20 changed the title ~~Ct/tq pull out~~ TurboQuant again! May 7, 2026

connortsui20 added the changelog/feature A new feature label May 7, 2026

connortsui20 mentioned this pull request May 7, 2026

Tracking Issue: TurboQuant #7830

Open

7 tasks

connortsui20 force-pushed the ct/tq-pull-out branch 2 times, most recently from 507349c to 51f347e Compare May 7, 2026 17:32

gatesn reviewed May 7, 2026

View reviewed changes

Comment thread vortex-turboquant/src/lib.rs Outdated

Comment thread vortex-turboquant/src/vector/normalize.rs Outdated

connortsui20 force-pushed the ct/tq-pull-out branch 2 times, most recently from 25c7339 to f473276 Compare May 7, 2026 21:54

connortsui20 force-pushed the ct/tq-pull-out branch from 49e8eec to c098583 Compare May 7, 2026 22:12

connortsui20 marked this pull request as ready for review May 7, 2026 22:15

connortsui20 requested a review from gatesn May 7, 2026 22:16

connortsui20 commented May 7, 2026

View reviewed changes

Comment thread vortex-turboquant/src/scalar_fns/pack.rs Outdated

connortsui20 force-pushed the ct/tq-pull-out branch from c098583 to 37fd9e4 Compare May 7, 2026 22:20

connortsui20 force-pushed the ct/tq-pull-out branch from 37fd9e4 to 12b4aa0 Compare May 8, 2026 02:45

joseph-isaacs reviewed May 8, 2026

View reviewed changes

Comment thread vortex-turboquant/src/scalar_fns/encode.rs

joseph-isaacs previously requested changes May 8, 2026

View reviewed changes

gatesn reviewed May 8, 2026

View reviewed changes

robert3005 previously requested changes May 8, 2026

View reviewed changes

connortsui20 force-pushed the ct/tq-pull-out branch from edcebd9 to 2cf49f7 Compare May 8, 2026 15:53

connortsui20 added 3 commits May 8, 2026 14:14

refactor(tensor): prepare SORF and vector APIs for TurboQuant

46f33e3

Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>

feat: add vortex-turboquant crate

d61d4a3

Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>

perf(turboquant): specialize pack and unpack for mask variants

3c3836f

Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>

connortsui20 force-pushed the ct/tq-pull-out branch from 2cf49f7 to 3c3836f Compare May 8, 2026 18:23

connortsui20 added 2 commits May 8, 2026 14:48

rename pack/unpack to encode/decode

5314399

Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>

move code around and remove scalar fn plugin

11ac1c0

Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>

connortsui20 force-pushed the ct/tq-pull-out branch from 0ae8703 to 11ac1c0 Compare May 8, 2026 19:43

gatesn approved these changes May 8, 2026

View reviewed changes

gatesn enabled auto-merge (squash) May 8, 2026 19:55

gatesn merged commit ff12040 into develop May 8, 2026
68 checks passed

gatesn deleted the ct/tq-pull-out branch May 8, 2026 19:57

Conversation

connortsui20 commented May 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Testing

Uh oh!

Uh oh!

Uh oh!

codspeed-hq Bot commented May 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Merging this PR will not alter performance

Uh oh!

Uh oh!

connortsui20 commented May 7, 2026

Uh oh!

joseph-isaacs commented May 8, 2026

Uh oh!

Uh oh!

joseph-isaacs left a comment

Choose a reason for hiding this comment

Uh oh!

gatesn May 8, 2026

Choose a reason for hiding this comment

Uh oh!

connortsui20 May 8, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

gatesn May 8, 2026

Choose a reason for hiding this comment

Uh oh!

connortsui20 May 8, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

robert3005 left a comment

Choose a reason for hiding this comment

Uh oh!

connortsui20 commented May 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

connortsui20 commented May 8, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

connortsui20 commented May 7, 2026 •

edited

Loading

codspeed-hq Bot commented May 7, 2026 •

edited

Loading

connortsui20 commented May 8, 2026 •

edited

Loading