Skip to content

Validate short view strings in separate buffer in arrow-row#10250

Merged
Jefffrey merged 8 commits into
apache:mainfrom
Jefffrey:row-short-utf8-validation
Jul 5, 2026
Merged

Validate short view strings in separate buffer in arrow-row#10250
Jefffrey merged 8 commits into
apache:mainfrom
Jefffrey:row-short-utf8-validation

Conversation

@Jefffrey

@Jefffrey Jefffrey commented Jul 1, 2026

Copy link
Copy Markdown
Contributor

Which issue does this PR close?

Rationale for this change

Reduce output string view values buffer size when decoding from row format & validating utf8

What changes are included in this PR?

Introduce separate vec to append inline short strings to for one shot utf8 validation; previously we mixed short + long strings in the values buffer for ease of utf8 validation, but this means the output view array has more memory usage than strictly required.

Are these changes tested?

Existing tests.

Are there any user-facing changes?

No.

@github-actions github-actions Bot added the arrow Changes to the arrow crate label Jul 1, 2026
@Jefffrey

Jefffrey commented Jul 1, 2026

Copy link
Copy Markdown
Contributor Author

run benchmark row_format
env:
BENCH_FILTER: convert_rows.*string\sview

Comment thread arrow-row/src/variable.rs
}

fn decode_binary_view_inner(
fn decode_binary_view_inner<const VALIDATE_UTF8: bool>(

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

decided to take this opportunity to hoist the validate_utf8 option into a generic since it was used in the hot loop

@adriangbot

Copy link
Copy Markdown

🤖 Arrow criterion benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4855871093-774-psktj 6.12.85+ #1 SMP Mon May 11 08:17:35 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing row-short-utf8-validation (23069fd) to fbe75a3 (merge-base) diff
BENCH_NAME=row_format
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental,object_store --bench row_format
BENCH_FILTER=convert_rows.*string\sview
Results will be posted here when complete


File an issue against this benchmark runner

@adriangbot

Copy link
Copy Markdown

🤖 Arrow criterion benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected
Details

group                                         main                                   row-short-utf8-validation
-----                                         ----                                   -------------------------
convert_rows 4096 string view(1..100, 0)      1.01     79.1±0.34µs        ? ?/sec    1.00     78.1±0.30µs        ? ?/sec
convert_rows 4096 string view(1..100, 0.5)    1.02     69.1±0.38µs        ? ?/sec    1.00     67.7±0.42µs        ? ?/sec
convert_rows 4096 string view(10, 0)          1.00     43.2±0.07µs        ? ?/sec    1.00     43.1±0.06µs        ? ?/sec
convert_rows 4096 string view(100, 0)         1.02     96.6±0.18µs        ? ?/sec    1.00     94.8±0.07µs        ? ?/sec
convert_rows 4096 string view(100, 0.5)       1.02     61.9±0.13µs        ? ?/sec    1.00     60.8±0.16µs        ? ?/sec
convert_rows 4096 string view(30, 0)          1.00     58.1±0.07µs        ? ?/sec    1.00     58.1±0.04µs        ? ?/sec

Resource Usage

base (merge-base)

Metric Value
Wall time 65.0s
Peak memory 9.3 MiB
Avg memory 7.4 MiB
CPU user 58.6s
CPU sys 0.0s
Peak spill 0 B

branch

Metric Value
Wall time 60.0s
Peak memory 12.4 MiB
Avg memory 7.9 MiB
CPU user 57.1s
CPU sys 0.0s
Peak spill 0 B

File an issue against this benchmark runner

@alamb alamb left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me -- thanks @Jefffrey

One thing codex pointed out is that the bechmarks don't currently use validate, so the benchmark may not be representative.

I made a PR to add this benchmark if you want to run it here

Comment thread arrow-row/src/variable.rs
}
let mut values = MutableBuffer::new(values_capacity);
let mut view_utf8_validation_buffer = if VALIDATE_UTF8 {
Vec::with_capacity(inline_capacity)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so this is a new allocation, but given the benchmark it seems to be more than paid for with the additional performance

Comment thread arrow-row/src/lib.rs
}

#[test]
fn test_values_buffer_smaller_when_utf8_validation_disabled() {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe it is worth updating this test to verify that when utf8 validation is enabled, the values only contain data for the long strings (not the short strings(

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point, repurposed the test

Jefffrey pushed a commit that referenced this pull request Jul 2, 2026
# Which issue does this PR close?

- related to #10250 from
@Jefffrey

# Rationale for this change

The existing row format benchmark measures `convert_rows` using rows
created directly by `RowConverter::convert_columns`, which skips UTF-8
validation during decode.

# What changes are included in this PR?

This adds a `convert_rows_validated` benchmark that decodes rows parsed
through `RowParser`, exercising the UTF-8 validation path. It also
derives `Clone` for `Rows` so the benchmark setup can reuse the prepared
rows when converting to binary.

# Are these changes tested?

CI.

I also ran it locally like
```shell
cargo bench --bench row_format -- convert_rows_validated
```


# Are there any user-facing changes?

No breaking changes. `Rows` now implements `Clone`.
@Jefffrey

Jefffrey commented Jul 2, 2026

Copy link
Copy Markdown
Contributor Author

run benchmark row_format
env:
BENCH_FILTER: convert_rows.*string\sview

@adriangbot

Copy link
Copy Markdown

🤖 Arrow criterion benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4865651716-806-kdfqv 6.12.85+ #1 SMP Mon May 11 08:17:35 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing row-short-utf8-validation (6d67b70) to c7dc6b8 (merge-base) diff
BENCH_NAME=row_format
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental,object_store --bench row_format
BENCH_FILTER=convert_rows.*string\sview
Results will be posted here when complete


File an issue against this benchmark runner

@adriangbot

Copy link
Copy Markdown

🤖 Arrow criterion benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected
Details

group                                                main                                   row-short-utf8-validation
-----                                                ----                                   -------------------------
convert_rows 4096 string view(1..100, 0)             1.21     77.9±0.38µs        ? ?/sec    1.00     64.7±0.05µs        ? ?/sec
convert_rows 4096 string view(1..100, 0.5)           1.61     67.3±0.34µs        ? ?/sec    1.00     41.9±0.05µs        ? ?/sec
convert_rows 4096 string view(10, 0)                 1.01     42.9±0.14µs        ? ?/sec    1.00     42.7±0.07µs        ? ?/sec
convert_rows 4096 string view(100, 0)                1.00     95.7±0.12µs        ? ?/sec    1.01     96.9±0.31µs        ? ?/sec
convert_rows 4096 string view(100, 0.5)              1.07     61.8±0.17µs        ? ?/sec    1.00     57.6±0.10µs        ? ?/sec
convert_rows 4096 string view(30, 0)                 1.00     57.9±0.05µs        ? ?/sec    1.01     58.6±0.05µs        ? ?/sec
convert_rows_parsed 4096 string view(1..100, 0)      1.00     85.9±0.49µs        ? ?/sec    1.07     91.8±0.83µs        ? ?/sec
convert_rows_parsed 4096 string view(1..100, 0.5)    1.25     72.0±0.36µs        ? ?/sec    1.00     57.4±0.11µs        ? ?/sec
convert_rows_parsed 4096 string view(10, 0)          1.00     44.0±0.13µs        ? ?/sec    1.23     54.1±0.45µs        ? ?/sec
convert_rows_parsed 4096 string view(100, 0)         1.00    109.9±0.12µs        ? ?/sec    1.21    132.8±0.23µs        ? ?/sec
convert_rows_parsed 4096 string view(100, 0.5)       1.00     69.2±0.11µs        ? ?/sec    1.16     80.5±0.20µs        ? ?/sec
convert_rows_parsed 4096 string view(30, 0)          1.00     62.0±0.03µs        ? ?/sec    1.14     70.8±0.70µs        ? ?/sec

Resource Usage

base (merge-base)

Metric Value
Wall time 125.0s
Peak memory 10.5 MiB
Avg memory 8.1 MiB
CPU user 118.7s
CPU sys 0.0s
Peak spill 0 B

branch

Metric Value
Wall time 120.0s
Peak memory 13.3 MiB
Avg memory 9.1 MiB
CPU user 115.1s
CPU sys 0.0s
Peak spill 0 B

File an issue against this benchmark runner

@Jefffrey

Jefffrey commented Jul 2, 2026

Copy link
Copy Markdown
Contributor Author

run benchmark row_format
env:
BENCH_FILTER: convert_rows.*string\sview

@adriangbot

Copy link
Copy Markdown

🤖 Arrow criterion benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4865768440-807-lgw4q 6.12.85+ #1 SMP Mon May 11 08:17:35 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing row-short-utf8-validation (6d67b70) to c7dc6b8 (merge-base) diff
BENCH_NAME=row_format
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental,object_store --bench row_format
BENCH_FILTER=convert_rows.*string\sview
Results will be posted here when complete


File an issue against this benchmark runner

@adriangbot

Copy link
Copy Markdown

🤖 Arrow criterion benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected
Details

group                                                main                                   row-short-utf8-validation
-----                                                ----                                   -------------------------
convert_rows 4096 string view(1..100, 0)             1.20     77.9±0.36µs        ? ?/sec    1.00     64.7±0.05µs        ? ?/sec
convert_rows 4096 string view(1..100, 0.5)           1.61     67.2±0.36µs        ? ?/sec    1.00     41.9±0.04µs        ? ?/sec
convert_rows 4096 string view(10, 0)                 1.01     42.8±0.06µs        ? ?/sec    1.00     42.6±0.06µs        ? ?/sec
convert_rows 4096 string view(100, 0)                1.01     96.3±0.25µs        ? ?/sec    1.00     95.8±0.15µs        ? ?/sec
convert_rows 4096 string view(100, 0.5)              1.08     61.5±0.13µs        ? ?/sec    1.00     56.7±0.18µs        ? ?/sec
convert_rows 4096 string view(30, 0)                 1.00     58.0±0.09µs        ? ?/sec    1.00     58.1±0.08µs        ? ?/sec
convert_rows_parsed 4096 string view(1..100, 0)      1.00     86.0±0.31µs        ? ?/sec    1.06     91.4±0.88µs        ? ?/sec
convert_rows_parsed 4096 string view(1..100, 0.5)    1.25     71.2±0.35µs        ? ?/sec    1.00     56.8±0.11µs        ? ?/sec
convert_rows_parsed 4096 string view(10, 0)          1.00     44.3±0.08µs        ? ?/sec    1.20     53.2±0.16µs        ? ?/sec
convert_rows_parsed 4096 string view(100, 0)         1.00    112.0±0.30µs        ? ?/sec    1.17    130.8±0.23µs        ? ?/sec
convert_rows_parsed 4096 string view(100, 0.5)       1.00     68.8±0.12µs        ? ?/sec    1.14     78.4±0.19µs        ? ?/sec
convert_rows_parsed 4096 string view(30, 0)          1.00     62.0±0.04µs        ? ?/sec    1.13     70.4±1.15µs        ? ?/sec

Resource Usage

base (merge-base)

Metric Value
Wall time 125.0s
Peak memory 9.9 MiB
Avg memory 7.9 MiB
CPU user 118.8s
CPU sys 0.0s
Peak spill 0 B

branch

Metric Value
Wall time 120.0s
Peak memory 12.7 MiB
Avg memory 9.1 MiB
CPU user 114.1s
CPU sys 0.0s
Peak spill 0 B

File an issue against this benchmark runner

@Jefffrey

Jefffrey commented Jul 2, 2026

Copy link
Copy Markdown
Contributor Author

i can reproduce the regressions locally

Details
convert_rows 4096 string view(10, 0)
                        time:   [29.906 µs 29.955 µs 30.007 µs]
                        change: [−2.7118% −2.2274% −1.7427%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 12 outliers among 100 measurements (12.00%)
  12 (12.00%) high mild

convert_rows_parsed 4096 string view(10, 0)
                        time:   [40.673 µs 40.705 µs 40.740 µs]
                        change: [+23.796% +24.170% +24.558%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 7 outliers among 100 measurements (7.00%)
  1 (1.00%) low severe
  2 (2.00%) low mild
  2 (2.00%) high mild
  2 (2.00%) high severe

convert_rows 4096 string view(30, 0)
                        time:   [38.753 µs 38.961 µs 39.200 µs]
                        change: [+1.8488% +2.1492% +2.4907%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 6 outliers among 100 measurements (6.00%)
  3 (3.00%) high mild
  3 (3.00%) high severe

convert_rows_parsed 4096 string view(30, 0)
                        time:   [60.500 µs 60.579 µs 60.672 µs]
                        change: [+46.025% +46.395% +46.758%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 18 outliers among 100 measurements (18.00%)
  9 (9.00%) high mild
  9 (9.00%) high severe

convert_rows 4096 string view(100, 0)
                        time:   [48.704 µs 48.803 µs 48.905 µs]
                        change: [+2.3709% +2.6566% +2.9531%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 9 outliers among 100 measurements (9.00%)
  2 (2.00%) low mild
  6 (6.00%) high mild
  1 (1.00%) high severe

convert_rows_parsed 4096 string view(100, 0)
                        time:   [75.288 µs 75.331 µs 75.380 µs]
                        change: [+36.667% +36.813% +36.944%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 6 outliers among 100 measurements (6.00%)
  1 (1.00%) low mild
  2 (2.00%) high mild
  3 (3.00%) high severe

convert_rows 4096 string view(100, 0.5)
                        time:   [42.943 µs 43.025 µs 43.099 µs]
                        change: [+2.5279% +2.7698% +3.0016%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 4 outliers among 100 measurements (4.00%)
  2 (2.00%) low mild
  2 (2.00%) high mild

convert_rows_parsed 4096 string view(100, 0.5)
                        time:   [57.405 µs 57.465 µs 57.534 µs]
                        change: [+24.690% +25.082% +25.470%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 18 outliers among 100 measurements (18.00%)
  2 (2.00%) low severe
  1 (1.00%) low mild
  12 (12.00%) high mild
  3 (3.00%) high severe

convert_rows 4096 string view(1..100, 0)
                        time:   [51.030 µs 51.123 µs 51.216 µs]
                        change: [+1.7675% +2.0033% +2.2338%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 4 outliers among 100 measurements (4.00%)
  1 (1.00%) low mild
  3 (3.00%) high mild

convert_rows_parsed 4096 string view(1..100, 0)
                        time:   [75.340 µs 75.442 µs 75.553 µs]
                        change: [+38.508% +38.931% +39.346%] (p = 0.00 < 0.05)
                        Performance has regressed.

convert_rows 4096 string view(1..100, 0.5)
                        time:   [37.833 µs 37.877 µs 37.921 µs]
                        change: [−0.3127% −0.0730% +0.1526%] (p = 0.56 > 0.05)
                        No change in performance detected.
Found 4 outliers among 100 measurements (4.00%)
  1 (1.00%) low mild
  3 (3.00%) high mild

convert_rows_parsed 4096 string view(1..100, 0.5)
                        time:   [50.394 µs 50.441 µs 50.489 µs]
                        change: [+25.642% +25.928% +26.215%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 6 outliers among 100 measurements (6.00%)
  1 (1.00%) low severe
  4 (4.00%) low mild
  1 (1.00%) high severe

might need to think more on this

Comment thread arrow-row/src/variable.rs Outdated
// truncate inline string in values buffer if validate_utf8 is false
if !validate_utf8 && decoded_len <= inline_str_max_len {
if VALIDATE_UTF8 {
view_utf8_validation_buffer.extend_from_slice(val);

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh wait i think i only need to append this val in the len check below 🤦

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed; that was an embarrassing mistake, thankfully the new benchmarks helped catch it 🙏

@Jefffrey

Jefffrey commented Jul 2, 2026

Copy link
Copy Markdown
Contributor Author

run benchmark row_format
env:
BENCH_FILTER: convert_rows.*string\sview

Comment thread arrow-row/src/variable.rs Outdated
// truncate inline string in values buffer if validate_utf8 is false
if !validate_utf8 && decoded_len <= inline_str_max_len {
if VALIDATE_UTF8 {
view_utf8_validation_buffer.extend_from_slice(val);

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed; that was an embarrassing mistake, thankfully the new benchmarks helped catch it 🙏

Comment thread arrow-row/src/variable.rs
Comment on lines -342 to +340
let mut views = BufferBuilder::<u128>::new(len);
for row in rows {
let mut views = vec![0_u128; len];
for (i, row) in rows.iter_mut().enumerate() {

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For good measure i also removed bufferbuilder in favour of a vec 🚀

@Jefffrey

Jefffrey commented Jul 2, 2026

Copy link
Copy Markdown
Contributor Author

run benchmark row_format
env:
BENCH_FILTER: convert_rows.*string\sview

@adriangbot

Copy link
Copy Markdown

🤖 Arrow criterion benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4867056569-809-c989r 6.12.85+ #1 SMP Mon May 11 08:17:35 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing row-short-utf8-validation (f520cc0) to d969025 (merge-base) diff
BENCH_NAME=row_format
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental,object_store --bench row_format
BENCH_FILTER=convert_rows.*string\sview
Results will be posted here when complete


File an issue against this benchmark runner

@adriangbot

Copy link
Copy Markdown

🤖 Arrow criterion benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4867072905-810-kw6cr 6.12.85+ #1 SMP Mon May 11 08:17:35 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing row-short-utf8-validation (f520cc0) to d969025 (merge-base) diff
BENCH_NAME=row_format
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental,object_store --bench row_format
BENCH_FILTER=convert_rows.*string\sview
Results will be posted here when complete


File an issue against this benchmark runner

@adriangbot

Copy link
Copy Markdown

🤖 Arrow criterion benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected
Details

group                                                main                                   row-short-utf8-validation
-----                                                ----                                   -------------------------
convert_rows 4096 string view(1..100, 0)             1.21     78.4±0.29µs        ? ?/sec    1.00     65.0±0.11µs        ? ?/sec
convert_rows 4096 string view(1..100, 0.5)           1.67     67.9±0.37µs        ? ?/sec    1.00     40.7±0.07µs        ? ?/sec
convert_rows 4096 string view(10, 0)                 1.02     43.2±0.16µs        ? ?/sec    1.00     42.2±0.30µs        ? ?/sec
convert_rows 4096 string view(100, 0)                1.00     95.8±0.19µs        ? ?/sec    1.02     98.0±0.13µs        ? ?/sec
convert_rows 4096 string view(100, 0.5)              1.10     61.9±0.12µs        ? ?/sec    1.00     56.3±0.11µs        ? ?/sec
convert_rows 4096 string view(30, 0)                 1.00     57.9±0.07µs        ? ?/sec    1.00     58.2±0.07µs        ? ?/sec
convert_rows_parsed 4096 string view(1..100, 0)      1.00     86.1±0.38µs        ? ?/sec    1.04     89.2±0.34µs        ? ?/sec
convert_rows_parsed 4096 string view(1..100, 0.5)    1.01     71.3±0.35µs        ? ?/sec    1.00     70.6±0.50µs        ? ?/sec
convert_rows_parsed 4096 string view(10, 0)          1.00     43.9±0.25µs        ? ?/sec    1.13     49.7±0.11µs        ? ?/sec
convert_rows_parsed 4096 string view(100, 0)         1.02    114.5±0.82µs        ? ?/sec    1.00    112.4±0.20µs        ? ?/sec
convert_rows_parsed 4096 string view(100, 0.5)       1.00     69.1±0.15µs        ? ?/sec    1.00     68.9±0.22µs        ? ?/sec
convert_rows_parsed 4096 string view(30, 0)          1.00     62.0±0.07µs        ? ?/sec    1.01     62.7±0.63µs        ? ?/sec

Resource Usage

base (merge-base)

Metric Value
Wall time 125.0s
Peak memory 12.0 MiB
Avg memory 8.2 MiB
CPU user 118.7s
CPU sys 0.0s
Peak spill 0 B

branch

Metric Value
Wall time 120.0s
Peak memory 12.1 MiB
Avg memory 8.7 MiB
CPU user 116.1s
CPU sys 0.0s
Peak spill 0 B

File an issue against this benchmark runner

@adriangbot

Copy link
Copy Markdown

🤖 Arrow criterion benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected
Details

group                                                main                                   row-short-utf8-validation
-----                                                ----                                   -------------------------
convert_rows 4096 string view(1..100, 0)             1.21     79.0±0.26µs        ? ?/sec    1.00     65.4±0.06µs        ? ?/sec
convert_rows 4096 string view(1..100, 0.5)           1.67     67.9±0.33µs        ? ?/sec    1.00     40.8±0.05µs        ? ?/sec
convert_rows 4096 string view(10, 0)                 1.02     42.8±0.10µs        ? ?/sec    1.00     41.8±0.04µs        ? ?/sec
convert_rows 4096 string view(100, 0)                1.00     96.0±0.14µs        ? ?/sec    1.02     98.2±0.26µs        ? ?/sec
convert_rows 4096 string view(100, 0.5)              1.11     61.9±0.12µs        ? ?/sec    1.00     55.9±0.10µs        ? ?/sec
convert_rows 4096 string view(30, 0)                 1.00     57.9±0.05µs        ? ?/sec    1.00     58.1±0.06µs        ? ?/sec
convert_rows_parsed 4096 string view(1..100, 0)      1.00     85.7±0.39µs        ? ?/sec    1.04     89.1±0.44µs        ? ?/sec
convert_rows_parsed 4096 string view(1..100, 0.5)    1.01     71.3±0.39µs        ? ?/sec    1.00     70.6±0.35µs        ? ?/sec
convert_rows_parsed 4096 string view(10, 0)          1.00     43.7±0.04µs        ? ?/sec    1.14     49.7±0.05µs        ? ?/sec
convert_rows_parsed 4096 string view(100, 0)         1.00    110.3±0.19µs        ? ?/sec    1.02    112.0±0.10µs        ? ?/sec
convert_rows_parsed 4096 string view(100, 0.5)       1.00     69.0±0.15µs        ? ?/sec    1.00     69.0±0.18µs        ? ?/sec
convert_rows_parsed 4096 string view(30, 0)          1.00     62.0±0.04µs        ? ?/sec    1.00     61.9±0.05µs        ? ?/sec

Resource Usage

base (merge-base)

Metric Value
Wall time 125.0s
Peak memory 12.0 MiB
Avg memory 7.9 MiB
CPU user 118.7s
CPU sys 0.0s
Peak spill 0 B

branch

Metric Value
Wall time 120.0s
Peak memory 13.3 MiB
Avg memory 8.8 MiB
CPU user 116.1s
CPU sys 0.0s
Peak spill 0 B

File an issue against this benchmark runner

@Jefffrey

Jefffrey commented Jul 2, 2026

Copy link
Copy Markdown
Contributor Author

the slowdown in convert_rows_parsed 4096 string view(10, 0) i think is because this is a pathological case, where since all the strings are inlined, we are stuck in a loop of

  1. decode into values buffer
  2. copy from values buffer into view_utf8_validation_buffer
  3. truncate values buffer
  4. repeat

in that values is an unnecessary intermediary in this copy

i could try optimize it for the case where if we have more short strings than long strings (based on some ratio of values_capacity to inline_capacity) we could just copy directly into view_utf8_validation_buffer first then copy out the long strings into values buffer (essentially reversing the copy order), but not sure if its worth the complexity 🤔

@alamb

alamb commented Jul 2, 2026

Copy link
Copy Markdown
Contributor

I think now that validate_utf8 is a somewhat rare code path in the Rows code as it happens after externalizing the Rows to bytes and then parsing them back (I understand this after messing with #10259)

Since most systems are likely going to be parsing row data back from trusted sources (itself, for example) maybe we could just offer an unsafe API on RowParser to simply trust the data (and skip the utf8 check entirely)

This will probably make things faster even than this optimized utf8 check 🤔

@Jefffrey

Jefffrey commented Jul 3, 2026

Copy link
Copy Markdown
Contributor Author

I think now that validate_utf8 is a somewhat rare code path in the Rows code as it happens after externalizing the Rows to bytes and then parsing them back (I understand this after messing with #10259)

Since most systems are likely going to be parsing row data back from trusted sources (itself, for example) maybe we could just offer an unsafe API on RowParser to simply trust the data (and skip the utf8 check entirely)

This will probably make things faster even than this optimized utf8 check 🤔

@Jefffrey Jefffrey merged commit 8544614 into apache:main Jul 5, 2026
14 checks passed
@Jefffrey Jefffrey deleted the row-short-utf8-validation branch July 5, 2026 00:46
@Jefffrey

Jefffrey commented Jul 5, 2026

Copy link
Copy Markdown
Contributor Author

thanks @alamb

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

arrow Changes to the arrow crate performance

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Improve arrow-row --> StringView/BinaryView memory usage

3 participants