bench(parquet): add `ListArray` benchmarks for runtime and peak memory by HippoBaro · Pull Request #9846 · apache/arrow-rs

HippoBaro · 2026-04-28T22:48:45Z

Which issue does this PR close?

Contributes to Column performance: run-proportional read/write cost #9731
Dependency of feat(parquet): selective null padding for list child readers #9848

Rationale for this change

Existing benchmarks have some gaps in the types of columns they exercise. Additionally, I would like to improve the memory efficiency of the read/decode path in terms of RSS requirements, especially for sparse inputs and we currently do not have any infrastructure to measure that.

What changes are included in this PR?

Extend the existing arrow_reader runtime benchmarks with Int32 and FixedBinary32 list columns alongside the existing StringList, with parameterized null density (0%, 50%, 90%, 99%). The prior benchmarks only covered string lists, which didn't surface costs specific to fixed-width and primitive element types.

Add a new arrow_reader_peak_memory benchmark that measures peak heap usage during ListArrayReader::consume_batch using a thread-local tracking allocator. It captures how RSS-efficient we are when materializing a column into its final Arrow in-memory representation.

Are these changes tested?

All tests passing.

Are there any user-facing changes?

None.

HippoBaro · 2026-04-28T23:13:48Z

This PR adds benchmarks with a custom Measurement. In this case, we measure RSS and cumulative memory usage (target arrow_reader_peak_memory).

I am not sure if the benchmark bot will like those new units:

arrow_array_reader/ListArray_peak_memory/Int32List/no NULLs
                        time:   [836.51 KiB 836.51 KiB 836.51 KiB]
arrow_array_reader/ListArray_peak_memory/Int32List/half NULLs
                        time:   [482.01 KiB 482.01 KiB 482.01 KiB]
arrow_array_reader/ListArray_peak_memory/Int32List/90pct NULLs
                        time:   [271.95 KiB 271.95 KiB 271.95 KiB]
arrow_array_reader/ListArray_peak_memory/Int32List/99pct NULLs
                        time:   [217.96 KiB 217.96 KiB 217.96 KiB]

etseidl · 2026-05-06T22:02:37Z

    let schema = build_test_schema();

    let mut group = c.benchmark_group("arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Float16Array");
-    let mandatory_f16_leaf_desc = schema.column(17);


One quick ask: could you move the new schema elements to the bottom so we don't have to renumber everything? 🙏 😅

Will do! 🙇

etseidl · 2026-05-06T22:35:54Z

Not your problem, but I find it funny that the peak memory test prints out:

arrow_array_reader/ListArray_allocated_bytes/Int32List/no NULLs
                        time:   [10.458 MiB 10.458 MiB 10.458 MiB]
arrow_array_reader/ListArray_allocated_bytes/Int32List/half NULLs
                        time:   [5.9797 MiB 5.9797 MiB 5.9797 MiB]

I looked at the Criterion source and "time:" is sadly hard coded :(

etseidl

Looks legit to me. Thanks @HippoBaro.

Extend the existing `arrow_reader` runtime benchmarks with `Int32` and `FixedBinary32` list columns alongside the existing `StringList`, with parameterized null density (0%, 50%, 90%, 99%). The prior benchmarks only covered string lists, which didn't surface costs specific to fixed-width and primitive element types. Add a new `arrow_reader_peak_memory` benchmark that measures peak heap usage during `ListArrayReader::consume_batch` using a thread-local tracking allocator. It captures how RSS-efficient we are when materializing a column into its final Arrow in-memory representation. Signed-off-by: Hippolyte Barraud <hippolyte.barraud@datadoghq.com>

HippoBaro · 2026-05-07T02:11:24Z

Thanks @etseidl! Sorry for the spurious diff. It's rebased and fixed.

I looked at the Criterion source and "time:" is sadly hard coded :(

I also tried to change that 😅 It’s also quite sad that Criterion won’t let me measure multiple things per kernel. In essence, these tests are not any different from the CPU-equivalent ones, they just measure a different thing 🤷

alamb · 2026-05-07T14:18:47Z

The MSRV failure is unrelated to this PR. For more details:

msrv check failing on main due to tonic@0.14.6 #9938

alamb · 2026-05-07T18:03:38Z

Merged up to get a clean CI run

apache#9846) # Which issue does this PR close?  - Contributes to apache#9731 - Dependency of apache#9848 # Rationale for this change See apache#9848 Existing benchmarks have some gaps in the types of columns they exercise. Additionally, I would like to improve the memory efficiency of the read/decode path in terms of RSS requirements, especially for sparse inputs and we currently do not have any infrastructure to measure that. # What changes are included in this PR? Extend the existing `arrow_reader` runtime benchmarks with `Int32` and `FixedBinary32` list columns alongside the existing `StringList`, with parameterized null density (0%, 50%, 90%, 99%). The prior benchmarks only covered string lists, which didn't surface costs specific to fixed-width and primitive element types. Add a new `arrow_reader_peak_memory` benchmark that measures peak heap usage during `ListArrayReader::consume_batch` using a thread-local tracking allocator. It captures how RSS-efficient we are when materializing a column into its final Arrow in-memory representation. # Are these changes tested? All tests passing. # Are there any user-facing changes? None. Signed-off-by: Hippolyte Barraud <hippolyte.barraud@datadoghq.com> Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

# Which issue does this PR close?  - Contributes to #9731 - Depend on #9847 - Depend on #9846 # Rationale for this change  Parquet list decoding currently materializes padding for null or empty parent lists and then copies the child array to filter that padding back out. This is expensive for nested list columns, especially sparse lists and fixed-width children, where memory can scale with decoded levels instead of actual emitted child values. # What changes are included in this PR?  This PR makes list child readers emit compact child arrays directly by pushing selective null padding into the leaf `RecordReader`. It also builds definition-level validity bitmaps word-at-a-time, sizes child buffers after levels are decoded, and adds list runtime and peak-memory benchmarks across element types and null densities. # Are these changes tested?  Extensive test coverage for the new logic through existing list reader tests, which now exercise the production `PrimitiveArrayReader` path with in-memory Parquet pages (see #9846), and `BooleanBufferBuilder::append_word` has targeted unit coverage. Benchmarks results: ``` Name Before After Delta ListArray/StringList/no NULLs 7.3395 ms 5.8001 ms (-21.0%) ListArray/StringList/half NULLs 4.1255 ms 3.4274 ms (-16.9%) ListArray/Int32List/90pct NULLs 1.0366 ms 975.63 us (-5.9%) ListArray/Fixed32List/no NULLs 4.9298 ms 3.3137 ms (-32.8%) ListArray/Fixed32List/half NULLs 2.8998 ms 2.7258 ms (-6.0%) ListArray/Fixed32List/90pct NULLs 1.0556 ms 995.82 us (-5.7%) ListArray/Fixed32List/99pct NULLs 508.65 us 467.16 us (-8.2%) Name Before After Delta ListArray_peak_memory/Int32List/no NULLs 836.51 KiB 574.79 KiB (-31.3%) ListArray_peak_memory/Int32List/half NULLs 482.01 KiB 336.29 KiB (-30.2%) ListArray_peak_memory/Int32List/90pct NULLs 271.95 KiB 175.32 KiB (-35.5%) ListArray_peak_memory/Int32List/99pct NULLs 217.96 KiB 120.04 KiB (-44.9%) ListArray_peak_memory/DoubleList/no NULLs 1.2399 MiB 715.31 KiB (-43.7%) ListArray_peak_memory/DoubleList/half NULLs 753.66 KiB 400.39 KiB (-46.9%) ListArray_peak_memory/DoubleList/90pct NULLs 380.62 KiB 190.89 KiB (-49.8%) ListArray_peak_memory/DoubleList/99pct NULLs 315.21 KiB 121.61 KiB (-61.4%) ListArray_peak_memory/Fixed32List/no NULLs 3.8031 MiB 1.5760 MiB (-58.6%) ListArray_peak_memory/Fixed32List/half NULLs 2.1710 MiB 849.94 KiB (-61.8%) ListArray_peak_memory/Fixed32List/90pct NULLs 1.0017 MiB 277.35 KiB (-73.0%) ListArray_peak_memory/Fixed32List/99pct NULLs 898.69 KiB 130.93 KiB (-85.4%) ListArray_peak_memory/StringList/no NULLs 3.7925 MiB 2.4715 MiB (-34.8%) ListArray_peak_memory/StringList/half NULLs 1.2541 MiB 772.94 KiB (-39.8%) ListArray_peak_memory/StringList/90pct NULLs 296.63 KiB 188.96 KiB (-36.3%) ListArray_peak_memory/StringList/99pct NULLs 226.75 KiB 120.37 KiB (-46.9%) Name Before After Delta ListArray_allocated_bytes/Int32List/no NULLs 10.458 MiB 6.8018 MiB (-35.0%) ListArray_allocated_bytes/Int32List/half NULLs 5.9797 MiB 4.0127 MiB (-32.9%) ListArray_allocated_bytes/Int32List/90pct NULLs 2.9985 MiB 1.8210 MiB (-39.3%) ListArray_allocated_bytes/Int32List/99pct NULLs 2.5579 MiB 1.3733 MiB (-46.3%) ListArray_allocated_bytes/DoubleList/no NULLs 16.083 MiB 8.6546 MiB (-46.2%) ListArray_allocated_bytes/DoubleList/half NULLs 8.9134 MiB 4.8497 MiB (-45.6%) ListArray_allocated_bytes/DoubleList/90pct NULLs 4.3656 MiB 2.0179 MiB (-53.8%) ListArray_allocated_bytes/DoubleList/99pct NULLs 3.7482 MiB 1.3903 MiB (-62.9%) ListArray_allocated_bytes/Fixed32List/no NULLs 49.441 MiB 19.505 MiB (-60.5%) ListArray_allocated_bytes/Fixed32List/half NULLs 26.846 MiB 10.459 MiB (-61.0%) ListArray_allocated_bytes/Fixed32List/90pct NULLs 12.483 MiB 3.1127 MiB (-75.1%) ListArray_allocated_bytes/Fixed32List/99pct NULLs 10.895 MiB 1.4980 MiB (-86.3%) ListArray_allocated_bytes/StringList/no NULLs 47.519 MiB 21.743 MiB (-54.2%) ListArray_allocated_bytes/StringList/half NULLs 19.097 MiB 10.478 MiB (-45.1%) ListArray_allocated_bytes/StringList/90pct NULLs 3.4203 MiB 2.1165 MiB (-38.1%) ListArray_allocated_bytes/StringList/99pct NULLs 2.6424 MiB 1.3777 MiB (-47.9%) ``` # Are there any user-facing changes?  None. --------- Signed-off-by: Hippolyte Barraud <hippolyte.barraud@datadoghq.com>

github-actions Bot added the parquet Changes to the parquet crate label Apr 28, 2026

HippoBaro mentioned this pull request Apr 29, 2026

feat(parquet): selective null padding for list child readers #9848

Merged

etseidl reviewed May 6, 2026

View reviewed changes

etseidl approved these changes May 6, 2026

View reviewed changes

HippoBaro force-pushed the unpadded_child_mode_bench branch from 45bfbf2 to 989c468 Compare May 7, 2026 02:08

Merge branch 'main' into unpadded_child_mode_bench

172a665

alamb merged commit 7abb225 into apache:main May 7, 2026
17 checks passed

alamb mentioned this pull request Jun 3, 2026

msrv check failing on main due to tonic@0.14.6 #9938

Closed

HippoBaro mentioned this pull request Jun 5, 2026

Pluggable page spilling API for the Parquet ArrowWriter (PageStore) #10020

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

bench(parquet): add `ListArray` benchmarks for runtime and peak memory#9846

bench(parquet): add `ListArray` benchmarks for runtime and peak memory#9846
alamb merged 2 commits into
apache:mainfrom
HippoBaro:unpadded_child_mode_bench

HippoBaro commented Apr 28, 2026 •

edited

Loading

Uh oh!

HippoBaro commented Apr 28, 2026

Uh oh!

etseidl May 6, 2026

Uh oh!

HippoBaro May 6, 2026

Uh oh!

etseidl commented May 6, 2026

Uh oh!

etseidl left a comment

Uh oh!

HippoBaro commented May 7, 2026

Uh oh!

alamb commented May 7, 2026

Uh oh!

alamb commented May 7, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

HippoBaro commented Apr 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

HippoBaro commented Apr 28, 2026

Uh oh!

etseidl May 6, 2026

Choose a reason for hiding this comment

Uh oh!

HippoBaro May 6, 2026

Choose a reason for hiding this comment

Uh oh!

etseidl commented May 6, 2026

Uh oh!

etseidl left a comment

Choose a reason for hiding this comment

Uh oh!

HippoBaro commented May 7, 2026

Uh oh!

alamb commented May 7, 2026

Uh oh!

alamb commented May 7, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

HippoBaro commented Apr 28, 2026 •

edited

Loading