Perf: Pre-size buffer allocations to avoid intermediate allocations#10262
Perf: Pre-size buffer allocations to avoid intermediate allocations#10262Rich-T-kid wants to merge 3 commits into
Conversation
|
pretty big descriptions for a (+3,-2) PR 😅 |
Jefffrey
left a comment
There was a problem hiding this comment.
pretty big descriptions for a (+3,-2) PR 😅
you love to see it 🙂
|
run benchmark flight |
|
🤖 Arrow criterion benchmark running (GKE) | trigger CPU Details (lscpu)Comparing rich-T-kid/re-use-buffers (53f1c95) to 32bba5a (merge-base) diff File an issue against this benchmark runner |
|
🤖 Arrow criterion benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usagebase (merge-base)
branch
File an issue against this benchmark runner |
|
Updating to get the "use single threaded executor" in the benchmarks |
|
run benchmark flight |
1 similar comment
|
run benchmark flight |
|
Running twice to make sure results are reproducable |
|
🤖 Arrow criterion benchmark running (GKE) | trigger CPU Details (lscpu)Comparing rich-T-kid/re-use-buffers (7197cd2) to 8c7df18 (merge-base) diff File an issue against this benchmark runner |
|
🤖 Arrow criterion benchmark running (GKE) | trigger CPU Details (lscpu)Comparing rich-T-kid/re-use-buffers (7197cd2) to 8c7df18 (merge-base) diff File an issue against this benchmark runner |
|
🤖 Arrow criterion benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usagebase (merge-base)
branch
File an issue against this benchmark runner |
|
🤖 Arrow criterion benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usagebase (merge-base)
branch
File an issue against this benchmark runner |
81783c4 to
a44192f
Compare
|
run benchmark flight |
|
🤖 Arrow criterion benchmark running (GKE) | trigger CPU Details (lscpu)Comparing rich-T-kid/re-use-buffers (a44192f) to 8c7df18 (merge-base) diff File an issue against this benchmark runner |
|
🤖 Arrow criterion benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usagebase (merge-base)
branch
File an issue against this benchmark runner |
|
run benchmark flight |
|
🤖 Arrow criterion benchmark running (GKE) | trigger CPU Details (lscpu)Comparing rich-T-kid/re-use-buffers (a44192f) to 8c7df18 (merge-base) diff File an issue against this benchmark runner |
|
🤖 Arrow criterion benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usagebase (merge-base)
branch
File an issue against this benchmark runner |
|
with 🤔 benchmarks aren't as strong as id expect but the amount of data being worked with isn't huge. Seems to be mostly noise due to |
Which issue does this PR close?
Rationale for this change
TLDR: Its useful to pre-allocate vectors when you know the amount of data it will require
When
IpcDataGeneratoruses theIpcBodySink::Writevariant, record batch buffer bytes are written directly into aVec. If thatVecis undersized, it repeatedly reallocates and copies bytes into a larger buffer, growing exponentially (1, 4, 16, 32 ... KB ... MB) and paying two costs on each reallocation:For large batches this cascade is expensive, and paying it fresh on every record batch chunk compounds the problem further. Since
FlightDataEncoder::split_batch_for_grpc_responsesplits record batches into roughly equal-sized chunks, we exploit this by using the previous buffer's final capacity as an estimate for the next call, keeping a correctly-sizedVecalive across iterations and avoiding repeated reallocation on the hot path.why not pre-allocate the buffers using an estimate with the length
split_batch_for_grpc_responseuses?Using the final capacity rather than the uncompressed dictionary size is intentional, since IPC encoding and compression both affect the actual bytes written, the final capacity naturally adapts to whatever encoding and compression settings are in effect rather than consistently overprovisioning.
What changes are included in this PR?
Are these changes tested?
n/a
Are there any user-facing changes?
no