perf(gzip): index BytesView directly in crc32_update#388
Open
mizchi wants to merge 1 commit into
Open
Conversation
crc32_update iterates the input via 'for byte in chunk' where chunk is a BytesView. That desugars through BytesView::iter + Iter::next, which allocates an iterator closure and pays a virtual dispatch per byte. A gzip_roundtrip callgrind profile attributes ~16% of total instructions to that single loop (BytesView::iter 9.80% + Iter::next 6.07%), even though the loop body is a couple of arithmetic ops and a table lookup. Pull out the backing Bytes + start offset + length once and index the raw Bytes directly. Bytes[i] is intrinsic and inlined. gzip_roundtrip bench (native, 3-run median): baseline: 178 ms patched : 162 ms (-9.0%)
Collaborator
|
For builtin, array-like data structure, |
Collaborator
|
It seems that |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
crc32_updateiterates the input viafor byte in chunkwherechunk : BytesView. That desugars throughBytesView::iter+Iter::next, which allocates an iterator closure and pays a virtual dispatch per byte. In agzip_roundtripcallgrind profile that single loop accounts for ~16% of total instructions (BytesView::iter 9.80% + Iter::next 6.07%), even though the loop body is a couple of arithmetic ops and a lookup.Switching to a direct index over the backing
Bytesremoves both the closure allocation and the iterator dispatch. The intrinsicBytes[i]is inlined.Benchmark
Scenario:
bench-async/cmd/gzip_roundtrip/main.mbt— 3 iter × 1000 chunks × 1024-byte payload, encode → pipe → decode. Linux x86_64, native build, 3-run median wall time:Callgrind self-time delta:
BytesView::iter(closure)Iter::nextcrc32_updateNet: -9.85% on this hot path, which lines up with the -9% wall-time drop.
Test results
Socket / TLS / HTTP / fs / process / websocket tests require a network interface this sandbox lacks; this patch does not touch any code on that path.