feat(arrow-ipc): add sans-IO stream encoder#10277
Open
Phoenix500526 wants to merge 1 commit into
Open
Conversation
Introduce StreamEncoder for IPC streaming without requiring a std::io::Write sink. The encoder owns stream state, emits ordered Buffer chunks, and preserves the low-copy path for uncompressed record batch body buffers. Add byte-for-byte compatibility tests against StreamWriter for normal batches, empty streams, and dictionary batches. Closes apache#7812 Signed-off-by: Jiawei Zhao <Phoenix500526@163.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Which issue does this PR close?
Rationale for this change
StreamWriter currently requires a std::io::Write sink, which is awkward for async or chunk-oriented destinations such as object stores. This PR adds a sans-IO IPC stream encoder so callers can encode Arrow IPC stream data into ordered Buffer chunks and send those chunks through their own IO layer.
What changes are included in this PR?
This PR adds StreamEncoder, a stateful IPC stream encoder that:
Are these changes tested?
Yes.
Added tests compare StreamEncoder output byte-for-byte with StreamWriter output for:
a normal record batch stream
an empty stream
a stream containing dictionary batches
Are there any user-facing changes?
Yes. This adds a new public arrow_ipc::writer::StreamEncoder API.
There are no breaking changes. Existing StreamWriter behavior is unchanged.