Commit 2ddd2be
Tom McCormick
Add comprehensive ORC batching tests demonstrating stripe size, batch size, and compression interactions.
Tests show ORC batching is based on stripes (like Parquet row groups), with near-perfect 1:1 mapping achievable using large stripe sizes (2-5MB) and hard-to-compress data, achieving 0.91-0.97 ratios between stripe size and actual file size.1 parent 71143f6 commit 2ddd2be
1 file changed
Lines changed: 1257 additions & 63 deletions
0 commit comments