Add validated row decode benchmark#10259
Conversation
345e692 to
d972932
Compare
| .collect(); | ||
| c.bench_function(&format!("convert_rows_validated {name}"), |b| { | ||
| b.iter(|| { | ||
| hint::black_box( |
Jefffrey
left a comment
There was a problem hiding this comment.
this is a good catch; didn't realize we needed this roundabout way of enabling validation
one concern i have here is this will add this new bench for non-string types creating more benchmark noise (and i think it ignores nulls too so the null config benches also get extra noise because of this)
probably want to config this so it only runs for string types (and probably just for the 0.0 null density case) 🤔
| // back into Arrow arrays by RowConverter::convert_rows. | ||
| let parsed_rows: Vec<_> = binary_rows | ||
| .iter() | ||
| .flatten() |
There was a problem hiding this comment.
does this mean nulls are ignored in this bench?
There was a problem hiding this comment.
Good call -- I fixed that by not converting into binary first
Yeah -- it took me a while too -- I think the idea is that Rows are assumed to be valid when prodiced by a row converter. It is only when they may have come from somewhere else (e.g. a spill file) they need to be validated
|
|
thanks @alamb |

Which issue does this PR close?
Rationale for this change
The existing row format benchmark measures
convert_rowsusing rows created directly byRowConverter::convert_columns, which skips UTF-8 validation during decode.What changes are included in this PR?
This adds a
convert_rows_validatedbenchmark that decodes rows parsed throughRowParser, exercising the UTF-8 validation path. It also derivesCloneforRowsso the benchmark setup can reuse the prepared rows when converting to binary.Are these changes tested?
CI.
I also ran it locally like
Are there any user-facing changes?
No breaking changes.
Rowsnow implementsClone.