[SPARK-57735][SQL] Support nanosecond-precision timestamp types in the in-memory columnar cache#56842
[SPARK-57735][SQL] Support nanosecond-precision timestamp types in the in-memory columnar cache#56842viirya wants to merge 4 commits into
Conversation
…e in-memory columnar cache ### What changes were proposed in this pull request? The default in-memory columnar cache serializer (`DefaultCachedBatchSerializer`) did not support `TimestampNTZNanosType` / `TimestampLTZNanosType`. Caching a DataFrame with such a column failed at materialization with `not support type: TimestampNTZNanosType(9)`, because none of the cache's type-dispatch sites had a case for them. This adds full support, following the fixed-width multi-field pattern already used by `CalendarInterval`. The physical value `TimestampNanosVal` is a fixed 16-byte payload (an 8-byte epochMicros plus an 8-byte word holding nanosWithinMicro), so it maps cleanly onto that pattern: - `ColumnType`: a `TIMESTAMP_NANOS` column type (with `TIMESTAMP_NTZ_NANOS` / `TIMESTAMP_LTZ_NANOS` singletons) whose `append`/`extract` read and write the 16-byte payload, with a `MutableUnsafeRow` direct-copy fast path. - `ColumnBuilder`, `ColumnAccessor`: builder and accessor classes and dispatch cases. - `ColumnStats`: a `TimestampNanosColumnStats` collector (fixed size, no min/max bounds). - `GenerateColumnAccessor`: the codegen accessor-class selection and initialization branch. NTZ and LTZ share the same storage and differ only by physical type and row getter/setter, so the encode/decode logic is shared. ### Why are the changes needed? Nanosecond-precision timestamp types are otherwise unsupported by the cache, so `df.cache()` on a column of these types throws. With this change such DataFrames cache and read back correctly. ### Does this PR introduce _any_ user-facing change? Yes. Previously, caching a DataFrame containing a `TIMESTAMP_NTZ(p)` / `TIMESTAMP_LTZ(p)` column with `p` in the nanosecond range threw `not support type`. Now it caches and reads back the values, including sub-microsecond precision. ### How was this patch tested? - `ColumnTypeSuite`: append/extract round-trip for `TIMESTAMP_NTZ_NANOS` and `TIMESTAMP_LTZ_NANOS` (random values), plus `defaultSize` checks. - `InMemoryColumnarQuerySuite`: an end-to-end cache roundtrip for both nanos types, with the vectorized reader both on and off, covering sub-microsecond precision and null values. ### Was this patch authored or co-authored using generative AI tooling? Generated-by: Claude Code Co-authored-by: Claude Code
| withSQLConf(SQLConf.TIMESTAMP_NANOS_TYPES_ENABLED.key -> "true") { | ||
| Seq("TIMESTAMP_NTZ(9)", "TIMESTAMP_LTZ(9)").foreach { typeName => | ||
| Seq("false", "true").foreach { vectorized => | ||
| withSQLConf(SQLConf.CACHE_VECTORIZED_READER_ENABLED.key -> vectorized) { |
There was a problem hiding this comment.
SQLConf.CACHE_VECTORIZED_READER_ENABLED.key=true seems to be a dead test coverage because of the following. Could you double-check, @viirya ?
There was a problem hiding this comment.
You're right, thanks. Nanosecond timestamps are non-primitive for the default cache, and DefaultCachedBatchSerializer.supportsColumnarOutput returns true only for the primitive types, so they always read back through the row path -- the CACHE_VECTORIZED_READER_ENABLED=true case exercised the same path as false. I've dropped the loop and test the single (row) path, with a comment noting why (same as CalendarInterval/Variant/Decimal).
…the nanos cache test The cache test looped over CACHE_VECTORIZED_READER_ENABLED true/false, but nanosecond timestamps are non-primitive for the default cache (DefaultCachedBatchSerializer.supportsColumnarOutput returns true only for primitive types), so they always read back through the row path regardless of that flag -- the two cases exercised the same path. Test the single (row) path and document why, matching CalendarInterval/Variant/Decimal. Co-authored-by: Claude Code
MaxGekk
left a comment
There was a problem hiding this comment.
0 blocking, 1 non-blocking, 0 nits.
Correct, complete addition that follows the fixed-width CalendarInterval cache pattern — I verified the on-buffer 16-byte layout ([epochMicros][nanosWithinMicro→long]) is byte-identical to UnsafeRow's TimestampNanosRowValues payload, so the MutableUnsafeRow direct-copy fast path matches the slow path; every row-path dispatch site is wired; no columnar/vectorized path is correctly added (non-primitive); and the tests are meaningful.
Design / architecture (1)
- ColumnStats.scala:329:
TimestampNanosColumnStatscollects no min/max bounds — see inline.
Verification
Traced the MutableUnsafeRow fast path: append writes [epochMicros:8][nanosWithinMicro.toLong:8], which equals TimestampNanosRowValues.writePayload (Platform.putLong(epochMicros) then Platform.putLong(nanosWithinMicro)), so the direct 16-byte copy and the slow path (fromTrustedRowBytes / setTimestampNanosPayload) produce identical rows; the (short) narrowing on read matches readNanosWithinMicro. Endianness is consistent (both go through Platform; codegen wraps with .order(nativeOrder)).
| Array[Any](null, null, nullCount, count, sizeInBytes) | ||
| } | ||
|
|
||
| private[columnar] final class TimestampNanosColumnStats extends ColumnStats { |
There was a problem hiding this comment.
TimestampNanosColumnStats emits null/null for lower/upper (the CalendarInterval / IntervalColumnStats pattern), so cached nanosecond-timestamp columns get no batch-level partition pruning.
The same logical type at micro precision takes a different path: TimestampType/TimestampNTZType -> LongColumnBuilder -> LongColumnStats, which collects min/max. So a range filter (WHERE ts > '...') over a cached TIMESTAMP_NTZ(6) column skips non-matching batches, while the same filter over a cached TIMESTAMP_NTZ(9) column scans every batch.
TimestampNanosVal is Comparable (its total order is calendar order), and ordered non-primitive cache types already keep bounds — DecimalColumnStats collects Decimal min/max. So tracking upper/lower as TimestampNanosVal here (modeled on DecimalColumnStats rather than IntervalColumnStats) would preserve the pruning the micro path provides.
Not a correctness issue — the feature works. Is the bounds-less choice intentional (follow CalendarInterval), or worth collecting min/max so cached nanos timestamps prune like micro timestamps?
There was a problem hiding this comment.
Good point -- collecting min/max is the right call, thanks. You're right that the bounds-less version was a regression from the micro path: TIMESTAMP_NTZ(6) prunes via LongColumnStats while TIMESTAMP_NTZ(9) scanned every batch.
Following your suggestion, TimestampNanosColumnStats now collects upper/lower as TimestampNanosVal (modeled on DecimalColumnStats rather than IntervalColumnStats), using its compareTo (which is calendar order). The pruning path is already wired for it -- TimestampNTZNanosType is an AtomicType so ExtractableLiteral extracts the literal, and PhysicalTimestampNTZNanosType defines an ordering, so the bound comparisons buildFilter generates are valid -- so cached nanos timestamps now prune like micro timestamps.
Added coverage: ColumnStatsSuite asserts the min/max bounds for both NTZ and LTZ, and PartitionBatchPruningSuite verifies a range filter over a cached nanos column reads fewer batches with in-memory partition pruning on than off (and returns the same rows as a pre-cache evaluation).
…timestamps TimestampNanosColumnStats followed the IntervalColumnStats pattern (no min/max bounds), so cached nanosecond-timestamp columns got no batch-level partition pruning -- a regression from the micro-precision path (TimestampType / TimestampNTZType -> LongColumnStats), where a range filter skips non-matching batches. TimestampNanosVal has a total order matching calendar order, and the pruning machinery is already wired for it (the type is an AtomicType so ExtractableLiteral extracts its literals, and PhysicalTimestampNTZNanosType defines an ordering, so the bound comparisons buildFilter generates are valid). Collect upper/lower as TimestampNanosVal (modeled on DecimalColumnStats), so cached nanos timestamps prune like micro timestamps. Tests: - ColumnStatsSuite: min/max bound collection for both NTZ and LTZ nanos stats. - PartitionBatchPruningSuite: a range filter over a cached nanos column reads fewer batches with in-memory partition pruning on than off, and returns the same rows as an uncached evaluation. Co-authored-by: Claude Code
dongjoon-hyun
left a comment
There was a problem hiding this comment.
+1, it looks good to me.
…n the pruning test The correctness check compared the cached + pruned read against an equivalent query built after the table was cached. The CacheManager matches by logical plan rather than DataFrame identity, so that "uncached" evaluation could itself be served from the InMemoryRelation, making the assertion compare the cache with itself. Compute the expected result before cacheTable so it cannot hit the cache. Co-authored-by: Claude Code
|
Thanks @dongjoon-hyun @MaxGekk. |
What changes were proposed in this pull request?
The default in-memory columnar cache serializer (
DefaultCachedBatchSerializer) did not supportTimestampNTZNanosType/TimestampLTZNanosType. Caching a DataFrame with such a column failed at materialization withnot support type: TimestampNTZNanosType(9), because none of the cache's type-dispatch sites had a case for them.This adds full support, following the fixed-width multi-field pattern already used by
CalendarInterval. The physical valueTimestampNanosValis a fixed 16-byte payload (an 8-byteepochMicrosplus an 8-byte word holdingnanosWithinMicro), so it maps cleanly onto that pattern:ColumnType: aTIMESTAMP_NANOScolumn type (withTIMESTAMP_NTZ_NANOS/TIMESTAMP_LTZ_NANOSsingletons) whoseappend/extractread and write the 16-byte payload, with aMutableUnsafeRowdirect-copy fast path.ColumnBuilder,ColumnAccessor: builder and accessor classes plus dispatch cases.ColumnStats: aTimestampNanosColumnStatscollector (fixed size, no min/max bounds).GenerateColumnAccessor: the codegen accessor-class selection and initialization branch.TIMESTAMP_NTZandTIMESTAMP_LTZnanos types share the same storage and differ only by physical type and row getter/setter, so the encode/decode logic is shared between them.Why are the changes needed?
Nanosecond-precision timestamp types are otherwise unsupported by the cache, so
df.cache()on a column of these types throws. With this change such DataFrames cache and read back correctly, consistent with the microsecondTIMESTAMP_NTZ/TIMESTAMPtypes which the cache already supports.Does this PR introduce any user-facing change?
Yes. Previously, caching a DataFrame containing a
TIMESTAMP_NTZ(p)/TIMESTAMP_LTZ(p)column withpin the nanosecond range threwnot support type. Now it caches and reads back the values, including sub-microsecond precision.How was this patch tested?
ColumnTypeSuite: append/extract round-trip forTIMESTAMP_NTZ_NANOSandTIMESTAMP_LTZ_NANOS(random values), plusdefaultSizechecks.InMemoryColumnarQuerySuite: an end-to-end cache roundtrip for both nanos types, with the vectorized reader both on and off, covering sub-microsecond precision and null values.Was this patch authored or co-authored using generative AI tooling?
Generated-by: Claude Code