Skip to content

[SPARK-57735][SQL] Support nanosecond-precision timestamp types in the in-memory columnar cache#56842

Open
viirya wants to merge 4 commits into
apache:masterfrom
viirya:nanos-timestamp-default-cache
Open

[SPARK-57735][SQL] Support nanosecond-precision timestamp types in the in-memory columnar cache#56842
viirya wants to merge 4 commits into
apache:masterfrom
viirya:nanos-timestamp-default-cache

Conversation

@viirya

@viirya viirya commented Jun 28, 2026

Copy link
Copy Markdown
Member

What changes were proposed in this pull request?

The default in-memory columnar cache serializer (DefaultCachedBatchSerializer) did not support TimestampNTZNanosType / TimestampLTZNanosType. Caching a DataFrame with such a column failed at materialization with not support type: TimestampNTZNanosType(9), because none of the cache's type-dispatch sites had a case for them.

This adds full support, following the fixed-width multi-field pattern already used by CalendarInterval. The physical value TimestampNanosVal is a fixed 16-byte payload (an 8-byte epochMicros plus an 8-byte word holding nanosWithinMicro), so it maps cleanly onto that pattern:

  • ColumnType: a TIMESTAMP_NANOS column type (with TIMESTAMP_NTZ_NANOS / TIMESTAMP_LTZ_NANOS singletons) whose append/extract read and write the 16-byte payload, with a MutableUnsafeRow direct-copy fast path.
  • ColumnBuilder, ColumnAccessor: builder and accessor classes plus dispatch cases.
  • ColumnStats: a TimestampNanosColumnStats collector (fixed size, no min/max bounds).
  • GenerateColumnAccessor: the codegen accessor-class selection and initialization branch.

TIMESTAMP_NTZ and TIMESTAMP_LTZ nanos types share the same storage and differ only by physical type and row getter/setter, so the encode/decode logic is shared between them.

Why are the changes needed?

Nanosecond-precision timestamp types are otherwise unsupported by the cache, so df.cache() on a column of these types throws. With this change such DataFrames cache and read back correctly, consistent with the microsecond TIMESTAMP_NTZ / TIMESTAMP types which the cache already supports.

Does this PR introduce any user-facing change?

Yes. Previously, caching a DataFrame containing a TIMESTAMP_NTZ(p) / TIMESTAMP_LTZ(p) column with p in the nanosecond range threw not support type. Now it caches and reads back the values, including sub-microsecond precision.

How was this patch tested?

  • ColumnTypeSuite: append/extract round-trip for TIMESTAMP_NTZ_NANOS and TIMESTAMP_LTZ_NANOS (random values), plus defaultSize checks.
  • InMemoryColumnarQuerySuite: an end-to-end cache roundtrip for both nanos types, with the vectorized reader both on and off, covering sub-microsecond precision and null values.

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Code

…e in-memory columnar cache

### What changes were proposed in this pull request?

The default in-memory columnar cache serializer (`DefaultCachedBatchSerializer`)
did not support `TimestampNTZNanosType` / `TimestampLTZNanosType`. Caching a
DataFrame with such a column failed at materialization with
`not support type: TimestampNTZNanosType(9)`, because none of the cache's
type-dispatch sites had a case for them.

This adds full support, following the fixed-width multi-field pattern already
used by `CalendarInterval`. The physical value `TimestampNanosVal` is a fixed
16-byte payload (an 8-byte epochMicros plus an 8-byte word holding
nanosWithinMicro), so it maps cleanly onto that pattern:

- `ColumnType`: a `TIMESTAMP_NANOS` column type (with `TIMESTAMP_NTZ_NANOS` /
  `TIMESTAMP_LTZ_NANOS` singletons) whose `append`/`extract` read and write the
  16-byte payload, with a `MutableUnsafeRow` direct-copy fast path.
- `ColumnBuilder`, `ColumnAccessor`: builder and accessor classes and dispatch
  cases.
- `ColumnStats`: a `TimestampNanosColumnStats` collector (fixed size, no
  min/max bounds).
- `GenerateColumnAccessor`: the codegen accessor-class selection and
  initialization branch.

NTZ and LTZ share the same storage and differ only by physical type and row
getter/setter, so the encode/decode logic is shared.

### Why are the changes needed?

Nanosecond-precision timestamp types are otherwise unsupported by the cache, so
`df.cache()` on a column of these types throws. With this change such DataFrames
cache and read back correctly.

### Does this PR introduce _any_ user-facing change?

Yes. Previously, caching a DataFrame containing a `TIMESTAMP_NTZ(p)` /
`TIMESTAMP_LTZ(p)` column with `p` in the nanosecond range threw
`not support type`. Now it caches and reads back the values, including
sub-microsecond precision.

### How was this patch tested?

- `ColumnTypeSuite`: append/extract round-trip for `TIMESTAMP_NTZ_NANOS` and
  `TIMESTAMP_LTZ_NANOS` (random values), plus `defaultSize` checks.
- `InMemoryColumnarQuerySuite`: an end-to-end cache roundtrip for both nanos
  types, with the vectorized reader both on and off, covering sub-microsecond
  precision and null values.

### Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Code

Co-authored-by: Claude Code
withSQLConf(SQLConf.TIMESTAMP_NANOS_TYPES_ENABLED.key -> "true") {
Seq("TIMESTAMP_NTZ(9)", "TIMESTAMP_LTZ(9)").foreach { typeName =>
Seq("false", "true").foreach { vectorized =>
withSQLConf(SQLConf.CACHE_VECTORIZED_READER_ENABLED.key -> vectorized) {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SQLConf.CACHE_VECTORIZED_READER_ENABLED.key=true seems to be a dead test coverage because of the following. Could you double-check, @viirya ?

override def supportsColumnarOutput(schema: StructType): Boolean = schema.fields.forall(f =>
f.dataType match {
// More types can be supported, but this is to match the original implementation that
// only supported primitive types "for ease of review"
case BooleanType | ByteType | ShortType | IntegerType | LongType |
FloatType | DoubleType => true
case _ => false
})

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, thanks. Nanosecond timestamps are non-primitive for the default cache, and DefaultCachedBatchSerializer.supportsColumnarOutput returns true only for the primitive types, so they always read back through the row path -- the CACHE_VECTORIZED_READER_ENABLED=true case exercised the same path as false. I've dropped the loop and test the single (row) path, with a comment noting why (same as CalendarInterval/Variant/Decimal).

…the nanos cache test

The cache test looped over CACHE_VECTORIZED_READER_ENABLED true/false, but
nanosecond timestamps are non-primitive for the default cache
(DefaultCachedBatchSerializer.supportsColumnarOutput returns true only for
primitive types), so they always read back through the row path regardless of
that flag -- the two cases exercised the same path. Test the single (row) path
and document why, matching CalendarInterval/Variant/Decimal.

Co-authored-by: Claude Code

@MaxGekk MaxGekk left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0 blocking, 1 non-blocking, 0 nits.
Correct, complete addition that follows the fixed-width CalendarInterval cache pattern — I verified the on-buffer 16-byte layout ([epochMicros][nanosWithinMicro→long]) is byte-identical to UnsafeRow's TimestampNanosRowValues payload, so the MutableUnsafeRow direct-copy fast path matches the slow path; every row-path dispatch site is wired; no columnar/vectorized path is correctly added (non-primitive); and the tests are meaningful.

Design / architecture (1)

  • ColumnStats.scala:329: TimestampNanosColumnStats collects no min/max bounds — see inline.

Verification

Traced the MutableUnsafeRow fast path: append writes [epochMicros:8][nanosWithinMicro.toLong:8], which equals TimestampNanosRowValues.writePayload (Platform.putLong(epochMicros) then Platform.putLong(nanosWithinMicro)), so the direct 16-byte copy and the slow path (fromTrustedRowBytes / setTimestampNanosPayload) produce identical rows; the (short) narrowing on read matches readNanosWithinMicro. Endianness is consistent (both go through Platform; codegen wraps with .order(nativeOrder)).

Array[Any](null, null, nullCount, count, sizeInBytes)
}

private[columnar] final class TimestampNanosColumnStats extends ColumnStats {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TimestampNanosColumnStats emits null/null for lower/upper (the CalendarInterval / IntervalColumnStats pattern), so cached nanosecond-timestamp columns get no batch-level partition pruning.

The same logical type at micro precision takes a different path: TimestampType/TimestampNTZType -> LongColumnBuilder -> LongColumnStats, which collects min/max. So a range filter (WHERE ts > '...') over a cached TIMESTAMP_NTZ(6) column skips non-matching batches, while the same filter over a cached TIMESTAMP_NTZ(9) column scans every batch.

TimestampNanosVal is Comparable (its total order is calendar order), and ordered non-primitive cache types already keep bounds — DecimalColumnStats collects Decimal min/max. So tracking upper/lower as TimestampNanosVal here (modeled on DecimalColumnStats rather than IntervalColumnStats) would preserve the pruning the micro path provides.

Not a correctness issue — the feature works. Is the bounds-less choice intentional (follow CalendarInterval), or worth collecting min/max so cached nanos timestamps prune like micro timestamps?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point -- collecting min/max is the right call, thanks. You're right that the bounds-less version was a regression from the micro path: TIMESTAMP_NTZ(6) prunes via LongColumnStats while TIMESTAMP_NTZ(9) scanned every batch.

Following your suggestion, TimestampNanosColumnStats now collects upper/lower as TimestampNanosVal (modeled on DecimalColumnStats rather than IntervalColumnStats), using its compareTo (which is calendar order). The pruning path is already wired for it -- TimestampNTZNanosType is an AtomicType so ExtractableLiteral extracts the literal, and PhysicalTimestampNTZNanosType defines an ordering, so the bound comparisons buildFilter generates are valid -- so cached nanos timestamps now prune like micro timestamps.

Added coverage: ColumnStatsSuite asserts the min/max bounds for both NTZ and LTZ, and PartitionBatchPruningSuite verifies a range filter over a cached nanos column reads fewer batches with in-memory partition pruning on than off (and returns the same rows as a pre-cache evaluation).

…timestamps

TimestampNanosColumnStats followed the IntervalColumnStats pattern (no min/max
bounds), so cached nanosecond-timestamp columns got no batch-level partition
pruning -- a regression from the micro-precision path (TimestampType /
TimestampNTZType -> LongColumnStats), where a range filter skips non-matching
batches.

TimestampNanosVal has a total order matching calendar order, and the pruning
machinery is already wired for it (the type is an AtomicType so ExtractableLiteral
extracts its literals, and PhysicalTimestampNTZNanosType defines an ordering, so
the bound comparisons buildFilter generates are valid). Collect upper/lower as
TimestampNanosVal (modeled on DecimalColumnStats), so cached nanos timestamps
prune like micro timestamps.

Tests:
- ColumnStatsSuite: min/max bound collection for both NTZ and LTZ nanos stats.
- PartitionBatchPruningSuite: a range filter over a cached nanos column reads
  fewer batches with in-memory partition pruning on than off, and returns the
  same rows as an uncached evaluation.

Co-authored-by: Claude Code

@dongjoon-hyun dongjoon-hyun left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, it looks good to me.

…n the pruning test

The correctness check compared the cached + pruned read against an equivalent
query built after the table was cached. The CacheManager matches by logical
plan rather than DataFrame identity, so that "uncached" evaluation could itself
be served from the InMemoryRelation, making the assertion compare the cache with
itself. Compute the expected result before cacheTable so it cannot hit the
cache.

Co-authored-by: Claude Code
@viirya

viirya commented Jun 28, 2026

Copy link
Copy Markdown
Member Author

Thanks @dongjoon-hyun @MaxGekk.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants