Skip to content

[SPARK-57738][CONNECT] Restore fast-fail guard for nanosecond timestamp types in ArrowVectorReader#56849

Open
jubins wants to merge 2 commits into
apache:masterfrom
jubins:j-SPARK-57738-arrow-vector-reader
Open

[SPARK-57738][CONNECT] Restore fast-fail guard for nanosecond timestamp types in ArrowVectorReader#56849
jubins wants to merge 2 commits into
apache:masterfrom
jubins:j-SPARK-57738-arrow-vector-reader

Conversation

@jubins

@jubins jubins commented Jun 28, 2026

Copy link
Copy Markdown
Contributor

What is the purpose of the change

Fixes SPARK-57738 — restores the fast-fail guard for nanosecond-precision timestamp types in ArrowVectorReader, which was silently broken by SPARK-57303.

SPARK-57303 updated UpCastRule.canUpCast to return true for lossless widening within the timestamp family (e.g. TimestampType -> TimestampLTZNanosType(p)). As a side effect, the existing unsupported-type guard in ArrowVectorReader.applyDefault no longer rejects nanosecond timestamp targets — the SPARK-57303 commit message explicitly flagged this as a known follow-up item.

Without this fix, a request to read a TIMESTAMP_LTZ(p) or TIMESTAMP_NTZ(p) (p in [7, 9]) column over Spark Connect silently passes the guard and then crashes with a confusing "Unsupported Vector Type" error from the catch-all branch of the vector match. With this fix it fails fast with a clear "not yet supported" message.

Brief change log

  • sql/connect/common/src/main/scala/org/apache/spark/sql/connect/client/arrow/ArrowVectorReader.scala: added AnyTimestampNanoType to the import and inserted an explicit rejection guard between the canUpCast check and the vector match block

Verifying this change

No pre-existing unit tests covered ArrowVectorReader directly. This PR adds
ArrowVectorReaderSuite with three cases:

  • ArrowVectorReader rejects TimestampLTZNanosType with a clear error — asserts
    that passing a TimestampLTZNanosType(9) target throws a RuntimeException
    with "not yet supported" in the message, rather than falling through to the
    generic "Unsupported Vector Type" crash
  • ArrowVectorReader rejects TimestampNTZNanosType with a clear error — same
    check for TimestampNTZNanosType(7)
  • ArrowVectorReader still succeeds for plain TimestampType — sanity-checks that
    the guard does not regress the existing supported path

Does this pull request potentially affect one of the following parts

  • Dependencies (does it add or upgrade a dependency): no
  • The public API, i.e., is any changed class annotated with @Public/@Evolving: no — ArrowVectorReader is private[connect]
  • The serializers: no
  • The runtime per-record code paths (performance sensitive): no — the guard only fires for an unsupported type that cannot currently be produced
  • Anything that affects deployment or recovery: no
  • The S3 file system connector: no

Documentation

Does this pull request introduce a new feature? No — this is a bug fix restoring a guard that was inadvertently disabled by SPARK-57303.

Was generative AI tooling used to co-author this PR?

  • Yes — Claude Code was used as a pair-programming assistant. All code was written, understood, and verified by the author.
    Generated-by: Claude Opus 4.8

jubins added 2 commits June 28, 2026 08:28
…mp types in ArrowVectorReader

### What is the purpose of the change

Fixes SPARK-57738 — restores the fast-fail guard for nanosecond-precision timestamp
types in `ArrowVectorReader`, which was silently broken by SPARK-57303.

SPARK-57303 updated `UpCastRule.canUpCast` to return `true` for lossless widening
within the timestamp family (e.g. `TimestampType -> TimestampLTZNanosType(p)`).
As a side effect, the existing unsupported-type guard in `ArrowVectorReader.applyDefault`
no longer rejects nanosecond timestamp targets — the SPARK-57303 commit message
explicitly flagged this as a known follow-up item.

Without this fix, a request to read a `TIMESTAMP_LTZ(p)` or `TIMESTAMP_NTZ(p)`
(`p` in `[7, 9]`) column over Spark Connect silently passes the guard and then
crashes with a confusing `"Unsupported Vector Type"` error from the catch-all
branch of the `vector match`. With this fix it fails fast with a clear
`"not yet supported"` message.

### Brief change log

- `sql/connect/common/src/main/scala/org/apache/spark/sql/connect/client/arrow/ArrowVectorReader.scala`:
  added `AnyTimestampNanoType` to the import and inserted an explicit rejection
  guard between the `canUpCast` check and the `vector match` block

### Verifying this change

No existing unit tests cover `ArrowVectorReader` directly. The fix is a
defensive guard on an unsupported code path (nanosecond-precision timestamps
are not yet reachable over Connect in any supported workflow), so the primary
verification is:

- Manual inspection: the guard fires before the `vector match`, so no
  nanosecond type can reach the `"Unsupported Vector Type"` catch-all
- The fix will be superseded and removed when Connect nanos support is
  implemented (the comment in the code points to this)

### Does this pull request potentially affect one of the following parts

- Dependencies (does it add or upgrade a dependency): no
- The public API, i.e., is any changed class annotated with `@Public`/`@Evolving`: no — `ArrowVectorReader` is `private[connect]`
- The serializers: no
- The runtime per-record code paths (performance sensitive): no — the guard only fires for an unsupported type that cannot currently be produced
- Anything that affects deployment or recovery: no
- The S3 file system connector: no

### Documentation

Does this pull request introduce a new feature? No — this is a bug fix restoring
a guard that was inadvertently disabled by SPARK-57303.

### Was generative AI tooling used to co-author this PR?

Yes — Claude Code was used as a pair-programming assistant. All code was written,
understood, and verified by the author.
Generated-by: Claude Sonnet 4.6
Run with: build/sbt 'connect-client-jvm/testOnly *ArrowVectorReaderSuite'
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant