Skip to content

[SPARK-57556][SQL] Raise a clear error for the TIME data type in Hive SerDe interop#56850

Open
MaxGekk wants to merge 2 commits into
apache:masterfrom
MaxGekk:time-hive-serde
Open

[SPARK-57556][SQL] Raise a clear error for the TIME data type in Hive SerDe interop#56850
MaxGekk wants to merge 2 commits into
apache:masterfrom
MaxGekk:time-hive-serde

Conversation

@MaxGekk

@MaxGekk MaxGekk commented Jun 28, 2026

Copy link
Copy Markdown
Member

What changes were proposed in this pull request?

Apache Hive has no TIME type, so TimeType has no faithful representation in Hive SerDe interop. This PR (the Option B / "clear, documented error" path from SPARK-57556) makes TimeType produce a clear AnalysisException instead of a scala.MatchError/internal error when it reaches the HiveInspectors mapping functions, and rejects it in the Hive SerDe write path:

  • HiveInspectors.toInspector(dataType), toInspector(expr) (TIME literal) and toTypeInfo now throw UNSUPPORTED_DATATYPE via a shared unsupportedHiveType helper. Previously toInspector(dataType) had no TimeType case and no default branch, so a TIME column hit a raw scala.MatchError.
  • HiveFileFormat.supportDataType rejects TimeType (recursing into nested struct/array/map/UDT types, preserving the prior default for all other types) so Hive SerDe writes raise UNSUPPORTED_DATA_TYPE_FOR_DATASOURCE (format Hive) via FileFormatWriter.verifySchema.
  • Documented the limitation on the TIME entry in docs/sql-ref-datatypes.md.

Why are the changes needed?

HiveInspectors had no TimeType case, so object-inspector creation and TypeInfo mapping fell through to a MatchError/internal error when a TIME column or literal reached Hive SerDe paths (for example, a TIME argument to a Hive UDF/UDAF/UDTF). This makes the behavior explicit and documented, consistent with the existing TIME rejection for Hive ORC (SPARK-51590).

Does this PR introduce any user-facing change?

Yes. Using TIME with Hive UDFs or in a Hive SerDe write now fails with a clear error that names the unsupported TIME type, instead of a MatchError/internal error. For example, SELECT myHiveUDF(TIME'12:01:02') now reports [UNSUPPORTED_DATATYPE] Unsupported data type "TIME(6)" (wrapped by the Hive UDF resolver), and writing a TIME column through the Hive SerDe write path reports [UNSUPPORTED_DATA_TYPE_FOR_DATASOURCE] The Hive datasource doesn't support the column ... of the type "TIME(6)".

How was this patch tested?

Added tests and ran them locally (build/sbt 'hive/testOnly *HiveInspectorSuite *HiveUDFSuite *InsertSuite'):

  • HiveInspectorSuite: toInspector(TimeType()), a TIME literal, and TimeType().toTypeInfo raise UNSUPPORTED_DATATYPE.
  • HiveUDFSuite: passing TIME'12:01:02' to a Hive GenericUDFHash fails with a message naming the unsupported TIME type.
  • InsertSuite: INSERT OVERWRITE LOCAL DIRECTORY ... STORED AS PARQUET SELECT TIME'...' (with spark.sql.hive.convertMetastoreInsertDir=false) raises UNSUPPORTED_DATA_TYPE_FOR_DATASOURCE.

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Cursor (Claude Opus 4.8)

MaxGekk added 2 commits June 28, 2026 17:27
… SerDe interop

### What changes were proposed in this pull request?

Apache Hive has no TIME type, so `TimeType` has no faithful representation in
Hive SerDe interop. This PR makes `TimeType` produce a clear `AnalysisException`
instead of a `scala.MatchError` or an internal error when it reaches the
`HiveInspectors` mapping functions, and rejects it in the Hive SerDe write path:

- `HiveInspectors.toInspector(dataType)`, `toInspector(expr)` (TIME literal) and
  `toTypeInfo` now throw `UNSUPPORTED_DATATYPE` via a shared helper.
- `HiveFileFormat.supportDataType` rejects `TimeType` (recursing into nested
  types) so Hive SerDe writes raise `UNSUPPORTED_DATA_TYPE_FOR_DATASOURCE`.

### Why are the changes needed?

`HiveInspectors` had no `TimeType` case, so object-inspector creation and
TypeInfo mapping fell through to a `MatchError`/internal error when a TIME
column or literal reached Hive SerDe paths (e.g. a Hive UDF argument). This
makes the behavior explicit and documented, consistent with the existing TIME
rejection for Hive ORC (SPARK-51590).

### Does this PR introduce any user-facing change?

Yes. Using TIME with Hive UDFs or Hive SerDe writes now fails with a clear
error message naming the unsupported TIME type rather than a MatchError/internal
error.

### How was this patch tested?

Added tests in `HiveInspectorSuite`, `HiveUDFSuite` and `InsertSuite`, and
documented the limitation in `docs/sql-ref-datatypes.md`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant