Skip to content

[SPARK-57741][PYTHON] Add timestamp_nanos to PySpark public API#56852

Open
jubins wants to merge 1 commit into
apache:masterfrom
jubins:j-SPARK-57741-add-timestamp-nanos
Open

[SPARK-57741][PYTHON] Add timestamp_nanos to PySpark public API#56852
jubins wants to merge 1 commit into
apache:masterfrom
jubins:j-SPARK-57741-add-timestamp-nanos

Conversation

@jubins

@jubins jubins commented Jun 28, 2026

Copy link
Copy Markdown
Contributor

What is the purpose of the change

Fixes SPARK-57741 (follow-up to SPARK-57526) — adds timestamp_nanos to the
PySpark API (pyspark.sql.functions and PySpark Connect), completing the
nanosecond round-trip pair in Python.

timestamp_nanos(e) converts a nanoseconds-since-epoch integer to a
TIMESTAMP_LTZ(9) value. It is the inverse of unix_nanos (SPARK-57579).

The SQL function and Scala API were added in SPARK-57526, but Python support
was explicitly deferred and tracked as a follow-up via expected_missing_in_py:

expected_missing_in_py = {
    "timestamp_nanos"
}  # SPARK-57526: PySpark support tracked as a follow-up

The round-trip pair is now complete

Function Before this PR After this PR
unix_nanos present present
timestamp_nanos missing added

Brief change log

  • python/pyspark/sql/functions/builtin.py
    Added timestamp_nanos(col) after timestamp_micros, decorated with @_try_remote_functions, with full docstring:

    • versionadded:: 4.3.0
    • parameters + return type
    • See Also links
    • two doctests (valid nanosecond value + NULL input)
  • python/pyspark/sql/connect/functions/builtin.py
    Added Connect-side wrapper for timestamp_nanos, inheriting docstring from main module and following the same pattern as timestamp_micros

  • python/pyspark/sql/functions/__init__.py
    Exported timestamp_nanos in alphabetical order between timestamp_millis and timestamp_seconds

  • python/docs/source/reference/pyspark.sql/functions.rst
    Added timestamp_nanos entry between timestamp_millis and timestamp_seconds

  • python/pyspark/sql/tests/test_functions.py
    Removed "timestamp_nanos" from expected_missing_in_py (set is now empty)


Verifying this change

Covered by the existing parity test in FunctionsTestsMixin:

  • test_function_parity previously allowlisted timestamp_nanos as an expected gap
  • Removing it from expected_missing_in_py ensures the test will now fail if
    timestamp_nanos is ever missing from the Python API again

The two doctests in the timestamp_nanos docstring verify:

  • A nanosecond integer input returns the correct TIMESTAMP_LTZ(9) value
  • A NULL input returns NULL

Does this pull request potentially affect one of the following parts

  • Dependencies (adds or upgrades dependency): No
  • Public API (@Public / @Evolving): Yes — new public PySpark function
  • Serializers: No
  • Runtime per-record code paths (performance sensitive): No — Python wrapper only; JVM expression unchanged
  • Deployment or recovery: No
  • S3 file system connector: No

Documentation

  • Introduces a new feature: Yes
  • New API: pyspark.sql.functions.timestamp_nanos
  • Documented via:
    • Inline docstring (parameters, return type, See Also links)
    • Doctests in builtin.py

Was generative AI tooling used to co-author this PR?

  • Yes — Claude Code was used as a pair-programming assistant.
    All code was written, understood, and verified by the author.

Generated-by: Claude Opus 4.8

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant