[SPARK-57737][SQL] Cache zone offset per task in date_trunc by Licht-T · Pull Request #56848 · apache/spark

Licht-T · 2026-06-28T15:22:03Z

What changes were proposed in this pull request?

date_trunc (TruncTimestamp) resolves the session zone offset for each row via ZoneRules.getOffset(Instant) -- a binary search over the zone's transition array -- and for non-fixed-offset zones it does so twice per row (the input instant and the candidate truncated instant used by the DST-equality guard from SPARK-56663 / SPARK-56769).

This PR adds a per-task ZoneOffsetCache that memoizes the resolved offset over the half-open epoch-second interval [lo, hi) on which it is provably constant, derived from the surrounding zone transitions (nextTransition / previousTransition, anchored on an interior point to avoid an off-by-one when an instant sits exactly on a transition). A lookup inside the cached interval reduces to two comparisons instead of a binary search.

Why are the changes needed?

The session time zone is constant for a query and a zone's offset is piecewise-constant between DST/historical transitions, so consecutive rows almost always fall in the same constant-offset window (analytic data is typically temporally clustered -- time series, date-partitioned tables, post-sort). Repeating the transition-array binary search on every row is redundant work on the hot path.

DateTimeBenchmark Truncation, whole-stage codegen on, session zone America/Los_Angeles, OpenJDK 17 on a 12th Gen Intel i7-1260P, ns/row (lower is better):

level	without cache	with cache	speedup
date_trunc YEAR	98.2	56.8	1.73x
date_trunc QUARTER	109.3	71.7	1.52x
date_trunc MONTH	90.8	53.7	1.69x
date_trunc WEEK	77.8	40.6	1.92x
date_trunc DAY	64.8	33.0	1.96x
date_trunc SECOND (control)	28.7	27.7	~1.0x

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Existing DateTimeUtilsSuite and DateExpressionsSuite pass.

Was this patch authored or co-authored using generative AI tooling?

Yes, co-authored with Claude Code.

…meBenchmark (JDK 21, Scala 2.13, split 1 of 1)

…meBenchmark (JDK 25, Scala 2.13, split 1 of 1)

…meBenchmark (JDK 17, Scala 2.13, split 1 of 1)

Licht-T and others added 4 commits June 29, 2026 00:13

[SPARK-57737][SQL] Cache zone offset per task in date_trunc

e89f572

Benchmark results for org.apache.spark.sql.execution.benchmark.DateTi…

79cd536

…meBenchmark (JDK 21, Scala 2.13, split 1 of 1)

Benchmark results for org.apache.spark.sql.execution.benchmark.DateTi…

326285c

…meBenchmark (JDK 25, Scala 2.13, split 1 of 1)

Benchmark results for org.apache.spark.sql.execution.benchmark.DateTi…

b4fc463

…meBenchmark (JDK 17, Scala 2.13, split 1 of 1)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-57737][SQL] Cache zone offset per task in date_trunc#56848

[SPARK-57737][SQL] Cache zone offset per task in date_trunc#56848
Licht-T wants to merge 4 commits into
apache:masterfrom
Licht-T:date-trunc-zone-offset-cache

Licht-T commented Jun 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

Licht-T commented Jun 28, 2026

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant