Skip to content

[SPARK-57737][SQL] Cache zone offset per task in date_trunc#56848

Open
Licht-T wants to merge 4 commits into
apache:masterfrom
Licht-T:date-trunc-zone-offset-cache
Open

[SPARK-57737][SQL] Cache zone offset per task in date_trunc#56848
Licht-T wants to merge 4 commits into
apache:masterfrom
Licht-T:date-trunc-zone-offset-cache

Conversation

@Licht-T

@Licht-T Licht-T commented Jun 28, 2026

Copy link
Copy Markdown
Contributor

What changes were proposed in this pull request?

date_trunc (TruncTimestamp) resolves the session zone offset for each row via ZoneRules.getOffset(Instant) -- a binary search over the zone's transition array -- and for non-fixed-offset zones it does so twice per row (the input instant and the candidate truncated instant used by the DST-equality guard from SPARK-56663 / SPARK-56769).

This PR adds a per-task ZoneOffsetCache that memoizes the resolved offset over the half-open epoch-second interval [lo, hi) on which it is provably constant, derived from the surrounding zone transitions (nextTransition / previousTransition, anchored on an interior point to avoid an off-by-one when an instant sits exactly on a transition). A lookup inside the cached interval reduces to two comparisons instead of a binary search.

Why are the changes needed?

The session time zone is constant for a query and a zone's offset is piecewise-constant between DST/historical transitions, so consecutive rows almost always fall in the same constant-offset window (analytic data is typically temporally clustered -- time series, date-partitioned tables, post-sort). Repeating the transition-array binary search on every row is redundant work on the hot path.

DateTimeBenchmark Truncation, whole-stage codegen on, session zone America/Los_Angeles, OpenJDK 17 on a 12th Gen Intel i7-1260P, ns/row (lower is better):

level without cache with cache speedup
date_trunc YEAR 98.2 56.8 1.73x
date_trunc QUARTER 109.3 71.7 1.52x
date_trunc MONTH 90.8 53.7 1.69x
date_trunc WEEK 77.8 40.6 1.92x
date_trunc DAY 64.8 33.0 1.96x
date_trunc SECOND (control) 28.7 27.7 ~1.0x

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Existing DateTimeUtilsSuite and DateExpressionsSuite pass.

Was this patch authored or co-authored using generative AI tooling?

Yes, co-authored with Claude Code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant