-
Notifications
You must be signed in to change notification settings - Fork 17.6k
fix: Apply timezone offset to convert local time boundaries to UTC #37014
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from all commits
5e3504a
903fa87
2d56e89
25f1e7f
9f9e4ee
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
@@ -65,6 +65,7 @@ | |||||||||||||||||||||
| import markdown as md | ||||||||||||||||||||||
| import nh3 | ||||||||||||||||||||||
| import pandas as pd | ||||||||||||||||||||||
| import pytz | ||||||||||||||||||||||
| import sqlalchemy as sa | ||||||||||||||||||||||
| from cryptography.hazmat.backends import default_backend | ||||||||||||||||||||||
| from cryptography.x509 import Certificate, load_pem_x509_certificate | ||||||||||||||||||||||
|
|
@@ -1935,6 +1936,7 @@ class DateColumn: | |||||||||||||||||||||
| timestamp_format: str | None = None | ||||||||||||||||||||||
| offset: int | None = None | ||||||||||||||||||||||
| time_shift: str | None = None | ||||||||||||||||||||||
| timezone: str | None = None # IANA timezone name | ||||||||||||||||||||||
|
|
||||||||||||||||||||||
| def __hash__(self) -> int: | ||||||||||||||||||||||
| return hash(self.col_label) | ||||||||||||||||||||||
|
|
@@ -1948,11 +1950,13 @@ def get_legacy_time_column( | |||||||||||||||||||||
| timestamp_format: str | None, | ||||||||||||||||||||||
| offset: int | None, | ||||||||||||||||||||||
| time_shift: str | None, | ||||||||||||||||||||||
| timezone: str | None = None, | ||||||||||||||||||||||
| ) -> DateColumn: | ||||||||||||||||||||||
| return cls( | ||||||||||||||||||||||
| timestamp_format=timestamp_format, | ||||||||||||||||||||||
| offset=offset, | ||||||||||||||||||||||
| time_shift=time_shift, | ||||||||||||||||||||||
| timezone=timezone, | ||||||||||||||||||||||
| col_label=DTTM_ALIAS, | ||||||||||||||||||||||
| ) | ||||||||||||||||||||||
|
|
||||||||||||||||||||||
|
|
@@ -2037,8 +2041,28 @@ def normalize_dttm_col( | |||||||||||||||||||||
|
|
||||||||||||||||||||||
| _process_datetime_column(df, _col) | ||||||||||||||||||||||
|
|
||||||||||||||||||||||
| if _col.offset: | ||||||||||||||||||||||
| if _col.timezone: | ||||||||||||||||||||||
| try: | ||||||||||||||||||||||
| tz = pytz.timezone(_col.timezone) | ||||||||||||||||||||||
| # Data is stored in UTC, convert to the dataset's configured timezone | ||||||||||||||||||||||
| # First make the datetime UTC-aware, then convert to target timezone | ||||||||||||||||||||||
| series = df[_col.col_label] | ||||||||||||||||||||||
| if not series.empty and series.notna().any(): | ||||||||||||||||||||||
| # Convert UTC to target timezone | ||||||||||||||||||||||
| df[_col.col_label] = ( | ||||||||||||||||||||||
| series.dt.tz_localize("UTC") | ||||||||||||||||||||||
| .dt.tz_convert(tz) | ||||||||||||||||||||||
| .dt.tz_localize(None) # Remove timezone info for display | ||||||||||||||||||||||
| ) | ||||||||||||||||||||||
| except pytz.UnknownTimeZoneError: | ||||||||||||||||||||||
| logging.warning( | ||||||||||||||||||||||
| "Unknown timezone '%s', falling back to offset", _col.timezone | ||||||||||||||||||||||
| ) | ||||||||||||||||||||||
| if _col.offset: | ||||||||||||||||||||||
| df[_col.col_label] += timedelta(hours=_col.offset) | ||||||||||||||||||||||
| elif _col.offset: | ||||||||||||||||||||||
| df[_col.col_label] += timedelta(hours=_col.offset) | ||||||||||||||||||||||
|
Comment on lines
+2061
to
2064
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Suggestion: Offset handling is incorrect: the code adds the configured Severity Level: Critical 🚨- ❌ Dashboard time filters produce incorrect UTC boundaries.
- ❌ Queries to /data return wrong time ranges.
- ⚠️ Charts and aggregations misaligned by timezone offset.
- ⚠️ Dataset timezone/offset settings produce incorrect results.
Suggested change
Steps of Reproduction ✅1. Open a Python REPL or test and import normalize_dttm_col from superset/utils/core.py
(function begins at file line ~1934 where `def normalize_dttm_col(` is defined).
2. Construct a DataFrame with a datetime column named "__timestamp", e.g. in code at
runtime by:
- df = pd.DataFrame({"__timestamp": pd.to_datetime(["2026-01-09 00:00:00"])})
3. Create a DateColumn instance with offset set (example using class defined around lines
~1847 in superset/utils/core.py):
- from superset.utils.core import DateColumn
- col = DateColumn(col_label="__timestamp", offset=6)
4. Call normalize_dttm_col(df, (col,)) (function at superset/utils/core.py: ~1934).
Observe resulting df["__timestamp"] has been increased by +6 hours (current code does +=
timedelta), whereas the PR description and intended behavior expect the local midnight
(UTC+6) to be converted to UTC by subtracting 6 hours. Also observe that setting offset=0
(DateColumn(offset=0)) will be ignored by current code because `if _col.offset:` treats 0
as falsy, so no adjustment is applied.Prompt for AI Agent 🤖This is a comment left during a code review.
**Path:** superset/utils/core.py
**Line:** 1974:1977
**Comment:**
*Logic Error: Offset handling is incorrect: the code adds the configured `offset` hours to the datetime (using `+= timedelta(hours=_col.offset)`) which moves times in the wrong direction when converting local boundaries to UTC; additionally, the code treats offset `0` as falsy so a valid zero offset is ignored. Change the logic to subtract the offset (and check `is not None`) so offsets of `0` are handled.
Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise. |
||||||||||||||||||||||
|
|
||||||||||||||||||||||
| if _col.time_shift is not None: | ||||||||||||||||||||||
| df[_col.col_label] += parse_human_timedelta(_col.time_shift) | ||||||||||||||||||||||
|
|
||||||||||||||||||||||
|
|
||||||||||||||||||||||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -324,6 +324,7 @@ def get_df(self, query_obj: QueryObjectDict | None = None) -> pd.DataFrame: | |
| timestamp_format=timestamp_format, | ||
| offset=self.datasource.offset, | ||
| time_shift=self.form_data.get("time_shift"), | ||
| timezone=self.datasource.get_dataset_timezone(), | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Missing test for timezone feature
The added Code Review Run #5607e8 Should Bito avoid suggestions like this for future reviews? (Manage Rules)
|
||
| ) | ||
| ] | ||
| ), | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since Superset targets Python ≥3.10, replace the third-party
pytzimport withfrom zoneinfo import ZoneInfo, ZoneInfoNotFoundError, useZoneInfo(_col.timezone)instead ofpytz.timezone, and catchZoneInfoNotFoundErrorrather thanpytz.UnknownTimeZoneError.Code Review Run #5607e8
Should Bito avoid suggestions like this for future reviews? (Manage Rules)