Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 17 additions & 0 deletions superset/commands/report/execute.py
Original file line number Diff line number Diff line change
Expand Up @@ -506,6 +506,20 @@ def _get_pdf(self) -> bytes:

return pdf

def _ensure_utf8_bom(self, csv_data: bytes, encoding: str) -> bytes:
"""
Ensure CSV bytes contain UTF-8 BOM when encoding is utf-8-sig.
Avoid double BOM.
"""
if not csv_data:
return csv_data

enc = (encoding or "").lower().replace("_", "-")
if enc in ("utf-8-sig", "utf8-sig"):

Copilot AI Jan 21, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The encoding normalization at line 447 replaces underscores with hyphens, converting both "utf_8_sig" and "utf-8-sig" to "utf-8-sig". Therefore, checking for "utf8-sig" (without any separator) in line 448 is unnecessary after this normalization.

Consider simplifying to just check for "utf-8-sig":

enc = (encoding or "").lower().replace("_", "-")
if enc == "utf-8-sig":

Alternatively, if you want to handle "utf8sig" (no separator), you could check for both "utf-8-sig" and "utf8sig", but the current check for "utf8-sig" (with hyphen but no separators elsewhere) won't match any real input after normalization.

Suggested change
if enc in ("utf-8-sig", "utf8-sig"):
if enc == "utf-8-sig":

Copilot uses AI. Check for mistakes.
if not csv_data.startswith(b"\xef\xbb\xbf"):
return b"\xef\xbb\xbf" + csv_data
return csv_data
Comment on lines +509 to +521

Copilot AI Jan 21, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new method _ensure_utf8_bom lacks test coverage. Given that this addresses a specific encoding issue that could lead to data corruption (garbled characters in Excel), it would be valuable to add a test that verifies:

  1. When CSV_EXPORT.encoding is "utf-8-sig", the returned CSV bytes start with the UTF-8 BOM (b"\xef\xbb\xbf")
  2. When a BOM already exists, it's not duplicated
  3. When the encoding is not utf-8-sig, no BOM is added

Consider adding a test in tests/integration_tests/reports/commands_tests.py similar to test_email_chart_report_schedule_with_csv but with assertions on the BOM presence.

Copilot uses AI. Check for mistakes.

def _get_csv_data(self) -> bytes:
start_time = datetime.utcnow()
url = self._get_url(result_format=ChartDataResultFormat.CSV)
Expand Down Expand Up @@ -550,6 +564,9 @@ def _get_csv_data(self) -> bytes:
) from ex
if not csv_data:
raise ReportScheduleCsvFailedError()

encoding = app.config.get("CSV_EXPORT", {}).get("encoding", "utf-8")
csv_data = self._ensure_utf8_bom(csv_data, encoding)
return csv_data

def _get_embedded_data(self) -> pd.DataFrame:
Expand Down
Loading