Skip to content

fix(reports): ensure CSV attachments use UTF-8 BOM when encoding is utf-8-sig#37245

Open
yayunn wants to merge 2 commits into
apache:masterfrom
yayunn:fix/report-utf8-bom
Open

fix(reports): ensure CSV attachments use UTF-8 BOM when encoding is utf-8-sig#37245
yayunn wants to merge 2 commits into
apache:masterfrom
yayunn:fix/report-utf8-bom

Conversation

@yayunn

@yayunn yayunn commented Jan 19, 2026

Copy link
Copy Markdown

Summary

This PR fixes an issue where CSV attachments generated by Reports are missing the UTF-8 BOM, even when CSV_EXPORT.encoding is set to utf-8-sig.

This causes non-ASCII characters (e.g. CJK) to appear garbled in Excel and some email clients.

Problem

  • SQL Lab CSV exports correctly include the BOM
  • Report email CSV attachments do not include the BOM
  • This leads to inconsistent behavior between export paths

Solution

  • Ensure that CSV bytes returned by the Reports execution pipeline contain a UTF-8 BOM when CSV_EXPORT.encoding is utf-8-sig
  • Scope the fix only to Reports export path

Screenshots / Logs

Before:

  • CSV bytes start with: e5 94 ae ... (no BOM)

After:

  • CSV bytes start with: ef bb bf e5 94 ae ... (correct UTF-8 BOM)

ADDITIONAL INFORMATION

  • Has associated issue:
  • Required feature flags:
  • Changes UI
  • Includes DB Migration (follow approval process in [SIP-59] Proposal for Database migration standards #13351)
    • Migration is atomic, supports rollback & is backwards-compatible
    • Confirm DB migration upgrade and downgrade tested
    • Runtime estimates and downtime expectations provided
  • Introduces new feature or API
  • Removes existing feature or API

@codeant-ai-for-open-source

Copy link
Copy Markdown
Contributor

CodeAnt AI is reviewing your PR.


Thanks for using CodeAnt! 🎉

We're free for open-source projects. if you're enjoying it, help us grow by sharing.

Share on X ·
Reddit ·
LinkedIn

@bito-code-review

bito-code-review Bot commented Jan 19, 2026

Copy link
Copy Markdown
Contributor

Code Review Agent Run #61ca6c

Actionable Suggestions - 0
Review Details
  • Files reviewed - 1 · Commit Range: 880edd7..880edd7
    • superset/commands/report/execute.py
  • Files skipped - 0
  • Tools
    • Whispers (Secret Scanner) - ✔︎ Successful
    • Detect-secrets (Secret Scanner) - ✔︎ Successful
    • MyPy (Static Code Analysis) - ✔︎ Successful
    • Astral Ruff (Static Code Analysis) - ✔︎ Successful

Bito Usage Guide

Commands

Type the following command in the pull request comment and save the comment.

  • /review - Manually triggers a full AI review.

  • /pause - Pauses automatic reviews on this pull request.

  • /resume - Resumes automatic reviews.

  • /resolve - Marks all Bito-posted review comments as resolved.

  • /abort - Cancels all in-progress reviews.

Refer to the documentation for additional commands.

Configuration

This repository uses Superset You can customize the agent settings here or contact your Bito workspace admin at evan@preset.io.

Documentation & Help

AI Code Review powered by Bito Logo

@dosubot dosubot Bot added alert-reports Namespace | Anything related to the Alert & Reports feature data:csv Related to import/export of CSVs labels Jan 19, 2026
@codeant-ai-for-open-source

Copy link
Copy Markdown
Contributor

CodeAnt AI finished reviewing your PR.

@netlify

netlify Bot commented Jan 19, 2026

Copy link
Copy Markdown

Deploy Preview for superset-docs-preview ready!

Name Link
🔨 Latest commit 880edd7
🔍 Latest deploy log https://app.netlify.com/projects/superset-docs-preview/deploys/696e05448ec78c00082617a2
😎 Deploy Preview https://deploy-preview-37245--superset-docs-preview.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a bug where CSV attachments generated by the Reports feature are missing the UTF-8 Byte Order Mark (BOM), even when CSV_EXPORT.encoding is configured as utf-8-sig. This causes non-ASCII characters (particularly CJK characters) to appear garbled in Excel and some email clients.

Changes:

  • Adds a new _ensure_utf8_bom() method to detect and prepend UTF-8 BOM when needed
  • Integrates BOM handling into the _get_csv_data() method in the report execution pipeline
  • Ensures consistency between SQL Lab CSV exports and Report CSV exports

Comment on lines +439 to +451
def _ensure_utf8_bom(self, csv_data: bytes, encoding: str) -> bytes:
"""
Ensure CSV bytes contain UTF-8 BOM when encoding is utf-8-sig.
Avoid double BOM.
"""
if not csv_data:
return csv_data

enc = (encoding or "").lower().replace("_", "-")
if enc in ("utf-8-sig", "utf8-sig"):
if not csv_data.startswith(b"\xef\xbb\xbf"):
return b"\xef\xbb\xbf" + csv_data
return csv_data

Copilot AI Jan 21, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new method _ensure_utf8_bom lacks test coverage. Given that this addresses a specific encoding issue that could lead to data corruption (garbled characters in Excel), it would be valuable to add a test that verifies:

  1. When CSV_EXPORT.encoding is "utf-8-sig", the returned CSV bytes start with the UTF-8 BOM (b"\xef\xbb\xbf")
  2. When a BOM already exists, it's not duplicated
  3. When the encoding is not utf-8-sig, no BOM is added

Consider adding a test in tests/integration_tests/reports/commands_tests.py similar to test_email_chart_report_schedule_with_csv but with assertions on the BOM presence.

Copilot uses AI. Check for mistakes.
return csv_data

enc = (encoding or "").lower().replace("_", "-")
if enc in ("utf-8-sig", "utf8-sig"):

Copilot AI Jan 21, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The encoding normalization at line 447 replaces underscores with hyphens, converting both "utf_8_sig" and "utf-8-sig" to "utf-8-sig". Therefore, checking for "utf8-sig" (without any separator) in line 448 is unnecessary after this normalization.

Consider simplifying to just check for "utf-8-sig":

enc = (encoding or "").lower().replace("_", "-")
if enc == "utf-8-sig":

Alternatively, if you want to handle "utf8sig" (no separator), you could check for both "utf-8-sig" and "utf8sig", but the current check for "utf8-sig" (with hyphen but no separators elsewhere) won't match any real input after normalization.

Suggested change
if enc in ("utf-8-sig", "utf8-sig"):
if enc == "utf-8-sig":

Copilot uses AI. Check for mistakes.
@codecov

codecov Bot commented Jan 21, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 10.00000% with 9 lines in your changes missing coverage. Please review.
✅ Project coverage is 56.18%. Comparing base (b05fe48) to head (30570e7).

Files with missing lines Patch % Lines
superset/commands/report/execute.py 10.00% 9 Missing ⚠️

❗ There is a different number of reports uploaded between BASE (b05fe48) and HEAD (30570e7). Click for more details.

HEAD has 93 uploads less than BASE
Flag BASE (b05fe48) HEAD (30570e7)
python 67 3
presto 11 1
hive 11 1
unit 10 1
Additional details and impacted files
@@            Coverage Diff             @@
##           master   #37245      +/-   ##
==========================================
- Coverage   64.28%   56.18%   -8.11%     
==========================================
  Files        2659     2659              
  Lines      144304   144314      +10     
  Branches    33260    33263       +3     
==========================================
- Hits        92773    81087   -11686     
- Misses      49902    62504   +12602     
+ Partials     1629      723     -906     
Flag Coverage Δ
hive 39.43% <10.00%> (-0.01%) ⬇️
mysql ?
postgres ?
presto 41.01% <10.00%> (-0.01%) ⬇️
python 42.31% <10.00%> (-17.39%) ⬇️
sqlite ?
unit 100.00% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@rusackas

Copy link
Copy Markdown
Member

Thanks @yayunn, let's see if we can get this across the finish line. CI is currently red: the Python-Integration suites (test-mysql/test-postgres/test-sqlite) are failing, likely an existing report/CSV test now seeing the BOM. I'm updating the branch now, but if we can get those green, and a small unit test asserting the BOM is prepended only for utf-8-sig, we'll take a fresh look.

@bito-code-review

bito-code-review Bot commented Jun 13, 2026

Copy link
Copy Markdown
Contributor

Code Review Agent Run #0dbd2c

Actionable Suggestions - 0
Review Details
  • Files reviewed - 1 · Commit Range: 880edd7..30570e7
    • superset/commands/report/execute.py
  • Files skipped - 0
  • Tools
    • Whispers (Secret Scanner) - ✔︎ Successful
    • Detect-secrets (Secret Scanner) - ✔︎ Successful
    • MyPy (Static Code Analysis) - ✔︎ Successful
    • Astral Ruff (Static Code Analysis) - ✔︎ Successful

Bito Usage Guide

Commands

Type the following command in the pull request comment and save the comment.

  • /review - Manually triggers a full AI review.

  • /pause - Pauses automatic reviews on this pull request.

  • /resume - Resumes automatic reviews.

  • /resolve - Marks all Bito-posted review comments as resolved.

  • /abort - Cancels all in-progress reviews.

Refer to the documentation for additional commands.

Configuration

This repository uses Superset You can customize the agent settings here or contact your Bito workspace admin at evan@preset.io.

Documentation & Help

AI Code Review powered by Bito Logo

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

alert-reports Namespace | Anything related to the Alert & Reports feature data:csv Related to import/export of CSVs size/S

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants