Skip to content

fix(network): validate target hostname in outbound requests#39301

Open
sha174n wants to merge 75 commits into
apache:masterfrom
sha174n:fix/dataset-import-url-validation
Open

fix(network): validate target hostname in outbound requests#39301
sha174n wants to merge 75 commits into
apache:masterfrom
sha174n:fix/dataset-import-url-validation

Conversation

@sha174n

@sha174n sha174n commented Apr 12, 2026

Copy link
Copy Markdown
Contributor

SUMMARY

Extends the hostname allowlist check introduced for dataset import URLs to cover two additional outbound HTTP paths from the Superset backend: the report webhook dispatcher and the Impala engine spec's cancel-query call.

Changes:

  • superset/utils/network.py: adds is_safe_host() helper that resolves a hostname and returns True only when all resolved addresses fall outside loopback, private, link-local, and shared address ranges.
  • superset/commands/dataset/importers/v1/utils.py: applies the host check when loading data from a configured URL; also validates redirect targets before following them.
  • superset/config.py: adds three opt-out flags (DATASET_IMPORT_ALLOW_INTERNAL_DATA_URLS, ALERT_REPORTS_WEBHOOK_ALLOW_INTERNAL_HOSTS, IMPALA_CANCEL_QUERY_ALLOW_INTERNAL_HOSTS, all default False) for deployments where the respective target legitimately points to internal hosts.
  • superset/db_engine_specs/impala.py: validates the database host before the requests.post(...) call in cancel_query(); gated by the Impala opt-out flag.
  • superset/reports/notifications/webhook.py: validates the webhook target host before dispatching; gated by the webhook opt-out flag.

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

N/A, no UI changes.

TESTING INSTRUCTIONS

pytest tests/unit_tests/utils/test_network.py
pytest tests/unit_tests/datasets/commands/importers/v1/import_test.py
pytest tests/unit_tests/db_engine_specs/test_impala.py
pytest tests/unit_tests/reports/notifications/

ADDITIONAL INFORMATION

  • Has associated issue:
  • Required feature flags:
  • Changes UI
  • Includes DB Migration
  • Introduces new feature or API
  • Removes existing feature or API

@dosubot dosubot Bot added change:backend Requires changing the backend risk:breaking-change Issues or PRs that will introduce breaking changes labels Apr 12, 2026
Comment thread tests/unit_tests/datasets/commands/importers/v1/import_test.py Outdated
Comment thread tests/unit_tests/datasets/commands/importers/v1/import_test.py Outdated
@codecov

codecov Bot commented Apr 12, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 20.83333% with 57 lines in your changes missing coverage. Please review.
✅ Project coverage is 64.28%. Comparing base (74845ea) to head (313334c).

Files with missing lines Patch % Lines
superset/commands/dataset/importers/v1/utils.py 21.42% 22 Missing ⚠️
superset/utils/network.py 15.78% 16 Missing ⚠️
superset/reports/notifications/webhook.py 13.33% 13 Missing ⚠️
superset/db_engine_specs/impala.py 14.28% 6 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master   #39301      +/-   ##
==========================================
- Coverage   64.30%   64.28%   -0.03%     
==========================================
  Files        2657     2657              
  Lines      144060   144128      +68     
  Branches    33216    33232      +16     
==========================================
+ Hits        92641    92651      +10     
- Misses      49797    49853      +56     
- Partials     1622     1624       +2     
Flag Coverage Δ
hive 39.42% <20.83%> (-0.02%) ⬇️
mysql 58.15% <20.83%> (-0.04%) ⬇️
postgres 58.22% <20.83%> (-0.05%) ⬇️
presto 41.01% <20.83%> (-0.02%) ⬇️
python 59.69% <20.83%> (-0.05%) ⬇️
sqlite 57.84% <20.83%> (-0.04%) ⬇️
unit 100.00% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Comment thread superset/utils/network.py
Comment thread superset/commands/dataset/importers/v1/utils.py
Comment thread superset/utils/network.py
Comment thread superset/utils/network.py
@sha174n sha174n force-pushed the fix/dataset-import-url-validation branch from f5d39c1 to 669096a Compare April 12, 2026 12:11
- Add is_safe_host() utility in network.py for hostname validation
- Update validate_data_uri() to validate resolved hostname after allowlist match
- Add DATASET_IMPORT_ALLOW_INTERNAL_DATA_URLS config flag for on-premises deployments
- Add unit tests for is_safe_host() and validate_data_uri()

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@sha174n sha174n force-pushed the fix/dataset-import-url-validation branch from 669096a to ec17725 Compare April 12, 2026 12:13
@sha174n sha174n changed the title fix(security): block SSRF in dataset import URL validation fix(dataset-import): validate hostname in import URL allowlist check Apr 12, 2026
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

@bito-code-review bito-code-review Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review Agent Run #78c147

Actionable Suggestions - 1
  • tests/unit_tests/utils/test_network.py - 1
    • Parametrize argument should be tuple not string · Line 24-28
Review Details
  • Files reviewed - 5 · Commit Range: f5d39c1..f5d39c1
    • superset/commands/dataset/importers/v1/utils.py
    • superset/config.py
    • superset/utils/network.py
    • tests/unit_tests/datasets/commands/importers/v1/import_test.py
    • tests/unit_tests/utils/test_network.py
  • Files skipped - 0
  • Tools
    • Whispers (Secret Scanner) - ✔︎ Successful
    • Detect-secrets (Secret Scanner) - ✔︎ Successful
    • MyPy (Static Code Analysis) - ✔︎ Successful
    • Astral Ruff (Static Code Analysis) - ✔︎ Successful

Bito Usage Guide

Commands

Type the following command in the pull request comment and save the comment.

  • /review - Manually triggers a full AI review.

  • /pause - Pauses automatic reviews on this pull request.

  • /resume - Resumes automatic reviews.

  • /resolve - Marks all Bito-posted review comments as resolved.

  • /abort - Cancels all in-progress reviews.

Refer to the documentation for additional commands.

Configuration

This repository uses Superset You can customize the agent settings here or contact your Bito workspace admin at evan@preset.io.

Documentation & Help

AI Code Review powered by Bito Logo

Comment thread tests/unit_tests/utils/test_network.py
sha174n and others added 5 commits April 12, 2026 13:20
…handling

- Add 100.64.0.0/10 (RFC 6598 shared address space) to blocked ranges
- Unwrap IPv4-mapped IPv6 addresses before network membership checks
- Fix return type annotations on new test functions
- Add tests for CGNAT range and IPv4-mapped IPv6 bypass cases

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…redirects

- Add _ValidatingRedirectHandler to re-run URL validation on each redirect
- Replace urlopen() with build_opener(_ValidatingRedirectHandler) in load_data()
- Add test for redirect handler blocking disallowed redirect targets

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… change

Update test mocks to match the request.build_opener() call pattern
introduced in the previous commit, and apply ruff formatting fixes.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…onnectors and webhooks

Extend host validation to database connector setup and webhook
notification dispatch, using the same hostname check introduced for
dataset import URL validation.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@sha174n sha174n changed the title fix(dataset-import): validate hostname in import URL allowlist check fix(network): validate target hostname in outbound requests Apr 12, 2026
sha174n and others added 3 commits April 12, 2026 16:15
Comment thread superset/commands/dataset/importers/v1/utils.py Outdated
Comment thread superset/commands/dataset/importers/v1/utils.py
Comment thread superset/utils/network.py Outdated
sha174n and others added 5 commits April 26, 2026 22:06
Prevent authenticated users from reading arbitrary host files via the dataset
import API by restricting file:// URIs to the bundled examples folder. Previously
any file:// path was accepted unconditionally; now paths outside the examples
directory raise DatasetForbiddenDataURI.

- Update validate_data_uri() to resolve and compare against get_examples_folder()
- Update tests: replace file_scheme_always_allowed with two focused tests that
  verify in-examples URIs pass and out-of-examples URIs are blocked
- Add type annotations to test_validate_data_uri parameter list

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@rusackas

rusackas commented Jun 1, 2026

Copy link
Copy Markdown
Member

Looks like CI just needs a little love, then we can merge this!

@sha174n

sha174n commented Jun 2, 2026

Copy link
Copy Markdown
Contributor Author

Looks like CI just needs a little love, then we can merge this!

Fixed now, thanks!

@rusackas

rusackas commented Jun 3, 2026

Copy link
Copy Markdown
Member

Sending some comments as a DM...

@rusackas rusackas left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The webhook and dataset-import SSRF guards look right, but the Databend change is a blocker for me: db_engine_specs/databend.py swaps is_hostname_valid for is_safe_host, which rejects all private/RFC-1918/loopback IPs as a hard error. A Databend server on an internal network is a normal deployment, and a DB-connection host isn't an SSRF sink the way an outbound fetch is — this blocks saving/testing valid internal connections with no opt-out (the dataset-import path got DATASET_IMPORT_ALLOW_INTERNAL_DATA_URLS; this didn't). It's also inconsistent with clickhouse/databricks/couchbase, which still use is_hostname_valid. Could we revert the databend spec to is_hostname_valid (or gate it behind a flag)? The webhook/import guards look good to go.

sha174n added 2 commits June 3, 2026 19:21
…d opt-out flag

Three follow-ups to review feedback on this PR:

- Revert is_safe_host swap in db_engine_specs/databend.py. Operator-
  configured database hosts on internal networks are trusted under the
  current threat model; is_safe_host's private-IP rejection breaks
  legitimate connections to internal DBs. Restore is_hostname_valid.

- Add ALERT_REPORTS_WEBHOOK_ALLOW_INTERNAL_HOSTS opt-out flag
  (defaults to False) parallel to DATASET_IMPORT_ALLOW_INTERNAL_DATA_URLS.
  Internal webhook targets (chatops bridges, internal automation,
  on-premises Mattermost/Rocket.Chat) are a common deployment shape;
  the prior PR hard-blocked them without an escape hatch. UPDATING.md
  describes the new behavior and the opt-out.

- Drop the test_impala.py edits (localhost -> impala.example.com)
  that had no matching production code change.
@pull-request-size pull-request-size Bot added size/XL and removed size/L labels Jun 3, 2026
The existing tests use impala.example.com which does not resolve in
network-isolated CI, so is_safe_host fails closed and the cancel call
is refused. The happy-path tests are not there to exercise the safety
check (dedicated block/allow tests cover that), so mock is_safe_host
to True and assert the post URL construction independently of DNS.
@bito-code-review

bito-code-review Bot commented Jun 3, 2026

Copy link
Copy Markdown
Contributor

Code Review Agent Run #eaf526

Actionable Suggestions - 0
Review Details
  • Files reviewed - 5 · Commit Range: 9cd1469..fbe4927
    • superset/config.py
    • superset/db_engine_specs/databend.py
    • superset/reports/notifications/webhook.py
    • tests/unit_tests/db_engine_specs/test_impala.py
    • superset/db_engine_specs/impala.py
  • Files skipped - 2
    • superset-frontend/packages/superset-ui-core/package.json - Reason: Filter setting
    • UPDATING.md - Reason: Filter setting
  • Tools
    • Whispers (Secret Scanner) - ✔︎ Successful
    • Detect-secrets (Secret Scanner) - ✔︎ Successful
    • MyPy (Static Code Analysis) - ✔︎ Successful
    • Astral Ruff (Static Code Analysis) - ✔︎ Successful

Bito Usage Guide

Commands

Type the following command in the pull request comment and save the comment.

  • /review - Manually triggers a full AI review.

  • /pause - Pauses automatic reviews on this pull request.

  • /resume - Resumes automatic reviews.

  • /resolve - Marks all Bito-posted review comments as resolved.

  • /abort - Cancels all in-progress reviews.

Refer to the documentation for additional commands.

Configuration

This repository uses Superset You can customize the agent settings here or contact your Bito workspace admin at evan@preset.io.

Documentation & Help

AI Code Review powered by Bito Logo

…t-url-validation

# Conflicts:
#	UPDATING.md
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

change:backend Requires changing the backend risk:breaking-change Issues or PRs that will introduce breaking changes size/XL

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants