fix(network): validate target hostname in outbound requests#39301
fix(network): validate target hostname in outbound requests#39301sha174n wants to merge 75 commits into
Conversation
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #39301 +/- ##
==========================================
- Coverage 64.30% 64.28% -0.03%
==========================================
Files 2657 2657
Lines 144060 144128 +68
Branches 33216 33232 +16
==========================================
+ Hits 92641 92651 +10
- Misses 49797 49853 +56
- Partials 1622 1624 +2
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
f5d39c1 to
669096a
Compare
- Add is_safe_host() utility in network.py for hostname validation - Update validate_data_uri() to validate resolved hostname after allowlist match - Add DATASET_IMPORT_ALLOW_INTERNAL_DATA_URLS config flag for on-premises deployments - Add unit tests for is_safe_host() and validate_data_uri() Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
669096a to
ec17725
Compare
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Code Review Agent Run #78c147
Actionable Suggestions - 1
-
tests/unit_tests/utils/test_network.py - 1
- Parametrize argument should be tuple not string · Line 24-28
Review Details
-
Files reviewed - 5 · Commit Range:
f5d39c1..f5d39c1- superset/commands/dataset/importers/v1/utils.py
- superset/config.py
- superset/utils/network.py
- tests/unit_tests/datasets/commands/importers/v1/import_test.py
- tests/unit_tests/utils/test_network.py
-
Files skipped - 0
-
Tools
- Whispers (Secret Scanner) - ✔︎ Successful
- Detect-secrets (Secret Scanner) - ✔︎ Successful
- MyPy (Static Code Analysis) - ✔︎ Successful
- Astral Ruff (Static Code Analysis) - ✔︎ Successful
Bito Usage Guide
Commands
Type the following command in the pull request comment and save the comment.
-
/review- Manually triggers a full AI review. -
/pause- Pauses automatic reviews on this pull request. -
/resume- Resumes automatic reviews. -
/resolve- Marks all Bito-posted review comments as resolved. -
/abort- Cancels all in-progress reviews.
Refer to the documentation for additional commands.
Configuration
This repository uses Superset You can customize the agent settings here or contact your Bito workspace admin at evan@preset.io.
Documentation & Help
…handling - Add 100.64.0.0/10 (RFC 6598 shared address space) to blocked ranges - Unwrap IPv4-mapped IPv6 addresses before network membership checks - Fix return type annotations on new test functions - Add tests for CGNAT range and IPv4-mapped IPv6 bypass cases Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…redirects - Add _ValidatingRedirectHandler to re-run URL validation on each redirect - Replace urlopen() with build_opener(_ValidatingRedirectHandler) in load_data() - Add test for redirect handler blocking disallowed redirect targets Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… change Update test mocks to match the request.build_opener() call pattern introduced in the previous commit, and apply ruff formatting fixes. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…onnectors and webhooks Extend host validation to database connector setup and webhook notification dispatch, using the same hostname check introduced for dataset import URL validation. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Prevent authenticated users from reading arbitrary host files via the dataset import API by restricting file:// URIs to the bundled examples folder. Previously any file:// path was accepted unconditionally; now paths outside the examples directory raise DatasetForbiddenDataURI. - Update validate_data_uri() to resolve and compare against get_examples_folder() - Update tests: replace file_scheme_always_allowed with two focused tests that verify in-examples URIs pass and out-of-examples URIs are blocked - Add type annotations to test_validate_data_uri parameter list Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
Looks like CI just needs a little love, then we can merge this! |
Fixed now, thanks! |
|
Sending some comments as a DM... |
rusackas
left a comment
There was a problem hiding this comment.
The webhook and dataset-import SSRF guards look right, but the Databend change is a blocker for me: db_engine_specs/databend.py swaps is_hostname_valid for is_safe_host, which rejects all private/RFC-1918/loopback IPs as a hard error. A Databend server on an internal network is a normal deployment, and a DB-connection host isn't an SSRF sink the way an outbound fetch is — this blocks saving/testing valid internal connections with no opt-out (the dataset-import path got DATASET_IMPORT_ALLOW_INTERNAL_DATA_URLS; this didn't). It's also inconsistent with clickhouse/databricks/couchbase, which still use is_hostname_valid. Could we revert the databend spec to is_hostname_valid (or gate it behind a flag)? The webhook/import guards look good to go.
…d opt-out flag Three follow-ups to review feedback on this PR: - Revert is_safe_host swap in db_engine_specs/databend.py. Operator- configured database hosts on internal networks are trusted under the current threat model; is_safe_host's private-IP rejection breaks legitimate connections to internal DBs. Restore is_hostname_valid. - Add ALERT_REPORTS_WEBHOOK_ALLOW_INTERNAL_HOSTS opt-out flag (defaults to False) parallel to DATASET_IMPORT_ALLOW_INTERNAL_DATA_URLS. Internal webhook targets (chatops bridges, internal automation, on-premises Mattermost/Rocket.Chat) are a common deployment shape; the prior PR hard-blocked them without an escape hatch. UPDATING.md describes the new behavior and the opt-out. - Drop the test_impala.py edits (localhost -> impala.example.com) that had no matching production code change.
The existing tests use impala.example.com which does not resolve in network-isolated CI, so is_safe_host fails closed and the cancel call is refused. The happy-path tests are not there to exercise the safety check (dedicated block/allow tests cover that), so mock is_safe_host to True and assert the post URL construction independently of DNS.
Code Review Agent Run #eaf526Actionable Suggestions - 0Review Details
Bito Usage GuideCommands Type the following command in the pull request comment and save the comment.
Refer to the documentation for additional commands. Configuration This repository uses Documentation & Help |
…t-url-validation # Conflicts: # UPDATING.md
SUMMARY
Extends the hostname allowlist check introduced for dataset import URLs to cover two additional outbound HTTP paths from the Superset backend: the report webhook dispatcher and the Impala engine spec's cancel-query call.
Changes:
superset/utils/network.py: addsis_safe_host()helper that resolves a hostname and returnsTrueonly when all resolved addresses fall outside loopback, private, link-local, and shared address ranges.superset/commands/dataset/importers/v1/utils.py: applies the host check when loading data from a configured URL; also validates redirect targets before following them.superset/config.py: adds three opt-out flags (DATASET_IMPORT_ALLOW_INTERNAL_DATA_URLS,ALERT_REPORTS_WEBHOOK_ALLOW_INTERNAL_HOSTS,IMPALA_CANCEL_QUERY_ALLOW_INTERNAL_HOSTS, all defaultFalse) for deployments where the respective target legitimately points to internal hosts.superset/db_engine_specs/impala.py: validates the database host before therequests.post(...)call incancel_query(); gated by the Impala opt-out flag.superset/reports/notifications/webhook.py: validates the webhook target host before dispatching; gated by the webhook opt-out flag.BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF
N/A, no UI changes.
TESTING INSTRUCTIONS
ADDITIONAL INFORMATION