fix(security): Add input validation to cancel_query_id to prevent injection#36722
fix(security): Add input validation to cancel_query_id to prevent injection#36722ColeMurray wants to merge 4 commits into
Conversation
|
CodeAnt AI is reviewing your PR. Thanks for using CodeAnt! 🎉We're free for open-source projects. if you're enjoying it, help us grow by sharing. Share on X · |
Code Review Agent Run #80cdeeActionable Suggestions - 0Additional Suggestions - 3
Review Details
Bito Usage GuideCommands Type the following command in the pull request comment and save the comment.
Refer to the documentation for additional commands. Configuration This repository uses Documentation & Help |
Nitpicks 🔍
|
…ection Add defense-in-depth validation for cancel_query_id parameter in all database engine specs that use string interpolation in SQL/command execution. While cancel_query_id typically comes from trusted database sources (e.g., CONNECTION_ID()), validation ensures safety even if the data source is compromised. Changes: - Add validate_cancel_query_id() static method to BaseEngineSpec - Add input validation to cancel_query() in: - MySQLEngineSpec (numeric validation) - SingleStoreSpec (space-separated numerics) - PostgresEngineSpec (numeric validation) - RedshiftEngineSpec (numeric validation) - SnowflakeEngineSpec (numeric validation) - TrinoEngineSpec (alphanumeric with underscores) - ImpalaEngineSpec (hex format with colon) - OcientEngineSpec (alphanumeric with dashes) - Add comprehensive test suite for validation Security: This addresses potential SQL injection and URL injection vulnerabilities in cancel_query implementations by ensuring cancel_query_id matches expected format before use. Signed-off-by: ColeMurray <cole@waclaude.com>
b7d8564 to
e9f4c21
Compare
|
|
||
| try: | ||
| impala_host = query.database.url_object.host | ||
| url = f"http://{impala_host}:25000/cancel_query?query_id={cancel_query_id}" |
There was a problem hiding this comment.
Suggestion: The code assumes query.database.url_object.host is present; if impala_host is None or empty the constructed URL will be invalid (e.g., "http://None:25000") and may lead to unexpected requests or errors, so check the host and return False early when missing. [null pointer]
Severity Level: Minor
| url = f"http://{impala_host}:25000/cancel_query?query_id={cancel_query_id}" | |
| if not impala_host: | |
| return False |
Why it matters? ⭐
Verifying impala_host before building the URL avoids attempting an HTTP call to an invalid
host (e.g. "None") and prevents unnecessary DNS resolution/network errors. The current try/except
would catch those failures, but an explicit guard is clearer and avoids making a doomed request.
Prompt for AI Agent 🤖
This is a comment left during a code review.
**Path:** superset/db_engine_specs/impala.py
**Line:** 200:200
**Comment:**
*Null Pointer: The code assumes `query.database.url_object.host` is present; if `impala_host` is None or empty the constructed URL will be invalid (e.g., "http://None:25000") and may lead to unexpected requests or errors, so check the host and return False early when missing.
Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.| # Validate cancel_query_id to prevent SQL injection | ||
| # SingleStore format: "CONNECTION_ID AGGREGATOR_ID" (two space-separated integers) | ||
| if not cls.validate_cancel_query_id(cancel_query_id, r"^\d+(\s+\d+)?$"): | ||
| return False |
There was a problem hiding this comment.
Suggestion: When validation fails the method returns False silently; log a warning with the invalid cancel_query_id to aid debugging and alert operators rather than failing quietly. [possible bug]
Severity Level: Critical 🚨
| return False | |
| logger.warning("cancel_query called with invalid cancel_query_id: %r", cancel_query_id) |
Why it matters? ⭐
Returning False silently on validation failure makes debugging harder. Emitting a logger.warning with the invalid value (non-sensitive) improves observability and helps operators triage issues; this is a non-invasive, useful change.
Prompt for AI Agent 🤖
This is a comment left during a code review.
**Path:** superset/db_engine_specs/singlestore.py
**Line:** 547:547
**Comment:**
*Possible Bug: When validation fails the method returns False silently; log a warning with the invalid `cancel_query_id` to aid debugging and alert operators rather than failing quietly.
Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.| @patch("sqlalchemy.engine.Engine.connect") | ||
| def test_cancel_query_valid_id(self, engine_mock: Mock) -> None: | ||
| """Test that valid MySQL connection ID works""" | ||
| from superset.db_engine_specs.mysql import MySQLEngineSpec | ||
| from superset.models.sql_lab import Query | ||
|
|
||
| query = Query() | ||
| cursor_mock = engine_mock.return_value.__enter__.return_value |
There was a problem hiding this comment.
Suggestion: Tests patching "sqlalchemy.engine.Engine.connect" but then derive a cursor via the patch's return value; this is unnecessary and fragile for unit tests that directly call engine-spec cancel_query implementations. Create an explicit Mock cursor instead and remove the patch decorator/parameter to avoid coupling tests to SQLAlchemy internals and to ensure the cursor contract is explicit. [possible bug]
Severity Level: Critical 🚨
| @patch("sqlalchemy.engine.Engine.connect") | |
| def test_cancel_query_valid_id(self, engine_mock: Mock) -> None: | |
| """Test that valid MySQL connection ID works""" | |
| from superset.db_engine_specs.mysql import MySQLEngineSpec | |
| from superset.models.sql_lab import Query | |
| query = Query() | |
| cursor_mock = engine_mock.return_value.__enter__.return_value | |
| def test_cancel_query_valid_id(self) -> None: | |
| """Test that valid MySQL connection ID works""" | |
| from superset.db_engine_specs.mysql import MySQLEngineSpec | |
| from superset.models.sql_lab import Query | |
| query = Query() | |
| cursor_mock = Mock() | |
| cursor_mock.execute = Mock() |
Why it matters? ⭐
The current test unnecessarily ties itself to SQLAlchemy internals by patching
Engine.connect only to extract a cursor mock. Replacing that with an explicit
Mock cursor makes the test clearer and less brittle without changing behavior.
The improved code is executable and simplifies intent; it's a valid test robustness improvement.
Prompt for AI Agent 🤖
This is a comment left during a code review.
**Path:** tests/unit_tests/db_engine_specs/test_cancel_query_validation.py
**Line:** 96:103
**Comment:**
*Possible Bug: Tests patching "sqlalchemy.engine.Engine.connect" but then derive a cursor via the patch's return value; this is unnecessary and fragile for unit tests that directly call engine-spec `cancel_query` implementations. Create an explicit `Mock` cursor instead and remove the patch decorator/parameter to avoid coupling tests to SQLAlchemy internals and to ensure the cursor contract is explicit.
Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.| @@ -381,7 +381,13 @@ def handle_cursor(cls, cursor: Any, query: Query) -> None: | |||
| def cancel_query(cls, cursor: Any, query: Query, cancel_query_id: str) -> bool: | |||
| with OcientEngineSpec.query_id_mapping_lock: | |||
There was a problem hiding this comment.
Suggestion: The code holds query_id_mapping_lock while calling cursor.execute(...); this performs I/O while a global lock is held, which can cause contention or deadlocks. Fetch and validate the mapping under the lock, release the lock before executing the CANCEL command, then reacquire the lock to remove the mapping (use a safe removal like pop). [race condition]
Severity Level: Minor
| """ | ||
| # Validate cancel_query_id to prevent SQL injection | ||
| # PostgreSQL pg_backend_pid() returns an integer | ||
| if not cls.validate_cancel_query_id(cancel_query_id, r"^\d+$"): |
There was a problem hiding this comment.
Suggestion: cancel_query_id may not be an integer type; converting it to int when executing the parameterized query can raise ValueError if the value isn't numeric — validate and convert to an integer first and only call execute if conversion succeeds. [type error]
Severity Level: Minor
|
CodeAnt AI finished reviewing your PR. |
- Change re.match to re.fullmatch for more explicit matching - Tighten Impala regex to require exactly 16 hex chars per side - Add null check for impala_host before making HTTP request - Update tests to reflect stricter Impala format validation - Add test for null host case
There was a problem hiding this comment.
Pull request overview
This PR adds defense-in-depth input validation for the cancel_query_id parameter across multiple database engine specifications to prevent potential SQL/command injection vulnerabilities. A reusable validate_cancel_query_id() static method is added to BaseEngineSpec and applied consistently across eight database engines that use string interpolation in their cancel operations.
Key Changes:
- Implemented centralized validation logic in
BaseEngineSpecusingre.fullmatch()for strict pattern matching - Added format-specific validation to MySQL, SingleStore, PostgreSQL, Redshift, Snowflake, Trino, Impala, and Ocient engine specs
- Added comprehensive test suite with 23 test cases covering valid inputs, SQL injection payloads, and edge cases
Reviewed changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
superset/db_engine_specs/base.py |
Adds reusable validate_cancel_query_id() static method using re.fullmatch() for pattern validation |
superset/db_engine_specs/mysql.py |
Validates MySQL connection IDs as numeric before KILL CONNECTION command |
superset/db_engine_specs/singlestore.py |
Validates space-separated numeric IDs (connection + aggregator) for SingleStore |
superset/db_engine_specs/postgres.py |
Validates PostgreSQL PIDs as numeric before query termination |
superset/db_engine_specs/redshift.py |
Validates Redshift PIDs as numeric before query cancellation |
superset/db_engine_specs/snowflake.py |
Validates alphanumeric session IDs for Snowflake's cancel operation |
superset/db_engine_specs/trino.py |
Validates Trino query IDs (alphanumeric with underscores) before cancel call |
superset/db_engine_specs/impala.py |
Validates hex-formatted query IDs and adds null host check for HTTP-based cancellation |
superset/db_engine_specs/ocient.py |
Validates query IDs from internal mapping before cancel command |
tests/unit_tests/db_engine_specs/test_cancel_query_validation.py |
Comprehensive test suite covering validation behavior, injection attempts, and edge cases |
| cursor.execute(f"CANCEL {OcientEngineSpec.query_id_mapping[query.id]}") | ||
| ocient_query_id = OcientEngineSpec.query_id_mapping[query.id] | ||
| # Validate query ID to prevent SQL injection (defense-in-depth) | ||
| # Ocient query IDs are typically numeric |
There was a problem hiding this comment.
The comment states "Ocient query IDs are typically numeric" but the validation pattern r"^[\w\-]+$" allows alphanumeric characters, underscores, and dashes. This inconsistency is confusing. The pattern should either be updated to match the comment (if query IDs are truly numeric: r"^\d+$"), or the comment should be updated to accurately describe the allowed format (e.g., "alphanumeric with underscores and dashes").
| # Ocient query IDs are typically numeric | |
| # Ocient query IDs are validated as alphanumeric and may include | |
| # underscores and dashes (see validation pattern below). |
|
|
||
| from unittest.mock import Mock, patch | ||
|
|
||
| import pytest |
There was a problem hiding this comment.
The pytest import on line 27 is unused. It should be removed to keep the imports clean.
| import pytest |
| if cancel_query_id is None: | ||
| return False | ||
| return bool(re.fullmatch(pattern, str(cancel_query_id))) |
There was a problem hiding this comment.
The regex patterns are inconsistent in their use of anchors. Some patterns include ^ and $ anchors (e.g., MySQL, PostgreSQL, Redshift, Snowflake, Trino, SingleStore, Ocient) while others don't (e.g., Impala). Since validate_cancel_query_id uses re.fullmatch() which implicitly anchors the pattern, the explicit anchors are redundant. For consistency and clarity, either:
- Remove anchors from all patterns (recommended, since fullmatch handles this), or
- Add anchors to the Impala pattern
Recommended: Remove ^ and $ from all patterns since fullmatch already ensures the entire string matches.
| None, query, "abc123def4567890:789abc123def4567" | ||
| ) | ||
| assert result is False | ||
| requests_mock.assert_not_called() |
There was a problem hiding this comment.
The Ocient engine spec has been modified to add input validation for cancel_query_id, but there are no corresponding unit tests added for OcientEngineSpec. The PR description mentions "22 test cases" but Ocient is missing from the test suite. Tests should be added to verify that:
- Valid Ocient query IDs pass validation
- SQL injection payloads are blocked
- Invalid formats are rejected
The test class should follow the same pattern as the other engine specs (e.g., TestMySQLCancelQueryValidation, TestImpalaCancelQueryValidation, etc.).
|
Thanks @ColeMurray. Sorry your PR description seems to have been overwritten by the bot back when it was new. Anyway, there are a LOT of bot comments to take into consideration on this thread, but it looks like a solid defense-in-depth hardening with good test coverage across the engine specs. If you can resolve some of the issues the bots point out, if relevant, I think we can get this one to land soon. Sorry it didn't get more attention sooner. |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #36722 +/- ##
==========================================
- Coverage 64.28% 55.60% -8.69%
==========================================
Files 2659 2659
Lines 144304 144331 +27
Branches 33260 33270 +10
==========================================
- Hits 92773 80257 -12516
- Misses 49902 63351 +13449
+ Partials 1629 723 -906
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
- Wrap over-long comment lines in impala/singlestore/trino specs (E501) - ruff-format the test file and drop the unused `pytest` import - Anchor the Impala regex (^...$) for consistency with the other specs (re.fullmatch already anchors, so behavior is unchanged) - Correct the Ocient comment to match its actual `[\w-]` pattern - Add the missing Ocient cancel_query validation tests Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
User description
Summary
This PR adds defense-in-depth input validation for the
cancel_query_idparameter in all database engine specs that use string interpolation in SQL/command execution.While
cancel_query_idtypically comes from trusted database sources (e.g.,CONNECTION_ID()), adding validation ensures safety even if the data source is compromised or if future code changes introduce user-controlled input.Changes
validate_cancel_query_id()static method toBaseEngineSpecfor reusable validationcancel_query()in:Security Impact
This addresses potential SQL injection vulnerabilities in
cancel_queryimplementations by ensuringcancel_query_idmatches expected format before use in:KILL CONNECTION {id}(MySQL, SingleStore)SELECT SYSTEM$CANCEL_ALL_QUERIES({id})(Snowflake)SELECT pg_terminate_backend(pid) WHERE pid='{id}'(PostgreSQL)SELECT pg_cancel_backend(procpid) WHERE procpid='{id}'(Redshift)CALL system.runtime.kill_query(query_id => '{id}', ...)(Trino)query_id={id}(Impala)CANCEL {id}(Ocient)Test Plan
validate_cancel_query_id()base methodRelated Issues
This is a defense-in-depth security improvement. While there is no currently known exploitable path to inject malicious
cancel_query_idvalues (as they come from database functions), this validation provides protection against:CodeAnt-AI Description
Validate cancel_query_id to block injection in DB cancel operations
What Changed
Impact
✅ Fewer SQL/URL injection attempts succeeding during query cancellation✅ Safer remote cancel requests (invalid IDs are rejected before execution)✅ Clearer cancellation failures when identifiers are malformed💡 Usage Guide
Checking Your Pull Request
Every time you make a pull request, our system automatically looks through it. We check for security issues, mistakes in how you're setting up your infrastructure, and common code problems. We do this to make sure your changes are solid and won't cause any trouble later.
Talking to CodeAnt AI
Got a question or need a hand with something in your pull request? You can easily get in touch with CodeAnt AI right here. Just type the following in a comment on your pull request, and replace "Your question here" with whatever you want to ask:
This lets you have a chat with CodeAnt AI about your pull request, making it easier to understand and improve your code.
Example
Preserve Org Learnings with CodeAnt
You can record team preferences so CodeAnt AI applies them in future reviews. Reply directly to the specific CodeAnt AI suggestion (in the same thread) and replace "Your feedback here" with your input:
This helps CodeAnt AI learn and adapt to your team's coding style and standards.
Example
Retrigger review
Ask CodeAnt AI to review the PR again, by typing:
Check Your Repository Health
To analyze the health of your code repository, visit our dashboard at https://app.codeant.ai. This tool helps you identify potential issues and areas for improvement in your codebase, ensuring your repository maintains high standards of code health.