Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 21 additions & 0 deletions superset/db_engine_specs/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -2477,6 +2477,27 @@ def get_cancel_query_id( # pylint: disable=unused-argument

return None

@staticmethod
def validate_cancel_query_id(
cancel_query_id: str | None,
pattern: str = r"^\d+$",
) -> bool:
"""
Validate that a cancel_query_id matches expected format.

This is a defense-in-depth measure to prevent SQL injection in cancel_query
implementations that use string interpolation. While cancel_query_id typically
comes from trusted database sources (e.g., CONNECTION_ID()), validation ensures
safety even if the data source is compromised.

:param cancel_query_id: The query identifier to validate
:param pattern: Regex pattern to match (default: numeric only)
:return: True if valid, False otherwise
"""
if cancel_query_id is None:
return False
return bool(re.fullmatch(pattern, str(cancel_query_id)))
Comment on lines +2497 to +2499

Copilot AI Jan 6, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The regex patterns are inconsistent in their use of anchors. Some patterns include ^ and $ anchors (e.g., MySQL, PostgreSQL, Redshift, Snowflake, Trino, SingleStore, Ocient) while others don't (e.g., Impala). Since validate_cancel_query_id uses re.fullmatch() which implicitly anchors the pattern, the explicit anchors are redundant. For consistency and clarity, either:

  1. Remove anchors from all patterns (recommended, since fullmatch handles this), or
  2. Add anchors to the Impala pattern

Recommended: Remove ^ and $ from all patterns since fullmatch already ensures the entire string matches.

Copilot uses AI. Check for mistakes.

@classmethod
def cancel_query( # pylint: disable=unused-argument
cls,
Expand Down
11 changes: 10 additions & 1 deletion superset/db_engine_specs/impala.py
Original file line number Diff line number Diff line change
Expand Up @@ -204,11 +204,20 @@ def cancel_query(cls, cursor: Any, query: Query, cancel_query_id: str) -> bool:

:param cursor: New cursor instance to the db of the query
:param query: Query instance
:param cancel_query_id: impala db not need
:param cancel_query_id: Impala query ID in format "hex:hex"
:return: True if query cancelled successfully, False otherwise
"""
# Validate cancel_query_id to prevent URL injection
# Impala query IDs are in "hex:hex" form (16 hex chars per side)
if not cls.validate_cancel_query_id(
cancel_query_id, r"^[A-Fa-f0-9]{16}:[A-Fa-f0-9]{16}$"
):
return False

try:
impala_host = query.database.url_object.host
if not impala_host:
return False
url = f"http://{impala_host}:25000/cancel_query?query_id={cancel_query_id}"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: The code assumes query.database.url_object.host is present; if impala_host is None or empty the constructed URL will be invalid (e.g., "http://None:25000") and may lead to unexpected requests or errors, so check the host and return False early when missing. [null pointer]

Severity Level: Minor ⚠️

Suggested change
url = f"http://{impala_host}:25000/cancel_query?query_id={cancel_query_id}"
if not impala_host:
return False
Why it matters? ⭐

Verifying impala_host before building the URL avoids attempting an HTTP call to an invalid
host (e.g. "None") and prevents unnecessary DNS resolution/network errors. The current try/except
would catch those failures, but an explicit guard is clearer and avoids making a doomed request.

Prompt for AI Agent 🤖
This is a comment left during a code review.

**Path:** superset/db_engine_specs/impala.py
**Line:** 200:200
**Comment:**
	*Null Pointer: The code assumes `query.database.url_object.host` is present; if `impala_host` is None or empty the constructed URL will be invalid (e.g., "http://None:25000") and may lead to unexpected requests or errors, so check the host and return False early when missing.

Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.

response = requests.post(url, timeout=3)
except Exception: # pylint: disable=broad-except
Expand Down
5 changes: 5 additions & 0 deletions superset/db_engine_specs/mysql.py
Original file line number Diff line number Diff line change
Expand Up @@ -457,6 +457,11 @@ def cancel_query(cls, cursor: Any, query: Query, cancel_query_id: str) -> bool:
:param cancel_query_id: MySQL Connection ID
:return: True if query cancelled successfully, False otherwise
"""
# Validate cancel_query_id to prevent SQL injection
# MySQL CONNECTION_ID() returns an unsigned integer
if not cls.validate_cancel_query_id(cancel_query_id, r"^\d+$"):
Comment thread
ColeMurray marked this conversation as resolved.
return False

try:
cursor.execute(f"KILL CONNECTION {cancel_query_id}")
except Exception: # pylint: disable=broad-except
Expand Down
8 changes: 7 additions & 1 deletion superset/db_engine_specs/ocient.py
Original file line number Diff line number Diff line change
Expand Up @@ -392,7 +392,13 @@ def handle_cursor(cls, cursor: Any, query: Query) -> None:
def cancel_query(cls, cursor: Any, query: Query, cancel_query_id: str) -> bool:
with OcientEngineSpec.query_id_mapping_lock:

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: The code holds query_id_mapping_lock while calling cursor.execute(...); this performs I/O while a global lock is held, which can cause contention or deadlocks. Fetch and validate the mapping under the lock, release the lock before executing the CANCEL command, then reacquire the lock to remove the mapping (use a safe removal like pop). [race condition]

Severity Level: Minor ⚠️

if query.id in OcientEngineSpec.query_id_mapping:
cursor.execute(f"CANCEL {OcientEngineSpec.query_id_mapping[query.id]}")
ocient_query_id = OcientEngineSpec.query_id_mapping[query.id]
# Validate query ID to prevent SQL injection (defense-in-depth)
# Ocient query IDs are alphanumeric with underscores and dashes
if not cls.validate_cancel_query_id(str(ocient_query_id), r"^[\w\-]+$"):
return False

cursor.execute(f"CANCEL {ocient_query_id}")
# Query has been cancelled, so we can safely remove the cursor from
# the cache
del OcientEngineSpec.query_id_mapping[query.id]
Expand Down
5 changes: 5 additions & 0 deletions superset/db_engine_specs/postgres.py
Original file line number Diff line number Diff line change
Expand Up @@ -862,6 +862,11 @@ def cancel_query(cls, cursor: Any, query: Query, cancel_query_id: str) -> bool:
:param cancel_query_id: Postgres PID
:return: True if query cancelled successfully, False otherwise
"""
# Validate cancel_query_id to prevent SQL injection
# PostgreSQL pg_backend_pid() returns an integer
if not cls.validate_cancel_query_id(cancel_query_id, r"^\d+$"):

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: cancel_query_id may not be an integer type; converting it to int when executing the parameterized query can raise ValueError if the value isn't numeric — validate and convert to an integer first and only call execute if conversion succeeds. [type error]

Severity Level: Minor ⚠️

return False

try:
cursor.execute(
"SELECT pg_terminate_backend(pid) " # noqa: S608
Expand Down
5 changes: 5 additions & 0 deletions superset/db_engine_specs/redshift.py
Original file line number Diff line number Diff line change
Expand Up @@ -346,6 +346,11 @@ def cancel_query(cls, cursor: Any, query: Query, cancel_query_id: str) -> bool:
:param cancel_query_id: Redshift PID
:return: True if query cancelled successfully, False otherwise
"""
# Validate cancel_query_id to prevent SQL injection
# Redshift pg_backend_pid() returns an integer
if not cls.validate_cancel_query_id(cancel_query_id, r"^\d+$"):
return False
Comment thread
ColeMurray marked this conversation as resolved.

try:
logger.info("Killing Redshift PID:%s", str(cancel_query_id))
cursor.execute(
Expand Down
5 changes: 5 additions & 0 deletions superset/db_engine_specs/singlestore.py
Original file line number Diff line number Diff line change
Expand Up @@ -577,6 +577,11 @@ def cancel_query(cls, cursor: Any, query: Query, cancel_query_id: str) -> bool:
:param cancel_query_id: SingleStore connection ID and aggregator ID
:return: True if query cancelled successfully, False otherwise
"""
# Validate cancel_query_id to prevent SQL injection
# SingleStore: "CONNECTION_ID AGGREGATOR_ID" (two space-separated ints)
if not cls.validate_cancel_query_id(cancel_query_id, r"^\d+(\s+\d+)?$"):
return False

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: When validation fails the method returns False silently; log a warning with the invalid cancel_query_id to aid debugging and alert operators rather than failing quietly. [possible bug]

Severity Level: Critical 🚨

Suggested change
return False
logger.warning("cancel_query called with invalid cancel_query_id: %r", cancel_query_id)
Why it matters? ⭐

Returning False silently on validation failure makes debugging harder. Emitting a logger.warning with the invalid value (non-sensitive) improves observability and helps operators triage issues; this is a non-invasive, useful change.

Prompt for AI Agent 🤖
This is a comment left during a code review.

**Path:** superset/db_engine_specs/singlestore.py
**Line:** 547:547
**Comment:**
	*Possible Bug: When validation fails the method returns False silently; log a warning with the invalid `cancel_query_id` to aid debugging and alert operators rather than failing quietly.

Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.


try:
cursor.execute(f"KILL CONNECTION {cancel_query_id}")
except Exception: # pylint: disable=broad-except
Expand Down
5 changes: 5 additions & 0 deletions superset/db_engine_specs/snowflake.py
Original file line number Diff line number Diff line change
Expand Up @@ -334,6 +334,11 @@ def cancel_query(cls, cursor: Any, query: Query, cancel_query_id: str) -> bool:
:param cancel_query_id: Snowflake Session ID
:return: True if query cancelled successfully, False otherwise
"""
# Validate cancel_query_id to prevent SQL injection
# Snowflake CURRENT_SESSION() returns an alphanumeric VARCHAR session ID
if not cls.validate_cancel_query_id(cancel_query_id, r"^[a-zA-Z0-9]+$"):
return False

try:
cursor.execute(f"SELECT SYSTEM$CANCEL_ALL_QUERIES({cancel_query_id})")
except Exception: # pylint: disable=broad-except
Expand Down
6 changes: 6 additions & 0 deletions superset/db_engine_specs/trino.py
Original file line number Diff line number Diff line change
Expand Up @@ -429,6 +429,12 @@ def cancel_query(cls, cursor: Cursor, query: Query, cancel_query_id: str) -> boo
:param cancel_query_id: Trino `queryId`
:return: True if query cancelled successfully, False otherwise
"""
# Validate cancel_query_id to prevent SQL injection
# Trino query IDs look like yyyymmdd_hhmmss_nnnnn_xxxxx
# (alphanumeric with underscores)
if not cls.validate_cancel_query_id(cancel_query_id, r"^[a-zA-Z0-9_]+$"):
return False

try:
cursor.execute(
f"CALL system.runtime.kill_query(query_id => '{cancel_query_id}',"
Expand Down
Loading
Loading