Skip to content

fix(sql): broaden mutating-statement detection in SQL Lab parser#40421

Open
sha174n wants to merge 12 commits into
apache:masterfrom
sha174n:fix/sql-coverage-20260525
Open

fix(sql): broaden mutating-statement detection in SQL Lab parser#40421
sha174n wants to merge 12 commits into
apache:masterfrom
sha174n:fix/sql-coverage-20260525

Conversation

@sha174n

@sha174n sha174n commented May 25, 2026

Copy link
Copy Markdown
Contributor

Summary

Broadens read-only-mode coverage in the SQL Lab parser. is_mutating() and check_functions_present() in superset/sql/parse.py previously missed several construct shapes that share a meta cause: AST nodes that are not subclasses of the expected types, opaque exp.Command nodes for PostgreSQL constructs that sqlglot cannot fully structure, and function-name patterns that mutate server-side state without a wrapping DML node.

Scope of this change:

  1. PostgreSQL opaque exp.Command constructs. is_mutating() now classifies these as mutating alongside the existing DO and EXPLAIN ANALYZE handling: PREPARE, EXECUTE, CALL, COPY, GRANT, REVOKE, SET, REFRESH, REINDEX, VACUUM.
  2. Structured nodes added to the mutating-node tuple. Some dialects parse COPY / GRANT / REVOKE into exp.Copy / exp.Grant / exp.Revoke instead of exp.Command, so node-type matching is needed alongside the command-name check.
  3. PostgreSQL large-object writers classified as mutating by function name. lo_from_bytea, lo_export, lo_import, lo_put, lo_create, lowrite appear as plain function calls inside an exp.Select wrapper with no enclosing mutating AST node. is_mutating() now matches on function name. The full lo_* family (writers and readers) is also added to the PostgreSQL entry in DISALLOWED_SQL_FUNCTIONS for defense in depth.
  4. SELECT ... INTO new_table FROM ... (PostgreSQL CTAS variant). Parses as exp.Select with an into arg; now treated as mutating since it creates a new relation.
  5. MySQL @@<name> session parameters. check_functions_present() now walks exp.SessionParameter so that denylist entries like version or hostname match SELECT @@version / SELECT @@hostname. exp.SessionParameter is not a subclass of exp.Func, so the existing walk skipped it.

Test plan

  • pytest tests/unit_tests/sql/parse_tests.py (562 pass, no regressions)
  • pytest tests/unit_tests/sql/execution/ tests/unit_tests/sql_lab_test.py (117 pass, no regressions)
  • pre-commit run clean

New parametrized cases in tests/unit_tests/sql/parse_tests.py:

  • test_is_mutating_postgres_function_and_select_intolo_* writers, lo_* readers (negative, blocked separately via denylist), case-insensitivity, SELECT INTO.
  • test_is_mutating_postgres_command_constructsPREPARE/EXECUTE/CALL/COPY/GRANT/REVOKE/SET/REFRESH/REINDEX/VACUUM, with pre-existing DO and EXPLAIN ANALYZE controls included.
  • test_check_functions_present_session_parameter — MySQL @@version/@@hostname etc.; lo_* defense-in-depth via denylist.

is_mutating() and check_functions_present() in superset/sql/parse.py
previously missed several construct shapes that share a meta cause:
AST nodes that are not subclasses of the expected types, opaque
exp.Command nodes for PostgreSQL constructs that sqlglot cannot fully
structure, and function-name patterns that mutate server state without
a wrapping DML node.

This change addresses:

1. PostgreSQL opaque exp.Command constructs now classified as mutating
   alongside the existing DO and EXPLAIN ANALYZE handling: PREPARE,
   EXECUTE, CALL, COPY, GRANT, REVOKE, SET, REFRESH, REINDEX, VACUUM.

2. Structured nodes added to the mutating-node tuple: exp.Copy,
   exp.Grant, exp.Revoke. Some dialects parse these into structured
   nodes (e.g. COPY in PostgreSQL becomes exp.Copy, not exp.Command),
   so node-type matching is needed in addition to command-name
   matching.

3. PostgreSQL large-object writer functions classified as mutating by
   name: lo_from_bytea, lo_export, lo_import, lo_put, lo_create,
   lowrite. These appear as plain function calls inside an exp.Select
   wrapper and have no enclosing mutating AST node. The full lo_*
   family (writers and readers) is also added to the PostgreSQL entry
   in DISALLOWED_SQL_FUNCTIONS for defense in depth.

4. SELECT ... INTO new_table FROM ... parses as exp.Select with an
   `into` arg (PostgreSQL CTAS variant). It creates a new relation
   and is now treated as mutating.

5. check_functions_present() now visits exp.SessionParameter so that
   MySQL's @@<name> syntax matches denylist entries like `version`
   or `hostname`. exp.SessionParameter is not a subclass of exp.Func,
   so the existing exp.Func walk skipped it.

Tests
-----
- New: test_is_mutating_postgres_function_and_select_into covers the
  lo_* writers (positive), lo_* readers (negative, blocked separately
  via denylist), case-insensitivity, and SELECT INTO.
- New: test_is_mutating_postgres_command_constructs covers
  PREPARE/EXECUTE/CALL/COPY/GRANT/REVOKE/SET/REFRESH/REINDEX/VACUUM,
  with the pre-existing DO and EXPLAIN ANALYZE controls included.
- New: test_check_functions_present_session_parameter covers MySQL
  @@Version / @@hostname etc., plus lo_* function denylist matching.
- All 562 tests in tests/unit_tests/sql/parse_tests.py pass.
- All 117 tests in tests/unit_tests/sql/execution/ + sql_lab_test.py
  pass; no regressions.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@codecov

codecov Bot commented May 25, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 61.40351% with 22 lines in your changes missing coverage. Please review.
✅ Project coverage is 64.31%. Comparing base (74845ea) to head (812ce07).

Files with missing lines Patch % Lines
superset/sql/parse.py 67.50% 8 Missing and 5 partials ⚠️
superset/models/helpers.py 50.00% 2 Missing and 2 partials ⚠️
superset/commands/sql_lab/estimate.py 50.00% 1 Missing and 1 partial ⚠️
superset/sql_lab.py 50.00% 1 Missing and 1 partial ⚠️
superset/sql/execution/executor.py 0.00% 1 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##           master   #40421   +/-   ##
=======================================
  Coverage   64.30%   64.31%           
=======================================
  Files        2657     2657           
  Lines      144060   144081   +21     
  Branches    33216    33221    +5     
=======================================
+ Hits        92641    92665   +24     
+ Misses      49797    49787   -10     
- Partials     1622     1629    +7     
Flag Coverage Δ
hive 39.44% <15.78%> (-0.01%) ⬇️
mysql 58.21% <57.89%> (+0.02%) ⬆️
postgres 58.28% <59.64%> (+0.01%) ⬆️
presto 41.03% <15.78%> (-0.01%) ⬇️
python 59.75% <59.64%> (+0.01%) ⬆️
sqlite 57.87% <19.29%> (-0.01%) ⬇️
unit 100.00% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

sha174n and others added 6 commits May 26, 2026 17:00
Follow-up to the initial broadening of is_mutating() and
check_functions_present(). This change closes the remaining bypass
surfaces that the SQL Lab read-only gate did not catch:

1. PostgreSQL information_schema views. Adds
   information_schema.tables, .columns, .schemata, .views,
   .routines, .role_table_grants, .role_column_grants,
   .role_routine_grants, .table_privileges, .column_privileges,
   .usage_privileges, .key_column_usage, .table_constraints,
   .referential_constraints, .view_table_usage, .applicable_roles,
   .enabled_roles to DISALLOWED_SQL_TABLES['postgresql'].

   These entries are schema-qualified, so they require the new
   schema-aware matching in check_tables_present (see below).
   Without this change, a user table happening to be named `tables`
   or `columns` would have been falsely blocked alongside the
   information_schema views.

2. check_tables_present schema-qualified matching.

   The previous matcher built `present = {t.table.lower() ...}`
   from parsed table references, stripping the schema. A denylist
   entry of `information_schema.tables` therefore could not match
   without also matching every user table named `tables`. The new
   matcher distinguishes bare entries (no dot in the denylist
   string) from schema-qualified entries (with a dot):

     - Bare entries match by table name regardless of schema
       (existing behavior preserved for `pg_stat_activity` etc.).
     - Schema-qualified entries match only when both schema and
       table match.

3. SHOW commands. Adds SHOW to _POSTGRES_MUTATING_COMMAND_NAMES so
   it is rejected by the allow_dml=False gate, mirroring the
   read-side blocks already in DISALLOWED_SQL_FUNCTIONS for
   pg_read_file, version(), current_setting, etc. SHOW reads
   server-configuration state (server version, hba_file, ssl,
   search_path) which has the same information-disclosure profile
   as those functions.

4. PostgreSQL setval() sequence mutator. Adds SETVAL to
   _MUTATING_FUNCTION_NAMES. setval() looks like a read but
   advances sequence state for every subsequent nextval caller.

Tests
-----
- New test_check_tables_present_schema_qualified covers bare vs
  schema-qualified matching, including the must-not-match case
  for user tables sharing a name with an information_schema view.
- test_is_mutating_postgres_function_and_select_into extended
  with setval positive and case-insensitivity cases.
- test_is_mutating_postgres_command_constructs extended with
  SHOW positive cases (search_path, all, server_version).
- Pre-existing test_has_mutation expectation for
  `SHOW search_path` flipped from False to True to reflect the
  intentional behavior change; `SET search_path TO public`
  remains non-mutating because it parses as exp.Set, not
  exp.Command.
- All 576 tests in tests/unit_tests/sql/parse_tests.py pass.
- All 117 tests in tests/unit_tests/sql/execution/ and
  sql_lab_test.py pass; no regressions.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The previous parametrization for test_check_tables_present_schema_qualified
exercised only PostgreSQL. The shipped DISALLOWED_SQL_TABLES entries
for MySQL and MSSQL are also schema-qualified (mysql.user,
performance_schema.threads, performance_schema.processlist,
sys.sql_logins, sys.server_principals, sys.configurations) and rely
on the same matcher. Without explicit dialect coverage a future
refactor of check_tables_present could silently regress one engine
while leaving the others working.

Adds parametrized cases for:

- mysql: mysql.user, performance_schema.threads,
  performance_schema.processlist (positive); mydb.user (negative,
  user-authored table sharing the leaf name must not be blocked).
- mssql: sys.sql_logins, sys.server_principals, sys.configurations
  (positive); mydb.sql_logins (negative).

All 17 parameter cases pass; no regressions in the broader suite
(584 tests in tests/unit_tests/sql/parse_tests.py).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Master moved the duckdb pin but never regenerated the pinned
requirements file, so every PR's check-python-deps job has been
failing on this single-line diff. Regen via the uv-pip-compile path
that CI uses (`uv pip compile pyproject.toml requirements/base.in`).
Coverage was at 99 (fail-under=100). Both concrete subclasses override
is_destructive(), so the base-class `raise NotImplementedError()` line
was never executed in tests. Added a direct test that calls the unbound
base method on a sentinel object, matching the pattern used elsewhere
in the codebase for abstract stubs.
check_tables_present had `if not bare: continue` after
`bare = (t.table or "").lower()`. sqlglot Table.table is always
non-empty for any parseable table reference, so the guard never
triggered and broke the 100% coverage gate.

Use `t.table.lower()` directly. If a future sqlglot version ever
returns None, the AttributeError surfaces immediately rather than
silently polluting the present-tables set.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@sha174n sha174n marked this pull request as ready for review May 28, 2026 12:20
@dosubot dosubot Bot added the sqllab Namespace | Anything related to the SQL Lab label May 28, 2026
@bito-code-review

Copy link
Copy Markdown
Contributor

AI Code Review is in progress (usually takes 3 to 15 minutes unless it's a very large PR).

Bito Usage Guide

Commands

Type the following command in the pull request comment and save the comment.

  • /review - Manually triggers a full AI review.

  • /pause - Pauses automatic reviews on this pull request.

  • /resume - Resumes automatic reviews.

  • /resolve - Marks all Bito-posted review comments as resolved.

  • /abort - Cancels all in-progress reviews.

Refer to the documentation for additional commands.

Configuration

This repository uses Superset You can customize the agent settings here or contact your Bito workspace admin at evan@preset.io.

Documentation & Help

sha174n and others added 2 commits May 29, 2026 14:20
Follow-up. Three gaps that the prior commits left open:

1. _POSTGRES_MUTATING_COMMAND_NAMES set lookup was case-sensitive
   against an uppercase-only set, but `exp.Command.name` preserves the
   source case of the head keyword. A lowercase head-token (e.g.
   `create extension pg_trgm`, `load '/tmp/x.so'`) bypassed the gate
   entirely. Normalise with `.upper()` at the lookup site. Adds two
   lowercase regression test cases.

2. PostgreSQL DDL head-tokens CREATE / ALTER / DROP added to the
   command-name set. sqlglot's typed exp.Create / exp.Alter / exp.Drop
   nodes only cover the well-formed DDL shapes (CREATE TABLE,
   ALTER TABLE, DROP TABLE); the many PostgreSQL forms it does not
   model (CREATE FUNCTION ... LANGUAGE C; CREATE EXTENSION /
   PUBLICATION / SUBSCRIPTION / TABLESPACE / RULE / EVENT TRIGGER;
   ALTER ROLE / SYSTEM / PUBLICATION; DROP EXTENSION / RULE; etc.)
   fall back to exp.Command and were silently allowed through.

3. Two additional head-tokens added in the same set: RESET (mirrors
   the SET entry; RESET ROLE backs out a SET ROLE escalation) and
   LOAD (PostgreSQL LOAD dlopens a shared library on the PG host;
   same RCE primitive as the C-extension form of CREATE FUNCTION
   when the library path is attacker-controlled).

4. exp.Comment added to the mutating_nodes tuple. COMMENT ON
   TABLE / COLUMN / etc. writes to the pg_description system
   catalog; parses to a typed exp.Comment node across dialects, so
   it would not have been caught by either the typed-node tuple or
   the Command-name set.

Tests: one representative per head-token branch (CREATE/ALTER/DROP
all hit the same set-lookup, so a single CREATE EXTENSION and a
single ALTER SYSTEM are enough), plus the LOAD case, plus two
lowercase regression cases, plus a global COMMENT ON case. All 647
tests in parse_tests.py pass.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Comment thread superset/config.py
Comment thread superset/sql/parse.py
Comment thread superset/sql/parse.py Outdated
@bito-code-review

bito-code-review Bot commented May 29, 2026

Copy link
Copy Markdown
Contributor

Code Review Agent Run #d7761f

Actionable Suggestions - 0
Review Details
  • Files reviewed - 2 · Commit Range: 4cc0164..f6ba199
    • superset/sql/parse.py
    • tests/unit_tests/sql/parse_tests.py
  • Files skipped - 0
  • Tools
    • Whispers (Secret Scanner) - ✔︎ Successful
    • Detect-secrets (Secret Scanner) - ✔︎ Successful
    • MyPy (Static Code Analysis) - ✔︎ Successful
    • Astral Ruff (Static Code Analysis) - ✔︎ Successful

Bito Usage Guide

Commands

Type the following command in the pull request comment and save the comment.

  • /review - Manually triggers a full AI review.

  • /pause - Pauses automatic reviews on this pull request.

  • /resume - Resumes automatic reviews.

  • /resolve - Marks all Bito-posted review comments as resolved.

  • /abort - Cancels all in-progress reviews.

Refer to the documentation for additional commands.

Configuration

This repository uses Superset You can customize the agent settings here or contact your Bito workspace admin at evan@preset.io.

Documentation & Help

AI Code Review powered by Bito Logo

@rusackas rusackas left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No false positives on legitimate SELECT/CTE/subquery/SET search_path (those stay non-mutating), and function-name matching is exact (a column/alias named setval isn't flagged) — nice. One behavior change worth confirming is intended: SHOW / SET ROLE / CALL / EXECUTE are now classified as mutating, so on read-only DBs (allow_dml=False) users who previously ran SHOW search_path / SHOW server_version will now get DML-not-allowed. Fine if intended — just worth calling out in the description.

sha174n and others added 2 commits June 11, 2026 23:03
…20260525

# Conflicts:
#	tests/unit_tests/sql/parse_tests.py
…matching

Add lo_unlink (PostgreSQL large-object delete) to the mutating-function
and disallowed-function denylists. Introduce SQLStatement/SQLScript
get_disallowed_tables() so denylist gates report the actual matched
entries, honoring schema-qualified entries, and reuse it across all five
gates (sql_lab, estimate, executor, helpers adhoc/query paths).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@bito-code-review

bito-code-review Bot commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

Code Review Agent Run #889c54

Actionable Suggestions - 0
Filtered by Review Rules

Bito filtered these suggestions based on rules created automatically for your feedback. Manage rules.

  • superset/commands/sql_lab/estimate.py - 1
Review Details
  • Files reviewed - 7 · Commit Range: f6ba199..d7f3ea9
    • superset/commands/sql_lab/estimate.py
    • superset/config.py
    • superset/models/helpers.py
    • superset/sql/execution/executor.py
    • superset/sql/parse.py
    • superset/sql_lab.py
    • tests/unit_tests/sql/parse_tests.py
  • Files skipped - 0
  • Tools
    • Whispers (Secret Scanner) - ✔︎ Successful
    • Detect-secrets (Secret Scanner) - ✔︎ Successful
    • MyPy (Static Code Analysis) - ✔︎ Successful
    • Astral Ruff (Static Code Analysis) - ✔︎ Successful

Bito Usage Guide

Commands

Type the following command in the pull request comment and save the comment.

  • /review - Manually triggers a full AI review.

  • /pause - Pauses automatic reviews on this pull request.

  • /resume - Resumes automatic reviews.

  • /resolve - Marks all Bito-posted review comments as resolved.

  • /abort - Cancels all in-progress reviews.

Refer to the documentation for additional commands.

Configuration

This repository uses Superset You can customize the agent settings here or contact your Bito workspace admin at evan@preset.io.

Documentation & Help

AI Code Review powered by Bito Logo

…arser coverage

Restrict the SELECT ... INTO mutation classification to dialects where
the syntax creates a table (Postgres/Redshift/T-SQL); on Oracle/MySQL it
assigns into a variable and is a read. Collapse the per-type node scan in
is_mutating into a single sqlglot find() walk. Add direct unit coverage
for the statement-level table-presence checks.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

@bito-code-review bito-code-review Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review Agent Run #698180

Actionable Suggestions - 1
  • superset/sql/parse.py - 1
Review Details
  • Files reviewed - 2 · Commit Range: d7f3ea9..812ce07
    • superset/sql/parse.py
    • tests/unit_tests/sql/parse_tests.py
  • Files skipped - 0
  • Tools
    • Whispers (Secret Scanner) - ✔︎ Successful
    • Detect-secrets (Secret Scanner) - ✔︎ Successful
    • MyPy (Static Code Analysis) - ✔︎ Successful
    • Astral Ruff (Static Code Analysis) - ✔︎ Successful

Bito Usage Guide

Commands

Type the following command in the pull request comment and save the comment.

  • /review - Manually triggers a full AI review.

  • /pause - Pauses automatic reviews on this pull request.

  • /resume - Resumes automatic reviews.

  • /resolve - Marks all Bito-posted review comments as resolved.

  • /abort - Cancels all in-progress reviews.

Refer to the documentation for additional commands.

Configuration

This repository uses Superset You can customize the agent settings here or contact your Bito workspace admin at evan@preset.io.

Documentation & Help

AI Code Review powered by Bito Logo

Comment thread superset/sql/parse.py
Comment on lines +661 to +672
# Dialects where `SELECT ... INTO target` is CTAS (creates a table, and so
# mutates schema). Elsewhere the same syntax assigns into a variable and is
# a read: Oracle PL/SQL `SELECT ... INTO v` and MySQL `SELECT ... INTO @v`
# parse into an identical `exp.Select` with an `into` arg, so the dialect is
# the only signal that distinguishes the mutating form from the read form.
_SELECT_INTO_CTAS_DIALECTS: frozenset[Dialects] = frozenset(
{
Dialects.POSTGRES,
Dialects.REDSHIFT,
Dialects.TSQL,
}
)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing test coverage for CTAS dialects

The new _SELECT_INTO_CTAS_DIALECTS class variable (lines 666-672) adds Redshift and T-SQL to the CTAS detection, but the existing test test_is_mutating_postgres_function_and_select_into only validates PostgreSQL. Add parametrized test cases for Redshift and T-SQL to ensure the dialect-specific logic is exercised.

Code Review Run #698180


Should Bito avoid suggestions like this for future reviews? (Manage Rules)

  • Yes, avoid them

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/XL sqllab Namespace | Anything related to the SQL Lab

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants