fix(sql): broaden mutating-statement detection in SQL Lab parser#40421
fix(sql): broaden mutating-statement detection in SQL Lab parser#40421sha174n wants to merge 12 commits into
Conversation
is_mutating() and check_functions_present() in superset/sql/parse.py previously missed several construct shapes that share a meta cause: AST nodes that are not subclasses of the expected types, opaque exp.Command nodes for PostgreSQL constructs that sqlglot cannot fully structure, and function-name patterns that mutate server state without a wrapping DML node. This change addresses: 1. PostgreSQL opaque exp.Command constructs now classified as mutating alongside the existing DO and EXPLAIN ANALYZE handling: PREPARE, EXECUTE, CALL, COPY, GRANT, REVOKE, SET, REFRESH, REINDEX, VACUUM. 2. Structured nodes added to the mutating-node tuple: exp.Copy, exp.Grant, exp.Revoke. Some dialects parse these into structured nodes (e.g. COPY in PostgreSQL becomes exp.Copy, not exp.Command), so node-type matching is needed in addition to command-name matching. 3. PostgreSQL large-object writer functions classified as mutating by name: lo_from_bytea, lo_export, lo_import, lo_put, lo_create, lowrite. These appear as plain function calls inside an exp.Select wrapper and have no enclosing mutating AST node. The full lo_* family (writers and readers) is also added to the PostgreSQL entry in DISALLOWED_SQL_FUNCTIONS for defense in depth. 4. SELECT ... INTO new_table FROM ... parses as exp.Select with an `into` arg (PostgreSQL CTAS variant). It creates a new relation and is now treated as mutating. 5. check_functions_present() now visits exp.SessionParameter so that MySQL's @@<name> syntax matches denylist entries like `version` or `hostname`. exp.SessionParameter is not a subclass of exp.Func, so the existing exp.Func walk skipped it. Tests ----- - New: test_is_mutating_postgres_function_and_select_into covers the lo_* writers (positive), lo_* readers (negative, blocked separately via denylist), case-insensitivity, and SELECT INTO. - New: test_is_mutating_postgres_command_constructs covers PREPARE/EXECUTE/CALL/COPY/GRANT/REVOKE/SET/REFRESH/REINDEX/VACUUM, with the pre-existing DO and EXPLAIN ANALYZE controls included. - New: test_check_functions_present_session_parameter covers MySQL @@Version / @@hostname etc., plus lo_* function denylist matching. - All 562 tests in tests/unit_tests/sql/parse_tests.py pass. - All 117 tests in tests/unit_tests/sql/execution/ + sql_lab_test.py pass; no regressions. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #40421 +/- ##
=======================================
Coverage 64.30% 64.31%
=======================================
Files 2657 2657
Lines 144060 144081 +21
Branches 33216 33221 +5
=======================================
+ Hits 92641 92665 +24
+ Misses 49797 49787 -10
- Partials 1622 1629 +7
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
Follow-up to the initial broadening of is_mutating() and
check_functions_present(). This change closes the remaining bypass
surfaces that the SQL Lab read-only gate did not catch:
1. PostgreSQL information_schema views. Adds
information_schema.tables, .columns, .schemata, .views,
.routines, .role_table_grants, .role_column_grants,
.role_routine_grants, .table_privileges, .column_privileges,
.usage_privileges, .key_column_usage, .table_constraints,
.referential_constraints, .view_table_usage, .applicable_roles,
.enabled_roles to DISALLOWED_SQL_TABLES['postgresql'].
These entries are schema-qualified, so they require the new
schema-aware matching in check_tables_present (see below).
Without this change, a user table happening to be named `tables`
or `columns` would have been falsely blocked alongside the
information_schema views.
2. check_tables_present schema-qualified matching.
The previous matcher built `present = {t.table.lower() ...}`
from parsed table references, stripping the schema. A denylist
entry of `information_schema.tables` therefore could not match
without also matching every user table named `tables`. The new
matcher distinguishes bare entries (no dot in the denylist
string) from schema-qualified entries (with a dot):
- Bare entries match by table name regardless of schema
(existing behavior preserved for `pg_stat_activity` etc.).
- Schema-qualified entries match only when both schema and
table match.
3. SHOW commands. Adds SHOW to _POSTGRES_MUTATING_COMMAND_NAMES so
it is rejected by the allow_dml=False gate, mirroring the
read-side blocks already in DISALLOWED_SQL_FUNCTIONS for
pg_read_file, version(), current_setting, etc. SHOW reads
server-configuration state (server version, hba_file, ssl,
search_path) which has the same information-disclosure profile
as those functions.
4. PostgreSQL setval() sequence mutator. Adds SETVAL to
_MUTATING_FUNCTION_NAMES. setval() looks like a read but
advances sequence state for every subsequent nextval caller.
Tests
-----
- New test_check_tables_present_schema_qualified covers bare vs
schema-qualified matching, including the must-not-match case
for user tables sharing a name with an information_schema view.
- test_is_mutating_postgres_function_and_select_into extended
with setval positive and case-insensitivity cases.
- test_is_mutating_postgres_command_constructs extended with
SHOW positive cases (search_path, all, server_version).
- Pre-existing test_has_mutation expectation for
`SHOW search_path` flipped from False to True to reflect the
intentional behavior change; `SET search_path TO public`
remains non-mutating because it parses as exp.Set, not
exp.Command.
- All 576 tests in tests/unit_tests/sql/parse_tests.py pass.
- All 117 tests in tests/unit_tests/sql/execution/ and
sql_lab_test.py pass; no regressions.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The previous parametrization for test_check_tables_present_schema_qualified exercised only PostgreSQL. The shipped DISALLOWED_SQL_TABLES entries for MySQL and MSSQL are also schema-qualified (mysql.user, performance_schema.threads, performance_schema.processlist, sys.sql_logins, sys.server_principals, sys.configurations) and rely on the same matcher. Without explicit dialect coverage a future refactor of check_tables_present could silently regress one engine while leaving the others working. Adds parametrized cases for: - mysql: mysql.user, performance_schema.threads, performance_schema.processlist (positive); mydb.user (negative, user-authored table sharing the leaf name must not be blocked). - mssql: sys.sql_logins, sys.server_principals, sys.configurations (positive); mydb.sql_logins (negative). All 17 parameter cases pass; no regressions in the broader suite (584 tests in tests/unit_tests/sql/parse_tests.py). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Master moved the duckdb pin but never regenerated the pinned requirements file, so every PR's check-python-deps job has been failing on this single-line diff. Regen via the uv-pip-compile path that CI uses (`uv pip compile pyproject.toml requirements/base.in`).
Coverage was at 99 (fail-under=100). Both concrete subclasses override is_destructive(), so the base-class `raise NotImplementedError()` line was never executed in tests. Added a direct test that calls the unbound base method on a sentinel object, matching the pattern used elsewhere in the codebase for abstract stubs.
check_tables_present had `if not bare: continue` after `bare = (t.table or "").lower()`. sqlglot Table.table is always non-empty for any parseable table reference, so the guard never triggered and broke the 100% coverage gate. Use `t.table.lower()` directly. If a future sqlglot version ever returns None, the AttributeError surfaces immediately rather than silently polluting the present-tables set. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
AI Code Review is in progress (usually takes 3 to 15 minutes unless it's a very large PR). Bito Usage GuideCommands Type the following command in the pull request comment and save the comment.
Refer to the documentation for additional commands. Configuration This repository uses Documentation & Help |
Follow-up. Three gaps that the prior commits left open: 1. _POSTGRES_MUTATING_COMMAND_NAMES set lookup was case-sensitive against an uppercase-only set, but `exp.Command.name` preserves the source case of the head keyword. A lowercase head-token (e.g. `create extension pg_trgm`, `load '/tmp/x.so'`) bypassed the gate entirely. Normalise with `.upper()` at the lookup site. Adds two lowercase regression test cases. 2. PostgreSQL DDL head-tokens CREATE / ALTER / DROP added to the command-name set. sqlglot's typed exp.Create / exp.Alter / exp.Drop nodes only cover the well-formed DDL shapes (CREATE TABLE, ALTER TABLE, DROP TABLE); the many PostgreSQL forms it does not model (CREATE FUNCTION ... LANGUAGE C; CREATE EXTENSION / PUBLICATION / SUBSCRIPTION / TABLESPACE / RULE / EVENT TRIGGER; ALTER ROLE / SYSTEM / PUBLICATION; DROP EXTENSION / RULE; etc.) fall back to exp.Command and were silently allowed through. 3. Two additional head-tokens added in the same set: RESET (mirrors the SET entry; RESET ROLE backs out a SET ROLE escalation) and LOAD (PostgreSQL LOAD dlopens a shared library on the PG host; same RCE primitive as the C-extension form of CREATE FUNCTION when the library path is attacker-controlled). 4. exp.Comment added to the mutating_nodes tuple. COMMENT ON TABLE / COLUMN / etc. writes to the pg_description system catalog; parses to a typed exp.Comment node across dialects, so it would not have been caught by either the typed-node tuple or the Command-name set. Tests: one representative per head-token branch (CREATE/ALTER/DROP all hit the same set-lookup, so a single CREATE EXTENSION and a single ALTER SYSTEM are enough), plus the LOAD case, plus two lowercase regression cases, plus a global COMMENT ON case. All 647 tests in parse_tests.py pass. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Code Review Agent Run #d7761fActionable Suggestions - 0Review Details
Bito Usage GuideCommands Type the following command in the pull request comment and save the comment.
Refer to the documentation for additional commands. Configuration This repository uses Documentation & Help |
rusackas
left a comment
There was a problem hiding this comment.
No false positives on legitimate SELECT/CTE/subquery/SET search_path (those stay non-mutating), and function-name matching is exact (a column/alias named setval isn't flagged) — nice. One behavior change worth confirming is intended: SHOW / SET ROLE / CALL / EXECUTE are now classified as mutating, so on read-only DBs (allow_dml=False) users who previously ran SHOW search_path / SHOW server_version will now get DML-not-allowed. Fine if intended — just worth calling out in the description.
…20260525 # Conflicts: # tests/unit_tests/sql/parse_tests.py
…matching Add lo_unlink (PostgreSQL large-object delete) to the mutating-function and disallowed-function denylists. Introduce SQLStatement/SQLScript get_disallowed_tables() so denylist gates report the actual matched entries, honoring schema-qualified entries, and reuse it across all five gates (sql_lab, estimate, executor, helpers adhoc/query paths). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Code Review Agent Run #889c54Actionable Suggestions - 0Filtered by Review RulesBito filtered these suggestions based on rules created automatically for your feedback. Manage rules.
Review Details
Bito Usage GuideCommands Type the following command in the pull request comment and save the comment.
Refer to the documentation for additional commands. Configuration This repository uses Documentation & Help |
…arser coverage Restrict the SELECT ... INTO mutation classification to dialects where the syntax creates a table (Postgres/Redshift/T-SQL); on Oracle/MySQL it assigns into a variable and is a read. Collapse the per-type node scan in is_mutating into a single sqlglot find() walk. Add direct unit coverage for the statement-level table-presence checks. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
There was a problem hiding this comment.
Code Review Agent Run #698180
Actionable Suggestions - 1
-
superset/sql/parse.py - 1
- Missing test coverage for CTAS dialects · Line 661-672
Review Details
-
Files reviewed - 2 · Commit Range:
d7f3ea9..812ce07- superset/sql/parse.py
- tests/unit_tests/sql/parse_tests.py
-
Files skipped - 0
-
Tools
- Whispers (Secret Scanner) - ✔︎ Successful
- Detect-secrets (Secret Scanner) - ✔︎ Successful
- MyPy (Static Code Analysis) - ✔︎ Successful
- Astral Ruff (Static Code Analysis) - ✔︎ Successful
Bito Usage Guide
Commands
Type the following command in the pull request comment and save the comment.
-
/review- Manually triggers a full AI review. -
/pause- Pauses automatic reviews on this pull request. -
/resume- Resumes automatic reviews. -
/resolve- Marks all Bito-posted review comments as resolved. -
/abort- Cancels all in-progress reviews.
Refer to the documentation for additional commands.
Configuration
This repository uses Superset You can customize the agent settings here or contact your Bito workspace admin at evan@preset.io.
Documentation & Help
| # Dialects where `SELECT ... INTO target` is CTAS (creates a table, and so | ||
| # mutates schema). Elsewhere the same syntax assigns into a variable and is | ||
| # a read: Oracle PL/SQL `SELECT ... INTO v` and MySQL `SELECT ... INTO @v` | ||
| # parse into an identical `exp.Select` with an `into` arg, so the dialect is | ||
| # the only signal that distinguishes the mutating form from the read form. | ||
| _SELECT_INTO_CTAS_DIALECTS: frozenset[Dialects] = frozenset( | ||
| { | ||
| Dialects.POSTGRES, | ||
| Dialects.REDSHIFT, | ||
| Dialects.TSQL, | ||
| } | ||
| ) |
There was a problem hiding this comment.
The new _SELECT_INTO_CTAS_DIALECTS class variable (lines 666-672) adds Redshift and T-SQL to the CTAS detection, but the existing test test_is_mutating_postgres_function_and_select_into only validates PostgreSQL. Add parametrized test cases for Redshift and T-SQL to ensure the dialect-specific logic is exercised.
Code Review Run #698180
Should Bito avoid suggestions like this for future reviews? (Manage Rules)
- Yes, avoid them
Summary
Broadens read-only-mode coverage in the SQL Lab parser.
is_mutating()andcheck_functions_present()insuperset/sql/parse.pypreviously missed several construct shapes that share a meta cause: AST nodes that are not subclasses of the expected types, opaqueexp.Commandnodes for PostgreSQL constructs that sqlglot cannot fully structure, and function-name patterns that mutate server-side state without a wrapping DML node.Scope of this change:
exp.Commandconstructs.is_mutating()now classifies these as mutating alongside the existingDOandEXPLAIN ANALYZEhandling:PREPARE,EXECUTE,CALL,COPY,GRANT,REVOKE,SET,REFRESH,REINDEX,VACUUM.COPY/GRANT/REVOKEintoexp.Copy/exp.Grant/exp.Revokeinstead ofexp.Command, so node-type matching is needed alongside the command-name check.lo_from_bytea,lo_export,lo_import,lo_put,lo_create,lowriteappear as plain function calls inside anexp.Selectwrapper with no enclosing mutating AST node.is_mutating()now matches on function name. The fulllo_*family (writers and readers) is also added to the PostgreSQL entry inDISALLOWED_SQL_FUNCTIONSfor defense in depth.SELECT ... INTO new_table FROM ...(PostgreSQL CTAS variant). Parses asexp.Selectwith anintoarg; now treated as mutating since it creates a new relation.@@<name>session parameters.check_functions_present()now walksexp.SessionParameterso that denylist entries likeversionorhostnamematchSELECT @@version/SELECT @@hostname.exp.SessionParameteris not a subclass ofexp.Func, so the existing walk skipped it.Test plan
pytest tests/unit_tests/sql/parse_tests.py(562 pass, no regressions)pytest tests/unit_tests/sql/execution/ tests/unit_tests/sql_lab_test.py(117 pass, no regressions)pre-commit runcleanNew parametrized cases in
tests/unit_tests/sql/parse_tests.py:test_is_mutating_postgres_function_and_select_into—lo_*writers,lo_*readers (negative, blocked separately via denylist), case-insensitivity,SELECT INTO.test_is_mutating_postgres_command_constructs—PREPARE/EXECUTE/CALL/COPY/GRANT/REVOKE/SET/REFRESH/REINDEX/VACUUM, with pre-existingDOandEXPLAIN ANALYZEcontrols included.test_check_functions_present_session_parameter— MySQL@@version/@@hostnameetc.;lo_*defense-in-depth via denylist.