-
Notifications
You must be signed in to change notification settings - Fork 17.6k
fix(sql): broaden mutating-statement detection in SQL Lab parser #40421
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from all commits
f17b157
f6c7659
8146ae1
8e904fc
ef69fcd
b209d80
4cc0164
5d9ba2c
f6ba199
bd70af4
d7f3ea9
812ce07
1bf6672
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -1830,6 +1830,20 @@ def engine_context_manager( # pylint: disable=unused-argument | |
| "pg_read_file", | ||
| "pg_ls_dir", | ||
| "pg_read_binary_file", | ||
| # PostgreSQL large-object functions: writers can plant arbitrary | ||
| # bytes on the server filesystem (lo_export, lo_from_bytea, lowrite, | ||
| # lo_put, lo_create, lo_import), readers can pull bytes back out | ||
| # (lo_get, loread), and lo_unlink deletes large objects outright. | ||
| # Defense-in-depth on top of is_mutating()'s function-name check. | ||
| "lo_from_bytea", | ||
| "lo_export", | ||
| "lo_import", | ||
| "lo_put", | ||
| "lo_create", | ||
| "lowrite", | ||
| "lo_get", | ||
| "loread", | ||
| "lo_unlink", | ||
| # XML functions that can execute SQL | ||
| "database_to_xml", | ||
| "database_to_xmlschema", | ||
|
|
@@ -1920,6 +1934,29 @@ def engine_context_manager( # pylint: disable=unused-argument | |
| "pg_stat_replication", | ||
| "pg_stat_wal_receiver", | ||
| "pg_user", | ||
| # The SQL-standard `information_schema` views expose table / | ||
| # column / privilege / view-definition metadata across the entire | ||
| # database role the connection user can see. Entries are | ||
| # schema-qualified so `check_tables_present` only matches when the | ||
| # query actually references `information_schema.<view>`, not any | ||
| # user table that happens to share a name. | ||
| "information_schema.tables", | ||
| "information_schema.columns", | ||
| "information_schema.schemata", | ||
| "information_schema.views", | ||
| "information_schema.routines", | ||
| "information_schema.role_table_grants", | ||
| "information_schema.role_column_grants", | ||
| "information_schema.role_routine_grants", | ||
| "information_schema.table_privileges", | ||
| "information_schema.column_privileges", | ||
| "information_schema.usage_privileges", | ||
| "information_schema.key_column_usage", | ||
| "information_schema.table_constraints", | ||
| "information_schema.referential_constraints", | ||
| "information_schema.view_table_usage", | ||
| "information_schema.applicable_roles", | ||
| "information_schema.enabled_roles", | ||
|
Comment on lines
+1937
to
+1959
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Suggestion: The new PostgreSQL denylist entries are only schema-qualified ( Severity Level: Critical 🚨- ❌ SQL Lab can read sensitive information_schema views unblocked.
- ⚠️ Cost estimate path reuses same denylist and misses access.Steps of Reproduction ✅1. Configure a PostgreSQL database in Superset and open SQL Lab so queries execute via
`execute_sql_statements` in `superset/sql_lab.py:391-80`. Select the same PostgreSQL
database and choose the `information_schema` schema from the schema dropdown so that the
`schema` argument passed into the database engine is `"information_schema"`.
2. When SQL Lab opens a connection for that database, `Database.get_sqla_engine` in
`superset/models/core.py:480-59` calls `PostgresEngineSpec.get_prequeries` at
`superset/db_engine_specs/postgres.py:717-29`, which returns `['set search_path =
"information_schema"']`. This prequery is executed on each new DBAPI connection (see the
`run_prequeries` connect hook at `superset/models/core.py:488-32`), so at runtime
unqualified table names like `tables` resolve to `information_schema.tables`.
3. Submit the SQL text `SELECT * FROM tables;` in SQL Lab with the `information_schema`
schema selected. `execute_sql_statements` builds a `SQLScript` at `superset/sql_lab.py:31`
and then loads the PostgreSQL denylist from `DISALLOWED_SQL_TABLES["postgresql"]` defined
in `superset/config.py:1915-46`, which includes only schema-qualified entries such as
`"information_schema.tables"` and `"information_schema.columns"`. It then calls
`parsed_script.get_disallowed_tables(disallowed_tables)` at `superset/sql_lab.py:42-50`.
4. Inside `SQLStatement.get_disallowed_tables` and `check_tables_present` in
`superset/sql/parse.py:988-137`, the parser iterates `self.tables`, which is built by
`_extract_tables_from_statement` using sqlglot `exp.Table` nodes at
`superset/sql/parse.py:760-25` and `extract_tables_from_statement` at
`superset/sql/parse.py:92-115`. For the query `SELECT * FROM tables`, each table node has
`table='tables'` and no `schema`, so `present_bare = {'tables'}` and `present_qualified`
does not contain `"information_schema.tables"`. Because the denylist entries for these
views are only schema-qualified (e.g. `"information_schema.tables"` at
`superset/config.py:1929`), the matching logic in `get_disallowed_tables` (`needle`
containing a dot must be in `present_qualified`) fails to match, `found_tables` is empty,
and `SupersetDisallowedSQLTableException` is never raised in `execute_sql_statements`. The
query then executes against `information_schema.tables` due to the `search_path` prequery,
allowing the user to retrieve metadata from views that were intended to be blocked by the
schema-qualified denylist entries.Fix in Cursor | Fix in VSCode Claude (Use Cmd/Ctrl + Click for best experience) Prompt for AI Agent 🤖This is a comment left during a code review.
**Path:** superset/config.py
**Line:** 1937:1959
**Comment:**
*Security: The new PostgreSQL denylist entries are only schema-qualified (`information_schema.<view>`), but table matching logic cannot resolve runtime default schema/search_path. This allows bypassing the blocklist via unqualified references like `SELECT * FROM tables` when `information_schema` is in scope (for example via configured/default schema or session search path), exposing metadata that this change is meant to protect. Add corresponding bare names for sensitive `information_schema` views (or enforce schema-resolution-aware checks) so unqualified access is also blocked.
Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.
Once fix is implemented, also check other comments on the same PR, and ask user if the user wants to fix the rest of the comments as well. if said yes, then fetch all the comments validate the correctness and implement a minimal fix |
||
| }, | ||
| "mysql": { | ||
| "mysql.user", | ||
|
|
||
Uh oh!
There was an error while loading. Please reload this page.