fix(sql): detect set operations and nested selects in subquery check#38452
fix(sql): detect set operations and nested selects in subquery check#38452sha174n wants to merge 2 commits into
Conversation
2f27633 to
9c335ba
Compare
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #38452 +/- ##
==========================================
- Coverage 64.30% 64.30% -0.01%
==========================================
Files 2657 2657
Lines 144060 144060
Branches 33216 33216
==========================================
- Hits 92641 92636 -5
- Misses 49797 49800 +3
- Partials 1622 1624 +2
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
9c335ba to
37472a1
Compare
319fae3 to
4d1c717
Compare
4d1c717 to
8dce70d
Compare
There was a problem hiding this comment.
Pull request overview
This PR aims to fix subquery detection in SQL parsing, specifically to ensure that subqueries within RLS filter clauses (raw SQL predicates) are correctly detected even when sqlglot's scope traversal returns empty results for non-SELECT root expressions. The changes refactor has_subquery() to use a recursive AST walk and add a new validate_rls_clause() helper, along with new regression tests.
Changes:
has_subquery()inSQLStatementis refactored to useexp.Expression.walk()instead of relying solely onexp.Subquerydetection, with additional matching forexp.Union,exp.Except,exp.Intersect,exp.Update,exp.Delete,exp.Insert, andexp.Merge.- A new
validate_rls_clause()function is added tosuperset/models/helpers.py, and module-level imports ofsuperset.sql.parsesymbols are replaced with local function-scoped imports. - New test cases are added for
extract_tables_from_sqlwith raw SQL expressions (predicates containing subqueries).
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 6 comments.
| File | Description |
|---|---|
superset/sql/parse.py |
Refactors has_subquery() to use recursive walk for broader subquery detection |
superset/models/helpers.py |
Adds validate_rls_clause() (uncalled), restructures imports to be function-local |
tests/unit_tests/sql/parse_tests.py |
Adds test_extract_tables_raw_expression tests for raw SQL predicate table extraction |
8dce70d to
319fae3
Compare
319fae3 to
fb3b628
Compare
330819e to
4222d37
Compare
64b4ea7 to
728ee29
Compare
Code Review Agent Run #62a291Actionable Suggestions - 0Review Details
Bito Usage GuideCommands Type the following command in the pull request comment and save the comment.
Refer to the documentation for additional commands. Configuration This repository uses Documentation & Help |
Code Review Agent Run #8587b2Actionable Suggestions - 0Review Details
Bito Usage GuideCommands Type the following command in the pull request comment and save the comment.
Refer to the documentation for additional commands. Configuration This repository uses Documentation & Help |
Code Review Agent Run #dad4efActionable Suggestions - 0Review Details
Bito Usage GuideCommands Type the following command in the pull request comment and save the comment.
Refer to the documentation for additional commands. Configuration This repository uses Documentation & Help |
Code Review Agent Run #c3d774Actionable Suggestions - 0Review Details
Bito Usage GuideCommands Type the following command in the pull request comment and save the comment.
Refer to the documentation for additional commands. Configuration This repository uses Documentation & Help |
dpgaspar
left a comment
There was a problem hiding this comment.
This is looking great.
Final comments:
- MERGE not covered in DML table extraction - extract_tables_from_statement adds explicit handling for exp.Update, exp.Delete, and exp.Insert but not exp.Merge. A MERGE INTO protected_table USING ...
would have an empty table set and bypass the DML blocking check. The is_mutating() method correctly detects MERGE, but statement.tables would be empty so get_predicates_for_table would never be called
for it. - exp.Copy not in is_mutating command check - The PR adds MERGE, UPSERT, REPLACE, COPY, TRUNCATE to the exp.Command name check, but COPY can also parse as exp.Copy (a dedicated AST node), which isn't
in the mutating_nodes tuple. - test_rls_allows_mutation_on_unprotected_table is a false-positive test - It catches all non-SupersetSecurityException exceptions and passes, meaning it would pass even if the code crashes before
reaching the RLS check. This test doesn't actually verify the unprotected path works correctly. - _process_rls_clause could return empty parens - If _process_select_expression returns an empty string (which is falsy), the if not processed check would fall through to the template_processor path.
But if it returns None (as mocked in tests), the fallback works. Worth verifying the actual return values.
Code Review Agent Run #a340c3Actionable Suggestions - 0Review Details
Bito Usage GuideCommands Type the following command in the pull request comment and save the comment.
Refer to the documentation for additional commands. Configuration This repository uses Documentation & Help |
Code Review Agent Run #6493e5Actionable Suggestions - 0Filtered by Review RulesBito filtered these suggestions based on rules created automatically for your feedback. Manage rules.
Review Details
Bito Usage GuideCommands Type the following command in the pull request comment and save the comment.
Refer to the documentation for additional commands. Configuration This repository uses Documentation & Help |
Code Review Agent Run #8c3b83Actionable Suggestions - 0Review Details
Bito Usage GuideCommands Type the following command in the pull request comment and save the comment.
Refer to the documentation for additional commands. Configuration This repository uses Documentation & Help |
Thanks for the detailed review!
|
Code Review Agent Run #0e6a49Actionable Suggestions - 0Additional Suggestions - 1
Review Details
Bito Usage GuideCommands Type the following command in the pull request comment and save the comment.
Refer to the documentation for additional commands. Configuration This repository uses Documentation & Help |
Code Review Agent Run #016f42Actionable Suggestions - 0Review Details
Bito Usage GuideCommands Type the following command in the pull request comment and save the comment.
Refer to the documentation for additional commands. Configuration This repository uses Documentation & Help |
Code Review Agent Run #91ba4aActionable Suggestions - 0Review Details
Bito Usage GuideCommands Type the following command in the pull request comment and save the comment.
Refer to the documentation for additional commands. Configuration This repository uses Documentation & Help |
Code Review Agent Run #fefba7Actionable Suggestions - 0Review Details
Bito Usage GuideCommands Type the following command in the pull request comment and save the comment.
Refer to the documentation for additional commands. Configuration This repository uses Documentation & Help |
There was a problem hiding this comment.
Code Review Agent Run #406d33
Actionable Suggestions - 1
-
superset/connectors/sqla/models.py - 1
- Logic Error in Fallback Condition · Line 821-824
Additional Suggestions - 1
-
superset/sql/parse.py - 1
-
CWE-710: Improper Coding Standards · Line 950-950Adding # pragma: no cover to this security-critical line masks a potential testing gap in subquery detection logic used for preventing RLS bypass in untrusted adhoc SQL. This path should be covered by tests to verify correct behavior. ([CWE-710](https://cwe.mitre.org/data/definitions/710.html))
Code suggestion
--- superset/sql/parse.py @@ -949,1 +949,1 @@ - return True # pragma: no cover + return True
-
Review Details
-
Files reviewed - 3 · Commit Range:
e29b36e..6fb7ee4- superset/sql/parse.py
- tests/unit_tests/sql/parse_tests.py
- superset/connectors/sqla/models.py
-
Files skipped - 0
-
Tools
- Whispers (Secret Scanner) - ✔︎ Successful
- Detect-secrets (Secret Scanner) - ✔︎ Successful
- MyPy (Static Code Analysis) - ✔︎ Successful
- Astral Ruff (Static Code Analysis) - ✔︎ Successful
Bito Usage Guide
Commands
Type the following command in the pull request comment and save the comment.
-
/review- Manually triggers a full AI review. -
/pause- Pauses automatic reviews on this pull request. -
/resume- Resumes automatic reviews. -
/resolve- Marks all Bito-posted review comments as resolved. -
/abort- Cancels all in-progress reviews.
Refer to the documentation for additional commands.
Configuration
This repository uses Superset You can customize the agent settings here or contact your Bito workspace admin at evan@preset.io.
Documentation & Help
Code Review Agent Run #67f806Actionable Suggestions - 0Review Details
Bito Usage GuideCommands Type the following command in the pull request comment and save the comment.
Refer to the documentation for additional commands. Configuration This repository uses Documentation & Help |
Code Review Agent Run #2b21d4Actionable Suggestions - 0Review Details
Bito Usage GuideCommands Type the following command in the pull request comment and save the comment.
Refer to the documentation for additional commands. Configuration This repository uses Documentation & Help |
rusackas
left a comment
There was a problem hiding this comment.
The predicate-validation core looks solid — name='x', 1=1, IN, casts, function calls, OR, SIMILAR TO, and Jinja clauses all pass unchanged, and set-ops/multi-statement/subquery-without-flag are correctly blocked. Two intended-but-impactful behavior changes worth confirming don't regress prod: (1) get_from_clause now fails closed (raises) when RLS can't be applied to a virtual dataset's SQL, instead of best-effort log-and-continue — a virtual dataset with dialect-specific SQL sqlglot can't round-trip will now hard-fail; (2) DML on RLS-protected tables is blocked under RLS_IN_SQLLAB. Both are the right direction — just verify no legitimate virtual datasets regress.
has_subquery() flagged exp.Subquery and nested SELECTs only when the top-level node was itself a SELECT, so a top-level set operation (UNION/INTERSECT/EXCEPT) matched neither branch. It now also flags top-level set operations (reusing the existing is_set_operation()) and nested SELECTs under any top-level node, such as a parenthesised expression. This keeps validate_adhoc_subquery() consistent for adhoc SQL and chart extras filter clauses. Adds parser cases for UNION/INTERSECT/EXCEPT and parenthesis-nested SELECTs. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Code Review Agent Run #2f8ed2Actionable Suggestions - 0Review Details
Bito Usage GuideCommands Type the following command in the pull request comment and save the comment.
Refer to the documentation for additional commands. Configuration This repository uses Documentation & Help |
SUMMARY
SQLStatement.has_subquery()is the gatevalidate_adhoc_subquery()uses to reject sub-queries in adhoc SQL and in chartextrasfilter clauses. It previously flagged anexp.Subquery, or a nestedSELECTonly when the top-level node was itself aSELECT. A top-level set operation (UNION/INTERSECT/EXCEPT) is neither, so it was not flagged, and a nestedSELECTunder a different top-level node (for example a parenthesised expression) was missed.This makes
has_subquery()also return True for top-level set operations (reusing the existingis_set_operation()) and for a nestedSELECTunder any top-level node, so adhoc SQL andextrasfilter clauses are checked consistently.TESTING INSTRUCTIONS
New parser cases cover
UNION/INTERSECT/EXCEPTand parenthesis-nestedSELECTs. Existing cases (including quoted identifiers with spaces) continue to pass, so legitimate column expressions are not misclassified.ADDITIONAL INFORMATION
🤖 Generated with Claude Code