Skip to content

Extend V2 SQL parser with IN/EXISTS subqueries for unified query path#5448

Open
dai-chen wants to merge 1 commit into
opensearch-project:mainfrom
dai-chen:feature/sql-v2-subquery
Open

Extend V2 SQL parser with IN/EXISTS subqueries for unified query path#5448
dai-chen wants to merge 1 commit into
opensearch-project:mainfrom
dai-chen:feature/sql-v2-subquery

Conversation

@dai-chen
Copy link
Copy Markdown
Collaborator

@dai-chen dai-chen commented May 15, 2026

Description

Extend the V2 SQL ANTLR grammar and add ExtendedAstExpressionBuilder to support IN and EXISTS subqueries in the unified Calcite-based query path. The base AstExpressionBuilder continues to throw SyntaxCheckException for these statements, preserving the legacy engine fallback in production.

Notes

  • Fix Alias handling in CalciteRelNodeVisitor here to unblock tests using SELECT with specific fields
  • EXISTS subquery only works for PartiQL nested fields in the legacy engine, thus no IT added here

Planned PRs

Related Issues

Part of #5248

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • New functionality has javadoc added.
  • New functionality has a user manual doc added.
  • New PPL command checklist all confirmed.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff or -s.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@dai-chen dai-chen self-assigned this May 15, 2026
@dai-chen dai-chen added enhancement New feature or request SQL labels May 15, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 15, 2026

PR Reviewer Guide 🔍

(Review updated until commit bd3074d)

Here are some key observations to aid the review process:

🧪 PR contains tests
🔒 No security concerns identified
✅ No TODO sections
🔀 No multiple PR themes
⚡ Recommended focus areas for review

Incomplete Fix

The Alias handling fix only adds the alias to expandedFields but does not add it to addedFields. This means if the same alias appears multiple times in the project list, it will be added to expandedFields repeatedly instead of being deduplicated. The fix should include 'addedFields.add(alias.getName())' or similar logic to track that this alias has been processed.

case Alias alias -> {
  expandedFields.add(rexVisitor.analyze(alias, context));
}
Initialization Order

The constructor calls 'createExpressionBuilder()' before the 'query' field is fully initialized in the superclass chain. If a subclass overrides 'createExpressionBuilder()' and that override accesses 'this.query', it will see null. This can cause NullPointerException if the expression builder needs the query string during construction.

public AstBuilder(String query) {
  this.query = query;
  this.expressionBuilder = createExpressionBuilder();
}

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 15, 2026

PR Code Suggestions ✨

Latest suggestions up to bd3074d

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
General
Verify IN subquery value mapping

The visit(ctx.predicate()) call processes the left-hand side expression, but
inSubquery expects values as varargs. If ctx.predicate() returns a complex
expression or multiple values, this may not correctly represent the IN clause
semantics. Verify that the predicate context correctly maps to the expected value
list for the IN operation.

api/src/main/java/org/opensearch/sql/api/parser/SqlV2QueryParser.java [92-96]

 @Override
 public UnresolvedExpression visitInSubqueryPredicate(InSubqueryPredicateContext ctx) {
   UnresolvedPlan subquery = ExtendedAstBuilder.this.visit(ctx.querySpecification());
-  UnresolvedExpression inExpr = inSubquery(subquery, visit(ctx.predicate()));
+  UnresolvedExpression leftExpr = visit(ctx.predicate());
+  UnresolvedExpression inExpr = inSubquery(subquery, leftExpr);
   return (ctx.NOT() != null) ? new Not(inExpr) : inExpr;
 }
Suggestion importance[1-10]: 7

__

Why: The suggestion correctly identifies a potential semantic issue where visit(ctx.predicate()) processes the left-hand side expression for the IN clause. The concern about whether this correctly maps to the expected value list is valid and worth verification, though the improved code doesn't substantially change the logic beyond adding an intermediate variable for clarity.

Medium
Prevent duplicate alias field expansion

Adding Alias handling in expandProjectFields may cause duplicate field expansion if
aliases are already processed elsewhere in the projection logic. Ensure that this
doesn't conflict with existing alias resolution or cause fields to be added multiple
times to expandedFields.

core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java [544-546]

 case Alias alias -> {
-  expandedFields.add(rexVisitor.analyze(alias, context));
+  if (addedFields.add(alias.getName())) {
+    expandedFields.add(rexVisitor.analyze(alias, context));
+  }
 }
Suggestion importance[1-10]: 6

__

Why: The suggestion raises a valid concern about potential duplicate field expansion when handling Alias cases. Adding a check using addedFields.add(alias.getName()) before adding to expandedFields could prevent duplicates, though it's unclear if this is actually a problem in the current implementation without more context about how aliases are processed elsewhere.

Low

Previous suggestions

Suggestions up to commit 62a146b
CategorySuggestion                                                                                                                                    Impact
Possible issue
Handle NOT IN subquery case

The visitInSubqueryPredicate method doesn't handle the NOT IN case. The grammar
allows NOT? IN, but the implementation always creates a positive IN subquery. Check
ctx.NOT() and negate the result accordingly to support NOT IN subqueries.

api/src/main/java/org/opensearch/sql/api/parser/SqlV2QueryParser.java [90-94]

-private class ExtendedAstExpressionBuilder extends AstExpressionBuilder {
+@Override
+public UnresolvedExpression visitInSubqueryPredicate(InSubqueryPredicateContext ctx) {
+  UnresolvedPlan subquery = ExtendedAstBuilder.this.visit(ctx.querySpecification());
+  UnresolvedExpression inExpr = inSubquery(subquery, visit(ctx.predicate()));
+  return ctx.NOT() != null ? not(inExpr) : inExpr;
+}
 
-  @Override
-  public UnresolvedExpression visitInSubqueryPredicate(InSubqueryPredicateContext ctx) {
-    UnresolvedPlan subquery = ExtendedAstBuilder.this.visit(ctx.querySpecification());
-    return inSubquery(subquery, visit(ctx.predicate()));
-  }
-
Suggestion importance[1-10]: 8

__

Why: This is a critical correctness issue. The grammar allows NOT? IN for subqueries, but the implementation doesn't handle the NOT case, which would cause incorrect query results when NOT IN is used with subqueries.

Medium
Suggestions up to commit cc181cd
CategorySuggestion                                                                                                                                    Impact
General
Track alias in added fields

Adding Alias handling in the project expansion logic may cause duplicate field names
or incorrect field references if the alias is not properly tracked in addedFields.
Consider adding the alias name to addedFields to prevent duplicates and ensure
consistent field tracking across the expansion logic.

core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java [544-546]

 case Alias alias -> {
+  addedFields.add(alias.getName());
   expandedFields.add(rexVisitor.analyze(alias, context));
 }
Suggestion importance[1-10]: 6

__

Why: The suggestion addresses a potential issue where Alias fields might not be tracked in addedFields, which could lead to duplicates or inconsistent field tracking. Adding alias.getName() to addedFields would maintain consistency with how other field types are handled in the same method.

Low
Verify single predicate wrapping

The InSubquery constructor expects a list of expressions, but only a single
predicate is being passed. This may cause issues if the IN clause should support
multiple values on the left side (e.g., (col1, col2) IN (SELECT ...)). Verify that
wrapping a single predicate in a list is the intended behavior for all IN subquery
cases.

api/src/main/java/org/opensearch/sql/api/parser/SqlV2QueryParser.java [88-91]

 public UnresolvedExpression visitInSubqueryPredicate(InSubqueryPredicateContext ctx) {
   UnresolvedPlan subquery = ExtendedAstBuilder.this.visit(ctx.querySpecification());
-  return new InSubquery(List.of(visit(ctx.predicate())), subquery);
+  UnresolvedExpression predicate = visit(ctx.predicate());
+  return new InSubquery(List.of(predicate), subquery);
 }
Suggestion importance[1-10]: 5

__

Why: The suggestion correctly identifies that a single predicate is wrapped in a list for InSubquery. However, the 'improved_code' is functionally identical to the 'existing_code', only extracting the predicate to a variable. The concern about multi-column IN clauses is valid but requires verification rather than a code change, limiting the impact.

Low

@dai-chen dai-chen force-pushed the feature/sql-v2-subquery branch from cc181cd to 62a146b Compare May 15, 2026 22:19
@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit 62a146b

Add grammar rules (inSubqueryPredicate, existsSubqueryExpressionAtom)
and wire them through ExtendedAstExpressionBuilder to produce InSubquery
and ExistsSubquery AST nodes for the Calcite-based unified query path.

Base AstExpressionBuilder throws SyntaxCheckException to preserve legacy
engine fallback. AstBuilder now uses createExpressionBuilder() factory
method to allow subclass customization.

Also add Alias handling in CalciteRelNodeVisitor.expandProjectFields
required for any non-SELECT * query in the unified path.

Signed-off-by: Chen Dai <daichen@amazon.com>
@dai-chen dai-chen force-pushed the feature/sql-v2-subquery branch from 62a146b to bd3074d Compare May 16, 2026 04:45
@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit bd3074d

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request SQL

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant