Skip to content

MDEV-38591: Implement MEMBER OF and NOT MEMBER OF operators#5278

Draft
kjarir wants to merge 6 commits into
MariaDB:mainfrom
kjarir:bb-38591-member-of-grammar
Draft

MDEV-38591: Implement MEMBER OF and NOT MEMBER OF operators#5278
kjarir wants to merge 6 commits into
MariaDB:mainfrom
kjarir:bb-38591-member-of-grammar

Conversation

@kjarir

@kjarir kjarir commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

This PR adds native support for the SQL-standard value MEMBER OF (json_doc) and value NOT MEMBER OF (json_doc) containment predicates. It builds on MDEV-13645 (JSON_QUOTE numeric/NULL fixes) which is included in this branch.

Rather than reimplementing a JSON search engine, Item_func_member_of composes existing items internally. It holds Item_func_json_quote and Item_func_json_contains as child items. In fix_length_and_dec(), if the candidate operand is not already JSON-typed, it gets wrapped in json_quote_item. The actual containment check is then delegated to an internal json_contains_item(json_doc, candidate). AST traversal methods (walk(), transform(), compile(), propagate_equal_fields(), update_used_tables()) explicitly expose these hidden child items to the optimizer, following the Item_in_optimizer precedent.

negation:
Item_func_member_of derives from Item_func_opt_neg, the same pattern used by Item_func_between and Item_func_in, rather than wrapping in a generic Item_func_not. NULL propagation is handled directly in val_bool() via negated ? !res : res on non-NULL outcomes. print() emits value not member of (json_doc) to preserve surface syntax in optimizer traces and EXPLAIN EXTENDED.

json validation and error attribution:
val_bool() performs explicit JSON validation before delegating to json_contains. A persistent json_engine_t je_val member with a stack buffer initialized at prepare-time runs a full-document scan to catch trailing garbage that a simple json_read_value() would miss. Syntax errors are intercepted and reported via report_json_error_ex() attributed to "member of" with correct 0-based argument indices. This prevents warning misattribution to "json_contains" or wrong argument numbers leaking through to the user.

Parser:
MEMBER_SYM is registered alphabetically in lex.h and declared via %token. It is added to keyword_sp_var_and_label to keep it non-reserved, resolving prior reserved keyword conflicts with MySQL. Grammar rules cover both forms:

predicate MEMBER_SYM OF_SYM '(' expr ')'
predicate not MEMBER_SYM OF_SYM '(' expr ')' %prec MEMBER_SYM

A state-by-state conflict analysis confirms zero new conflicts introduced. Parser conflict count stays at 71, no %expect bump needed.

Test coverage:
New tests in mysql-test/suite/json/t/member_of.test cover scalar member checks, type-strict matches, correct NULL propagation under negation, malformed JSON warning attribution, prepared statement re-execution, nested container checks, table column cross-joins, and EXPLAIN EXTENDED round-trip print verification.

Out of scope: Multi-valued index support (MySQL WL#8955/WL#8763) is not part of this implementation.

Please Note:

This branch contains a merge commit (9f17b9a1af9 merging bb-13645-json_quote-numeric into bb-38591-member-of-grammar). I am happy to rebase and squash this branch chain into a clean linear history if preferred by reviewers.

kjarir added 6 commits June 24, 2026 15:58
JSON_QUOTE currently returns SQL NULL for any non-string argument,
including integers, decimals, and SQL NULL itself. This is incorrect:

  JSON_QUOTE(17)   -> should produce 17  (unquoted JSON number)
  JSON_QUOTE(NULL) -> should produce null (JSON null literal as string)

Fix Item_func_json_quote::val_str() to branch on result_type():

  - NULL input (args[0]->null_value): write the 4-character literal
    'null' into str, clear null_value=0, and return str. The function
    never returns SQL NULL for a SQL NULL argument.

  - STRING_RESULT: existing quote+escape logic, unchanged. Use the
    returned pointer from val_str() (not &tmp_s) to handle const-item
    optimisations where the returned pointer may differ from the buffer.

  - INT_RESULT / REAL_RESULT / DECIMAL_RESULT: copy the text
    representation unquoted and unescaped into str. The engine's
    val_str() already produces valid JSON number text.

  - default (ROW_RESULT, etc.): preserve original NULL-returning
    fallback for unrecognised types.

fix_length_and_dec(): add comment explaining why set_maybe_null() is
no longer needed - since SQL NULL now maps to the JSON null string,
the function always produces a non-NULL STRING result (the only
failure path is OOM in the string allocation itself).

mysql-test/main/func_json.test: add MDEV-13645 test cases covering
integer, decimal, float, NULL (with isnull() verify), string
regression, and JSON_CONTAINS composition.
Adds the SQL standard "value MEMBER OF (json_doc)" predicate.

This supersedes the placeholder val_bool() from the prior grammar-only
commit on this branch, which always returned false.

Design: composition over reimplementation.  fix_length_and_dec()
wraps the candidate in an internal Item_func_json_quote when it is not
already JSON-typed, then builds an internal Item_func_json_contains
(json_doc, candidate) that performs the actual containment check.
val_bool() pre-validates both operands with a full json_engine_t scan
and then delegates to json_contains_item.

The persistent je_val engine, wired once in fix_length_and_dec() via
mem_root_dynamic_array_init(), serves two purposes:
  (1) errors are attributed to "member of" at the correct argument
      index rather than to "json_contains"; and
  (2) a full-document scan catches malformed trailing content that
      json_contains would miss if it found an early match.

walk/transform/compile/propagate_equal_fields/update_used_tables are
overridden to expose the hidden helper items to the optimizer,
following the Item_in_optimizer precedent for items that maintain
child Items outside the normal args[] array.

Test coverage in mysql-test/suite/json/t/member_of.test:
  - scalar member checks and type-strict non-matches
  - SQL NULL propagation for candidate and container
  - malformed JSON error attribution (argument 1 and 2)
  - prepared-statement re-execution (fix_length_and_dec lifecycle)
  - JSON-typed candidate passthrough via is_json_type() branch
  - nested array/object containment via JSON_COMPACT()
  - table-column evaluation with cross-join

NOTE: this patch depends on the JSON_QUOTE single-argument bug fix
from MDEV-13645 being merged before this lands.
Implement the NOT MEMBER OF operator syntax by integrating negation directly into the existing Item_func_member_of class, matching the established pattern used for BETWEEN and IN operators.

- Base Class Refactor: Switch Item_func_member_of's base class from Item_bool_func to Item_func_opt_neg.
- Item_func_opt_neg: Introduce a new 2-argument constructor to accommodate the Item_func_member_of signature.
- Grammar Rules: Add a 'predicate not MEMBER_SYM OF_SYM' production in sql_yacc.yy using the %prec MEMBER_SYM annotation. A rigorous state-by-state conflict comparison confirms that this addition introduces zero new shift/reduce conflicts, keeping the parser conflict count exactly at 71.
- Print Logic: Modify Item_func_member_of::print() to prepend " not" before " member of" when the item's negated flag is set, matching the printed surface syntax.
- Execution: Update Item_func_member_of::val_bool() to negate and return the boolean result only on non-NULL evaluations, ensuring NULL propagation behavior remains correct and unaffected.
- Testing: Add a comprehensive test section to member_of.test to verify correct output values, type-strict comparisons, NULL propagation, warning attribution, and EXPLAIN EXTENDED round-trip syntax.
Remove stale ER_INCORRECT_TYPE error echo annotations and update inline comments in json_no_table.test to reflect that JSON_QUOTE now accepts NULL and numeric values. Re-record json_no_table.result.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements the SQL standard MEMBER OF and NOT MEMBER OF operators (MDEV-38591) and updates JSON_QUOTE to support numeric and NULL arguments (MDEV-13645). The review feedback highlights several critical and high-severity issues: a potential double-free or use-after-free vulnerability in shallow_copy due to raw pointer copying, type safety issues from using specific subclass pointers for helper items instead of Item*, double evaluation of non-deterministic arguments in val_bool(), and the need to sync helper arguments during optimization in update_used_tables(). Additionally, minor improvements are suggested to value-initialize the je_val struct and clean up C-style casts.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread sql/item_jsonfunc.h
Comment on lines +1186 to +1187
Item *shallow_copy(THD *thd) const override
{ return get_item_copy<Item_func_member_of>(thd, this); }

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-critical critical

Performing a raw memcpy via get_item_copy copies the internal pointers json_quote_item and json_contains_item, as well as the je_val.stack dynamic array. This leads to severe bugs where the cloned item shares the exact same internal helper items and the same dynamic array buffer as the original. If either item is evaluated or resized, it can lead to double-free, use-after-free, or memory corruption. We must reset these pointers and zero-initialize je_val in the clone so they are safely re-created during the clone's fix_fields phase.

  Item *shallow_copy(THD *thd) const override
  {
    Item_func_member_of *copy= get_item_copy<Item_func_member_of>(thd, this);
    if (copy)
    {
      copy->json_quote_item= NULL;
      copy->json_contains_item= NULL;
      memset(&copy->je_val, 0, sizeof(copy->je_val));
    }
    return copy;
  }

Comment thread sql/item_jsonfunc.h
Comment on lines +1164 to +1165
Item_func_json_quote *json_quote_item;
Item_func_json_contains *json_contains_item;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Declaring json_quote_item and json_contains_item as specific subclass pointers (Item_func_json_quote* and Item_func_json_contains*) is unsafe and leads to undefined behavior. During fix_fields or constant folding, these items can be simplified and replaced by other item types (such as Item_const or Item_string). If that happens, the pointers will be overwritten with objects of a different class hierarchy, violating type safety and potentially causing crashes when virtual methods are called. Declaring them as Item* is completely type-safe and avoids the need for dangerous C-style casts in fix_length_and_dec.

  Item *json_quote_item;
  Item *json_contains_item;

Comment thread sql/item_jsonfunc.cc
Comment on lines +6586 to +6601
else
{
(void) args[0]->val_str(&tmp_candidate);
if (args[0]->null_value)
{
null_value= 1;
return false;
}
}

// 4. Delegate the containment check to the composed JSON_CONTAINS item
bool res= json_contains_item->val_bool();
null_value= json_contains_item->null_value;
if (null_value)
return false;
return negated ? !res : res;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

When is_json_type(args[0]) is false, evaluating args[0]->val_str() in step 3 solely to check for SQL NULL results in double evaluation of args[0] (once here, and once inside json_contains_item->val_bool()). If args[0] is a non-deterministic subquery or stored function (e.g., containing RAND()), this can lead to inconsistent results and significant performance overhead. Instead, we can skip evaluating args[0] in step 3 and simply check args[0]->null_value after json_contains_item->val_bool() has executed.

  // 4. Delegate the containment check to the composed JSON_CONTAINS item
  bool res= json_contains_item->val_bool();
  if (args[0]->null_value)
  {
    null_value= 1;
    return false;
  }
  null_value= json_contains_item->null_value;
  if (null_value)
    return false;
  return negated ? !res : res;

Comment thread sql/item_jsonfunc.cc
Comment on lines +6732 to +6739
void Item_func_member_of::update_used_tables()
{
Item_bool_func::update_used_tables();
if (json_quote_item)
json_quote_item->update_used_tables();
if (json_contains_item)
json_contains_item->update_used_tables();
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

During optimization, args[0] and args[1] can be transformed, replaced, or simplified. To ensure that the internal helper items (json_quote_item and json_contains_item) always evaluate the correct, optimized arguments, we should explicitly sync their arguments to the current args[0] and args[1] of Item_func_member_of before updating their used tables.

void Item_func_member_of::update_used_tables()
{
  Item_bool_func::update_used_tables();
  if (json_quote_item)
  {
    json_quote_item->arguments()[0]= args[0];
    json_quote_item->update_used_tables();
  }
  if (json_contains_item)
  {
    json_contains_item->arguments()[0]= args[1];
    json_contains_item->arguments()[1]= json_quote_item ? json_quote_item : args[0];
    json_contains_item->update_used_tables();
  }
}

Comment thread sql/item_jsonfunc.h
Comment on lines +1173 to +1175
Item_func_member_of(THD *thd, Item *a, Item *b):
Item_func_opt_neg(thd, a, b), json_quote_item(NULL), json_contains_item(NULL)
{}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The je_val member (of type json_engine_t) is a POD struct and is left uninitialized in the constructor. It is highly recommended to value-initialize it to zero to prevent garbage values in its fields before fix_length_and_dec is called.

  Item_func_member_of(THD *thd, Item *a, Item *b):
    Item_func_opt_neg(thd, a, b), json_quote_item(NULL), json_contains_item(NULL),
    je_val{}
    {}

Comment thread sql/item_jsonfunc.cc
Comment on lines +6630 to +6643
json_quote_item= new (thd->mem_root) Item_func_json_quote(thd, args[0]);
if (!json_quote_item)
return true;
if (json_quote_item->fix_fields_if_needed(thd, (Item **)&json_quote_item))
return true;
if (contains_args.push_back(json_quote_item, thd->mem_root))
return true;
}

json_contains_item= new (thd->mem_root) Item_func_json_contains(thd, contains_args);
if (!json_contains_item)
return true;
if (json_contains_item->fix_fields_if_needed(thd, (Item **)&json_contains_item))
return true;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

With json_quote_item and json_contains_item declared as Item*, we can completely remove the dangerous and undefined (Item**) casts, making the code clean and type-safe.

    json_quote_item= new (thd->mem_root) Item_func_json_quote(thd, args[0]);
    if (!json_quote_item)
      return true;
    if (json_quote_item->fix_fields_if_needed(thd, &json_quote_item))
      return true;
    if (contains_args.push_back(json_quote_item, thd->mem_root))
      return true;
  }

  json_contains_item= new (thd->mem_root) Item_func_json_contains(thd, contains_args);
  if (!json_contains_item)
    return true;
  if (json_contains_item->fix_fields_if_needed(thd, &json_contains_item))
    return true;

@grooverdan grooverdan added GSoC External Contribution All PRs from entities outside of MariaDB Foundation, Corporation, Codership agreements. labels Jun 25, 2026
@grooverdan grooverdan self-assigned this Jun 25, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

External Contribution All PRs from entities outside of MariaDB Foundation, Corporation, Codership agreements. GSoC

Development

Successfully merging this pull request may close these issues.

2 participants