MDEV-38591: Implement MEMBER OF and NOT MEMBER OF operators#5278
MDEV-38591: Implement MEMBER OF and NOT MEMBER OF operators#5278kjarir wants to merge 6 commits into
Conversation
JSON_QUOTE currently returns SQL NULL for any non-string argument,
including integers, decimals, and SQL NULL itself. This is incorrect:
JSON_QUOTE(17) -> should produce 17 (unquoted JSON number)
JSON_QUOTE(NULL) -> should produce null (JSON null literal as string)
Fix Item_func_json_quote::val_str() to branch on result_type():
- NULL input (args[0]->null_value): write the 4-character literal
'null' into str, clear null_value=0, and return str. The function
never returns SQL NULL for a SQL NULL argument.
- STRING_RESULT: existing quote+escape logic, unchanged. Use the
returned pointer from val_str() (not &tmp_s) to handle const-item
optimisations where the returned pointer may differ from the buffer.
- INT_RESULT / REAL_RESULT / DECIMAL_RESULT: copy the text
representation unquoted and unescaped into str. The engine's
val_str() already produces valid JSON number text.
- default (ROW_RESULT, etc.): preserve original NULL-returning
fallback for unrecognised types.
fix_length_and_dec(): add comment explaining why set_maybe_null() is
no longer needed - since SQL NULL now maps to the JSON null string,
the function always produces a non-NULL STRING result (the only
failure path is OOM in the string allocation itself).
mysql-test/main/func_json.test: add MDEV-13645 test cases covering
integer, decimal, float, NULL (with isnull() verify), string
regression, and JSON_CONTAINS composition.
Adds the SQL standard "value MEMBER OF (json_doc)" predicate.
This supersedes the placeholder val_bool() from the prior grammar-only
commit on this branch, which always returned false.
Design: composition over reimplementation. fix_length_and_dec()
wraps the candidate in an internal Item_func_json_quote when it is not
already JSON-typed, then builds an internal Item_func_json_contains
(json_doc, candidate) that performs the actual containment check.
val_bool() pre-validates both operands with a full json_engine_t scan
and then delegates to json_contains_item.
The persistent je_val engine, wired once in fix_length_and_dec() via
mem_root_dynamic_array_init(), serves two purposes:
(1) errors are attributed to "member of" at the correct argument
index rather than to "json_contains"; and
(2) a full-document scan catches malformed trailing content that
json_contains would miss if it found an early match.
walk/transform/compile/propagate_equal_fields/update_used_tables are
overridden to expose the hidden helper items to the optimizer,
following the Item_in_optimizer precedent for items that maintain
child Items outside the normal args[] array.
Test coverage in mysql-test/suite/json/t/member_of.test:
- scalar member checks and type-strict non-matches
- SQL NULL propagation for candidate and container
- malformed JSON error attribution (argument 1 and 2)
- prepared-statement re-execution (fix_length_and_dec lifecycle)
- JSON-typed candidate passthrough via is_json_type() branch
- nested array/object containment via JSON_COMPACT()
- table-column evaluation with cross-join
NOTE: this patch depends on the JSON_QUOTE single-argument bug fix
from MDEV-13645 being merged before this lands.
Implement the NOT MEMBER OF operator syntax by integrating negation directly into the existing Item_func_member_of class, matching the established pattern used for BETWEEN and IN operators. - Base Class Refactor: Switch Item_func_member_of's base class from Item_bool_func to Item_func_opt_neg. - Item_func_opt_neg: Introduce a new 2-argument constructor to accommodate the Item_func_member_of signature. - Grammar Rules: Add a 'predicate not MEMBER_SYM OF_SYM' production in sql_yacc.yy using the %prec MEMBER_SYM annotation. A rigorous state-by-state conflict comparison confirms that this addition introduces zero new shift/reduce conflicts, keeping the parser conflict count exactly at 71. - Print Logic: Modify Item_func_member_of::print() to prepend " not" before " member of" when the item's negated flag is set, matching the printed surface syntax. - Execution: Update Item_func_member_of::val_bool() to negate and return the boolean result only on non-NULL evaluations, ensuring NULL propagation behavior remains correct and unaffected. - Testing: Add a comprehensive test section to member_of.test to verify correct output values, type-strict comparisons, NULL propagation, warning attribution, and EXPLAIN EXTENDED round-trip syntax.
Remove stale ER_INCORRECT_TYPE error echo annotations and update inline comments in json_no_table.test to reflect that JSON_QUOTE now accepts NULL and numeric values. Re-record json_no_table.result.
There was a problem hiding this comment.
Code Review
This pull request implements the SQL standard MEMBER OF and NOT MEMBER OF operators (MDEV-38591) and updates JSON_QUOTE to support numeric and NULL arguments (MDEV-13645). The review feedback highlights several critical and high-severity issues: a potential double-free or use-after-free vulnerability in shallow_copy due to raw pointer copying, type safety issues from using specific subclass pointers for helper items instead of Item*, double evaluation of non-deterministic arguments in val_bool(), and the need to sync helper arguments during optimization in update_used_tables(). Additionally, minor improvements are suggested to value-initialize the je_val struct and clean up C-style casts.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| Item *shallow_copy(THD *thd) const override | ||
| { return get_item_copy<Item_func_member_of>(thd, this); } |
There was a problem hiding this comment.
Performing a raw memcpy via get_item_copy copies the internal pointers json_quote_item and json_contains_item, as well as the je_val.stack dynamic array. This leads to severe bugs where the cloned item shares the exact same internal helper items and the same dynamic array buffer as the original. If either item is evaluated or resized, it can lead to double-free, use-after-free, or memory corruption. We must reset these pointers and zero-initialize je_val in the clone so they are safely re-created during the clone's fix_fields phase.
Item *shallow_copy(THD *thd) const override
{
Item_func_member_of *copy= get_item_copy<Item_func_member_of>(thd, this);
if (copy)
{
copy->json_quote_item= NULL;
copy->json_contains_item= NULL;
memset(©->je_val, 0, sizeof(copy->je_val));
}
return copy;
}| Item_func_json_quote *json_quote_item; | ||
| Item_func_json_contains *json_contains_item; |
There was a problem hiding this comment.
Declaring json_quote_item and json_contains_item as specific subclass pointers (Item_func_json_quote* and Item_func_json_contains*) is unsafe and leads to undefined behavior. During fix_fields or constant folding, these items can be simplified and replaced by other item types (such as Item_const or Item_string). If that happens, the pointers will be overwritten with objects of a different class hierarchy, violating type safety and potentially causing crashes when virtual methods are called. Declaring them as Item* is completely type-safe and avoids the need for dangerous C-style casts in fix_length_and_dec.
Item *json_quote_item;
Item *json_contains_item;| else | ||
| { | ||
| (void) args[0]->val_str(&tmp_candidate); | ||
| if (args[0]->null_value) | ||
| { | ||
| null_value= 1; | ||
| return false; | ||
| } | ||
| } | ||
|
|
||
| // 4. Delegate the containment check to the composed JSON_CONTAINS item | ||
| bool res= json_contains_item->val_bool(); | ||
| null_value= json_contains_item->null_value; | ||
| if (null_value) | ||
| return false; | ||
| return negated ? !res : res; |
There was a problem hiding this comment.
When is_json_type(args[0]) is false, evaluating args[0]->val_str() in step 3 solely to check for SQL NULL results in double evaluation of args[0] (once here, and once inside json_contains_item->val_bool()). If args[0] is a non-deterministic subquery or stored function (e.g., containing RAND()), this can lead to inconsistent results and significant performance overhead. Instead, we can skip evaluating args[0] in step 3 and simply check args[0]->null_value after json_contains_item->val_bool() has executed.
// 4. Delegate the containment check to the composed JSON_CONTAINS item
bool res= json_contains_item->val_bool();
if (args[0]->null_value)
{
null_value= 1;
return false;
}
null_value= json_contains_item->null_value;
if (null_value)
return false;
return negated ? !res : res;| void Item_func_member_of::update_used_tables() | ||
| { | ||
| Item_bool_func::update_used_tables(); | ||
| if (json_quote_item) | ||
| json_quote_item->update_used_tables(); | ||
| if (json_contains_item) | ||
| json_contains_item->update_used_tables(); | ||
| } |
There was a problem hiding this comment.
During optimization, args[0] and args[1] can be transformed, replaced, or simplified. To ensure that the internal helper items (json_quote_item and json_contains_item) always evaluate the correct, optimized arguments, we should explicitly sync their arguments to the current args[0] and args[1] of Item_func_member_of before updating their used tables.
void Item_func_member_of::update_used_tables()
{
Item_bool_func::update_used_tables();
if (json_quote_item)
{
json_quote_item->arguments()[0]= args[0];
json_quote_item->update_used_tables();
}
if (json_contains_item)
{
json_contains_item->arguments()[0]= args[1];
json_contains_item->arguments()[1]= json_quote_item ? json_quote_item : args[0];
json_contains_item->update_used_tables();
}
}| Item_func_member_of(THD *thd, Item *a, Item *b): | ||
| Item_func_opt_neg(thd, a, b), json_quote_item(NULL), json_contains_item(NULL) | ||
| {} |
There was a problem hiding this comment.
The je_val member (of type json_engine_t) is a POD struct and is left uninitialized in the constructor. It is highly recommended to value-initialize it to zero to prevent garbage values in its fields before fix_length_and_dec is called.
Item_func_member_of(THD *thd, Item *a, Item *b):
Item_func_opt_neg(thd, a, b), json_quote_item(NULL), json_contains_item(NULL),
je_val{}
{}| json_quote_item= new (thd->mem_root) Item_func_json_quote(thd, args[0]); | ||
| if (!json_quote_item) | ||
| return true; | ||
| if (json_quote_item->fix_fields_if_needed(thd, (Item **)&json_quote_item)) | ||
| return true; | ||
| if (contains_args.push_back(json_quote_item, thd->mem_root)) | ||
| return true; | ||
| } | ||
|
|
||
| json_contains_item= new (thd->mem_root) Item_func_json_contains(thd, contains_args); | ||
| if (!json_contains_item) | ||
| return true; | ||
| if (json_contains_item->fix_fields_if_needed(thd, (Item **)&json_contains_item)) | ||
| return true; |
There was a problem hiding this comment.
With json_quote_item and json_contains_item declared as Item*, we can completely remove the dangerous and undefined (Item**) casts, making the code clean and type-safe.
json_quote_item= new (thd->mem_root) Item_func_json_quote(thd, args[0]);
if (!json_quote_item)
return true;
if (json_quote_item->fix_fields_if_needed(thd, &json_quote_item))
return true;
if (contains_args.push_back(json_quote_item, thd->mem_root))
return true;
}
json_contains_item= new (thd->mem_root) Item_func_json_contains(thd, contains_args);
if (!json_contains_item)
return true;
if (json_contains_item->fix_fields_if_needed(thd, &json_contains_item))
return true;
This PR adds native support for the SQL-standard
value MEMBER OF (json_doc)andvalue NOT MEMBER OF (json_doc)containment predicates. It builds on MDEV-13645 (JSON_QUOTE numeric/NULL fixes) which is included in this branch.Rather than reimplementing a JSON search engine,
Item_func_member_ofcomposes existing items internally. It holdsItem_func_json_quoteandItem_func_json_containsas child items. Infix_length_and_dec(), if the candidate operand is not already JSON-typed, it gets wrapped injson_quote_item. The actual containment check is then delegated to an internaljson_contains_item(json_doc, candidate). AST traversal methods (walk(),transform(),compile(),propagate_equal_fields(),update_used_tables()) explicitly expose these hidden child items to the optimizer, following theItem_in_optimizerprecedent.negation:
Item_func_member_ofderives fromItem_func_opt_neg, the same pattern used byItem_func_betweenandItem_func_in, rather than wrapping in a genericItem_func_not. NULL propagation is handled directly inval_bool()vianegated ? !res : reson non-NULL outcomes.print()emitsvalue not member of (json_doc)to preserve surface syntax in optimizer traces andEXPLAIN EXTENDED.json validation and error attribution:
val_bool()performs explicit JSON validation before delegating tojson_contains. A persistentjson_engine_t je_valmember with a stack buffer initialized at prepare-time runs a full-document scan to catch trailing garbage that a simplejson_read_value()would miss. Syntax errors are intercepted and reported viareport_json_error_ex()attributed to"member of"with correct 0-based argument indices. This prevents warning misattribution to"json_contains"or wrong argument numbers leaking through to the user.Parser:
MEMBER_SYMis registered alphabetically inlex.hand declared via%token. It is added tokeyword_sp_var_and_labelto keep it non-reserved, resolving prior reserved keyword conflicts with MySQL. Grammar rules cover both forms:A state-by-state conflict analysis confirms zero new conflicts introduced. Parser conflict count stays at 71, no
%expectbump needed.Test coverage:
New tests in
mysql-test/suite/json/t/member_of.testcover scalar member checks, type-strict matches, correct NULL propagation under negation, malformed JSON warning attribution, prepared statement re-execution, nested container checks, table column cross-joins, andEXPLAIN EXTENDEDround-trip print verification.Out of scope: Multi-valued index support (MySQL WL#8955/WL#8763) is not part of this implementation.
Please Note: