Skip to content

MDEV-38970: Streaming window functions step 1#5267

Draft
OmarGamal10 wants to merge 9 commits into
MariaDB:mainfrom
OmarGamal10:mdev-38970
Draft

MDEV-38970: Streaming window functions step 1#5267
OmarGamal10 wants to merge 9 commits into
MariaDB:mainfrom
OmarGamal10:mdev-38970

Conversation

@OmarGamal10

@OmarGamal10 OmarGamal10 commented Jun 23, 2026

Copy link
Copy Markdown
Contributor

This PR is the initial implementation of the streaming window functions path as part of my GSOC project.
It's still under development. I'll update this accordingly during development.

To first highlight what's not there yet and what's not working

  • Tests have not yet been added, testing was done manually until this point.
  • Specifically tests for more complex scenarios (expressions, subqueries, etc) I did not try beyond the existing ones in the suite.
  • I know sometimes GROUP BY does not materialize, but I skipped this case for now and abort streaming if a GROUP BY exists in the main query.

Note: Many comments are there just for myself, and some namings are to be changed.

What's been added for now:

  • A Window_funcs_sort_streaming object, the naming comes from the fact that our criteria can be defined as having only one Window_funcs_sort object only, hence the sort in the name. Contains:
    • A setup function for setting up trackers, cursor managers and setting phase to computation (like how it's done in compute_window_func)
    • process_row method, intended to iterate the window functions and run for the current row.
  • have_streaming_window_funcs(), a preparation time function to check the criteria for streaming, it checks:
    • All window functions have a compatible ordering criteria
    • No aggregation functions exist within any window function (rank() over(max(a)))
    • Frame is default (unbounded preceding) or current row only.
    • Whether the window function sort order is compatible with the main query ORDER BY, and which is longer.
  • We run have_streaming_window_funcs() in preparation time to know if streaming CAN be satisfied (anything else needing a temp table falls back to materialization)
  • A window_funcs_streaming_step is added to the JOIN_TAB, analog to window_funcs_step, responsible for setting up window functions and holding the state for the trackers and cursors across rows. (it points to a Window_funcs_sort_streaming object)
  • end_compute_window_func() is attached to the last real table (last table is always real in this case), which calls process_row on the current row in the join loop, and sends it to the client.

Next step:

  • Manual testing shows expected results, for row_number(), rank() and dense_rank() work as expected, they stream when ordering is compatible, reuse the main query ordering if applicable and work for indexed and non indexed orderings.

As I said, this is initial, it has most of what we want to implement for streaming, but needs some cleaning up and some reviewing, as I reused much of the logic.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces support for streaming window functions, allowing certain functions like row number, rank, and dense rank to be computed on the fly without materializing into a temporary table. The code review feedback highlights several critical issues, including potential null pointer dereferences (crashes) due to missing checks on partition_list and order_list, a logic bug in compare_order_lists when handling trailing constants, and an ignored return value in cursor setup. Additionally, the reviewer suggests caching the THD pointer to avoid expensive thread-local lookups in the performance-critical per-row execution path, and resolving an inconsistency in how default versus explicit frames are handled.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread sql/sql_window.cc
Comment on lines +554 to +593
static int compare_order_lists(ORDER *list1, int spec_number1, ORDER *list2,
int spec_number2)
{
if (!list1 && !list2)
return CMP_EQ;
if (!list1)
return CMP_LT_C;
if (!list2)
return CMP_GT_C;
ORDER *elem1= list1;
ORDER *elem2= list2;
for (; elem1 && elem2; elem1= elem1->next, elem2= elem2->next)
{
int cmp;
// remove all constants as we don't need them for comparision
while (elem1 && ((*elem1->item)->real_item())->const_item())
{
elem1= elem1->next;
continue;
}

while (elem2 && ((*elem2->item)->real_item())->const_item())
{
elem2= elem2->next;
continue;
}

if (!elem1 || !elem2)
break;

if ((cmp=
compare_order_elements(elem1, spec_number1, elem2, spec_number2)))
return cmp;
}
if (elem1)
return CMP_GT_C;
if (elem2)
return CMP_LT_C;
return CMP_EQ;
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The current implementation of compare_order_lists has a bug where trailing constants in either list are not skipped. Because the loop condition is elem1 && elem2, the loop terminates as soon as one list runs out of elements. Any remaining trailing constants in the other list are not skipped, causing the function to incorrectly return CMP_GT_C or CMP_LT_C instead of CMP_EQ. We should restructure the loop to skip constants first before performing any length or element comparisons.

static int compare_order_lists(ORDER *list1, int spec_number1, ORDER *list2,
                               int spec_number2)
{
  ORDER *elem1= list1;
  ORDER *elem2= list2;
  while (true)
  {
    while (elem1 && ((*elem1->item)->real_item())->const_item())
      elem1= elem1->next;
    while (elem2 && ((*elem2->item)->real_item())->const_item())
      elem2= elem2->next;

    if (!elem1 && !elem2)
      return CMP_EQ;
    if (!elem1)
      return CMP_LT_C;
    if (!elem2)
      return CMP_GT_C;

    int cmp;
    if ((cmp= compare_order_elements(elem1, spec_number1, elem2, spec_number2)))
      return cmp;

    elem1= elem1->next;
    elem2= elem2->next;
  }
}

Comment thread sql/sql_window.cc
Comment on lines +855 to +866
static inline bool check_argument_list_aggregation(Window_spec *win_spec)
{
for (ORDER *o= win_spec->partition_list->first; o; o= o->next)
if ((*o->item)->with_sum_func())
return true;

for (ORDER *o= win_spec->order_list->first; o; o= o->next)
if ((*o->item)->with_sum_func())
return true;

return false;
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

partition_list and order_list can be NULL if the window specification does not contain PARTITION BY or ORDER BY clauses. Accessing win_spec->partition_list->first or win_spec->order_list->first without a null check will cause a server crash (segmentation fault). We must add null checks before accessing these lists.

static inline bool check_argument_list_aggregation(Window_spec *win_spec)
{
  if (win_spec->partition_list)
  { 
    for (ORDER *o= win_spec->partition_list->first; o; o= o->next)
      if ((*o->item)->with_sum_func())
        return true;
  }

  if (win_spec->order_list)
  { 
    for (ORDER *o= win_spec->order_list->first; o; o= o->next)
      if ((*o->item)->with_sum_func())
        return true;
  }

  return false;
}

Comment thread sql/sql_window.cc Outdated
Comment on lines +899 to +900
int win_func_order_elements=
spec->partition_list->elements + spec->order_list->elements;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Similar to the crash vector in check_argument_list_aggregation, spec->partition_list and spec->order_list can be NULL. Accessing spec->partition_list->elements or spec->order_list->elements directly will cause a crash. We should safely handle NULL pointers here.

    int win_func_order_elements=
        (spec->partition_list ? spec->partition_list->elements : 0) +
        (spec->order_list ? spec->order_list->elements : 0);

Comment thread sql/sql_window.cc
Comment on lines +912 to +915
longest_wf_order= concat_order_lists(
thd->mem_root,
win_func_with_longest_order->window_spec->partition_list->first,
win_func_with_longest_order->window_spec->order_list->first);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Similar to the previous crash vectors, partition_list and order_list can be NULL. We must check if they are non-NULL before accessing their first member to avoid a null pointer dereference.

  longest_wf_order= concat_order_lists(
      thd->mem_root,
      win_func_with_longest_order->window_spec->partition_list ?
          win_func_with_longest_order->window_spec->partition_list->first : NULL,
      win_func_with_longest_order->window_spec->order_list ?
          win_func_with_longest_order->window_spec->order_list->first : NULL);

Comment thread sql/sql_window.cc
Comment thread sql/sql_window.cc Outdated
Comment thread sql/sql_window.cc
Comment thread sql/sql_window.cc

/* Check if we found any error in the window function while adding values
through cursors. */
if (unlikely(current_thd->is_error() || current_thd->is_killed()))

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Use the cached thd pointer instead of calling current_thd in this performance-critical loop.

    if (unlikely(thd->is_error() || thd->is_killed()))
References
  1. Avoid calling thread-local lookups like current_thd or thd_to_trx in performance-critical paths (such as per-record operations or frequent lookups). Instead, pass down or cache the transaction pointer (trx_t) to reduce overhead.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, this makes sense as it's called per row. I copied the execution loop (with the comments), so I left it like it is. Will change it now.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be worth changing in the materialization path too

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's not touch the existing code at this point, at least this should be done with a separate commit.

Comment thread sql/sql_window.h
@Olernov Olernov added the GSoC label Jun 23, 2026
@Olernov Olernov self-requested a review June 23, 2026 09:38
@gkodinov gkodinov added the External Contribution All PRs from entities outside of MariaDB Foundation, Corporation, Codership agreements. label Jun 23, 2026
Comment thread sql/sql_select.cc
// we don't even need to pass the row to the window function, because the
// add() functions read from the TABLE::record[0] directly, as we did
// NOT call split_sum_func(), so we still point to base table
(join_tab - 1)->window_funcs_streaming_step->process_row();

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

process_rows() return value is ignored: potential error or KILLed query status is swallowed, add a check please

Comment thread sql/sql_window.cc
while ((win_func= it++))
{
Group_bound_tracker *tracker=
new Group_bound_tracker(thd, win_func->window_spec->partition_list);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This memory leaks, needs to be deallocated in the destructor or some cleanup method. As an option, you can allocate it on the query arena like new (thd->mem_root) Group_bound_tracker then it be will freed automatically. But it's better to follow the same pattern other window functions classes do.

Comment thread sql/sql_window.cc
// i would skip this now
List<Cursor_manager> cursor_managers;
if (get_window_functions_required_cursors(thd, window_functions,
&cursor_managers))

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cursor_managers are not deallocated, other window functions do it via delete_elements()

@mariadb-OlegSmirnov

mariadb-OlegSmirnov commented Jun 28, 2026

Copy link
Copy Markdown
Contributor

win_streaming.test does not prove correctness of results. Would be nice to see the same query executed in both variants: materialized vs streaming so results are comparable (maybe we can dismiss the streaming by adding SQL_BUFFER_RESULT after SELECT?).
As I mentioned earlier, do we really need EXPLAIN FORMAT=JSON everywhere or EXPLAIN EXTENDED is enough?
I also suggest using the pattern

let $q = SELECT ...;
eval $q;
eval explain format=json $q;

in tests to avoid query text duplication.

Comment thread sql/sql_window.cc

/* Check if we found any error in the window function while adding values
through cursors. */
if (unlikely(current_thd->is_error() || current_thd->is_killed()))

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By the way, current_thd is still here despite commit 227812d claims to cache THD

Comment thread sql/sql_window.cc

static
void order_window_funcs_by_window_specs(List<Item_window_func> *win_func_list)
static void

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change of formatting unrelated to our work, please roll it back.

Comment thread sql/sql_window.cc
{
if (win_func_list->elements == 0)
return;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above

Comment thread sql/sql_window.cc
{
curr->marker= (MARKER_SORTORDER_CHANGE |
MARKER_PARTITION_CHANGE |
curr->marker= (MARKER_SORTORDER_CHANGE | MARKER_PARTITION_CHANGE |

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above

Comment thread sql/sql_window.cc
while ((win_func= it++))
{
Group_bound_tracker *tracker=
new Group_bound_tracker(thd, win_func->window_spec->partition_list);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tracker needs to be checked for successful allocation

Comment thread sql/sql_window.cc
find_longest_compatible_order(List_iterator_fast<Item_window_func> &it)
{
int longest_order_elements= -1;
Item_window_func *longest, *win_func;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest initializing longest to nullptr and then check if it has been set in the loop below.

@OmarGamal10

Copy link
Copy Markdown
Contributor Author

win_streaming.test does not prove correctness of results. Would be nice to see the same query executed in both variants: materialized vs streaming so results are comparable (maybe we can dismiss the streaming by adding SQL_BUFFER_RESULT after SELECT?). As I mentioned earlier, do we really need EXPLAIN FORMAT=JSON everywhere or EXPLAIN EXTENDED is enough? I also suggest using the pattern

let $q = SELECT ...;
eval $q;
eval explain format=json $q;

in tests to avoid query text duplication.

I thought about this actually (running queries side by side) but thought I would need an optimizer switch just for that, I will try SQL_BUFFER_RESULT

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

External Contribution All PRs from entities outside of MariaDB Foundation, Corporation, Codership agreements. GSoC

Development

Successfully merging this pull request may close these issues.

4 participants