Skip to content

feat: implement retract_batch for array_agg sliding window support#22015

Open
SubhamSinghal wants to merge 1 commit intoapache:mainfrom
SubhamSinghal:array-agg-retract-batch
Open

feat: implement retract_batch for array_agg sliding window support#22015
SubhamSinghal wants to merge 1 commit intoapache:mainfrom
SubhamSinghal:array-agg-retract-batch

Conversation

@SubhamSinghal
Copy link
Copy Markdown
Contributor

Which issue does this PR close?

Closes #21957.

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Yes. Using UT

Are there any user-facing changes?

Yes. array_agg now works with bounded/sliding window frames. Queries that previously errored now succeed:

SELECT array_agg(x) OVER (ORDER BY t ROWS BETWEEN 1 PRECEDING AND CURRENT ROW) FROM t;

No breaking API changes

@github-actions github-actions Bot added sqllogictest SQL Logic Tests (.slt) functions Changes to functions implementation labels May 5, 2026
Copy link
Copy Markdown
Contributor

@kosiew kosiew left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SubhamSinghal
Thanks for the work here. I found one issue that looks like it can affect sliding window correctness for array_agg(... IGNORE NULLS)


let val = &values[0];
let mut to_retract = if self.ignore_nulls {
val.len() - val.null_count()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think retract_batch should use the same null definition as update_batch here.

update_batch filters IGNORE NULLS using val.logical_nulls(), but this path counts non-null rows with val.len() - val.null_count(). For arrays where logical nullability differs from physical nullability, such as dictionary or run arrays whose values contain logical nulls, the accumulator can store fewer rows than retract_batch later removes.

That means a sliding array_agg(... IGNORE NULLS) window can over-retract and accidentally drop following values from the frame. Could we switch this to the same null semantics, for example val.len() - val.logical_null_count(), and add a regression test with an input type where logical nulls differ from null_count()?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

functions Changes to functions implementation sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support array_agg as a sliding window aggregate by implementing retract_batch

2 participants