Skip to content

fix(dataplane): fix batch replay oom#2669

Open
olamilekan000 wants to merge 1 commit into
mainfrom
fix/batch-replay-oom-dos
Open

fix(dataplane): fix batch replay oom#2669
olamilekan000 wants to merge 1 commit into
mainfrom
fix/batch-replay-oom-dos

Conversation

@olamilekan000

@olamilekan000 olamilekan000 commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

Change fixes a batch replay OOM risk where the handler defaulted to
a 2B(2000000000) page size and loaded all matching events into memory in one request.
It caps page size at 1000, paginates through results in
BatchReplayEventService, and applies pagination middleware to batch replay
routes.


Note

Medium Risk
Changes event replay throughput and behavior on mid-run failures (partial replays may occur); memory risk is reduced but large replays still enqueue many queue jobs in one request.

Overview
Fixes batch replay OOM by stopping the handler from forcing an enormous single-page fetch (~2B events) and loading everything into memory at once.

NormalizeBatchReplayPageable (page size 1000, forward direction, cursors reset from the start) is applied in BatchReplayEvents and BatchReplayEventService, which now loops through LoadEventsPaged until there is no next page. Dashboard list-view cursors are ignored so replay starts from the full filtered set, not a UI page.

If a fetch fails after some replays, the API returns 500 with success/failure counts and an incomplete message instead of a generic error only.

Reviewed by Cursor Bugbot for commit ef2a027. Bugbot is set up for automated code reviews on this repo. Configure here.

@olamilekan000 olamilekan000 force-pushed the fix/batch-replay-oom-dos branch from 91e8f6c to 5c00519 Compare June 17, 2026 20:58

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 5c00519. Configure here.

Comment thread services/batch_replay_event.go Outdated
Comment thread services/batch_replay_event.go Outdated
@olamilekan000 olamilekan000 force-pushed the fix/batch-replay-oom-dos branch from 5c00519 to c11a9a0 Compare June 17, 2026 21:10
Comment thread api/api.go Outdated
eventRouter.With(handler.RequireEnabledProject(), handler.RequireEnabledOrganisation()).Post("/broadcast", handler.CreateBroadcastEvent)
eventRouter.With(handler.RequireEnabledProject(), handler.RequireEnabledOrganisation()).Post("/dynamic", handler.CreateDynamicEvent)
eventRouter.With(handler.RequireEnabledProject(), handler.RequireEnabledOrganisation()).Post("/batchreplay", handler.BatchReplayEvents)
eventRouter.With(handler.RequireEnabledProject(), handler.RequireEnabledOrganisation(), middleware.Pagination).Post("/batchreplay", handler.BatchReplayEvents)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This changes batch replay semantics when the caller includes list pagination params. The dashboard filter type already carries next_page_cursor, prev_page_cursor, and direction, and batchReplayEvent() sends the saved queryParams to /events/batchreplay. With this middleware, the replay starts from that cursor while /countbatchreplayevents still counts the full filter set, so the confirmation count can say N events but only the current page/window is replayed.

For a bulk replay endpoint, we should ignore caller cursors and use pagination only internally: start from the first cursor, force direction=next, cap perPage, and loop until done. Alternatively, do not attach middleware.Pagination to this route and let BatchReplayEventService build its own internal pageable.

@olamilekan000 olamilekan000 force-pushed the fix/batch-replay-oom-dos branch from c11a9a0 to ef2a027 Compare June 19, 2026 09:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants