claimDueSchedule() scales poorly with many schedules

Hey! We're running about ~950 schedules in production and hit a problem where midnight cron jobs fire hours late. Traced it back to `claimDueSchedule()`.

The current implementation calls `SMEMBERS` to get all schedule IDs, then loops through them client-side doing one `EVAL` per ID. Each of those is a Redis round-trip:

```
SMEMBERS schedules::index          → [id1, id2, …, idN]
EVAL CLAIM_SCHEDULE_SCRIPT id1     → nil
EVAL CLAIM_SCHEDULE_SCRIPT id2     → nil
…
EVAL CLAIM_SCHEDULE_SCRIPT idK     → claimed!
```

With ~950 schedules this gets slow fast — `#dispatchDueSchedules()` calls `claimDueSchedule()` in a loop until it returns `null`, so you end up with O(N) round-trips per claim × M claims.

We tested this by loading ~950 schedules into a local Redis instance, all set to become due at the same moment. Before starting the worker, we snapshotted every due schedule's target `next_run_at`. Then we ran the worker (concurrency 5) and waited for all schedules to be claimed. After that we read each schedule's `last_run_at` from Redis and compared it to the snapshotted target — giving us the actual drift per schedule and the total wall time from first to last claim.

We ran this twice: once with the current code, and once with a modified version where we replaced the per-ID client-side loop with a single Lua script that does the full `SMEMBERS` + `HGETALL` iteration server-side inside Redis — so one `EVAL` instead of N. 

| Metric | Before | After (server-side Lua) |
|---|---|---|
| Schedules due | 927 | 979 |
| Wall time (first → last claim) | 21.8s | 3.7s |
| Drift from target (min) | 3.8s | 839ms |
| Drift from target (avg) | 12.8s | 2.5s |
| Drift from target (max) | 25.6s | 4.6s |

This is on localhost with no network latency. In production where each round-trip to Redis has real latency, the gap would be wider since the old code does N round-trips per claim vs 2.

The method signature doesn't change at all. Cron `nextRunAt` recalculation still happens in JS (needs `cron-parser`), so there's one `HSET` after the claim, 2 round-trips total instead of N+1.

Happy to open a PR if you're interested.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

claimDueSchedule() scales poorly with many schedules #13

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Metric	Before	After (server-side Lua)
Schedules due	927	979
Wall time (first → last claim)	21.8s	3.7s
Drift from target (min)	3.8s	839ms
Drift from target (avg)	12.8s	2.5s
Drift from target (max)	25.6s	4.6s

Uh oh!

claimDueSchedule() scales poorly with many schedules #13

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions