items/async: fail fast on Redis outage (closes #647) by julietshen · Pull Request #653 · roostorg/coop

julietshen · 2026-05-29T22:21:41Z

Closes #647.

Summary

POST /api/v1/items/async used to return 202 even when Redis was unreachable. ioredis's default enableOfflineQueue: true buffers commands client-side, so queue.addBulk resolved cleanly while Redis was down and the DataLoader-batched write reported success. The submission survived as long as the API process stayed up and Redis came back — but the buffer is in-process, so a simultaneous Redis outage + API restart silently dropped every submission acknowledged with 202.

Fix

Three small changes:

server/iocContainer/index.ts — extract a makeRedis(extraOptions) helper so the existing IORedis connection wiring isn't copy-pasted, then add a second factory IORedisEnqueueNoBuffer that builds the same client with enableOfflineQueue: false. Wire itemSubmissionQueueBulkWrite to the new client. The shared IORedis keeps its default buffer for BullMQ workers (which legitimately need to ride through transient reconnects). Register the new client in closeSharedResourcesForShutdown.
server/routes/items/submitItems.ts — tighten the response flow. On bulkWriteResponse.error, short-circuit with 5xx and return. Before this fix the handler sent 500 and then fell through to a second res.status(202).end() call. Express swallowed the second one as "headers already sent" so clients got 500 in practice, but the code claimed both intents.

Verification

Verified locally with a standalone smoke test against the dev docker compose stack:

initial PING: PONG
stopping redis...
PING-while-down REJECTED: Stream isn't writeable and enableOfflineQueue options is false (GOOD)
starting redis...
PING-after-restore: PONG

The DLQ path (itemSubmissionRetryQueueBulkWrite) stays on the shared IORedis — it's worker-internal and buffering through brief reconnects is fine there.

What changed in observable behaviour

Redis up: identical to before (202 on success).
Redis briefly unreachable (stopped or paused): API now returns 5xx, telling the client to retry. Previously returned 202 against a buffer.
Redis intermittent (very brief disconnect during a single addBulk call): the new client rejects rather than buffering. Adopters who relied on the implicit buffering to ride through ms-scale blips will see more 5xx, but the trade is "honest 5xx" vs "false 202 + possible loss on restart." This is the intended contract per items/async returns 202 while Redis is unreachable (narrow data-loss on API restart) #647.

Follow-ups

Phase 1 of the outage integ suite (Add outage integration test: HMA + Redis (phase 1 of #343) #646) currently pins the pre-fix lossy contract for the items/async path. Once both Add outage integration test: HMA + Redis (phase 1 of #343) #646 and this PR land, the Redis scenarios in outage.integ.test.ts flip from "lands after recovery" to "5xx + no Scylla row." I'll do that follow-up in a rebase of Add outage integration test: HMA + Redis (phase 1 of #343) #646 after this merges.

Test plan

Standalone smoke test confirms enableOfflineQueue: false client rejects when Redis is stopped and recovers cleanly when restarted
items-submission.integ.test.ts (the happy-path integ test) still passes — confirms the new wiring doesn't regress submissions when Redis is healthy
Reviewer: confirm the DLQ path staying on the shared IORedis is the right call (rationale: worker-internal, transient reconnects acceptable)

🤖 Generated with Claude Code

`POST /api/v1/items/async` used to return 202 even when Redis was unreachable. ioredis's default `enableOfflineQueue: true` buffers commands client-side, so `queue.addBulk` resolved cleanly while Redis was down and the DataLoader-batched write reported success. The submission survived as long as the API process stayed up and Redis came back — but the buffer is in-process, so a simultaneous Redis outage + API restart silently dropped every submission that had been acknowledged with 202. Fix: - Add a dedicated `IORedisEnqueueNoBuffer` client built with `enableOfflineQueue: false`. Same Redis as the shared `IORedis`, same connection wiring extracted into a `makeRedis(extraOptions)` helper to avoid copy-pasting. - Wire `itemSubmissionQueueBulkWrite` to the new client. The retry/ DLQ path stays on the shared `IORedis` — it's worker-internal and buffering is fine there. - Tighten the handler in `submitItems.ts`: on `bulkWriteResponse.error` short-circuit with 5xx and `return`. Previously it sent 500 *and* then fell through to a second `res.status(202).end()` call. Express swallowed the second one as "headers already sent" so the client got 500 in practice, but the code claimed both intents. - Register the new client in `closeSharedResourcesForShutdown` so its connection is closed on shutdown. With this change, `queue.addBulk` rejects immediately when Redis is unreachable ("Stream isn't writeable and enableOfflineQueue options is false") — verified locally against a stopped Redis container — and the existing 5xx branch fires. The shared `IORedis` keeps its default buffer for the BullMQ worker side, which legitimately needs to ride through transient reconnects. Follow-ups: - Phase 1 of the outage integ suite (#646) currently pins the pre-fix lossy contract for items/async. Once both #646 and this PR land, the Redis scenarios in `outage.integ.test.ts` flip from "lands after recovery" to "5xx + no Scylla row". Filed as part of #646's review. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

coderabbitai · 2026-05-29T22:24:20Z

Warning

Review limit reached

@julietshen, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 34 minutes and 23 seconds. Learn how PR review limits work.

Your organization has run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 606802cc-9a32-45f8-a4d3-84c318dca0fc

📥 Commits

Reviewing files that changed from the base of the PR and between b98d11b and 8b75963.

📒 Files selected for processing (2)

server/iocContainer/index.ts
server/routes/items/submitItems.ts

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix-redis-offline-queue-data-loss

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

julietshen requested review from cassidyjames, dom-notion, juanmrad and vinaysrao1 as code owners May 29, 2026 22:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

items/async: fail fast on Redis outage (closes #647)#653

items/async: fail fast on Redis outage (closes #647)#653
julietshen wants to merge 1 commit into
mainfrom
fix-redis-offline-queue-data-loss

julietshen commented May 29, 2026

Uh oh!

coderabbitai Bot commented May 29, 2026

Review limit reached

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

julietshen commented May 29, 2026

Summary

Fix

Verification

What changed in observable behaviour

Follow-ups

Test plan

Uh oh!

coderabbitai Bot commented May 29, 2026

Review limit reached

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant