[superlog] Downgrade agent stream cleanup Redis timeouts from ERROR to WARN#499
[superlog] Downgrade agent stream cleanup Redis timeouts from ERROR to WARN#499superlog-app[bot] wants to merge 7 commits into
Conversation
…-markdown-evals [codex] Improve Slack insight digest markdown
|
The latest updates on your projects. Learn more about Vercel for GitHub.
2 Skipped Deployments
|
|
The latest updates on your projects. Learn more about Unkey Deploy
|
Greptile SummaryDowngrades three
Confidence Score: 5/5Safe to merge — the change only touches post-response background cleanup paths and leaves all user-visible error handling intact. The replaced captureError calls were already falling through to the no-request-logger branch, which logs err.message via log.error. The new log.warn calls carry the same fields, so no diagnostic information is dropped. DB persistence errors and all user-facing error paths are correctly left untouched. The one nuance is that the storage-writer IIFE's outer .catch now suppresses stream-reader errors at WARN in addition to Redis timeouts, but this is a low-probability path. No files require special attention — the single changed file has well-understood boundaries between user-visible and background operations. Important Files Changed
Sequence Diagram%%{init: {'theme': 'neutral'}}%%
sequenceDiagram
participant Client
participant AgentRoute as Agent Route
participant Redis
participant DB
AgentRoute->>Client: stream forClient (response sent)
par Background storage writer
AgentRoute->>Redis: appendStreamChunk(streamKey, chunk)
note over AgentRoute,Redis: Redis timeout → log.warn (was captureError/log.error)
AgentRoute->>Redis: markStreamDone(streamKey)
note over AgentRoute,Redis: Redis timeout → log.warn (was captureError/log.error)
and onFinish callback
AgentRoute->>Redis: clearActiveStream(streamScope, chatId, streamId)
note over AgentRoute,Redis: Redis timeout → log.warn (was captureError/log.error)
AgentRoute->>DB: upsert agentChats (messages, title)
note over AgentRoute,DB: DB error → captureError/log.error (unchanged)
end
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
sequenceDiagram
participant Client
participant AgentRoute as Agent Route
participant Redis
participant DB
AgentRoute->>Client: stream forClient (response sent)
par Background storage writer
AgentRoute->>Redis: appendStreamChunk(streamKey, chunk)
note over AgentRoute,Redis: Redis timeout → log.warn (was captureError/log.error)
AgentRoute->>Redis: markStreamDone(streamKey)
note over AgentRoute,Redis: Redis timeout → log.warn (was captureError/log.error)
and onFinish callback
AgentRoute->>Redis: clearActiveStream(streamScope, chatId, streamId)
note over AgentRoute,Redis: Redis timeout → log.warn (was captureError/log.error)
AgentRoute->>DB: upsert agentChats (messages, title)
note over AgentRoute,DB: DB error → captureError/log.error (unchanged)
end
|
Summary
Transient Redis timeouts during agent chat stream cleanup (specifically
clearActiveStream,markStreamDone, andappendStreamChunk) were being logged at ERROR level, generating false-positive incidents. The primary AI response is already sent to the client before these background operations run — they only affect the Redis stream buffer used for client reconnects.When a Redis command times out in the
onFinishcallback or in the background storage-writer task,captureErrorhas no active request logger context and falls through tolog.error()unconditionally. This is the correct path for errors that affect the user-visible response, but these three catch blocks are pure cleanup/buffering side-effects.Replaces the three
captureErrorcalls in post-response cleanup paths withlog.warn()(already imported). The stream replay/reconnect path remains the only potentially degraded feature (stale active-stream key for up to 1 hour on timeout), but the chat response itself is unaffected. An alternative approach would be to add aseverityoption tocaptureErrorso callers can choose warn vs error per call-site — that may be worth doing as a follow-up if more similar patterns appear in other routes.Incident on Superlog
Was this PR helpful? Leave feedback — goes straight to the Superlog team.
Summary by cubic
Downgraded logging of transient Redis timeouts during agent chat stream cleanup from ERROR to WARN to prevent false-positive incidents. Chat responses are unaffected; only the reconnect buffer may be briefly stale.
captureError(...)withlog.warn(...)in post-response cleanup and background storage-writer paths (clearActiveStream,markStreamDone).service,error_message,agent_stream_cleanup_error/agent_stream_persist_error,agent_chat_id) for observability.onFinish/background tasks.Written for commit 4b9ef10. Summary will update on new commits.