Skip to content

[superlog] Downgrade agent stream cleanup Redis timeouts from ERROR to WARN#499

Open
superlog-app[bot] wants to merge 7 commits into
stagingfrom
superlog/warn-agent-stream-cleanup-errors
Open

[superlog] Downgrade agent stream cleanup Redis timeouts from ERROR to WARN#499
superlog-app[bot] wants to merge 7 commits into
stagingfrom
superlog/warn-agent-stream-cleanup-errors

Conversation

@superlog-app

@superlog-app superlog-app Bot commented Jun 27, 2026

Copy link
Copy Markdown

Summary

Transient Redis timeouts during agent chat stream cleanup (specifically clearActiveStream, markStreamDone, and appendStreamChunk) were being logged at ERROR level, generating false-positive incidents. The primary AI response is already sent to the client before these background operations run — they only affect the Redis stream buffer used for client reconnects.

When a Redis command times out in the onFinish callback or in the background storage-writer task, captureError has no active request logger context and falls through to log.error() unconditionally. This is the correct path for errors that affect the user-visible response, but these three catch blocks are pure cleanup/buffering side-effects.

Replaces the three captureError calls in post-response cleanup paths with log.warn() (already imported). The stream replay/reconnect path remains the only potentially degraded feature (stale active-stream key for up to 1 hour on timeout), but the chat response itself is unaffected. An alternative approach would be to add a severity option to captureError so callers can choose warn vs error per call-site — that may be worth doing as a follow-up if more similar patterns appear in other routes.

Incident on Superlog


Was this PR helpful? Leave feedback — goes straight to the Superlog team.


Summary by cubic

Downgraded logging of transient Redis timeouts during agent chat stream cleanup from ERROR to WARN to prevent false-positive incidents. Chat responses are unaffected; only the reconnect buffer may be briefly stale.

  • Bug Fixes
    • Replaced captureError(...) with log.warn(...) in post-response cleanup and background storage-writer paths (clearActiveStream, markStreamDone).
    • Preserved structured fields (service, error_message, agent_stream_cleanup_error/agent_stream_persist_error, agent_chat_id) for observability.
    • Eliminates noisy ERRORs triggered without a request logger context in onFinish/background tasks.

Written for commit 4b9ef10. Summary will update on new commits.

Review in cubic

@vercel

vercel Bot commented Jun 27, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
databuddy-status Ready Ready Preview, Comment Jun 27, 2026 5:52am
2 Skipped Deployments
Project Deployment Actions Updated (UTC)
dashboard Skipped Skipped Jun 27, 2026 5:52am
documentation Skipped Skipped Jun 27, 2026 5:52am

@unkey-deploy

unkey-deploy Bot commented Jun 27, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Unkey Deploy

Name Status Preview Inspect Updated (UTC)
api (preview) Ready Visit Preview Inspect Jun 27, 2026 5:52am

@greptile-apps

greptile-apps Bot commented Jun 27, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

Downgrades three captureError calls in post-response Redis cleanup paths (clearActiveStream, markStreamDone, and the storage-writer IIFE catch) to log.warn, eliminating false-positive error alerts for transient timeouts that don't affect the already-delivered chat response.

  • The old captureError calls were falling through to log.error() at module scope because no active request logger exists in the onFinish callback or in the background storage-writer task — the new log.warn calls carry the same fields (service, error_message, context keys) without changing what is captured.
  • Database persistence errors (agent_persist_error) are correctly left at ERROR level via captureError, so genuinely user-visible failures still alert.

Confidence Score: 5/5

Safe to merge — the change only touches post-response background cleanup paths and leaves all user-visible error handling intact.

The replaced captureError calls were already falling through to the no-request-logger branch, which logs err.message via log.error. The new log.warn calls carry the same fields, so no diagnostic information is dropped. DB persistence errors and all user-facing error paths are correctly left untouched. The one nuance is that the storage-writer IIFE's outer .catch now suppresses stream-reader errors at WARN in addition to Redis timeouts, but this is a low-probability path.

No files require special attention — the single changed file has well-understood boundaries between user-visible and background operations.

Important Files Changed

Filename Overview
apps/api/src/routes/agent.ts Three captureError calls in background Redis cleanup paths replaced with log.warn; DB persistence error handling and all user-visible error paths are unchanged.

Sequence Diagram

%%{init: {'theme': 'neutral'}}%%
sequenceDiagram
    participant Client
    participant AgentRoute as Agent Route
    participant Redis
    participant DB

    AgentRoute->>Client: stream forClient (response sent)
    
    par Background storage writer
        AgentRoute->>Redis: appendStreamChunk(streamKey, chunk)
        note over AgentRoute,Redis: Redis timeout → log.warn (was captureError/log.error)
        AgentRoute->>Redis: markStreamDone(streamKey)
        note over AgentRoute,Redis: Redis timeout → log.warn (was captureError/log.error)
    and onFinish callback
        AgentRoute->>Redis: clearActiveStream(streamScope, chatId, streamId)
        note over AgentRoute,Redis: Redis timeout → log.warn (was captureError/log.error)
        AgentRoute->>DB: upsert agentChats (messages, title)
        note over AgentRoute,DB: DB error → captureError/log.error (unchanged)
    end
Loading
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
sequenceDiagram
    participant Client
    participant AgentRoute as Agent Route
    participant Redis
    participant DB

    AgentRoute->>Client: stream forClient (response sent)
    
    par Background storage writer
        AgentRoute->>Redis: appendStreamChunk(streamKey, chunk)
        note over AgentRoute,Redis: Redis timeout → log.warn (was captureError/log.error)
        AgentRoute->>Redis: markStreamDone(streamKey)
        note over AgentRoute,Redis: Redis timeout → log.warn (was captureError/log.error)
    and onFinish callback
        AgentRoute->>Redis: clearActiveStream(streamScope, chatId, streamId)
        note over AgentRoute,Redis: Redis timeout → log.warn (was captureError/log.error)
        AgentRoute->>DB: upsert agentChats (messages, title)
        note over AgentRoute,DB: DB error → captureError/log.error (unchanged)
    end
Loading

Comments Outside Diff (1)

  1. apps/api/src/routes/agent.ts, line 1134-1147 (link)

    P2 IIFE .catch silences non-Redis stream errors at WARN

    The .catch((storageError) => …) block catches everything that escapes the storage-writer IIFE, not just Redis operations. This includes reader.read() failures on the teed ReadableStream, which are unusual and typically signal a programming error rather than a transient Redis timeout. If the ReadableStream itself errors (e.g. due to a bug in createAgentUsageInjector), the symptom would now only appear as a WARN with no stack, making it hard to distinguish from the expected Redis timeout noise the PR is trying to suppress.

Reviews (1): Last reviewed commit: "[superlog] Downgrade agent stream cleanu..." | Re-trigger Greptile

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant