feat(observability): limit wirelogs stream duration#651
Conversation
Wirelogs (Cilium Hubble flows) had no timeout anywhere, so a forgotten tab could hold an SSE connection open indefinitely and degrade the browser. Cap the duration with a server-enforced hard limit plus graduated UI warnings. - Backend /wirelogs/stream enforces a hard cap (default 15m, configurable via openchoreo.observability.wirelogs.streamTimeoutSeconds): it sends a `meta` frame advertising the cap and a `timeout` frame before closing so the UI can label the stop precisely - useWirelogsStream parses the new meta/timeout frames and exposes startedAt, hardTimeoutMs and closedReason - The wirelogs view shows soft-warning dialogs at ~1/3 and ~2/3 of the cap (Keep streaming / Stop) and a toast when the server ends the stream; Start resumes a fresh session - Add backend, hook and page tests; document the option in app-config and the local example; changeset Signed-off-by: Akila-I <akila.99g@gmail.com>
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (4)
🚧 Files skipped from review as they are similar to previous changes (3)
📝 WalkthroughWalkthroughAdds a configurable hard timeout (default 900 s) for the ChangesWirelogs SSE Hard Timeout
Sequence Diagram(s)sequenceDiagram
participant Browser as Browser (SSE client)
participant Page as ObservabilityWirelogsPage
participant Hook as useWirelogsStream
participant Backend as /wirelogs/stream handler
participant Hubble as Hubble upstream
Browser->>Page: clicks Start stream
Page->>Hook: start()
Hook->>Backend: GET /wirelogs/stream
Backend->>Hubble: openStream(abortSignal)
Backend->>Hook: SSE meta {hardTimeoutMs}
Hook->>Hook: set hardTimeoutMs, startedAt
loop streaming data
Hubble-->>Backend: chunk
Backend->>Hook: SSE data frame
Hook->>Page: flows updated
Page->>Page: WirelogsStreamTimeoutDialog checks elapsed
alt elapsed >= hardTimeoutMs/3 or 2/3
Page-->>Browser: show warning dialog
end
end
alt Hard timeout fires on backend
Backend->>Hook: SSE timeout frame {hardTimeoutMs}
Backend->>Hook: end response
Hook->>Hook: closedReason = 'timeout', status = 'closed'
Hook->>Page: closedReason = 'timeout'
Page->>Browser: alertApi.post info toast
else User clicks Stop
Page->>Hook: stop()
Hook->>Hook: closedReason = 'user'
Hook->>Backend: abort request
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |
|
@CodeRabbit review |
✅ Action performedReview finished.
|
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
plugins/openchoreo-backend/src/router.ts (1)
2270-2281: 🎯 Functional Correctness | 🟠 Major | ⚡ Quick winPreserve timeout semantics when abort occurs before upstream open completes.
If
openStream(...)rejects due the hard timeout abort, this branch currently emitsevent: error, so the client records the wrong close reason. It can also try to write to a disconnected response. HandleabortController.signal.abortedseparately, emitevent: timeoutonly whentimedOut, and otherwise just end quietly for client-close aborts.Suggested fix
} catch (err) { clearTimeout(timeoutHandle); res.off('close', onClientClose); + if (abortController.signal.aborted) { + if (timedOut && !res.writableEnded && !res.destroyed) { + try { + res.write( + `event: timeout\ndata: ${JSON.stringify({ + hardTimeoutMs: wirelogsStreamTimeoutMs, + message: `Wirelogs stream stopped after ${Math.round( + wirelogsStreamTimeoutMs / 60000, + )} minutes`, + })}\n\n`, + ); + } catch { + // ignore + } + } + res.end(); + return; + } logger.error( `Failed to open wirelogs upstream: ${(err as Error).message}`, );🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@plugins/openchoreo-backend/src/router.ts` around lines 2270 - 2281, The current error handler in the upstream connection block treats all openStream rejections identically by emitting an error event, but it needs to distinguish between different failure types. Add a check for abortController.signal.aborted to determine if the rejection was due to an abort. When an abort occurs, check the timedOut flag to emit event: timeout only when a hard timeout triggered the abort, and silently end the response for client-close aborts. Only emit the event: error response when the failure is not an abort, preserving proper timeout semantics for the client.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@plugins/openchoreo-backend/src/plugin.ts`:
- Around line 81-88: The wirelogsStreamTimeoutMs variable assignment retrieves
the streamTimeoutSeconds configuration value and converts it to milliseconds,
but does not validate that the value is within acceptable bounds. Add validation
logic after retrieving the configuration value with config.getOptionalNumber for
the 'openchoreo.observability.wirelogs.streamTimeoutSeconds' key to ensure the
value is positive and not excessively large, then clamp or reject invalid values
before applying the multiplication by 1000 to convert to milliseconds. This
ensures the router logic always receives a sane timeout value.
In
`@plugins/openchoreo-observability/src/components/Wirelogs/useWirelogsStream.ts`:
- Around line 118-121: The hardTimeoutMs state variable is not being reset when
a new stream session starts, unlike other session-scoped state variables such as
status, error, closedReason, and startedAt that are cleared in the
initialization block. This can cause stale timeout values from a previous
session to persist. Add a reset call for hardTimeoutMs (such as
setHardTimeoutMs(null)) in the same block where setStatus, setError,
setClosedReason, and setStartedAt are being called to ensure all session state
is properly cleared at the start of each new stream.
---
Outside diff comments:
In `@plugins/openchoreo-backend/src/router.ts`:
- Around line 2270-2281: The current error handler in the upstream connection
block treats all openStream rejections identically by emitting an error event,
but it needs to distinguish between different failure types. Add a check for
abortController.signal.aborted to determine if the rejection was due to an
abort. When an abort occurs, check the timedOut flag to emit event: timeout only
when a hard timeout triggered the abort, and silently end the response for
client-close aborts. Only emit the event: error response when the failure is not
an abort, preserving proper timeout semantics for the client.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 8657de68-d948-4cde-8227-e20c5ca4ca83
📒 Files selected for processing (12)
.changeset/wirelogs-stream-timeout.mdapp-config.local.yaml.exampleapp-config.yamlplugins/openchoreo-backend/src/plugin.tsplugins/openchoreo-backend/src/router.test.tsplugins/openchoreo-backend/src/router.tsplugins/openchoreo-observability/src/components/Wirelogs/ObservabilityWirelogsPage.test.tsxplugins/openchoreo-observability/src/components/Wirelogs/ObservabilityWirelogsPage.tsxplugins/openchoreo-observability/src/components/Wirelogs/WirelogsStreamTimeoutDialog.tsxplugins/openchoreo-observability/src/components/Wirelogs/types.tsplugins/openchoreo-observability/src/components/Wirelogs/useWirelogsStream.test.tsplugins/openchoreo-observability/src/components/Wirelogs/useWirelogsStream.ts
Addresses review feedback on the wirelogs stream timeout. - Validate/clamp openchoreo.observability.wirelogs.streamTimeoutSeconds: reject non-positive values (would abort streams instantly) and cap at setTimeout's max delay before converting to ms - Reset hardTimeoutMs at the start of each stream session so the soft-timeout dialog can't reuse a stale cap from a previous stream - Handle an abort that fires before the upstream opens: emit a `timeout` frame for the hard cap (or end quietly on client disconnect) instead of a misleading "Failed to open" error frame - Add a router test for the pre-open hard-timeout path Refs openchoreo#651 Signed-off-by: Akila-I <akila.99g@gmail.com>
| # the timeout flow quickly during local testing. | ||
| # observability: | ||
| # wirelogs: | ||
| # streamTimeoutSeconds: 900 |
There was a problem hiding this comment.
Shall parametrize this config so that users can edit the values.yaml and configure a custom timeout value during deployment time.
Also update the app-config.production.yaml as well.
Purpose
Wirelogs (Cilium Hubble flows) stream over SSE and had no timeout anywhere — the openchoreo-api wirelogs handler disables the server write deadline, and the gateway TrafficPolicy for the route is
request: 0s/streamIdle: 24h. A wirelogs tab left open therefore holds an SSE connection indefinitely, consuming CPU/network/an open connection and eventually degrading the browser. This PR bounds a stream's lifetime with a server-enforced hard cap plus graduated UI warnings, so a forgotten stream is cleaned up and the user is told why.Goals
Approach
plugins/openchoreo-backend)./wirelogs/streamcomposes the client-disconnect abort with a hard timeout (default 15 min, configurable viaopenchoreo.observability.wirelogs.streamTimeoutSeconds). It opens the stream with ametaSSE frame advertisinghardTimeoutMs, and on hitting the cap sends atimeoutSSE frame before closing — so the cause is unambiguous to the client. Client-disconnect detection moved tores.on('close').plugins/openchoreo-observability).useWirelogsStreamparses the newmeta/timeoutframes and exposesstartedAt,hardTimeoutMsandclosedReason. A newWirelogsStreamTimeoutDialogshows confirmation dialogs at ~1/3 and ~2/3 of the cap (= ~5 and ~10 min at the default, auto-scaling for testing) with Keep streaming / Stop, and the page posts a toast when the server ends the stream. The existing Start stream button resumes a fresh session.request: 0s/streamIdle: 24h), so the backend proxy is the right and sufficient enforcement point for the Backstage path.Verified end-to-end against a live cluster to confirm the warning dialogs, the auto-stop, the toast, and the backend log line.
Screenshots:

User stories
Release note
Wirelogs streams are now bounded by a configurable server-side hard timeout (default 15 minutes) with graduated in-UI warnings and a clear notice when the server ends a stream, preventing forgotten streams from holding connections open indefinitely.
Documentation
Documented the new
openchoreo.observability.wirelogs.streamTimeoutSecondsoption inapp-config.yamlandapp-config.local.yaml.example. No external docs changes required.Training
N/A
Certification
N/A
🤖 Generated with Claude Code
Summary by CodeRabbit
Release Notes
New Features
Bug Fixes
Documentation
openchoreo.observability.wirelogs.streamTimeoutSecondsto the example configuration.Tests