Skip to content

feat(observability): limit wirelogs stream duration#651

Open
akila-i wants to merge 2 commits into
openchoreo:mainfrom
akila-i:wirelogs-stream-timeout
Open

feat(observability): limit wirelogs stream duration#651
akila-i wants to merge 2 commits into
openchoreo:mainfrom
akila-i:wirelogs-stream-timeout

Conversation

@akila-i

@akila-i akila-i commented Jun 23, 2026

Copy link
Copy Markdown
Contributor

Purpose

Wirelogs (Cilium Hubble flows) stream over SSE and had no timeout anywhere — the openchoreo-api wirelogs handler disables the server write deadline, and the gateway TrafficPolicy for the route is request: 0s / streamIdle: 24h. A wirelogs tab left open therefore holds an SSE connection indefinitely, consuming CPU/network/an open connection and eventually degrading the browser. This PR bounds a stream's lifetime with a server-enforced hard cap plus graduated UI warnings, so a forgotten stream is cleaned up and the user is told why.

Goals

  • Stop a single wirelogs stream from running unbounded.
  • Enforce the limit server-side (reliable even if the tab is backgrounded), and make it configurable.
  • Warn the user before the cap so they can stop early or knowingly continue, and clearly explain when the server ends the stream.

Approach

  • Backend hard cap (plugins/openchoreo-backend). /wirelogs/stream composes the client-disconnect abort with a hard timeout (default 15 min, configurable via openchoreo.observability.wirelogs.streamTimeoutSeconds). It opens the stream with a meta SSE frame advertising hardTimeoutMs, and on hitting the cap sends a timeout SSE frame before closing — so the cause is unambiguous to the client. Client-disconnect detection moved to res.on('close').
  • Frontend (plugins/openchoreo-observability). useWirelogsStream parses the new meta/timeout frames and exposes startedAt, hardTimeoutMs and closedReason. A new WirelogsStreamTimeoutDialog shows confirmation dialogs at ~1/3 and ~2/3 of the cap (= ~5 and ~10 min at the default, auto-scaling for testing) with Keep streaming / Stop, and the page posts a toast when the server ends the stream. The existing Start stream button resumes a fresh session.
  • No gateway change needed in this repo. The OpenChoreo gateway already exempts the wirelogs route from request timeouts (request: 0s / streamIdle: 24h), so the backend proxy is the right and sufficient enforcement point for the Backstage path.

Verified end-to-end against a live cluster to confirm the warning dialogs, the auto-stop, the toast, and the backend log line.

Screenshots:
Screenshot 2026-06-23 at 11 32 42

Screenshot 2026-06-23 at 11 53 07 Screenshot 2026-06-23 at 11 42 10 Screenshot 2026-06-23 at 11 42 39

User stories

  • As a developer debugging traffic, I'm warned that my wirelogs stream has been running a while and can stop it or continue — and if I walk away, it's cleaned up automatically instead of running forever.
  • As a platform operator, I can tune (or effectively disable) the cap per environment via a single config key.

Release note

Wirelogs streams are now bounded by a configurable server-side hard timeout (default 15 minutes) with graduated in-UI warnings and a clear notice when the server ends a stream, preventing forgotten streams from holding connections open indefinitely.

Documentation

Documented the new openchoreo.observability.wirelogs.streamTimeoutSeconds option in app-config.yaml and app-config.local.yaml.example. No external docs changes required.

Training

N/A

Certification

N/A


🤖 Generated with Claude Code

Summary by CodeRabbit

Release Notes

New Features

  • Wirelogs stream now enforces a configurable hard timeout (default: 15 minutes) and communicates the limit to the UI via SSE.
  • Added graduated timeout warning dialogs at 1/3 and 2/3 of the timeout to prompt users to stop early or continue.

Bug Fixes

  • When the server ends the stream due to the hard cap, the UI now shows an informational toast and resumes with a fresh session.

Documentation

  • Added openchoreo.observability.wirelogs.streamTimeoutSeconds to the example configuration.

Tests

  • Expanded backend and UI tests to cover hard-timeout behavior and closure reasons.

Wirelogs (Cilium Hubble flows) had no timeout anywhere, so a forgotten
tab could hold an SSE connection open indefinitely and degrade the
browser. Cap the duration with a server-enforced hard limit plus
graduated UI warnings.

- Backend /wirelogs/stream enforces a hard cap (default 15m,
  configurable via openchoreo.observability.wirelogs.streamTimeoutSeconds):
  it sends a `meta` frame advertising the cap and a `timeout` frame
  before closing so the UI can label the stop precisely
- useWirelogsStream parses the new meta/timeout frames and exposes
  startedAt, hardTimeoutMs and closedReason
- The wirelogs view shows soft-warning dialogs at ~1/3 and ~2/3 of the
  cap (Keep streaming / Stop) and a toast when the server ends the
  stream; Start resumes a fresh session
- Add backend, hook and page tests; document the option in app-config
  and the local example; changeset

Signed-off-by: Akila-I <akila.99g@gmail.com>
@coderabbitai

coderabbitai Bot commented Jun 23, 2026

Copy link
Copy Markdown

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 7b7fef2f-c0c6-4e27-b48d-c728a755bb2c

📥 Commits

Reviewing files that changed from the base of the PR and between e5d27a1 and a3dda00.

📒 Files selected for processing (4)
  • plugins/openchoreo-backend/src/plugin.ts
  • plugins/openchoreo-backend/src/router.test.ts
  • plugins/openchoreo-backend/src/router.ts
  • plugins/openchoreo-observability/src/components/Wirelogs/useWirelogsStream.ts
🚧 Files skipped from review as they are similar to previous changes (3)
  • plugins/openchoreo-backend/src/plugin.ts
  • plugins/openchoreo-backend/src/router.ts
  • plugins/openchoreo-observability/src/components/Wirelogs/useWirelogsStream.ts

📝 Walkthrough

Walkthrough

Adds a configurable hard timeout (default 900 s) for the /wirelogs/stream SSE backend proxy. The backend enforces the cap via AbortController, emits an initial meta SSE frame advertising the limit, and sends a timeout SSE frame before closing. The frontend hook tracks closedReason, hardTimeoutMs, and startedAt; a new WirelogsStreamTimeoutDialog shows graduated warnings at ⅓ and ⅔ of the cap; the page posts a toast on server-triggered close.

Changes

Wirelogs SSE Hard Timeout

Layer / File(s) Summary
Shared type contract and app configuration
plugins/openchoreo-observability/src/components/Wirelogs/types.ts, app-config.yaml, app-config.local.yaml.example, .changeset/wirelogs-stream-timeout.md
Exports WirelogStreamClosedReason union type; adds commented wirelogs.streamTimeoutSeconds config key in both config files; records patch changeset.
Backend config reading and router parameter wiring
plugins/openchoreo-backend/src/plugin.ts, plugins/openchoreo-backend/src/router.ts
Reads openchoreo.observability.wirelogs.streamTimeoutSeconds (default 900 s), converts to ms, and passes wirelogsStreamTimeoutMs into createRouter; adds the parameter to createRouter's type signature.
Backend SSE handler: AbortController, meta frame, timeout frame
plugins/openchoreo-backend/src/router.ts, plugins/openchoreo-backend/src/router.test.ts
Rewrites /wirelogs/stream to use a single AbortController for client-close and hard-timeout, writes an initial meta SSE frame, unblocks backpressure waits on abort, writes a timeout SSE frame when the timer fires, and clears timers/listeners on all error paths. Tests assert the meta frame and a new timeout-abort test scenario.
Frontend hook: frame parsing and lifecycle state
plugins/openchoreo-observability/src/components/Wirelogs/useWirelogsStream.ts, plugins/openchoreo-observability/src/components/Wirelogs/useWirelogsStream.test.ts
Adds startedAt, hardTimeoutMs, and closedReason to UseWirelogsStreamResult; introduces markClosedReason; parses new meta/timeout SSE frames; records terminal reasons (user, ended, error, timeout). Tests cover each new frame type and reason.
WirelogsStreamTimeoutDialog component
plugins/openchoreo-observability/src/components/Wirelogs/WirelogsStreamTimeoutDialog.tsx
New component with WirelogsStreamTimeoutDialogProps, humanizeDuration helper, per-second clock, two graduated warning stages at ⅓ and ⅔ of hardTimeoutMs, per-session dismissal state, and a MUI dialog with "Stop stream" / "Keep streaming" actions.
Page integration: toast, dialog rendering, and page tests
plugins/openchoreo-observability/src/components/Wirelogs/ObservabilityWirelogsPage.tsx, plugins/openchoreo-observability/src/components/Wirelogs/ObservabilityWirelogsPage.test.tsx
Wires alertApi, adds a useEffect to post an info toast when closedReason === 'timeout', renders WirelogsStreamTimeoutDialog; tests assert toast is posted on timeout close and not on user stop.

Sequence Diagram(s)

sequenceDiagram
  participant Browser as Browser (SSE client)
  participant Page as ObservabilityWirelogsPage
  participant Hook as useWirelogsStream
  participant Backend as /wirelogs/stream handler
  participant Hubble as Hubble upstream

  Browser->>Page: clicks Start stream
  Page->>Hook: start()
  Hook->>Backend: GET /wirelogs/stream
  Backend->>Hubble: openStream(abortSignal)
  Backend->>Hook: SSE meta {hardTimeoutMs}
  Hook->>Hook: set hardTimeoutMs, startedAt
  loop streaming data
    Hubble-->>Backend: chunk
    Backend->>Hook: SSE data frame
    Hook->>Page: flows updated
    Page->>Page: WirelogsStreamTimeoutDialog checks elapsed
    alt elapsed >= hardTimeoutMs/3 or 2/3
      Page-->>Browser: show warning dialog
    end
  end
  alt Hard timeout fires on backend
    Backend->>Hook: SSE timeout frame {hardTimeoutMs}
    Backend->>Hook: end response
    Hook->>Hook: closedReason = 'timeout', status = 'closed'
    Hook->>Page: closedReason = 'timeout'
    Page->>Browser: alertApi.post info toast
  else User clicks Stop
    Page->>Hook: stop()
    Hook->>Hook: closedReason = 'user'
    Hook->>Backend: abort request
  end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

  • openchoreo/backstage-plugins#604: Introduced the /wirelogs/stream SSE endpoint and useWirelogsStream hook that this PR extends with hard-timeout enforcement, meta/timeout frames, and the timeout dialog.

Suggested reviewers

  • sameerajayasoma
  • stefinie123
  • Mirage20
  • kaviththiranga

Poem

🐇 Hoppity-hop, the stream won't run forever,
A timeout cap keeps the wirelogs clever.
meta says "fifteen minutes, that's your lot!"
A timeout frame when the clock has stopped.
The dialog warns at one-third, then two-thirds too —
Start fresh, little rabbit, the stream resets for you! 🎉

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title 'feat(observability): limit wirelogs stream duration' clearly and specifically summarizes the main change—adding a configurable timeout limit to wirelogs streams.
Description check ✅ Passed The pull request description comprehensively covers all major required sections including Purpose, Goals, Approach, User stories, Release note, and Documentation, with clear explanations and evidence of end-to-end verification.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@codecov

codecov Bot commented Jun 23, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 51.85185% with 52 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
...omponents/Wirelogs/WirelogsStreamTimeoutDialog.tsx 0.00% 37 Missing and 3 partials ⚠️
plugins/openchoreo-backend/src/plugin.ts 0.00% 6 Missing ⚠️
plugins/openchoreo-backend/src/router.ts 76.00% 6 Missing ⚠️

📢 Thoughts on this report? Let us know!

@akila-i

akila-i commented Jun 23, 2026

Copy link
Copy Markdown
Contributor Author

@CodeRabbit review

@coderabbitai

coderabbitai Bot commented Jun 23, 2026

Copy link
Copy Markdown
✅ Action performed

Review finished.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
plugins/openchoreo-backend/src/router.ts (1)

2270-2281: 🎯 Functional Correctness | 🟠 Major | ⚡ Quick win

Preserve timeout semantics when abort occurs before upstream open completes.

If openStream(...) rejects due the hard timeout abort, this branch currently emits event: error, so the client records the wrong close reason. It can also try to write to a disconnected response. Handle abortController.signal.aborted separately, emit event: timeout only when timedOut, and otherwise just end quietly for client-close aborts.

Suggested fix
     } catch (err) {
       clearTimeout(timeoutHandle);
       res.off('close', onClientClose);
+      if (abortController.signal.aborted) {
+        if (timedOut && !res.writableEnded && !res.destroyed) {
+          try {
+            res.write(
+              `event: timeout\ndata: ${JSON.stringify({
+                hardTimeoutMs: wirelogsStreamTimeoutMs,
+                message: `Wirelogs stream stopped after ${Math.round(
+                  wirelogsStreamTimeoutMs / 60000,
+                )} minutes`,
+              })}\n\n`,
+            );
+          } catch {
+            // ignore
+          }
+        }
+        res.end();
+        return;
+      }
       logger.error(
         `Failed to open wirelogs upstream: ${(err as Error).message}`,
       );
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@plugins/openchoreo-backend/src/router.ts` around lines 2270 - 2281, The
current error handler in the upstream connection block treats all openStream
rejections identically by emitting an error event, but it needs to distinguish
between different failure types. Add a check for abortController.signal.aborted
to determine if the rejection was due to an abort. When an abort occurs, check
the timedOut flag to emit event: timeout only when a hard timeout triggered the
abort, and silently end the response for client-close aborts. Only emit the
event: error response when the failure is not an abort, preserving proper
timeout semantics for the client.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@plugins/openchoreo-backend/src/plugin.ts`:
- Around line 81-88: The wirelogsStreamTimeoutMs variable assignment retrieves
the streamTimeoutSeconds configuration value and converts it to milliseconds,
but does not validate that the value is within acceptable bounds. Add validation
logic after retrieving the configuration value with config.getOptionalNumber for
the 'openchoreo.observability.wirelogs.streamTimeoutSeconds' key to ensure the
value is positive and not excessively large, then clamp or reject invalid values
before applying the multiplication by 1000 to convert to milliseconds. This
ensures the router logic always receives a sane timeout value.

In
`@plugins/openchoreo-observability/src/components/Wirelogs/useWirelogsStream.ts`:
- Around line 118-121: The hardTimeoutMs state variable is not being reset when
a new stream session starts, unlike other session-scoped state variables such as
status, error, closedReason, and startedAt that are cleared in the
initialization block. This can cause stale timeout values from a previous
session to persist. Add a reset call for hardTimeoutMs (such as
setHardTimeoutMs(null)) in the same block where setStatus, setError,
setClosedReason, and setStartedAt are being called to ensure all session state
is properly cleared at the start of each new stream.

---

Outside diff comments:
In `@plugins/openchoreo-backend/src/router.ts`:
- Around line 2270-2281: The current error handler in the upstream connection
block treats all openStream rejections identically by emitting an error event,
but it needs to distinguish between different failure types. Add a check for
abortController.signal.aborted to determine if the rejection was due to an
abort. When an abort occurs, check the timedOut flag to emit event: timeout only
when a hard timeout triggered the abort, and silently end the response for
client-close aborts. Only emit the event: error response when the failure is not
an abort, preserving proper timeout semantics for the client.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 8657de68-d948-4cde-8227-e20c5ca4ca83

📥 Commits

Reviewing files that changed from the base of the PR and between 2f45e83 and e5d27a1.

📒 Files selected for processing (12)
  • .changeset/wirelogs-stream-timeout.md
  • app-config.local.yaml.example
  • app-config.yaml
  • plugins/openchoreo-backend/src/plugin.ts
  • plugins/openchoreo-backend/src/router.test.ts
  • plugins/openchoreo-backend/src/router.ts
  • plugins/openchoreo-observability/src/components/Wirelogs/ObservabilityWirelogsPage.test.tsx
  • plugins/openchoreo-observability/src/components/Wirelogs/ObservabilityWirelogsPage.tsx
  • plugins/openchoreo-observability/src/components/Wirelogs/WirelogsStreamTimeoutDialog.tsx
  • plugins/openchoreo-observability/src/components/Wirelogs/types.ts
  • plugins/openchoreo-observability/src/components/Wirelogs/useWirelogsStream.test.ts
  • plugins/openchoreo-observability/src/components/Wirelogs/useWirelogsStream.ts

Comment thread plugins/openchoreo-backend/src/plugin.ts
Addresses review feedback on the wirelogs stream timeout.

- Validate/clamp openchoreo.observability.wirelogs.streamTimeoutSeconds:
  reject non-positive values (would abort streams instantly) and cap at
  setTimeout's max delay before converting to ms
- Reset hardTimeoutMs at the start of each stream session so the
  soft-timeout dialog can't reuse a stale cap from a previous stream
- Handle an abort that fires before the upstream opens: emit a `timeout`
  frame for the hard cap (or end quietly on client disconnect) instead
  of a misleading "Failed to open" error frame
- Add a router test for the pre-open hard-timeout path

Refs openchoreo#651

Signed-off-by: Akila-I <akila.99g@gmail.com>
@akila-i akila-i marked this pull request as ready for review June 23, 2026 08:35
Comment thread app-config.yaml
# the timeout flow quickly during local testing.
# observability:
# wirelogs:
# streamTimeoutSeconds: 900

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall parametrize this config so that users can edit the values.yaml and configure a custom timeout value during deployment time.
Also update the app-config.production.yaml as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants