Skip to content

fix(websocket): add server-side heartbeat to prevent proxy idle disconnects#770

Open
cvrysanek wants to merge 1 commit into
siteboon:mainfrom
cvrysanek:fix/websocket-heartbeat
Open

fix(websocket): add server-side heartbeat to prevent proxy idle disconnects#770
cvrysanek wants to merge 1 commit into
siteboon:mainfrom
cvrysanek:fix/websocket-heartbeat

Conversation

@cvrysanek
Copy link
Copy Markdown

@cvrysanek cvrysanek commented May 18, 2026

Summary

Why

The ws library does not heartbeat by default — it's an opt-in pattern documented in the ws README. Without app-level pings, an idle WebSocket has no protocol traffic and is reaped by the first reverse proxy in the path. CloudCLI users behind Cloudflare Tunnel see this constantly: /shell, /ws, and /plugin-ws/* all flap. Issue #769 has the full server-log trace from a live instance.

What changed

server/modules/websocket/services/websocket-server.service.ts — one block added inside the existing wss.on('connection', ...) handler, before the route dispatch:

const HEARTBEAT_INTERVAL_MS = 30_000;
const heartbeat = setInterval(() => {
  if (ws.readyState === ws.OPEN) {
    try { ws.ping(); } catch { /* socket may have closed concurrently */ }
  }
}, HEARTBEAT_INTERVAL_MS);
const stopHeartbeat = () => clearInterval(heartbeat);
ws.on('close', stopHeartbeat);
ws.on('error', stopHeartbeat);

Placed at the gateway so every route (/shell, /ws, /plugin-ws/*) is covered without touching individual handlers.

Why 30s

  • Comfortably under the strictest common idle timeout in the wild (AWS ALB 60s).
  • 1 frame every 30s × small client count is negligible bandwidth/CPU.
  • Matches the de-facto default used by Socket.IO, ASP.NET WebSockets, and the ws README example.

Risk / compatibility

  • Pure additive change. No behavior change for any handler.
  • Browsers automatically pong on ping per RFC 6455 — no client-side change needed.
  • Existing close / error semantics preserved; heartbeat cleanup is purely defensive.

What was validated locally

The equivalent patch was hot-applied to the compiled dist-server JS on a running CloudCLI 1.32.0 instance behind Cloudflare Tunnel (the same instance whose logs are in #769):

  • Service restart succeeded; cloudcli.service reports active (running); :3001 listening.
  • Patch sentinel present in the loaded websocket-server.service.js; web-terminal plugin came up cleanly.

I have not measured idle-duration behavior over a long window yet — would appreciate maintainer review or CI runs to confirm reconnect frequency drops on a known-bad proxy environment before merge.

Test plan

  • Existing test suite passes in CI.
  • Manual: deploy behind Cloudflare Tunnel (or any proxy with WS idle timeout ≤ 100s), open shell + chat, leave idle ≥ 5 minutes, confirm no reconnect entries in server log.
  • Manual: open multiple concurrent sessions, kill them, confirm no leaked intervals (process memory stable).

Summary by CodeRabbit

  • Bug Fixes
    • Enhanced WebSocket connection stability by implementing automatic connection maintenance to prevent unexpected disconnections.

Review Change Stack

…isconnects

The WebSocket gateway never sent ping frames, so any reverse proxy with
an idle timeout (Cloudflare Tunnel ~100s, AWS ALB 60s, nginx 60s, etc.)
would silently tear down /shell, /ws and /plugin-ws/* connections after
the idle window. The UI reconnects automatically but users see a
"Connecting to shell" toast every 1–3 minutes during normal use and any
in-flight PTY/chat traffic can race the reconnect.

Schedule a 30s ws.ping() per connection at the gateway level, cleared on
close/error. ping/pong counts as protocol activity for all proxies that
implement WebSocket correctly, so this single change covers every
deployment topology without per-proxy tuning.

Fixes siteboon#769
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 18, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 62b1d22a-8ddc-4b47-9268-0b080f73e184

📥 Commits

Reviewing files that changed from the base of the PR and between 10f721c and 3084fc9.

📒 Files selected for processing (1)
  • server/modules/websocket/services/websocket-server.service.ts

📝 Walkthrough

Walkthrough

The PR adds an application-level WebSocket heartbeat to the gateway connection handler. Each new connection now starts a 30-second ping interval that keeps the socket alive across reverse-proxy idle timeouts, clears the interval on close/error, and uses try/catch to tolerate concurrent socket closure failures.

Changes

WebSocket Heartbeat

Layer / File(s) Summary
Connection-level heartbeat with 30s ping interval
server/modules/websocket/services/websocket-server.service.ts
On each WebSocket connection, a 30-second interval is created that pings the socket when open, safely ignores ping failures, and clears the interval when the connection closes or errors.

Suggested reviewers

  • blackmammoth

Poem

🐰 A ping in the night, every thirty seconds true,
Keeps CloudCLI's threads from the proxy's cruel timeout brew,
No more reconnects when the silence grows long,
Your WebSocket heartbeat beats steady and strong. 💙

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and specifically describes the main change: adding a server-side WebSocket heartbeat to prevent proxy idle disconnects, which is exactly what the code implements.
Linked Issues check ✅ Passed The PR implements all coding objectives from issue #769: adds a 30s ws.ping() interval per connection at the gateway level, clears it on close/error, and applies to all WebSocket routes without modification.
Out of Scope Changes check ✅ Passed All changes are narrowly scoped to the websocket-server.service.ts file adding the heartbeat mechanism, directly addressing the linked issue with no extraneous modifications.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

WebSocket connections drop every ~100s behind Cloudflare/proxies — server sends no ping frames

1 participant