Skip to content

security_infra: optional WebSocket listener for browser-based devices#41

Open
atsyplikhin wants to merge 3 commits into
arm:feat/registry-paginationfrom
atsyplikhin:feat/nats-websocket-listener
Open

security_infra: optional WebSocket listener for browser-based devices#41
atsyplikhin wants to merge 3 commits into
arm:feat/registry-paginationfrom
atsyplikhin:feat/nats-websocket-listener

Conversation

@atsyplikhin
Copy link
Copy Markdown
Collaborator

Summary

Adds an opt-in --enable-websocket flag to setup_deployment.sh so deployments can serve browser-based NATS clients alongside the existing TCP listener. Off by default; existing deployments are untouched.

  • security_infra/setup_deployment.sh — new flags: --enable-websocket, --websocket-port, --websocket-allowed-origins, --websocket-tls-cert, --websocket-tls-key. The TLS pair is both-or-neither.
  • infra/docker-compose-nats-websocket.yml — compose override that exposes the WS port. Bound to 127.0.0.1:8443 by default; bind interface overridable via DC_NATS_WS_BIND.
  • security_infra/README.md — new "Browser-based devices (WebSocket)" section with reverse-proxy sketch (Caddy) and an nsc recipe for narrow-scoping shared browser credentials.

Operator-mode JWT auth applies identically to WS and TCP clients — this PR adds a transport, not a new auth path.

Motivation

Browsers can't speak NATS over raw TCP — they need WebSocket. There's no way to do that today without hand-editing the generated config (which setup_deployment.sh rewrites on the next bootstrap). I hit this building an audience-phone demo on top of an existing portal tenant; the changes are small and useful generally.

Security defaults

  • WS off by default. No existing config changes shape.
  • Listener is plain WS. Intended to be fronted by a TLS-terminating reverse proxy (Caddy, nginx, ...). The override binds to 127.0.0.1 so the unencrypted port can't be reached from the network without the proxy.
  • Native TLS supported. Pass --websocket-tls-cert + --websocket-tls-key to have NATS terminate TLS itself, in which case DC_NATS_WS_BIND=0.0.0.0:8443 is safe.
  • Same-origin by default. --websocket-allowed-origins is empty by default, which keeps nats-server's same_origin: true behavior. Operators only override it when a reverse proxy rewrites Host headers.

Test plan

  • bash -n setup_deployment.sh — syntactically clean
  • ./setup_deployment.sh --help — usage prints the new flags + a security note
  • TLS pair validation: passing only one of cert/key exits with a clear error
  • Appender output validates with nats-server -t in all three modes:
    • no_tls + allowed_origins list
    • native TLS (tls { cert_file; key_file })
    • defaults (no_tls, same-origin)
  • docker compose -f infra/docker-compose-multitenant-nats.yml -f infra/docker-compose-nats-websocket.yml config resolves cleanly; port 8443 is bound to 127.0.0.1 by default and DC_NATS_WS_BIND=... overrides the host_ip
  • Running deployment driven by these changes (browser device joining via wss://, terminated by Caddy in front of NATS) verified end-to-end

🤖 Generated with Claude Code

atsyplikhin and others added 2 commits May 29, 2026 01:58
Adds an opt-in `--enable-websocket` flag to setup_deployment.sh that
appends a `websocket {}` block to the generated NATS config. Default
behavior is unchanged -- existing deployments are untouched.

* setup_deployment.sh: --enable-websocket, --websocket-port,
  --websocket-allowed-origins, --websocket-tls-cert/--websocket-tls-key.
  TLS pair is both-or-neither; without TLS args, the listener is plain
  WS and intended to be fronted by a reverse proxy that terminates TLS.

* infra/docker-compose-nats-websocket.yml: compose override that
  exposes the WS port. Binds to 127.0.0.1 by default; the bind interface
  is overridable via DC_NATS_WS_BIND, but the README warns loudly against
  exposing plain WS publicly.

* security_infra/README.md: "Browser-based devices (WebSocket)" section
  covering the deployment shape, the reverse-proxy sketch (Caddy), and
  scoping shared browser credentials with nsc (token-prefix wildcards).

Operator-mode JWT auth applies identically to WS and TCP clients --
this adds a transport, not a new auth path. Validated by generating
configs in all three modes (no-TLS + allowed-origins, native TLS,
defaults) and running `nats-server -t` against each.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…idation)

Three small follow-ups to the initial commit, all in service of avoiding
silent footguns when the operator strays from the default settings.

1. Port pair: --websocket-port writes the listener port into the NATS
   config; the compose override maps the host->container port via
   DC_NATS_WS_PORT (default 8443). If the operator passes
   --websocket-port 9000 without also setting DC_NATS_WS_PORT=9000,
   NATS listens on 9000 inside the container but compose maps to 8443
   and the listener is silently unreachable.

   * Documented DC_NATS_WS_PORT in usage text, in the override file's
     own comment header, and in security_infra/README.md.
   * At "WebSocket listener enabled" the script now prints the exact
     `DC_NATS_WS_PORT=<port> docker compose ...` invocation including
     the port value the operator passed, so the two can't drift.

2. Empty tokens in --websocket-allowed-origins: "a.com,,b.com" or a
   trailing comma previously produced an empty quoted entry in
   allowed_origins. Trim each token, skip empty ones, and skip emitting
   the line entirely if the input collapses to nothing after trimming.

3. Numeric validation: --websocket-port and --nats-port now require a
   numeric value. A typo (--websocket-port 84as3) is caught at the
   arg-parse stage with a clear error instead of flowing into the
   config and only surfacing at `nats-server -t`.

Verified: all three fixes in isolation, plus a compose-merge with
DC_NATS_WS_PORT=9000 DC_NATS_WS_BIND=127.0.0.1:9443 correctly produces
"target: 9000, published: 9443, host_ip: 127.0.0.1".

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
manage_tenants.sh regenerate_nats_config() (used by create / add-device /
reload-nats) reran `nsc generate config` and re-appended only listen +
http_port, silently dropping every other directive below the
"# Device Connect additions" marker -- including the websocket {} block
this PR adds and the max_payload tuning. In production this took the
browser WebSocket listener offline whenever a device was added: phones
got a 502 on wss://.../nats and never registered.

Capture the existing additions tail before regeneration and restore it
afterward, falling back to the default listen/http_port only when no
prior block exists. This keeps the WebSocket listener (and any other
appended server config) alive across routine tenant operations.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
echo " # port is exposed to the network."
echo " no_tls: true"
fi
if [ -n "$WS_ALLOWED_ORIGINS" ]; then
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The documented default says an empty --websocket-allowed-origins keeps same-origin behavior, but this branch emits no same_origin: true. In NATS, same_origin defaults to false, so an empty allowed-origins list leaves the WebSocket listener open to any Origin rather than same-origin only. Please emit same_origin: true when no non-empty origins survive parsing, and only emit allowed_origins when the operator explicitly opts into cross-origin access.

if [ -f "${output}" ] && grep -q '^# Device Connect additions' "${output}"; then
additions=$(sed -n '/^# Device Connect additions/,$p' "${output}")
fi
nsc generate config --mem-resolver --config-file "${output}" 2>/dev/null
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This still regenerates directly over an existing ${output}. setup_deployment.sh already has to rm -f first because newer nsc refuses to overwrite an existing --config-file; this path will hit the same failure before the preserved additions are appended back. Please generate to a temp file and then append additions + move it into place, or remove the old output before generation after capturing the tail.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants