Skip to content

fix(ws): wrap clientSession in ConcurrentWebSocketSessionDecorator (2.0.2)#524

Merged
tcheeric merged 6 commits intomainfrom
fix/concurrent-subscribe-send-2026-05-07
May 8, 2026
Merged

fix(ws): wrap clientSession in ConcurrentWebSocketSessionDecorator (2.0.2)#524
tcheeric merged 6 commits intomainfrom
fix/concurrent-subscribe-send-2026-05-07

Conversation

@tcheeric
Copy link
Copy Markdown
Owner

@tcheeric tcheeric commented May 7, 2026

Summary

Wrap the underlying WebSocketSession returned by connectSession(...) in Spring's ConcurrentWebSocketSessionDecorator at construction time so concurrent sendMessage() calls from subscribe() and send() are serialised at the session layer.

The subscribe() path was calling clientSession.sendMessage(...) without holding the existing sendLock (which send() already used correctly). Two threads racing inside subscribe() could trigger Tomcat's IllegalStateException("The remote endpoint was in state [TEXT_FULL_WRITING]"), which the existing catch (RuntimeException) block rewrapped as IOException("Failed to send subscription payload", e). The imani-gateway-core consumer observed ~70 RelaySubscribeException per 60 minutes once a circuit-breaker tuning fix unmasked the underlying race.

Per the upstream spec (imani-apps/docs/analysis-gateway-core-relay-subscribe-instability-2026-05-07.md), wrapping the session in ConcurrentWebSocketSessionDecorator is the idiomatic Spring fix and is preferred over extending sendLock to also guard subscribe() because it future-proofs against new send-paths added later.

Changes

  • New canonical four-arg constructor NostrRelayClient(String, long, int, int) annotated @Autowired with @Value-injected sendBufferLimit (default 256 KiB, ~2.5x the largest expected ~100 KiB chunked-EVENT payload) and sendTimeLimitMs (default 10 s, well below the existing 60 s awaitTimeoutMs). Reachable via nostr.websocket.send-buffer-limit / NOSTR_WEBSOCKET_SEND_BUFFER_LIMIT and nostr.websocket.send-time-limit-ms / NOSTR_WEBSOCKET_SEND_TIME_LIMIT_MS under Spring relaxed binding.
  • Existing one-arg and two-arg public constructors retained as delegating overloads (binary-compatible for downstream callers like connectAsync(...)); both now also benefit from the decorator wrap.
  • awaitTimeoutMs migrated from a @Value-annotated field to a final field assigned via constructor injection (required so the value is available before the decorator is constructed).
  • Two named test-only static factories forTestWithRawSession(...) and forTestWithDecoratedSession(...) (two overloads) plus a package-private four-arg test ctor make the wrap-or-not decision explicit at the call site, so the new test class can deterministically reproduce the race against a raw session and verify the fix on a decorated one.
  • OverflowStrategy.TERMINATE is explicit (not relying on Spring's default), and the spec explicitly forbids OverflowStrategy.DROP (silent loss of REQ/EVENT frames).

New test class — NostrRelayClientConcurrencyTest (8 tests)

Structured as fail-first / pass-after per spec §6.7:

  • (7a) Reproduction — 32 concurrent subscribe() calls against a raw session reproduce the IOException("Failed to send subscription payload", IllegalStateException) rewrap. Proves the test would catch the regression if the fix were reverted.
  • (7b) Resolution — same harness against a decorated session sees zero failures and maxInFlight==1 (the decorator's serialisation invariant). Includes the field-injection regression assertion: constructor arguments must be reflected in sendBufferLimit / sendTimeLimitMs, not overridden by static defaults.
  • (7c) Mixed-workload — a 100 KiB send racing with 20 normal-sized subscribes under the 256 KiB default buffer completes without overflow (validates the §4.2 sizing claim).
  • (7d) Overflow-and-close — a tiny 256-byte buffer with 50 concurrent 1 KiB subscribes forces overflow; SessionLimitExceededException (rewrapped as IOException("Failed to send subscription payload", …)) propagates to the caller. Note: the spec's stricter "session must be closed after overflow" assertion is relaxed here because Spring's ConcurrentWebSocketSessionDecorator does NOT close the delegate from limitExceeded() — that close-on-overflow chain is the upstream imani-wallet-lib's responsibility (§6.7e in the spec).
  • Constructor validation: non-positive sendBufferLimit / sendTimeLimitMs / awaitTimeoutMs and null session are rejected with IllegalArgumentException / NullPointerException.

Verification

  • mvn -P no-docker clean install -DskipITs — all 126 tests pass (118 pre-existing + 8 new), 1 skipped (pre-existing).
  • Published to Reposilite as 2.0.2 (https://maven.398ja.xyz/releases/xyz/tcheeric/nostr-java-client/2.0.2/).
  • §3.2 / §3.3 spec-verification structural checks confirmed against the source: clientSession is final, single instance per NostrRelayClient; pre-fix subscribe() had no sendLock.lock() between lines 374–404 and no ConcurrentWebSocketSessionDecorator reference anywhere in the class.

Test plan

  • mvn -P no-docker clean install -DskipITs — green
  • Reposilite deploy — https://maven.398ja.xyz/releases/xyz/tcheeric/nostr-java-client/2.0.2/nostr-java-client-2.0.2.jar returns 200
  • Local artifact installed at ~/.m2/repository/xyz/tcheeric/nostr-java-client/2.0.2/
  • Cross-repo phases 2–6 (separate PRs, driven by the user): bump in imani-bom, imani-wallet-lib (with the property-flow + reconnect-test wiring described in the spec §8 Phase 3), imani-gateway-core + imani-gateway-customer, redeploy to staging, monitor §6 acceptance criteria for 24 h.

🤖 Generated with Claude Code

tcheeric and others added 3 commits May 8, 2026 00:26
… serialise concurrent sends

NostrRelayClient.subscribe() called clientSession.sendMessage(...) without
holding the existing sendLock, while send() held it correctly. Two threads
racing inside subscribe() (or one in subscribe() racing one in send()) could
trigger StandardWebSocketSession's IllegalStateException("The remote endpoint
was in state [TEXT_FULL_WRITING]") (Tomcat) or "Blocking message pending"
(Jetty). The catch (RuntimeException) block at the bottom of subscribe()
rewrapped the exception as IOException("Failed to send subscription
payload", e), which the imani-gateway-core consumer then surfaced as
~70 RelaySubscribeException per 60 minutes once a circuit-breaker tuning
fix unmasked the underlying race.

Fixed by wrapping the underlying WebSocketSession returned by connectSession()
in Spring's ConcurrentWebSocketSessionDecorator(session, sendTimeLimit,
bufferSizeLimit, OverflowStrategy.TERMINATE) at construction time, so
concurrent sendMessage() calls from any send-path are serialised at the
session layer. The decorator is idiomatic Spring and is future-proof against
new send-paths added later (e.g. heartbeat/ping) that would otherwise have
to remember to acquire sendLock.

Sized: 256 KiB buffer (~2.5x the largest expected payload, a chunked
kind-37375 EVENT at ~100 KiB), 10 s send time limit (above any p99.9
healthy-state subscribe REQ, well below the existing awaitTimeoutMs=60 s).
Both tunable via @Value-injected constructor parameters
(nostr.websocket.send-buffer-limit, nostr.websocket.send-time-limit-ms) or
matching env-vars (NOSTR_WEBSOCKET_SEND_BUFFER_LIMIT,
NOSTR_WEBSOCKET_SEND_TIME_LIMIT_MS) under Spring relaxed binding.
OverflowStrategy is explicitly TERMINATE, never DROP — dropping outbound
REQ/EVENT frames silently would make callers believe a subscribe/publish
succeeded when the relay never received it.

Constructor binding:
- The new four-arg public constructor (relayUri, awaitTimeoutMs,
  sendBufferLimit, sendTimeLimitMs) is annotated @Autowired so Spring's
  prototype-scoped @component selection is deterministic.
- All existing public constructors (one-arg and two-arg) are retained as
  delegating overloads (binary-compat) and now also benefit from the
  decorator wrap by way of the canonical ctor.
- A new package-private four-arg test ctor + two named static factories
  (forTestWithRawSession, forTestWithDecoratedSession) make the
  wrap-or-not decision explicit at the call site, so the new
  NostrRelayClientConcurrencyTest can deterministically reproduce the
  race against a raw session and verify the fix on a decorated one.
- awaitTimeoutMs is now a final field assigned via constructor injection
  (previously @Value-annotated field). Constructor-injection ordering
  guarantees the value reaches the constructor body before the decorator
  is constructed.

New test class NostrRelayClientConcurrencyTest covers:
- (7a) Reproduction: 32 concurrent subscribe() calls against a raw session
  reproduce the IllegalStateException race rewrap.
- (7b) Resolution: same harness against a decorated session sees zero
  failures and maxInFlight==1; field-injection regression assertions on
  the new ctor parameters.
- (7c) Mixed-workload: a 100 KiB send racing with 20 subscribes under the
  256 KiB default buffer completes without overflow.
- (7d) Overflow-and-close: a tiny 256-byte buffer is forced to overflow;
  SessionLimitExceededException (rewrapped as
  IOException("Failed to send subscription payload", …)) propagates to
  the caller. Note: the spec's strict "session must be closed after
  overflow" assertion is relaxed here because Spring's
  ConcurrentWebSocketSessionDecorator does not close the delegate from
  limitExceeded() — the close-on-overflow chain is the upstream
  imani-wallet-lib's responsibility (§6.7e in the spec).
- Constructor validation tests for non-positive sendBufferLimit /
  sendTimeLimitMs / awaitTimeoutMs and null session.

Spec: imani-apps/docs/analysis-gateway-core-relay-subscribe-instability-2026-05-07.md
Phase: 1 of 6

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Includes the concurrent-subscribe-send fix from the previous commit.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e10ac66fc1

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +209 to +213
public NostrRelayClient(
@Value("${nostr.relay.uri}") String relayUri,
@Value("${nostr.websocket.await-timeout-ms:60000}") long awaitTimeoutMs,
@Value("${nostr.websocket.send-buffer-limit:262144}") int sendBufferLimit,
@Value("${nostr.websocket.send-time-limit-ms:10000}") int sendTimeLimitMs)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Document the new constructor and config properties

When this public four-argument constructor and the nostr.websocket.send-buffer-limit / nostr.websocket.send-time-limit-ms settings are added, the repo guideline to update documentation for modified features applies; however, the API reference still lists only the one- and two-argument constructors in docs/reference/nostr-java-api.md:174-178 and does not mention the new tuning properties. Consumers relying on the reference docs will miss the new configuration surface, so please update the relevant docs alongside this API change.

Useful? React with 👍 / 👎.

tcheeric and others added 3 commits May 8, 2026 05:59
… restore isOpen()==false reconnect contract

Spring's ConcurrentWebSocketSessionDecorator under OverflowStrategy.TERMINATE
only sets a private limitExceeded flag and throws SessionLimitExceededException
from limitExceeded() — it does NOT close the delegate session. As a result,
2.0.2's wrap-only fix left the session open after overflow, breaking the
upstream caller's clientSession.isOpen()==false → reconnect contract that the
spec's §6.7d ii originally required.

NostrRelayClient.subscribe() and NostrRelayClient.send() now detect a
SessionLimitExceededException cause in their respective catch blocks and call
clientSession.close(CloseStatus.SESSION_NOT_RELIABLE) explicitly before
rewrapping the exception. Non-overflow RuntimeExceptions continue to flow
through the existing wrap-as-IOException path unchanged — only overflow
terminates the session, matching Spring's OverflowStrategy.TERMINATE
semantics that the framework only half-implements.

The §6.7d concurrency test now strictly asserts clientSession.isOpen()==false
after overflow, restoring the spec's original contract.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@tcheeric tcheeric merged commit 10a37a3 into main May 8, 2026
5 checks passed
@tcheeric tcheeric deleted the fix/concurrent-subscribe-send-2026-05-07 branch May 8, 2026 06:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant