XIP-83: Mutable subscription streams with liveness#139
Conversation
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
| 3. The node MUST process `add`/`remove` deltas that arrive **after** the initial request, mutating | ||
| the live subscription **without** terminating or reopening the stream. Removed topics MUST stop | ||
| being delivered; added topics MUST follow rule (2). |
There was a problem hiding this comment.
🟡 Medium XIPs/xip-83-mutable-subscription-streams.md:148
The add/remove protocol at lines 148-150 lacks an acknowledgement or ordering barrier, so the take-effect time of a delta is undefined. With concurrent delivery, the server may already have t1 messages buffered when it processes remove:[t1], making it impossible to satisfy test case 5's requirement that the client "MUST stop receiving t1 messages." Implementations will diverge: some will leak post-remove messages, others will drop in-flight ones, producing nondeterministic cross-node behavior.
- 4. The node MUST process `add`/`remove` deltas that arrive **after** the initial request, mutating
- the live subscription **without** terminating or reopening the stream. Removed topics MUST stop
- being delivered; added topics MUST follow rule (2).
+ 4. The node MUST process `add`/`remove` deltas that arrive **after** the initial request, mutating
+ the live subscription **without** terminating or reopening the stream. The node MUST send a
+ `StatusUpdate{ SUBSCRIPTION_UPDATED }` acknowledging each processed delta before delivering any
+ messages under the new subscription state. Removed topics MUST stop being delivered after this
+ acknowledgement; added topics MUST follow rule (2).🚀 Reply "fix it for me" or copy this AI Prompt for your agent:
In file XIPs/xip-83-mutable-subscription-streams.md around lines 148-150:
The `add`/`remove` protocol at lines 148-150 lacks an acknowledgement or ordering barrier, so the take-effect time of a delta is undefined. With concurrent delivery, the server may already have `t1` messages buffered when it processes `remove:[t1]`, making it impossible to satisfy test case 5's requirement that the client "MUST stop receiving `t1` messages." Implementations will diverge: some will leak post-remove messages, others will drop in-flight ones, producing nondeterministic cross-node behavior.
| a bounded interval. The interval is server-controlled and RECOMMENDED to be **≤ 30 seconds**. The | ||
| idle timer MUST reset whenever any `Messages` or other frame is delivered, so heartbeats add **no | ||
| per-message overhead** and impose **no per-topic broadcast** — they are a property of the | ||
| connection, not of any conversation. |
There was a problem hiding this comment.
🟡 Medium XIPs/xip-83-mutable-subscription-streams.md:152
The keepalive_interval_ms field is OPTIONAL in the protobuf, and the server heartbeat interval is only recommended to be ≤ 30 seconds, not required. A compliant server could send heartbeats every 60 seconds while omitting the field, yet clients following line 167 assume a 30-second default and would declare the stream dead at 60–90 seconds, triggering infinite false reconnects. Consider requiring servers to either (a) always include keepalive_interval_ms when heartbeats are sent, or (b) mandate a hard upper bound on the heartbeat interval so the default assumption holds.
-4. The node MUST emit a `StatusUpdate{ WAITING }` heartbeat whenever no other frame has been sent for
- a bounded interval. The interval is server-controlled and RECOMMENDED to be **≤ 30 seconds**. The
- idle timer MUST reset whenever any `Messages` or other frame is delivered, so heartbeats add **no
- per-message overhead** and impose **no per-topic broadcast** — they are a property of the
- connection, not of any conversation.
+4. The node MUST emit a `StatusUpdate{ WAITING }` heartbeat whenever no other frame has been sent for
+ a bounded interval. The interval MUST be **≤ 30 seconds**, and the node MUST include
+ `keepalive_interval_ms` in every `StatusUpdate` so clients can derive accurate watchdog thresholds.
+ The idle timer MUST reset whenever any `Messages` or other frame is delivered, so heartbeats add
+ **no per-message overhead** and impose **no per-topic broadcast** — they are a property of the
+ connection, not of any conversation.🚀 Reply "fix it for me" or copy this AI Prompt for your agent:
In file XIPs/xip-83-mutable-subscription-streams.md around lines 152-155:
The `keepalive_interval_ms` field is `OPTIONAL` in the protobuf, and the server heartbeat interval is only *recommended* to be `≤ 30 seconds`, not required. A compliant server could send heartbeats every 60 seconds while omitting the field, yet clients following line 167 assume a 30-second default and would declare the stream dead at 60–90 seconds, triggering infinite false reconnects. Consider requiring servers to either (a) always include `keepalive_interval_ms` when heartbeats are sent, or (b) mandate a hard upper bound on the heartbeat interval so the default assumption holds.
| 2. For each `TopicFilter` in `add`, the node MUST deliver messages with id greater than | ||
| `last_seen_id` (or from the live edge if `last_seen_id == 0`), performing catch-up from history | ||
| then transitioning to live delivery, and MUST NOT deliver an id at or below a cursor it has | ||
| already advanced past for that topic on this stream (no duplicates across catch-up/live). |
There was a problem hiding this comment.
🟡 Medium XIPs/xip-83-mutable-subscription-streams.md:147
Server requirement 2 forbids redelivering IDs below the highest cursor ever seen for a topic on the same stream. After a remove and re-add, the server cannot honor a lower last_seen_id in the new TopicFilter, so clients requesting replay from an older durable checkpoint silently miss messages in that window.
-and MUST NOT deliver an id at or below a cursor it has already advanced past for that topic on this stream (no duplicates across catch-up/live).
+and MUST NOT deliver an id at or below a cursor it has already advanced past for that topic on this stream *unless* the topic was removed and is being re-added with an explicit `last_seen_id`, in which case the cursor is reset to the requested value (allowing intentional replay).🚀 Reply "fix it for me" or copy this AI Prompt for your agent:
In file XIPs/xip-83-mutable-subscription-streams.md around line 147:
Server requirement 2 forbids redelivering IDs below the highest cursor ever seen for a topic on the same stream. After a `remove` and re-`add`, the server cannot honor a lower `last_seen_id` in the new `TopicFilter`, so clients requesting replay from an older durable checkpoint silently miss messages in that window.
ApprovabilityVerdict: Needs human review 3 blocking correctness issues found. This XIP specification document is owned by @jhaaaa, not the PR author, and there are 3 unresolved review comments identifying substantive protocol ambiguities (delta acknowledgment ordering, keepalive interval bounds, cursor reset semantics) that warrant owner review before merging. You can customize Macroscope's approvability policy. Learn more. |
|
Love this direction. Would be a huge unlock for Herald and also better for all mobile apps. Main concern is that we have an even-more-different code path for the browser and everything else, and that browser issues can go unnoticed. Not a blocker, just an unfortunate side effect. Also means that our client streaming implementation needs to handle both mutable and immutable streams, which makes things harder to maintain. Such is life. |
Draft XIP, opened for circulation/discussion.
Summary
Defines a single bidirectional subscription RPC (
Subscribe) on the MLS API. A client opens one long-lived stream and mutates its topic set in place (add/remove deltas) instead of tearing down and reopening on every membership change; the server delivers messages plus a periodic liveness heartbeat (StatusUpdate{ WAITING }) so clients can detect silent stream death that transport keepalives miss — a terminating L7 proxy answers HTTP/2 pings at the edge while the backend subscription is gone. One connection can carry the union of many topics, the enabling primitive for multi-tenant agent gateways.Compatibility
Additive and backward-compatible: existing
SubscribeGroupMessages/SubscribeWelcomeMessagesare untouched, and WASM/browser clients (no bidirectional gRPC) stay on them. A client callingSubscribeagainst a node that lacks it getsUNIMPLEMENTEDand falls back.Status
Draft, for discussion. The client-side liveness floor — a
WatchdogStreamcombinator that reconnects a stale subscription from its persisted cursor — is already implemented in libxmtp against the existing server-streaming subscriptions (xmtp/libxmtp#3718). TheSubscribeRPC + node heartbeat handler standardized here are the remaining protocol work.Note
Add XIP-83 specification for mutable subscription streams with liveness
Adds xip-83-mutable-subscription-streams.md, a new protocol specification describing bidirectional mutable subscription streams with application-level keepalive heartbeats. The spec covers the protocol overview, protobuf definitions, server/client requirements, rationale, backward compatibility, test cases, and security considerations. Also adds "keepalive" and "keepalives" to the cspell.json dictionary.
📊 Macroscope summarized 59dc043. 1 file reviewed, 0 issues evaluated, 0 issues filtered, 0 comments posted
🗂️ Filtered Issues
No issues evaluated.