Background
PR #3630 fixed MAX_PAYLOAD_EXCEEDED on the response direction of the link transport by splitting oversized chunk frames into ordered sub-chunks before the NATS hop (apps/mesh/src/nats/payload-chunking.ts, ws-gateway.publishFrame). That works because the response stream is an ordered, single-publisher, per-request reply inbox the consumer simply concatenates — no reassembly needed.
The request direction still has no guard. This is the deferred F1 follow-up referenced in the dispatcher.ts code comment.
Current behavior
createDispatcher publishes the whole request (including body) as one request frame to the shared subject links.dispatch.<userSub>:
apps/mesh/src/links/dispatcher.ts (the try { deps.nats.publish("links.dispatch.…", …, { reply: inbox }) } catch block, ~L132–154, with the follow-up F1 comment)
If the encoded frame exceeds the NATS max_payload (default 1 MiB), nats.publish throws synchronously. As of #3630 this now fails cleanly — the dispatcher detaches the abort listener, tears down the inbox subscription, and rethrows — but the request itself is rejected. A large request body (e.g. a big file write / POST into the user sandbox) cannot be dispatched.
Why the response-direction fix doesn't transfer
The request hop is not a per-request channel:
links.dispatch.<userSub> is a shared subject — the owner pod's single onDispatchFromNats subscription (apps/mesh/src/links/ws-gateway.ts) receives the request frames for all concurrent requests for that user, from any pod.
- The reply inbox is attached per-publish via
{ reply: inbox }.
So we can't simply emit ordered sub-messages and concatenate. Fragmenting the request body requires a reassembly protocol keyed by reqId (fragment index/total), with the owner pod buffering fragments until complete before forwarding the assembled frame down the WS (the WS leg itself has no payload cap, so only the NATS hop needs splitting).
Design constraints (carried over from the #3630 plan review)
A naive in-band fragment/reassemble util was rejected during the #3630 plan review; F1 must avoid the same pitfalls:
- Untrusted
total → allocation DoS — a malicious/garbled fragment count could pre-allocate huge buffers (~GBs). Bound the per-reqId buffer by a hard byte cap, not by a sender-declared count.
- Live-group eviction corruption on a shared subject — concurrent
reqIds must not interfere; evicting one in-flight group must not corrupt another.
- Lost-fragment hangs — missing fragments must time out and surface a clean error, not hang the dispatcher (which has no idle timeout after its first frame).
Acceptance criteria
- A request whose encoded
request frame exceeds MAX_PUBLISH_BYTES is split on links.dispatch.<userSub>, reassembled at the owner pod, and forwarded intact to the daemon.
- Reassembly is bounded by a byte cap (clean error beyond it), per-
reqId, with a timeout that surfaces a dispatcher error instead of hanging.
- Concurrent oversized requests for the same user don't interleave or corrupt each other.
- Reply-inbox routing is preserved end-to-end.
- Unit tests cover: split + reassemble round-trip, lost-fragment timeout, oversized-beyond-cap rejection, and concurrent-
reqId isolation.
Pointers
- Request publish:
apps/mesh/src/links/dispatcher.ts
- Owner-pod request intake (where reassembly would live):
apps/mesh/src/links/ws-gateway.ts → onDispatchFromNats
- Response-direction splitter to mirror/share:
apps/mesh/src/nats/payload-chunking.ts
Sibling follow-ups noted alongside F1: F3 (share the MAX_PUBLISH_BYTES constant between the link transport and the decopilot stream buffer), F4 (raise server max_payload as a stopgap).
Follow-up to #3630.
Background
PR #3630 fixed
MAX_PAYLOAD_EXCEEDEDon the response direction of the link transport by splitting oversizedchunkframes into ordered sub-chunks before the NATS hop (apps/mesh/src/nats/payload-chunking.ts,ws-gateway.publishFrame). That works because the response stream is an ordered, single-publisher, per-request reply inbox the consumer simply concatenates — no reassembly needed.The request direction still has no guard. This is the deferred F1 follow-up referenced in the
dispatcher.tscode comment.Current behavior
createDispatcherpublishes the whole request (includingbody) as onerequestframe to the shared subjectlinks.dispatch.<userSub>:apps/mesh/src/links/dispatcher.ts(thetry { deps.nats.publish("links.dispatch.…", …, { reply: inbox }) } catchblock, ~L132–154, with thefollow-up F1comment)If the encoded frame exceeds the NATS
max_payload(default 1 MiB),nats.publishthrows synchronously. As of #3630 this now fails cleanly — the dispatcher detaches the abort listener, tears down the inbox subscription, and rethrows — but the request itself is rejected. A large request body (e.g. a big file write / POST into the user sandbox) cannot be dispatched.Why the response-direction fix doesn't transfer
The request hop is not a per-request channel:
links.dispatch.<userSub>is a shared subject — the owner pod's singleonDispatchFromNatssubscription (apps/mesh/src/links/ws-gateway.ts) receives the request frames for all concurrent requests for that user, from any pod.{ reply: inbox }.So we can't simply emit ordered sub-messages and concatenate. Fragmenting the request body requires a reassembly protocol keyed by
reqId(fragment index/total), with the owner pod buffering fragments until complete before forwarding the assembled frame down the WS (the WS leg itself has no payload cap, so only the NATS hop needs splitting).Design constraints (carried over from the #3630 plan review)
A naive in-band fragment/reassemble util was rejected during the #3630 plan review; F1 must avoid the same pitfalls:
total→ allocation DoS — a malicious/garbled fragment count could pre-allocate huge buffers (~GBs). Bound the per-reqIdbuffer by a hard byte cap, not by a sender-declared count.reqIds must not interfere; evicting one in-flight group must not corrupt another.Acceptance criteria
requestframe exceedsMAX_PUBLISH_BYTESis split onlinks.dispatch.<userSub>, reassembled at the owner pod, and forwarded intact to the daemon.reqId, with a timeout that surfaces a dispatcher error instead of hanging.reqIdisolation.Pointers
apps/mesh/src/links/dispatcher.tsapps/mesh/src/links/ws-gateway.ts→onDispatchFromNatsapps/mesh/src/nats/payload-chunking.tsSibling follow-ups noted alongside F1: F3 (share the
MAX_PUBLISH_BYTESconstant between the link transport and the decopilot stream buffer), F4 (raise servermax_payloadas a stopgap).Follow-up to #3630.