F1: chunk oversized link request bodies over the NATS hop

## Background

PR #3630 fixed `MAX_PAYLOAD_EXCEEDED` on the **response** direction of the link transport by splitting oversized `chunk` frames into ordered sub-chunks before the NATS hop (`apps/mesh/src/nats/payload-chunking.ts`, `ws-gateway.publishFrame`). That works because the response stream is an ordered, **single-publisher, per-request reply inbox** the consumer simply concatenates — no reassembly needed.

The **request** direction still has no guard. This is the deferred F1 follow-up referenced in the `dispatcher.ts` code comment.

## Current behavior

`createDispatcher` publishes the whole request (including `body`) as one `request` frame to the shared subject `links.dispatch.<userSub>`:

- `apps/mesh/src/links/dispatcher.ts` (the `try { deps.nats.publish("links.dispatch.…", …, { reply: inbox }) } catch` block, ~L132–154, with the `follow-up F1` comment)

If the encoded frame exceeds the NATS `max_payload` (default 1 MiB), `nats.publish` throws synchronously. As of #3630 this now **fails cleanly** — the dispatcher detaches the abort listener, tears down the inbox subscription, and rethrows — but the request itself is rejected. A large request body (e.g. a big file write / POST into the user sandbox) cannot be dispatched.

## Why the response-direction fix doesn't transfer

The request hop is **not** a per-request channel:

- `links.dispatch.<userSub>` is a **shared subject** — the owner pod's single `onDispatchFromNats` subscription (`apps/mesh/src/links/ws-gateway.ts`) receives the request frames for *all* concurrent requests for that user, from any pod.
- The reply inbox is attached per-publish via `{ reply: inbox }`.

So we can't simply emit ordered sub-messages and concatenate. Fragmenting the request body requires a **reassembly protocol** keyed by `reqId` (fragment index/total), with the owner pod buffering fragments until complete before forwarding the assembled frame down the WS (the WS leg itself has no payload cap, so only the NATS hop needs splitting).

## Design constraints (carried over from the #3630 plan review)

A naive in-band fragment/reassemble util was **rejected** during the #3630 plan review; F1 must avoid the same pitfalls:

- **Untrusted `total` → allocation DoS** — a malicious/garbled fragment count could pre-allocate huge buffers (~GBs). Bound the per-`reqId` buffer by a hard byte cap, not by a sender-declared count.
- **Live-group eviction corruption** on a shared subject — concurrent `reqId`s must not interfere; evicting one in-flight group must not corrupt another.
- **Lost-fragment hangs** — missing fragments must time out and surface a clean error, not hang the dispatcher (which has no idle timeout after its first frame).

## Acceptance criteria

- A request whose encoded `request` frame exceeds `MAX_PUBLISH_BYTES` is split on `links.dispatch.<userSub>`, reassembled at the owner pod, and forwarded intact to the daemon.
- Reassembly is bounded by a byte cap (clean error beyond it), per-`reqId`, with a timeout that surfaces a dispatcher error instead of hanging.
- Concurrent oversized requests for the same user don't interleave or corrupt each other.
- Reply-inbox routing is preserved end-to-end.
- Unit tests cover: split + reassemble round-trip, lost-fragment timeout, oversized-beyond-cap rejection, and concurrent-`reqId` isolation.

## Pointers

- Request publish: `apps/mesh/src/links/dispatcher.ts`
- Owner-pod request intake (where reassembly would live): `apps/mesh/src/links/ws-gateway.ts` → `onDispatchFromNats`
- Response-direction splitter to mirror/share: `apps/mesh/src/nats/payload-chunking.ts`

Sibling follow-ups noted alongside F1: **F3** (share the `MAX_PUBLISH_BYTES` constant between the link transport and the decopilot stream buffer), **F4** (raise server `max_payload` as a stopgap).

Follow-up to #3630.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

F1: chunk oversized link request bodies over the NATS hop #3631

Background

Current behavior

Why the response-direction fix doesn't transfer

Design constraints (carried over from the #3630 plan review)

Acceptance criteria

Pointers

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

F1: chunk oversized link request bodies over the NATS hop #3631

Description

Background

Current behavior

Why the response-direction fix doesn't transfer

Design constraints (carried over from the #3630 plan review)

Acceptance criteria

Pointers

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions