Add core NATS tunnel transport + optimistic link presence#3854
Open
tlgimenes wants to merge 67 commits into
Open
Add core NATS tunnel transport + optimistic link presence#3854tlgimenes wants to merge 67 commits into
tlgimenes wants to merge 67 commits into
Conversation
80bfca6 to
7161b4d
Compare
9600332 to
c05c8cf
Compare
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… gate) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…esence Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… the daemon Optimistic dispatch publishes the work item without a pre-flight liveness gate; when the daemon is offline the publish throws tunnel_no_first_frame. Previously that propagated and DBOS retried it, leaving the run stuck in_progress. Now the thread gate self-fails the run (forceFailIfInProgress) so it settles terminal — surfacing the error the way the e2e expects. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ives-overview # Conflicts: # apps/mesh/src/link-daemon/cluster-connection-pull.test.ts
…ives-overview # Conflicts: # apps/mesh/src/api/routes/decopilot/dispatch-run.ts # apps/mesh/src/settings/resolve-config.ts
…ives-overview # Conflicts: # apps/mesh/src/link-daemon/handle-local-dispatch.ts
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…rts) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… + SLA retention Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… + SLA retention Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…umer Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ith status writes Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…only) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…arkers Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ns/epochs Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ector flag Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…or flag Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ives-overview # Conflicts: # apps/mesh/src/api/app.ts # apps/mesh/src/api/routes/decopilot/orphan-recovery.ts # bun.lock
…of scope) The post-merge re-added pod-heartbeat construction + projector leadership referenced POD_ID in closures that don't enclose its declaration, causing TS2304 in a clean build (a stale tsbuildinfo masked it locally). Call the module-scoped getPodId() directly at those sites. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
origin/main #3917 removed the getPodId import + POD_ID const from app.ts; the clean auto-merge applied that removal while keeping the projector/ heartbeat getPodId() call sites, leaving the name undeclared (TS2304 in the CI test-merge). Re-add the module-level import. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What is this contribution about?
Two related changes to the Studio ↔ desktop-link path.
1.
@decocms/tunnel— fetch-like HTTP over core NATS with streaming request/response bodies. Studio and the link daemon connect throughPOST /api/links/sessionusing scoped, short-lived NATS credentials; JetStream is kept out of the tunnel transport.2. Optimistic link presence — replaces the old heartbeat / KV-claim presence (which produced "CLI running but UI says offline" false-negatives) with a live probe:
GET /api/links/statusover the tunnel ({ hostname, capabilities, cliVersion }).LINK_CURRENT_GET//api/links/me, ~5s) to drive the desktop indicator, feature-gating, and thesandboxProviderKindit sends.resolveDispatchTargetjust normalizes the kind,POST /messagesno longer 409s on an offline desktop, and a work-publish that can't reach the daemon fails the run (forceFailIfInProgress) instead of hanging.TunnelPresenceSubscriber, thestudio_linksKV claim registry,resolve-default-provider-kind, and thelinks.presence.*publish + its credential grant. A tunnel inter-frame idle timeout replaces the claim-watch as the in-flight abort.Design + plan:
docs/superpowers/specs/2026-06-12-link-presence-tunnel-status-design.md.How to Test
bun run fmt && bun run check && bun run lintbun test apps/mesh/src/links apps/mesh/src/link-daemon apps/mesh/src/tools/links apps/mesh/src/sandbox packages/tunnel/srcapps/mesh/e2e/tests/link-tunnel.spec.ts): indicator flips online/offline via the live probe; an offlineuser-desktopsend is accepted (202) and the run settlesfailed(no 409).Migration Notes
tunnel.nats.publicUrl,tunnel.nats.publicEnabled,tunnel.nats.sessionTtlSeconds, plusNATS_OPERATOR_JWT/NATS_ACCOUNT_JWT/NATS_ACCOUNT_SIGNING_KEYto mint daemon sessions; expose NATS websockets in the NATS subchart.409onPOST /messages.Review Checklist
Summary by cubic
Moves Studio↔desktop-link traffic to an HTTP-over-NATS tunnel with streaming bodies and a live status probe, and routes run IO through a durable, seq-deduped stream processed by a leader‑elected projector. Also fixes a getPodId() import/scoping regression in projector/heartbeat wiring.
New Features
@decocms/tunnel(fetch over NATS) and@nats-io/jwt./api/links/sessionmints host‑scoped tunnel credentials or a token; returnsconnection.urls,credentials/token,expiresAt, andtunnelHostname(503 when disabled).GET /api/links/statusover the tunnel; backs/api/links/meandLINK_CURRENT_GET; the web app polls ~5s./messagesno longer 409‑gates on liveness; if a tunnel publish can’t reach the daemon, the thread gate fails the run.LINK_DISCONNECTjust sends a shutdown frame.ingestRunpublishes raw chunks toDECOPILOT_STREAMSwithNats-Msg-Id = ${runId}:${fenceToken}:${seq}, seq‑dedups replays, and drives hooks only;mintRunFenceToken()isolates turns; the projector keys accumulators by(runId, fenceToken).DECOPILOT_STREAMSis file‑backed with a 30‑min retention SLA and a 2‑min dedup window; a single‑active projector (pod‑heartbeat leader election) writesthreads.titleand terminal status, marks zero‑part runs failed, and exports lag and poison‑run metrics.Migration
tunnel.nats.publicUrl,tunnel.nats.publicEnabled, andtunnel.nats.sessionTtlSeconds(populatesNATS_PUBLIC_URL,NATS_TUNNEL_PUBLIC_ENABLED,NATS_TUNNEL_SESSION_TTL_SECONDS).NATS_OPERATOR_JWT,NATS_ACCOUNT_JWT,NATS_ACCOUNT_SIGNING_KEY.publicUrl.Written for commit 508740b. Summary will update on new commits.