Port Registrar: browser agent + Chrome extension migration + origin allowlist#2345
Draft
TalZaccai wants to merge 10 commits into
Draft
Port Registrar: browser agent + Chrome extension migration + origin allowlist#2345TalZaccai wants to merge 10 commits into
TalZaccai wants to merge 10 commits into
Conversation
PR 2 of the port-registrar series. Drops the hardcoded 8082 port for the `code` agent's WebSocket server and rewires the in-repo Coda VS Code extension to discover the live port through the agent-server's discovery channel before connecting. * `CodeAgentWebSocketServer` is now an async `start(port = 0)` factory that resolves only after the `listening` event fires, with `close()` returning a Promise. Lets us read the OS-assigned port reliably and serialize rapid disable/enable cycles under a fixed `CODE_WEBSOCKET_PORT` override. * `updateCodeContext` registers the server's actual port per session via `sessionContext.registerPort`. The shared module-scoped server is kept (refcounted), but each enabling session gets its own registration handle stored on its `agentContext`; `releaseAllForSession` is the crash-safety backstop. The registrar already supports multiple registrations per `(agent, role)` so this drops the previously considered ownership-transfer state machine. * `readiness.ts` treats port as `number | undefined` and omits the port suffix from setup-card and NOT-RUNNING messages when no override is set. `resolveCodePort` is now `resolveCodePortOverride` to make the env-var semantics explicit. * Coda's `createWebSocket` is async and uses `discoverPort()` (from the `@typeagent/agent-server-client/discovery` subpath introduced in PR 1) with a legacy 8082 fallback only when the discovery channel itself is unreachable (not when lookup returns `not-registered`). Reconnect loop now guards against the multi-interval leak in the prior `reconnectWebSocket` implementation and re-discovers the port on every retry (port can change across agent-server restarts). * Tests: 5 new integration tests for the shared-server lifecycle (single + multi-session enable / disable / rollback on bind failure). Existing 13 readiness tests updated for the new `number | undefined` port type. All 18 code-agent tests pass; full workspace build green; lint clean. Branched off and depends on PR 1 ([port-registrar foundation], talzacc/port-registrar-foundation tip `a8fd7cf7`). Does not modify SecretAgents. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…query separator, refresh stale docs This is a research project; there are no in-the-wild pre-discovery clients to support, so the back-compat fallback in coda's webSocket.ts is dead weight. Replace it with: when discovery is unreachable or the code agent isn't yet registered, return undefined and let the existing 5s reconnect loop in wsConnect.ts retry. Also fix a pre-existing bug in createWebSocket: `clientId=` was appended to the query string without a `&` separator, producing malformed URLs like `...&role=clientclientId=xyz`. Caught by code review. Stale doc references to port 8082 in commandExecutor/README.md, commandExecutor/VSCODE_CAPABILITIES.md, and shell/demo/DEMO_SETUP.md updated to describe the discovery-based flow. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Extract getKnownCodePort() helper with a comment explaining the two-phase lifecycle (live bound port after start; static env-var prediction before start). The previous inline `getSharedCodePort() ?? resolveCodePortOverride(...)` read like a fallback when it's actually 'live if known, else predicted'. - Tighten getCodeBindPort doc: explicitly call out that EADDRINUSE on the requested port is a hard failure (no silent fallback to OS-assigned), since the user explicitly asked for that port. The previous 'falling back to OS-assigned port' wording in the malformed-input branch could mislead readers into thinking bind failures also fall back, which they don't. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…est cleanup createWebSocket previously used an `async` Promise executor (`new Promise(async (resolve) => ...`). If `resolveCodeEndpoint()` rejects or `new WebSocket(endpoint)` throws synchronously (e.g. an invalid `CODE_WEBSOCKET_HOST` override), the rejection lands on the executor's implicit promise rather than the outer one — leaving createWebSocket pending forever and stalling the reconnect loop. Refactor to do the awaits at the top level and try/catch around `new WebSocket()`. codeUpdateContext.spec.ts: replace the placeholder afterEach with a real teardown that tracks every StubContext newSession() hands out and force-disables any still-enabled session. Asserts the shared server is down after cleanup so a leak surfaces loudly instead of contaminating later tests. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…nored Per @robgruen review feedback on resolveCodePortOverride: surface a debug log when the env-var override is in effect (so a user wondering why the port isn't OS-assigned can confirm), and another when an invalid value is ignored (so a typo doesn't silently fall back to OS-assigned without any diagnostic). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Migrates the browser agent off hardcoded port 8081 to use the new
PortRegistrar infrastructure (PR 3 of 5 in the port-registrar series).
Server side:
- AgentWebSocketServer.start(port=0) returns a Promise resolved on the
'listening' event so the OS-assigned port is readable synchronously
via .port; rejects on EADDRINUSE under fixed-port overrides instead
of swallowing.
- close() awaits server.close() and proactively closes all tracked
clients first so a rapid disable/enable cycle under
BROWSER_WEBSOCKET_PORT can rebind cleanly.
- New verifyClient gate uses isAllowedAgentOrigin() to reject WS
upgrades from arbitrary web pages on the same host (per design
section 4.2). Allows chrome-extension://, moz-extension://,
http(s)://localhost or 127.0.0.1, and absent Origin (Node ws clients).
Handler lifecycle:
- Replaces module-load 'new AgentWebSocketServer(8081)' with refcounted
module state. ensureSharedBrowserServer() serializes concurrent
binds; cleanupBrowserSession() is idempotent and called from BOTH
the disable path AND closeBrowserContext (previously only close
unregistered the session, leaking on disable).
- Per-session sessionContext.registerPort('default', port). Releases on
cleanup; backstop in closeSessionContext covers the skipped-disable
case.
- Reads BROWSER_WEBSOCKET_PORT for explicit pinning when debugging;
malformed values fall back to OS-assigned port with a debug log.
Code agent Origin parity:
- codeAgentWebSocketServer.ts now also gates upgrades on
isAllowedAgentOrigin() (vscode-* schemes + loopback + absent
Origin). Closes the design-section-4.2 gap PR 2 quietly skipped.
Chrome extension migration:
- websocket.ts now calls discoverPort('browser', 'default', { url:
agentServerHost }) on every connect, then dials the discovered port.
Removes the module-level settings cache that never invalidated on
agentServerHost changes.
- Replaces websocketHost setting with agentServerHost (default
ws://localhost:8999/, blank means default). Aligns with
dispatcherConnection.ts which already used this setting.
- reconnectWebSocket() is now a singleton — was creating a fresh
setInterval on every onclose, leaking timers under flapping
connectivity. New websocket test asserts only one interval is
scheduled across repeated calls.
Tests:
- agentWebSocketServer.test.ts (28) — async start/close + Origin gate
+ close-tracks-clients.
- websocket.test.ts (5) — discoverPort mock + reconnect singleton.
- code agent: 18 tests still pass.
- All 253 browser tests pass (one pre-existing flaky
contentDownloader 'invalid URL' timeout unrelated).
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…r EADDRINUSE on re-enable Two PR-3 follow-up fixes surfaced during manual smoke testing: 1. View-server EADDRINUSE on re-enable updateBrowserContext(false) and closeBrowserContext clear viewProcess but didn't reset agentContext.localHostPort. On re-enable, the agent forked server.mjs with the stale port, racing the still-draining killed child (SIGTERM is async on Windows) and erroring with EADDRINUSE. Reset localHostPort to 0 in both teardown paths so the re-fork lets the OS pick a fresh port. 2. Stale "setup-required" cache after extension connects The dispatcher caches readiness at agent init time and only refreshes it after setup() or an explicit `@config agent refresh <name>`. The browser agent has no setup hook (extensions can't be launched programmatically), so users had to run the refresh command manually every time the browser came up. Adds SessionContext.notifyReadinessChanged() to the agent SDK as the push channel for state-change events, plumbs it through agent-rpc (out-of-process agents), and wires the browser agent to fire it from onClientConnected and onClientDisconnected. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
e5ac9b9 to
11719be
Compare
browser Agent + Chrome Extension Migration + Origin Allowlist…fork-failure port reset - originAllowlist (browser+code): accept '[::1]' (Node URL parser preserves brackets in hostname) so IPv6 loopback dev clients pass the Origin gate. - browserActionHandler.createViewServiceHost: reset agentContext.localHostPort to 0 in the synchronous fork catch path so a retried enable can re-register a fresh port (closes the third EADDRINUSE-recurrence path; the disable + close paths were fixed in the prior commit). - Add agentWebSocketServer.test.ts cases for http://[::1], http://[::1]:8081, https://[::1]:5173. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…swallow Two new tests in sessionContext.spec.ts: - delegates to agents.refreshReadiness with the agent's own name - swallows errors from refreshReadiness so an event-driven caller (WS onClientConnected/onClientDisconnected) cannot throw into the emitter path Covers the contract added in commit e5ac9b9 that wires browser-extension connect/disconnect events to automatic dispatcher readiness refresh, removing the need for users to run \@config agent refresh browser\ manually. Tests: 671 passed, 671 total (was 669). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…-migration # Conflicts: # ts/pnpm-lock.yaml
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why
PR 2 migrated the
codeagent and proved the discovery handshake against an in-repo VS Code extension. Thebrowseragent is the second concrete migration and exercises a meaningfully different consumer shape:isomorphic-ws, not Nodews.code's per-schema model.BROWSER_WEBSOCKET_PORTescape hatch matters more for debugging thanCODE_WEBSOCKET_PORTdid.This PR also closes a design gap that PR 2 quietly skipped: per design §4.2, every per-agent listener migrated to the
PortRegistrarmust Origin-gate WebSocket upgrades to keep an OS-assigned ephemeral port from being dialed by arbitrary web pages on the same host. Both the browser AND the code agent get the gate in this PR.What this PR changes
browseragent server: asyncstart+close+ Origin gateAgentWebSocketServeris now an asyncstart(port = 0)factory rather than a constructor. It resolves only after thelisteningevent so the OS-assigned port is readable synchronously via.port, and rejects on EADDRINUSE under fixed-port overrides instead of swallowing the error in a permanenterrorlistener.close()now awaitsserver.close()and proactively closes all tracked client sockets first, so a rapiddisable → re-enablecycle underBROWSER_WEBSOCKET_PORTrebinds cleanly.The new
verifyClientcallback appliesisAllowedAgentOrigin()to every upgrade request before theconnectionevent fires. Allowed:chrome-extension://*,moz-extension://*,http(s)://localhost(:port),http(s)://127.0.0.1(:port), and absent /nullOrigin (Nodewsclients don't send one; the bridge binds loopback so this is OS-restricted). Anything else gets HTTP 403.codeagent: Origin parityCodeAgentWebSocketServergets the same gate with a code-agent-specific allowlist (vscode-webview://,vscode-file://,vscode-resource://plus loopback / absent Origin). The twooriginAllowlistfiles are deliberately duplicated rather than shared — different scheme prefixes per agent, ~30 LOC each, and a shared module would force every agent that opens a port to depend on a new common package.Handler lifecycle: refcounted shared server, idempotent cleanup
browserActionHandler.mtsno longer constructs the server at module load. It now mirrors PR 2's pattern: module-scopedsharedBrowserServer,sharedStartingPromise,sharedClosingPromise,sharedBrowserRefCount, with anensureSharedBrowserServer()that serializes concurrent enables and waits for any in-flight close before re-binding (matters under fixed-port override; withport = 0the OS would just hand out a different port).The browser agent's lifecycle differs from
code's in two ways worth flagging for reviewers:updateBrowserContextearly-returns for non-browserschemas, so refcount accounting tracksagentContext.browserSchemaEnabled(always 0 or 1 per session) rather thanenabled.size.closeBrowserContext, leaking on the disable path. This PR funnels bothupdateAgentContext(false, ...)andcloseBrowserContextthroughcleanupBrowserSession(), which is idempotent: ifbrowserSchemaEnabledis already false, it bails without touching shared state.BROWSER_WEBSOCKET_PORTis the new escape hatch (mirrorsCODE_WEBSOCKET_PORT); malformed values fall back to OS-assigned with a debug log rather than crashing.Chrome extension: discovery-then-connect
extension/serviceWorker/websocket.tsnow does the same dance the Coda extension does in PR 2:agentServerHostfromchrome.storage.sync(defaultws://localhost:8999/).discoverPort("browser", "default", { url: agentServerHost })against the agent-server's discovery channel.unreachableornot-registered, returnundefinedand let the 5s reconnect loop retry. No hardcoded fallback — same call the user made for PR 2 (this is a research project, no lingering external clients to maintain back-compat for).Three pre-existing extension bugs are fixed in the same patch since they directly block the new flow:
settingscache was loaded once and never refreshed, so a user changingagentServerHostsaw the old endpoint until the service worker restarted. Cache removed; settings re-read on everycreateWebSocket.reconnectWebSocket()scheduled a freshsetIntervalon every call, and theonclosehandler called it unconditionally, so flapping connectivity stacked timers indefinitely. Now guarded by a module-level singleton.AGENT_SERVER_DEFAULT_URL".The setting itself is renamed
websocketHost→agentServerHostto match the dispatcher path (dispatcherConnection.tsalready used this key) and reflect that the user now configures the agent-server URL, not the per-agent port.Validation
pnpm --filter browser-typeagent build— green (full agent + extension + Vite client builds).pnpm --filter agent-sdk --filter agent-rpc --filter agent-dispatcher --filter browser-typeagent build— green (post-fix rebuild across all touched packages).pnpm --filter code-agent build && pnpm --filter code-agent test— green (18/18 still pass).pnpm --filter agent-dispatcher test— 669/669 pass, includingagentReadiness.spec.jsandsessionContext.spec.js.pnpm --filter browser-typeagent test:local— 253/254 pass. The one failure is a pre-existing flake incontentDownloader.test.ts("invalid URL gracefully" timing out at 5s); the file isn't touched by this PR (last edited inbbcb8f30).agentWebSocketServer.test.ts— 28 tests. Exercises asyncstart,closeclosing tracked clients first, and the Origin allowlist (coverschrome-extension://,http://localhost:8081, absent Origin, and rejection ofhttps://evil.example.com).websocket.test.ts— 5 tests. CoversdiscoverPort-driven endpoint resolution and asserts only onesetIntervalis scheduled across repeatedreconnectWebSocket()calls (the singleton fix).pr-3-manual-tests.mdchecklist confirmed end-to-end after both fixes: extension auto-discovers the port,@browser open a new tab to bing.comdispatches without manual@config agent refresh browser, andgetWebFlowsForDomainround-trips successfully through the SW ↔ agent webAgent proxy.Reading order for reviewers
This PR is two commits (the follow-up commit is intentionally separate to keep the smoke-test fixes reviewable in isolation). Suggested reading order:
agents/browser/src/agent/{agentWebSocketServer,originAllowlist}.mts+ tests — the server-side primitives (async start/close, Origin gate). Identical structure to the PR 2CodeAgentWebSocketServer.agents/browser/src/agent/{browserActionHandler,browserActions}.mts— the lifecycle wiring. The interesting bits areensureSharedBrowserServer/cleanupBrowserSessionand the convergence of the disable + close paths. Also contains thelocalHostPort = 0reset and thenotifyReadinessChanged()calls in the WS lifecycle callbacks.agents/code/src/{codeAgentWebSocketServer,originAllowlist}.ts— Origin parity for the code agent (small).agents/browser/src/extension/serviceWorker/{websocket,storage,index}.ts+views/options.{ts,html}— the extension consumer. DemonstratesdiscoverPort()from MV3 and the three bug fixes (cache, reconnect singleton, blank-input).agentSdk/src/agentInterface.ts+dispatcher/src/execute/sessionContext.ts+agentRpc/src/{types,client,server}.ts— the newnotifyReadinessChanged()API and its in-process + RPC plumbing. Mirrors the existingreloadAgentSchema()pattern.Follow-up PRs
visualStudiohost webviewonboarding-scaffolder+ tightenoriginAllowlistonagentServer