Skip to content

Port Registrar: browser agent + Chrome extension migration + origin allowlist#2345

Draft
TalZaccai wants to merge 10 commits into
mainfrom
talzacc/browser-agent-migration
Draft

Port Registrar: browser agent + Chrome extension migration + origin allowlist#2345
TalZaccai wants to merge 10 commits into
mainfrom
talzacc/browser-agent-migration

Conversation

@TalZaccai
Copy link
Copy Markdown
Contributor

Depends on PR 2 (code agent migration). Branched off talzacc/code-and-localview-migration and must merge after it. Reuses the discoverPort() helper, the registrar SDK, and the async-start/close pattern proven there.

Why

PR 2 migrated the code agent and proved the discovery handshake against an in-repo VS Code extension. The browser agent is the second concrete migration and exercises a meaningfully different consumer shape:

  • The Chrome / Edge extension is out-of-process and distributed separately — discovery has to work from a service worker (MV3) over isomorphic-ws, not Node ws.
  • A single browser-agent server has two simultaneous client populations — the extension(s) and the Electron shell's inline browser — sharing one session id; the lifecycle / refcount accounting differs from code's per-schema model.
  • The Electron shell is the most-used host, so the BROWSER_WEBSOCKET_PORT escape hatch matters more for debugging than CODE_WEBSOCKET_PORT did.

This PR also closes a design gap that PR 2 quietly skipped: per design §4.2, every per-agent listener migrated to the PortRegistrar must Origin-gate WebSocket upgrades to keep an OS-assigned ephemeral port from being dialed by arbitrary web pages on the same host. Both the browser AND the code agent get the gate in this PR.

What this PR changes

browser agent server: async start + close + Origin gate

AgentWebSocketServer is now an async start(port = 0) factory rather than a constructor. It resolves only after the listening event so the OS-assigned port is readable synchronously via .port, and rejects on EADDRINUSE under fixed-port overrides instead of swallowing the error in a permanent error listener. close() now awaits server.close() and proactively closes all tracked client sockets first, so a rapid disable → re-enable cycle under BROWSER_WEBSOCKET_PORT rebinds cleanly.

The new verifyClient callback applies isAllowedAgentOrigin() to every upgrade request before the connection event fires. Allowed: chrome-extension://*, moz-extension://*, http(s)://localhost(:port), http(s)://127.0.0.1(:port), and absent / null Origin (Node ws clients don't send one; the bridge binds loopback so this is OS-restricted). Anything else gets HTTP 403.

code agent: Origin parity

CodeAgentWebSocketServer gets the same gate with a code-agent-specific allowlist (vscode-webview://, vscode-file://, vscode-resource:// plus loopback / absent Origin). The two originAllowlist files are deliberately duplicated rather than shared — different scheme prefixes per agent, ~30 LOC each, and a shared module would force every agent that opens a port to depend on a new common package.

Handler lifecycle: refcounted shared server, idempotent cleanup

browserActionHandler.mts no longer constructs the server at module load. It now mirrors PR 2's pattern: module-scoped sharedBrowserServer, sharedStartingPromise, sharedClosingPromise, sharedBrowserRefCount, with an ensureSharedBrowserServer() that serializes concurrent enables and waits for any in-flight close before re-binding (matters under fixed-port override; with port = 0 the OS would just hand out a different port).

The browser agent's lifecycle differs from code's in two ways worth flagging for reviewers:

  1. Per-session refcount, not per-schemaupdateBrowserContext early-returns for non-browser schemas, so refcount accounting tracks agentContext.browserSchemaEnabled (always 0 or 1 per session) rather than enabled.size.
  2. Two cleanup paths converge on one helper — the prior code only released its session registration in closeBrowserContext, leaking on the disable path. This PR funnels both updateAgentContext(false, ...) and closeBrowserContext through cleanupBrowserSession(), which is idempotent: if browserSchemaEnabled is already false, it bails without touching shared state.

BROWSER_WEBSOCKET_PORT is the new escape hatch (mirrors CODE_WEBSOCKET_PORT); malformed values fall back to OS-assigned with a debug log rather than crashing.

Chrome extension: discovery-then-connect

extension/serviceWorker/websocket.ts now does the same dance the Coda extension does in PR 2:

  1. Read agentServerHost from chrome.storage.sync (default ws://localhost:8999/).
  2. Call discoverPort("browser", "default", { url: agentServerHost }) against the agent-server's discovery channel.
  3. Construct the browser-agent URL by reusing the agent-server host and substituting the discovered port.
  4. On unreachable or not-registered, return undefined and let the 5s reconnect loop retry. No hardcoded fallback — same call the user made for PR 2 (this is a research project, no lingering external clients to maintain back-compat for).

Three pre-existing extension bugs are fixed in the same patch since they directly block the new flow:

  • Settings cache never invalidated — the module-level settings cache was loaded once and never refreshed, so a user changing agentServerHost saw the old endpoint until the service worker restarted. Cache removed; settings re-read on every createWebSocket.
  • Reconnect timer leakreconnectWebSocket() scheduled a fresh setInterval on every call, and the onclose handler called it unconditionally, so flapping connectivity stacked timers indefinitely. Now guarded by a module-level singleton.
  • Options page rejected blank input — the WebSocket-host field rejected empty strings, so users couldn't say "use the default". Now blank means "use AGENT_SERVER_DEFAULT_URL".

The setting itself is renamed websocketHostagentServerHost to match the dispatcher path (dispatcherConnection.ts already used this key) and reflect that the user now configures the agent-server URL, not the per-agent port.

Validation

  • pnpm --filter browser-typeagent build — green (full agent + extension + Vite client builds).
  • pnpm --filter agent-sdk --filter agent-rpc --filter agent-dispatcher --filter browser-typeagent build — green (post-fix rebuild across all touched packages).
  • pnpm --filter code-agent build && pnpm --filter code-agent test — green (18/18 still pass).
  • pnpm --filter agent-dispatcher test669/669 pass, including agentReadiness.spec.js and sessionContext.spec.js.
  • pnpm --filter browser-typeagent test:local253/254 pass. The one failure is a pre-existing flake in contentDownloader.test.ts ("invalid URL gracefully" timing out at 5s); the file isn't touched by this PR (last edited in bbcb8f30).
  • New / updated unit tests:
    • agentWebSocketServer.test.ts28 tests. Exercises async start, close closing tracked clients first, and the Origin allowlist (covers chrome-extension://, http://localhost:8081, absent Origin, and rejection of https://evil.example.com).
    • websocket.test.ts5 tests. Covers discoverPort-driven endpoint resolution and asserts only one setInterval is scheduled across repeated reconnectWebSocket() calls (the singleton fix).
  • Manual smoke test against the pr-3-manual-tests.md checklist confirmed end-to-end after both fixes: extension auto-discovers the port, @browser open a new tab to bing.com dispatches without manual @config agent refresh browser, and getWebFlowsForDomain round-trips successfully through the SW ↔ agent webAgent proxy.

Reading order for reviewers

This PR is two commits (the follow-up commit is intentionally separate to keep the smoke-test fixes reviewable in isolation). Suggested reading order:

  1. agents/browser/src/agent/{agentWebSocketServer,originAllowlist}.mts + tests — the server-side primitives (async start/close, Origin gate). Identical structure to the PR 2 CodeAgentWebSocketServer.
  2. agents/browser/src/agent/{browserActionHandler,browserActions}.mts — the lifecycle wiring. The interesting bits are ensureSharedBrowserServer / cleanupBrowserSession and the convergence of the disable + close paths. Also contains the localHostPort = 0 reset and the notifyReadinessChanged() calls in the WS lifecycle callbacks.
  3. agents/code/src/{codeAgentWebSocketServer,originAllowlist}.ts — Origin parity for the code agent (small).
  4. agents/browser/src/extension/serviceWorker/{websocket,storage,index}.ts + views/options.{ts,html} — the extension consumer. Demonstrates discoverPort() from MV3 and the three bug fixes (cache, reconnect singleton, blank-input).
  5. agentSdk/src/agentInterface.ts + dispatcher/src/execute/sessionContext.ts + agentRpc/src/{types,client,server}.ts — the new notifyReadinessChanged() API and its in-process + RPC plumbing. Mirrors the existing reloadAgentSchema() pattern.

Follow-up PRs

PR Scope
4 Migrate visualStudio host webview
5 Migrate onboarding-scaffolder + tighten originAllowlist on agentServer

TalZaccai and others added 4 commits May 13, 2026 13:36
PR 2 of the port-registrar series. Drops the hardcoded 8082 port for
the `code` agent's WebSocket server and rewires the in-repo Coda
VS Code extension to discover the live port through the agent-server's
discovery channel before connecting.

* `CodeAgentWebSocketServer` is now an async `start(port = 0)`
  factory that resolves only after the `listening` event fires, with
  `close()` returning a Promise. Lets us read the OS-assigned port
  reliably and serialize rapid disable/enable cycles under a fixed
  `CODE_WEBSOCKET_PORT` override.
* `updateCodeContext` registers the server's actual port per session
  via `sessionContext.registerPort`. The shared module-scoped server
  is kept (refcounted), but each enabling session gets its own
  registration handle stored on its `agentContext`;
  `releaseAllForSession` is the crash-safety backstop. The registrar
  already supports multiple registrations per `(agent, role)` so this
  drops the previously considered ownership-transfer state machine.
* `readiness.ts` treats port as `number | undefined` and omits the
  port suffix from setup-card and NOT-RUNNING messages when no override
  is set. `resolveCodePort` is now `resolveCodePortOverride` to make
  the env-var semantics explicit.
* Coda's `createWebSocket` is async and uses `discoverPort()` (from
  the `@typeagent/agent-server-client/discovery` subpath introduced
  in PR 1) with a legacy 8082 fallback only when the discovery channel
  itself is unreachable (not when lookup returns `not-registered`).
  Reconnect loop now guards against the multi-interval leak in the
  prior `reconnectWebSocket` implementation and re-discovers the
  port on every retry (port can change across agent-server restarts).
* Tests: 5 new integration tests for the shared-server lifecycle
  (single + multi-session enable / disable / rollback on bind failure).
  Existing 13 readiness tests updated for the new `number | undefined`
  port type. All 18 code-agent tests pass; full workspace build green;
  lint clean.

Branched off and depends on PR 1 ([port-registrar foundation],
talzacc/port-registrar-foundation tip `a8fd7cf7`). Does not modify
SecretAgents.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…query separator, refresh stale docs

This is a research project; there are no in-the-wild pre-discovery clients to support, so the back-compat fallback in coda's webSocket.ts is dead weight. Replace it with: when discovery is unreachable or the code agent isn't yet registered, return undefined and let the existing 5s reconnect loop in wsConnect.ts retry.

Also fix a pre-existing bug in createWebSocket: `clientId=` was appended to the query string without a `&` separator, producing malformed URLs like `...&role=clientclientId=xyz`. Caught by code review.

Stale doc references to port 8082 in commandExecutor/README.md, commandExecutor/VSCODE_CAPABILITIES.md, and shell/demo/DEMO_SETUP.md updated to describe the discovery-based flow.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Extract getKnownCodePort() helper with a comment explaining the two-phase lifecycle (live bound port after start; static env-var prediction before start). The previous inline `getSharedCodePort() ?? resolveCodePortOverride(...)` read like a fallback when it's actually 'live if known, else predicted'.

- Tighten getCodeBindPort doc: explicitly call out that EADDRINUSE on the requested port is a hard failure (no silent fallback to OS-assigned), since the user explicitly asked for that port. The previous 'falling back to OS-assigned port' wording in the malformed-input branch could mislead readers into thinking bind failures also fall back, which they don't.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…est cleanup

createWebSocket previously used an `async` Promise executor (`new Promise(async (resolve) => ...`). If `resolveCodeEndpoint()` rejects or `new WebSocket(endpoint)` throws synchronously (e.g. an invalid `CODE_WEBSOCKET_HOST` override), the rejection lands on the executor's implicit promise rather than the outer one — leaving createWebSocket pending forever and stalling the reconnect loop. Refactor to do the awaits at the top level and try/catch around `new WebSocket()`.

codeUpdateContext.spec.ts: replace the placeholder afterEach with a real teardown that tracks every StubContext newSession() hands out and force-disables any still-enabled session. Asserts the shared server is down after cleanup so a leak surfaces loudly instead of contaminating later tests.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
TalZaccai and others added 3 commits May 14, 2026 13:57
…nored

Per @robgruen review feedback on resolveCodePortOverride: surface a debug
log when the env-var override is in effect (so a user wondering why the
port isn't OS-assigned can confirm), and another when an invalid value is
ignored (so a typo doesn't silently fall back to OS-assigned without any
diagnostic).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Migrates the browser agent off hardcoded port 8081 to use the new

PortRegistrar infrastructure (PR 3 of 5 in the port-registrar series).

Server side:

- AgentWebSocketServer.start(port=0) returns a Promise resolved on the

  'listening' event so the OS-assigned port is readable synchronously

  via .port; rejects on EADDRINUSE under fixed-port overrides instead

  of swallowing.

- close() awaits server.close() and proactively closes all tracked

  clients first so a rapid disable/enable cycle under

  BROWSER_WEBSOCKET_PORT can rebind cleanly.

- New verifyClient gate uses isAllowedAgentOrigin() to reject WS

  upgrades from arbitrary web pages on the same host (per design

  section 4.2). Allows chrome-extension://, moz-extension://,

  http(s)://localhost or 127.0.0.1, and absent Origin (Node ws clients).

Handler lifecycle:

- Replaces module-load 'new AgentWebSocketServer(8081)' with refcounted

  module state. ensureSharedBrowserServer() serializes concurrent

  binds; cleanupBrowserSession() is idempotent and called from BOTH

  the disable path AND closeBrowserContext (previously only close

  unregistered the session, leaking on disable).

- Per-session sessionContext.registerPort('default', port). Releases on

  cleanup; backstop in closeSessionContext covers the skipped-disable

  case.

- Reads BROWSER_WEBSOCKET_PORT for explicit pinning when debugging;

  malformed values fall back to OS-assigned port with a debug log.

Code agent Origin parity:

- codeAgentWebSocketServer.ts now also gates upgrades on

  isAllowedAgentOrigin() (vscode-* schemes + loopback + absent

  Origin). Closes the design-section-4.2 gap PR 2 quietly skipped.

Chrome extension migration:

- websocket.ts now calls discoverPort('browser', 'default', { url:

  agentServerHost }) on every connect, then dials the discovered port.

  Removes the module-level settings cache that never invalidated on

  agentServerHost changes.

- Replaces websocketHost setting with agentServerHost (default

  ws://localhost:8999/, blank means default). Aligns with

  dispatcherConnection.ts which already used this setting.

- reconnectWebSocket() is now a singleton — was creating a fresh

  setInterval on every onclose, leaking timers under flapping

  connectivity. New websocket test asserts only one interval is

  scheduled across repeated calls.

Tests:

- agentWebSocketServer.test.ts (28) — async start/close + Origin gate

  + close-tracks-clients.

- websocket.test.ts (5) — discoverPort mock + reconnect singleton.

- code agent: 18 tests still pass.

- All 253 browser tests pass (one pre-existing flaky

  contentDownloader 'invalid URL' timeout unrelated).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…r EADDRINUSE on re-enable

Two PR-3 follow-up fixes surfaced during manual smoke testing:

1. View-server EADDRINUSE on re-enable
   updateBrowserContext(false) and closeBrowserContext clear viewProcess
   but didn't reset agentContext.localHostPort. On re-enable, the agent
   forked server.mjs with the stale port, racing the still-draining
   killed child (SIGTERM is async on Windows) and erroring with
   EADDRINUSE. Reset localHostPort to 0 in both teardown paths so the
   re-fork lets the OS pick a fresh port.

2. Stale "setup-required" cache after extension connects
   The dispatcher caches readiness at agent init time and only refreshes
   it after setup() or an explicit `@config agent refresh <name>`.
   The browser agent has no setup hook (extensions can't be launched
   programmatically), so users had to run the refresh command manually
   every time the browser came up. Adds SessionContext.notifyReadinessChanged()
   to the agent SDK as the push channel for state-change events, plumbs
   it through agent-rpc (out-of-process agents), and wires the browser
   agent to fire it from onClientConnected and onClientDisconnected.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@TalZaccai TalZaccai force-pushed the talzacc/browser-agent-migration branch from e5ac9b9 to 11719be Compare May 14, 2026 20:58
@TalZaccai TalZaccai changed the title Port Registrar — browser Agent + Chrome Extension Migration + Origin Allowlist Port Registrar — browser Agent + Chrome Extension Migration + Origin Allowlist May 14, 2026
@TalZaccai TalZaccai changed the title Port Registrar — browser Agent + Chrome Extension Migration + Origin Allowlist Port Registrar: browser Agent + Chrome Extension Migration + Origin Allowlist May 14, 2026
@TalZaccai TalZaccai changed the title Port Registrar: browser Agent + Chrome Extension Migration + Origin Allowlist Port Registrar: browser agent + Chrome extension migration + origin allowlist May 14, 2026
…fork-failure port reset

- originAllowlist (browser+code): accept '[::1]' (Node URL parser preserves brackets in hostname) so IPv6 loopback dev clients pass the Origin gate.

- browserActionHandler.createViewServiceHost: reset agentContext.localHostPort to 0 in the synchronous fork catch path so a retried enable can re-register a fresh port (closes the third EADDRINUSE-recurrence path; the disable + close paths were fixed in the prior commit).

- Add agentWebSocketServer.test.ts cases for http://[::1], http://[::1]:8081, https://[::1]:5173.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…swallow

Two new tests in sessionContext.spec.ts:

- delegates to agents.refreshReadiness with the agent's own name

- swallows errors from refreshReadiness so an event-driven caller (WS onClientConnected/onClientDisconnected) cannot throw into the emitter path

Covers the contract added in commit e5ac9b9 that wires browser-extension connect/disconnect events to automatic dispatcher readiness refresh, removing the need for users to run \@config agent refresh browser\ manually.

Tests: 671 passed, 671 total (was 669).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant