outfitter-dev · galligan · Jun 10, 2026 · Jun 7, 2026
diff --git a/.agents/plans/live-use-hardening/GOAL.md b/.agents/plans/live-use-hardening/GOAL.md
@@ -0,0 +1,28 @@
+# Live-use hardening - pasteable goal
+
+```text
+/goal Work in the dispatch repo root. Implement the live-use hardening plan in .agents/plans/live-use-hardening/PLAN.md.
+
+Context: a real Trails delegation attempt exposed trust failures that unit/parity tests did not catch. Dispatch accepted work that had not proven alive, hid model/system failures behind raw watch events, let slash-command goal text look like a native goal, left daemon lifecycle commands outside the JSON/scriptability contract, and allowed CLI projection hand-wiring to grow beyond the no-drift doctrine.
+
+Objective: make dispatch trustworthy for live agent coordination. Document the incident, add regression tests and guardrails, tighten derived surface boundaries, fix launch/error/status semantics, make cleanup/lifecycle commands agent-safe, update docs/skills, and run local review until no P0/P1/P2 issues remain.
+
+Required outcomes:
+- A durable plan/retro records decisions, checks, review findings, and deferred work.
+- Public CLI/MCP projections are governed by explicit projection metadata or an allowlisted control-surface contract; ungoverned hand-wired per-op routes are test failures.
+- `new`/`send` outputs distinguish accepted delivery from proof of execution.
+- `get`/list-like status surfaces expose latest turn/error state well enough that raw `watch` is not required to discover obvious model/system failures.
+- `/goal ...` as message text is either rejected/warned or replaced by a first-class `new --goal` path that calls the native goal API.
+- Destroy operations have explicit non-interactive confirmation support, and `up`/`down` expose JSON output.
+- Registry schema recovery is boring: doctor/up explain or expose a safe migrate/repair path without manual DB surgery.
+- Docs, README, skills, plugin docs, schemas/help, tests, and ADR/rules are updated where behavior or doctrine changes.
+- Checks pass, including focused tests and `just check`; run local review loops and fix P2+ findings.
+
+Constraints:
+- Preserve contract-first/no-drift architecture; if a surface needs special ergonomics, make the override explicit and tested.
+- Do not touch live user Codex state in tests. Use isolated `DISPATCH_HOME`/`CODEX_HOME` for any smoke.
+- Do not merge, publish, or mutate release state unless explicitly asked.
+- If model preflight cannot be made reliable from the current App Server contract, surface the first failure clearly and record the limitation.
+
+Done only when all required outcomes are implemented or explicitly deferred with evidence, local checks pass, review P2+ is clear, and RETRO.md contains final proof.
+```
diff --git a/.agents/plans/live-use-hardening/PLAN.md b/.agents/plans/live-use-hardening/PLAN.md
@@ -0,0 +1,123 @@
+# Live-use Hardening - implementation plan
+
+One-branch hardening packet for the real-use failures found during a Trails
+delegation attempt. Goal loop: [`GOAL.md`](./GOAL.md). References:
+[`REFS.md`](./REFS.md). Execution ledger: [`RETRO.md`](./RETRO.md).
+
+## Objective
+
+Make dispatch trustworthy when an agent uses it for real coordination:
+
+- keep surface projection honest and guarded;
+- make launch results distinguish "accepted" from "alive/responded";
+- surface latest turn/model/system failures in normal status surfaces;
+- make native goals first-class instead of relying on slash-command text;
+- make daemon lifecycle and destructive cleanup scriptable;
+- make registry migration/recovery safe and obvious;
+- update docs, skills, and tests so this class of failure does not sneak
+  through again.
+
+## Incident facts
+
+The real Trails use case found these product failures:
+
+- A stale registry with schema v1 and missing tables required manual DB backup
+  and recreation because `dispatch up` no-oped while a daemon answered.
+- `dispatch up --json` failed even though most agent-operated commands are
+  JSON-shaped.
+- `dispatch new --model gpt-5.5-codex --text "$goal_prompt"` used a stale,
+  guessed explicit model id and returned
+  `sent: true` and `status: idle`, but no assistant work happened.
+- `/goal ...` sent as initial text did not create native goal state.
+- The unsupported model failure was only obvious through `dispatch watch`, not
+  `dispatch get`.
+- `trigger rm --json` and `archive --json` still required interactive stdin.
+- The existing parity/handler tests stayed green.
+
+## Root causes to address
+
+1. Projection doctrine is written down, but CLI has bespoke route functions and
+   control commands without an enforceable manifest/allowlist.
+2. Tests prove routing and accepted calls, not live coordination trust.
+3. Normal state models do not persist latest turn failures or suspicious
+   no-assistant completions.
+4. Goal text and native App Server goals are separate, but docs/skills do not
+   make the boundary loud enough.
+5. Integration tests are intentionally out of the default gate, so real
+   semantics need cheap fake-level regression tests plus release smoke guidance.
+
+## Implementation chunks
+
+### Chunk 1 - regression tests and projection guardrails
+
+- Add failing tests for:
+  - destroy commands supporting an explicit non-interactive confirmation flag;
+  - `up --json` / `down --json`;
+  - `/goal` text guard or first-class `new --goal`;
+  - `TurnFailed.message` being persisted and exposed by `get`;
+  - `new` not overclaiming that a turn produced work.
+- Introduce CLI projection metadata/manifest or a strict allowlist that
+  classifies public commands as:
+  - op projection;
+  - composed op projection;
+  - surface control.
+- Add tests that fail for ungoverned public commands and mismatched schema/help
+  routes.
+
+### Chunk 2 - launch, goal, and status semantics
+
+- Replace or supplement `NewLane.sent` with explicit launch fields such as
+  `message_accepted`, `goal_set`, `first_turn`, and/or a structured launch
+  result. Maintain honest naming in docs/schemas.
+- Add `NewInput.goal` or equivalent. If text starts with `/goal` and no native
+  goal field is used, fail or warn clearly.
+- Persist latest turn/error state in the registry and expose it in `get`,
+  relevant list outputs, and MCP schemas.
+- Ensure model/system failures show in normal status without raw `watch`.
+
+### Chunk 3 - scriptable surfaces and registry recovery
+
+- Add `--yes`/`--no-interactive` support for destroy-intent CLI commands from
+  projection rules, not one-off commands.
+- Add JSON output to `up` and `down`.
+- Improve doctor recovery for versioned missing tables.
+- Add a safe registry migrate/repair command or lifecycle helper if it can be
+  done without broad architecture churn. At minimum, make `up`/doctor refuse
+  misleading no-op recovery and provide exact safe commands.
+- Add tests for older schema v1/v2 cases with existing lanes/triggers.
+
+### Chunk 4 - docs, skills, and release smoke
+
+- Update README, docs/usage, skills/dispatch, skills/dm if affected, plugin docs,
+  AGENTS/rules/ADRs where behavior or doctrine changed.
+- Add a documented pre-release/live-dogfood smoke that uses isolated state and
+  proves lane liveness.
+- Update examples/schema expectations.
+
+### Chunk 5 - local review and finalization
+
+- Run focused tests after each chunk.
+- Run `just check`.
+- Run a local review pass focused on P0/P1/P2:
+  - surface derivation drift;
+  - live-use trust;
+  - destructive/scripted safety;
+  - registry migration safety;
+  - docs/skill truthfulness.
+- Fix P2+; fix cheap P3s; record deferred P3s in RETRO.
+
+## Deferral policy
+
+Acceptable deferrals only if recorded in RETRO with evidence:
+
+- account-specific model preflight if `model/list`/verification does not expose
+  reliable support in the current App Server;
+- optional Graphite/worktree ownership, which is useful but not the root control
+  plane trust failure;
+- long-lived streaming subscriptions beyond bounded `watch`.
+
+## Done
+
+Done only when tests and docs prove the full objective, `just check` passes, a
+review loop has no unresolved P0/P1/P2, and RETRO contains exact verification
+commands, final git state, remaining risks, and PR state if submitted.
diff --git a/.agents/plans/live-use-hardening/REFS.md b/.agents/plans/live-use-hardening/REFS.md
@@ -0,0 +1,59 @@
+# Live-use Hardening - references
+
+## Field report
+
+- `/tmp/trails-dispatch-real-use-feedback-2026-06-07.md`
+  - Registry schema v1 missing `lane_snapshots` and `lane_sync_sources`.
+  - `dispatch up --json` unsupported; `dispatch up` no-oped against running
+    daemon.
+  - `dispatch new --model gpt-5.5-codex --text "$goal_prompt"` used a stale,
+    guessed explicit model id and returned `sent: true`, `status: idle`, but no
+    assistant response and no goal state.
+  - `watch` surfaced unsupported model error; `get` did not.
+  - Destroy cleanup required `printf 'y\n' | ... --json`.
+
+## Architecture docs
+
+- `docs/adrs/0000-contract-first-surface-derived.md`
+  - Every surface is a pure projection of one op registry.
+  - Parity tests must check behavior, not only names.
+- `docs/adrs/0010-surface-projections-are-ergonomic-not-isomorphic.md`
+  - Surfaces may group/rename/compose, but may not restate schemas, examples,
+    safety intent, error behavior, or capability policy.
+- `.claude/rules/contracts.md`
+  - Overrides must be visible escape hatches, not default hand wiring.
+- `.claude/rules/surfaces.md`
+  - Surface modules contain projection wiring only.
+
+## Code hot spots
+
+- `src/outfitter/dispatch/contracts/derive_cli.py`
+  - CLI projection, custom route functions, schema route table, destroy prompt.
+- `src/outfitter/dispatch/surfaces/cli.py`
+  - `doctor`, `up`, `down`, and `mcp` hand-written control commands.
+- `src/outfitter/dispatch/core/handlers.py`
+  - `new_lane`, `show`, send/goal handlers.
+- `src/outfitter/dispatch/core/reactor.py`
+  - `TurnFailed` currently updates status but does not persist message.
+- `src/outfitter/dispatch/registry/store.py`
+  - Schema migrations and registry state.
+- `src/outfitter/dispatch/doctor.py`
+  - Registry diagnostics and recovery hints.
+
+## Existing tests
+
+- `tests/surfaces/test_parity.py`
+- `tests/surfaces/test_derive_cli.py`
+- `tests/core/test_handlers.py`
+- `tests/test_doctor.py`
+- `tests/integration/test_daemon_e2e.py`
+- `tests/integration/test_app_server.py`
+
+## Verification commands
+
+```bash
+uv run pytest tests/surfaces/test_parity.py tests/surfaces/test_derive_cli.py tests/test_doctor.py tests/core/test_handlers.py -q
+just check
+```
+
+Optional live smoke must use isolated runtime paths.
diff --git a/.agents/plans/live-use-hardening/RETRO.md b/.agents/plans/live-use-hardening/RETRO.md
@@ -0,0 +1,64 @@
+# Live-use Hardening - execution ledger
+
+Durable execution ledger for the live-use trust hardening goal.
+
+## Timeline
+
+- 2026-06-07: Created packet on `feat/live-use-hardening` after the Trails
+  real-use report showed green tests missing operator trust failures.
+- 2026-06-07: Implemented runtime turn-state persistence, native `new --goal`,
+  honest `message_accepted` launch output, scriptable daemon lifecycle output,
+  destroy-command confirmation flags, registry migration recovery, and explicit
+  CLI projection/control manifests.
+- 2026-06-07: Updated README, usage docs, dispatch skill, plugin README,
+  development design notes, agent rules, and ADR-0020.
+
+## Checks
+
+- `uv run pytest tests/test_doctor.py tests/surfaces/test_parity.py tests/surfaces/test_derive_cli.py tests/core/test_handlers.py tests/registry/test_store.py -q`
+  - `103 passed`
+- `uv run pytest -q`
+  - `210 passed, 9 deselected`
+- `just check`
+  - `ruff check`: passed
+  - `ruff format --check`: passed
+  - `mypy src tests`: passed
+  - `pytest`: `210 passed, 9 deselected`
+  - `uv build`: built `outfitter_dispatch-0.4.0` sdist/wheel
+  - `scripts/check_package_contents.py`: passed
+- CLI smoke:
+  - `uv run dispatch schema new | jq -r ...`
+    - verified goal and `message_accepted` schema descriptions.
+  - `uv run dispatch schema 'list --unmanaged' | jq -r .op`
+    - returned `discover`.
+  - `uv run dispatch schema 'tail --follow'`
+    - exited `2` with a clean unknown-command error, matching current docs.
+  - `uv run dispatch registry migrate --help`
+    - showed JSON/text, backup, and controlled-running options.
+
+## Review
+
+- P0/P1/P2 review pass:
+  - Verified projection guardrails cover op-backed CLI routes, schema spellings,
+    and full CLI surface-control allowlist.
+  - Verified synchronous `turn/start` failures no longer leave registered lanes
+    looking idle; `new`/`send` now persist latest error state before re-raising.
+  - Verified `TurnFailed.message` projects through reactor -> registry -> `get`.
+  - Verified `/goal ...` initial text is rejected unless callers use native
+    `--goal`, and native goal set happens before the initial turn.
+  - Verified old registry recovery has doctor guidance plus `registry migrate`
+    tests, including daemon-running refusal.
+  - Verified docs/skill/plugin/rules/ADR describe the changed behavior and
+    current limitations.
+- Unresolved P0/P1/P2: none found in local review.
+
+## Deferred
+
+- Account/model preflight remains deferred. The current App Server client
+  accepts model strings on thread/turn options but does not expose a cheap,
+  reliable account-specific model support check in dispatch's verified contract.
+  The implemented mitigation is to persist and expose App Server failures through
+  ordinary status surfaces instead of requiring raw `watch`.
+- Infinite streaming remains deferred. `watch` is still a bounded live event
+  sample over a request/response control socket; a subscription-capable control
+  socket remains future work.
diff --git a/.claude/rules/client.md b/.claude/rules/client.md
@@ -19,12 +19,13 @@ Demux the single stream: responses by request `id`, notifications by `threadId`,
 - `thread/start.sandbox` is a **string** enum (`read-only`/`workspace-write`/`danger-full-access`); `turn/start.sandboxPolicy` is an **object** (`{type:"readOnly", ...}`). Different encodings — model both.
 - `turn/steer` requires `expectedTurnId` (from `turn/started`).
 - `thread/list` results are under `result.data` (not `result.threads`); `useStateDbOnly:true` reads the persisted store.
+- Current `thread/list` supports native `archived`, `cwd`, `searchTerm`, `sourceKinds`, and sort filters; use them when they match dispatch semantics, then keep registry/authority filters in core.
 - `thread/search` is experimental; enable the experimental API capability before using it and keep the wrapper thin.
 - `thread/resume` of a *persisted* thread yields live event fan-out; pre-persistence it errors `no rollout found`.
 - Approvals are server→client requests: lane emits `thread/status/changed` `activeFlags:["waitingOnApproval"]`; reply `{id, result:{decision}}` (`accept`/`acceptForSession`/`decline`/`cancel`); server emits `serverRequest/resolved`. File-change approvals carry **no diff** — correlate by `itemId` to the `fileChange` item.
 - Threads persist by default (`ephemeral:false`). Pass `ephemeral:true` for throwaway/test lanes.
 
 ## Discipline
 
-- Pin the binary; regenerate wire models from `codex app-server generate-json-schema` for that version. Do NOT depend on the `openai-codex` Python SDK (it pins an older CLI).
+- Pin/record the binary; regenerate wire models from `codex app-server generate-json-schema` for that version. Do not assume the `openai-codex` Python SDK matches the installed CLI; it has lagged before.
 - No business logic here — this layer is transport + typed primitives only. Orchestration lives in `core/`.
diff --git a/.claude/rules/contracts.md b/.claude/rules/contracts.md
@@ -20,6 +20,11 @@ compose ops (for example `list --unmanaged` → `discover`, `goal status` →
 derived from the registry; never hand-implement the same behavior separately in a
 surface.
 
+If the CLI needs custom shell grammar, declare it in the CLI projection manifest
+(`CliRoute`, `cli_public_routes`, and when needed `cli_schema_routes`). Do not add
+or special-case a command path without a parity test proving the path reaches the
+canonical op and that `dispatch schema <command>` reports the canonical op schema.
+
 ## Derivation (never hand-write a surface per op)
 
 Surfaces are pure projections of the registry, mirroring Trails' `derive* → create* → surface`:
@@ -44,6 +49,9 @@ One `DispatchError` hierarchy in `errors.py` (e.g. `NotFoundError`, `LaneBusyErr
 - Adding capability = adding an op, registering it, and ensuring the derived
   projections route it intentionally. If a route is missing, the parity tests
   should fail.
+- If a route is intentionally a surface control rather than an op (`doctor`,
+  `up`, `down`, `registry migrate`, `schema`, `mcp`), document why and keep it out
+  of per-op business logic.
 - Every op exposed on MCP/remote must define `output`.
 - Keep handlers pure-ish: input in, output out (or raise). Side effects go through injected dependencies (the App Server client, the registry) passed via `ctx`, never imported ad hoc.
 - A parity test must stay green — and it checks **behavior/reachability, not

diff --git a/.claude/rules/surfaces.md b/.claude/rules/surfaces.md
@@ -9,6 +9,9 @@ Path: `src/outfitter/dispatch/surfaces/`. Each surface is a thin, generated proj
   tree may group/alias ops for shell ergonomics, but each command marshals
   contract input → calls the daemon control socket → renders the result with
   Rich. The CLI is a **sync** client; it does not import `core/` or `client/`.
+  Process/control commands such as `doctor`, `up`, `down`, `registry migrate`,
+  `schema`, and `mcp` are the allowed exceptions: they manage or inspect the
+  surface/runtime itself and must not duplicate op behavior.
 - **MCP** (`mcp.py`): a stdio MCP server (via the `mcp` SDK) from
   `derive_mcp(registry)`; grouped tool handlers route to the daemon control
   socket, same as the CLI. Spawned by the MCP client (Claude/Codex), not hosted
@@ -18,6 +21,8 @@ Path: `src/outfitter/dispatch/surfaces/`. Each surface is a thin, generated proj
 - Keep the **parity test** green: every registered op must be reachable through
   each surface's derived projection with matching schemas, annotations, and error
   projection. Surface names do not need to equal op ids.
+- Destroy-intent CLI routes must preserve the derived confirmation behavior:
+  prompt interactively, and require `--yes` when paired with `--no-interactive`.
 
 ## Why
 

diff --git a/README.md b/README.md
@@ -12,6 +12,8 @@ uv tool install outfitter-dispatch
 dispatch --help
 dispatchd --help
 dispatch doctor
+dispatch up --json
+dispatch down --json
 ```
 
 From a source checkout:
@@ -20,7 +22,7 @@ From a source checkout:
 uv sync
 uv run dispatch --help
 uv run dispatch doctor --no-app-server
-uv run dispatch up
+uv run dispatch up --json
 uv run dispatch daemon status
 ```
 
@@ -30,12 +32,13 @@ Create an owned managed thread, send it work, and inspect the daemon:
 uv run dispatch new \
   --name docs \
   --cwd /path/to/dispatch \
+  --goal "Finish the docs review." \
   --text "Please summarize the current stack state."
 uv run dispatch list
+uv run dispatch get <dispatch-ref>
 uv run dispatch tail <dispatch-ref> --limit 20
-uv run dispatch goal set <dispatch-ref> "Finish the docs review."
 uv run dispatch daemon log --limit 10
-uv run dispatch down
+uv run dispatch down --json
 ```
 
 Use owned managed threads for turn-writing work. Existing desktop Codex threads can be attached as
@@ -48,12 +51,19 @@ unmanaged Codex thread ids, and `search` can span both. Attach is metadata-only
 default; use `dispatch sync <selector>` when you want dispatch to refresh its local
 indexed view of an attached thread.
 
+`new` reports whether the first message was accepted by the App Server, not whether
+assistant work completed. Use `get` to inspect the latest turn state and persisted
+App Server errors, or `watch` for a bounded live event sample. Slash commands in
+`--text` are plain text; use `--goal` when creating a native App Server goal.
+
 For the operator guide, CLI/MCP examples, triggers, and plugin setup, start at
 [`docs/usage/README.md`](docs/usage/README.md).
 
 Start troubleshooting with `dispatch doctor`. It checks PATH visibility, the Codex CLI
 and auth footprint, daemon socket/pidfile state, registry schema/integrity, packaged
-skills/plugin assets, and a low-risk Codex App Server initialize smoke.
+skills/plugin assets, and a low-risk Codex App Server initialize smoke. If doctor reports
+an old registry schema, stop the daemon and run `dispatch registry migrate` before
+starting it again.
 
 ## Agent And Plugin Support