fix: replace orchestrators through safe canonical branch handoff#2338
fix: replace orchestrators through safe canonical branch handoff#2338nikhilachale wants to merge 4 commits into
Conversation
|
Exact flow for switching/replacing an orchestrator now:
spawnOrchestrator(projectId, true) which sends: POST /api/v1/orchestrators
ao/-orchestrator
During steps 7 to 8 there is a short no-orchestrator gap by design. After success, there is one active orchestrator on the canonical branch. |
illegalcall
left a comment
There was a problem hiding this comment.
I found a few issues that should be fixed before merging:
- P2: Retry can get stuck after a partial replacement retirement failure
RetireForReplacement clears restore markers, marks the old orchestrator terminated, destroys the runtime best-effort, and only then calls ForceDestroy on the old worktree. If ForceDestroy fails, the API returns an error, but the old session is already terminated.
Example: the old canonical orchestrator worktree fails to force-remove because the path cannot be deleted or Git still has it registered. The user sees replacement failed and clicks Retry. On retry, SpawnOrchestrator(clean=true) lists only active orchestrators, so it no longer calls RetireForReplacement for the old session. It then tries to spawn a new orchestrator onto the same canonical branch/path and can keep failing because the old worktree still blocks it. The project is left with no active orchestrator and a retry button that cannot repair the stuck state.
Please keep the old session retirable until the branch/worktree release succeeds, or make the retry path detect and finish incomplete replacement retirements.
- P2: The configured orchestrator agent is not hydrated into the workspace model
WorkspaceSummary now has orchestratorAgent, and orchestratorHealth depends on it to show restart_needed. But the normal workspace query is built from GET /api/v1/projects, whose Summary does not include project config, and fetchWorkspaces never maps project.config.orchestrator.agent.
Example: a project is configured to use codex as orchestrator, but the currently running orchestrator is claude-code. After app load or after Project Settings saves and invalidates the workspace query, workspace.orchestratorAgent is undefined, so orchestratorNeedsRestart returns false and the board never shows the restart-needed CTA.
Please either expose the configured orchestrator harness in the project summary endpoint and map it in useWorkspaceQuery, or hydrate the workspace list from project detail/config data.
- P2: Duplicate-orchestrator UI still opens the first active orchestrator, not the newest
The new health message says “The newest one is used,” and the backend clean=false path now returns newestSession. The frontend does not match that behavior: findProjectOrchestrator uses .find(...), and SessionsBoard/Sidebar also pick the first active orchestrator from the session list.
Example: sessions are ordered by spawn number as proj-1 then proj-2, and both are active after a replacement race. The duplicate warning says the newest is used, but the Orchestrator button opens proj-1, the older/stale session.
Please make the frontend select the newest active orchestrator using the same CreatedAt/UpdatedAt/id fallback logic as the service, and reuse that helper everywhere the current orchestrator is opened.
…kspace components
|
illegalcall
left a comment
There was a problem hiding this comment.
Remaining issue on latest head 3983d37:
- P2: Project summaries expose full project config just to hydrate orchestrator agent
The previous replacement-flow issues look fixed, but this fix adds Config *domain.ProjectConfig to the GET /api/v1/projects summary shape and useWorkspaceQuery reads project.config?.orchestrator?.agent from it. That exposes much more than the workspace list needs: env, postCreate, symlinks, reviewer config, and agent config now travel through the project-list endpoint and the polling workspace query.
Example: if a project config contains env: { GITHUB_TOKEN: "..." } or other local runtime variables, opening the normal workspace list now fetches and caches that data even though the UI only needs the configured orchestrator harness.
Please expose a narrow field on the summary, e.g. orchestratorAgent, or a small role summary, instead of returning full ProjectConfig from the list endpoint. Keep full config on GET /api/v1/projects/{id} / settings flows.
Refs: backend/internal/service/project/types.go, backend/internal/service/project/service.go, backend/internal/domain/projectconfig.go, frontend/src/renderer/hooks/useWorkspaceQuery.ts
|
GET /api/v1/projects now returns each project summary like: { |

Summary
clean=trueorchestrator restarts with a replacement-specific retire path instead of interactiveKillDetails
Backend:
RetireForReplacement, which:StashUncommittedsession_worktreesrestore markersao/<prefix>-orchestratorcan be claimedFrontend:
Test Plan
cd backend && GOCACHE=/private/tmp/go-build-ao-orch-replace go test ./internal/session_manager ./internal/service/sessioncd frontend && npm run typecheckcloses #2310