Pluggable agent harnesses: claude behind interface, codex/gemini ready by rr0hit · Pull Request #58 · Facets-cloud/flow

rr0hit · 2026-05-20T12:07:43Z

Summary

Introduces internal/harness package — a 14-method interface abstracting the agent CLI driver (Claude Code today; Codex/Gemini ready). Existing claude-specific code moves into internal/harness/claude/ behind claude.New().
Adds tasks.harness column for per-task adapter pinning. NULL = claude (back-compat for pre-column DBs). Set on first flow do / flow do --here from the ambient harness; --force is required to switch.
Adds tasks.session_cwd column — fixes a pre-existing transcript-not-found bug for --here-bound sessions where claude was started in a directory other than task.work_dir (commonly via --force binds). Session identity is now keyed on (cwd, session_id) to match claude's on-disk layout.
Ambient harness detection via env-var probe (CLAUDE_CODE_SESSION_ID, future CODEX_THREAD_ID, GEMINI_SESSION_ID) — flow do <new-task> from inside a codex shell would pick codex when that adapter lands; falls back to claude.
Schema changes are additive + nullable; no backfill; legacy rows behave exactly as before via NULL-fallback paths.

What this unlocks

Codex and Gemini adapters can drop in as one line each in allHarnesses(). Their per-CLI specifics (binary name, headless flag spelling, transcript location/format, hook config path) live entirely inside their own sub-packages. Flow's caller code never branches on harness name — it dispatches via the interface or via harnessForTask(task) / harnessForSpawn(task) / ambientHarness() helpers in internal/app/harness.go.

Design choices worth flagging

Pre-allocation strategy is per-harness. Claude's NewSessionID() mints a v4 UUID locally (passed via --session-id). Codex/Gemini will probe their CLI (codex exec / gemini -p) to mint and capture a session id from stdout before opening the user-facing tab. The interface contract is uniform — every impl returns a real id, callers have one spawn path.
No env-var injection. Flow never sets FLOW_TASK or any other env var on spawned harness processes. The harness's own session-id env var is the only correlator, and flow only reads it. Avoids polluting nested-session environments and keeps --here's discovery path symmetric with first-spawn binding.
Harness owns transcripts end-to-end. RenderTranscript(workDir, sessionID, compact, cutoff, w) lives on the interface; each harness handles both path resolution AND format decoding (claude jsonl, codex event log, gemini single-object json) and writes normalized text. The ~270 lines of claude jsonl decoders move into internal/harness/claude/transcript.go.
Cross-harness binding under --force. flow do --here --force on a task pinned to a different harness switches the pinning alongside the session_id rebind, with a warning that the prior transcript is orphaned from flow's view.

Commits (chronological)

Extract abstraction; claude impl behind interface
Close claude hardcoding leaks in spawner backends + app helpers (closing 7 leaks identified by an audit pass)
Tighten interface — rename PrepareSpawn → NewSessionID, HeadlessRun → SkipPermissionsRun, drop HookEnvForSpawn, replace TranscriptPath → RenderTranscript
Per-task harness column + ambient detection
session_cwd column — fix (cwd, session_id) identity for --here binds

Test plan

Schema migration is additive (verified by reviewing the ALTER statements; runs on first flow invocation after upgrade)
flow do <new-task> writes both harness and session_cwd columns; verify with sqlite3 ~/.flow/flow.db "SELECT harness, session_cwd FROM tasks WHERE slug='<slug>'"
flow do --here <task> from a directory ≠ work_dir writes session_cwd = os.Getwd(); subsequent flow transcript <task> finds the file
flow do --here --force <task> on a task pinned to a different harness emits the warning and switches the pinning
Pre-existing tasks (NULL harness / session_cwd) continue to render transcripts via the work_dir + claude fallbacks
flow done <task> still drives the headless close-out sweep against the task's pinned harness
make test passes locally (verified uncached on the branch)

🤖 Generated with Claude Code

Introduces internal/harness with a Harness interface that abstracts the agent CLI (Claude Code today; codex / gemini next). claude moves to internal/harness/claude/ behind New(); the app package wires it via a single defaultHarness() helper. The interface accommodates both pre-allocating harnesses (claude mints a UUID up front via --session-id) and self-allocating ones (codex/ gemini mint their own at startup) without per-harness branches in flow's caller code: - PrepareSpawn() returns "" for self-allocating harnesses; callers branch on the empty string rather than on harness identity. - HookEnvForSpawn(slug) returns nil for claude (DB binding is already committed pre-spawn) and {FLOW_TASK: slug} for harnesses that complete the bind via SessionStart hook. Symbols moved out of internal/app: - newUUID, EncodeCwdForClaude → claude.NewUUID, claude.EncodeCwd - claudeRunner → claude.HeadlessRunner - psRunner, liveClaudeSessions → claude.PSRunner, (LiveSessionIDs) - sessionUUIDRe → claude.ValidateSessionID - install/uninstallClaudeHook → claude.{Install,Uninstall}*Hook - skill install/version paths → claude.SkillInstallPath etc. Behavior is preserved end to end — the byte-identity tests in internal/harness/claude/claude_test.go pin LaunchCmd / ResumeCmd to the exact strings the pre-refactor do.go produced. Full test suite (go test ./...) passes. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…pers A follow-up audit on the harness abstraction found 7 sites where claude-specific knowledge was leaking outside internal/harness/claude/ and internal/app/harness.go. All fixed here: - spawner.FocusSession now takes a `binary string` parameter; each backend (iterm, terminal, zellij, kitty) filters its process / pane list by that name instead of the hardcoded literal "claude". Caller in do.go passes h.Binary(). Internal helpers renamed ttyForClaudeSession → ttyForHarnessSession (and similar for the zellij / kitty pane lookups); claudeSessionRowRe → sessionUUIDRowRe. - internal/app/helpers.go's currentSessionID() now reads defaultHarness().SessionIDEnvVar() instead of CLAUDE_CODE_SESSION_ID directly. Multi-harness probing will replace the single read later. - Two flag-help strings in do.go and run.go that named "claude" explicitly are now harness-agnostic ("the spawned harness"). After this, claude is named in code only in two places: the impl package internal/harness/claude/ and the single instantiation point internal/app/harness.go. A re-audit confirmed clean. Deferred (per the implementation plan, not addressed here): - internal/app/hook.go (still emits claude-shaped hook JSON) - internal/app/skill/SKILL.md (embedded skill content) - User-facing error message strings mentioning "Claude session" Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…transcript end-to-end Five interface-level changes: 1. PrepareSpawn → NewSessionID. Docstring now says every impl returns a real id; claude generates locally, codex/gemini will probe their CLI to mint and capture. Flow's caller has a single uniform spawn path — no empty-string branches, no deferred-bind logic. 2. HeadlessRun → SkipPermissionsRun. Seam HeadlessRunner → SkipPermissionsRunner. Inner runHeadless → runSkipPermissions. 3. LaunchOpts.SkipApprovals → SkipPermissions. Matches the method name; one word for the concept across the package. 4. HookEnvForSpawn dropped from the interface. Flow no longer injects any env vars into spawned harness processes; the only injection that survives is $FLOW_ROOT (flow's data root, not a binding correlator). Codex/Gemini deferred-bind will use the probe-and- mint approach in NewSessionID rather than a hook callback — keeping flow's env-var contract one-way (harness exports, flow reads, never the reverse). 5. TranscriptPath → RenderTranscript. The old API returned only a path string; callers in app/ then hardcoded claude's jsonl schema to parse it. That's the leak — transcript format differs per harness (claude jsonl with claude messages, codex jsonl with codex events, gemini single-object json) and the parser couldn't stay in app/. Each harness now owns end-to-end transcript handling and writes a normalized text stream. The ~270 lines of claude jsonl decoders move from internal/app/transcript.go to internal/harness/claude/transcript.go. Parser-level tests move to internal/harness/claude/transcript_test.go; cmdTranscript wiring tests stay in internal/app/. internal/app/transcript.go is now a thin dispatcher. Behavior preserved end to end — go test ./... passes uncached. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Vestige from before the harness refactor — the skip-permissions flag now propagates via LaunchOpts.SkipPermissions which each harness translates to its own flag (claude: --dangerously-skip-permissions, codex: --dangerously-bypass-…, gemini: --yolo). The local var was appended to but never read. Caught by the post-refactor audit pass. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Adds a tasks.harness column so flow remembers which agent CLI owns each session's transcript / resume / close-out lifecycle. New tasks adopt the ambient harness on first bind; pinned tasks resume in their own harness regardless of where you run flow from. Schema (additive, no backfill): - ALTER TABLE tasks ADD COLUMN harness TEXT (nullable). NULL is the back-compat signal for claude — every row that existed before this column landed reads as claude in the app layer. Bootstrap binding (cmdDo, fresh path): - harnessForSpawn(task) picks the adapter: pinned task.harness wins; else the harness running THIS flow process (ambient probe via each adapter's SessionIDEnvVar); else claude default. - The chosen name persists alongside session_id atomically — future invocations read the same adapter even from a different ambient context. In-session bind (cmdDoHere): - --here probes ambient explicitly; refuses if the user isn't inside any known harness. - Cross-harness binding (task pinned to X, ambient is Y) is rejected without --force; with --force the harness column updates alongside session_id and a warning notes the orphaned prior transcript. - The bind UPDATE always writes harness, so a previously-unpinned task gets pinned on first --here too. Routing through task.harness: - cmdDone close-out sweep, cmdTranscript renderer, and the per-task [live] markers (show.go, list.go) all now use harnessForTask(t) instead of defaultHarness(). list.go batches LiveSessionIDs to one call per unique harness across the result set. - currentSessionID() probes every known harness's env var; previous hardcoded claude env-var read is gone. Helper module (internal/app/harness.go): - allHarnesses() — registry; codex/gemini drop in as one line each. - harnessByName / harnessForTask / harnessForSpawn / ambientHarness / defaultHarness — covers the four context shapes (have task & it's pinned, have task & ambient matters, have neither, just want a default). Tests cover ambient detection, harnessForTask fallback, bootstrap persistence, --here persistence, cross-harness rejection without --force, --force harness switch. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Claude's on-disk transcript path is keyed by the pair (cwd, session_id): ~/.claude/projects/<encode(cwd)>/<session_id>.jsonl. Pre-this-commit flow assumed cwd == task.work_dir for that encode — true for fresh `flow do` (we spawn the tab with cwd=work_dir) but BROKEN for `flow do --here` when claude was started in a directory other than work_dir. Most visible after `flow do --here --force` binds where the user explicitly attaches a foreign session. Reproducing case: task run-2-2 with work_dir=/tmp got --here-bound from a claude session running in /Users/rohit. Transcript lookup hit ~/.claude/projects/-tmp/<sid>.jsonl (didn't exist); actual file was at ~/.claude/projects/-Users-rohit/<sid>.jsonl. Same shape for any --here bind across directory boundaries. Fix: record the cwd alongside session_id at bind time. New nullable column tasks.session_cwd: - fresh `flow do`: session_cwd = task.work_dir (equal by construction — we spawn the tab with cwd=work_dir). - `flow do --here`: session_cwd = os.Getwd() (inherited from claude's startup cwd via the shell). - legacy rows (NULL): fall back to task.work_dir for transcript lookups, matching pre-commit behavior. Transcript lookup now passes session_cwd (with work_dir fallback) to harness.RenderTranscript instead of work_dir. Error messages on lookup failure now explain which cwd was assumed, why the file might be missing, and how to recover (re-bind via `flow do --here` to record the right cwd). The claude impl distinguishes "file not found at expected path" from other open errors and names the project dir in the message. Tests cover the new column being written on both `flow do` bootstrap and `flow do --here`. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

anshulsao

Solid extraction overall — the interface seams are well-chosen, the schema migration is clean (additive + nullable, no backfill), and make test is green on the branch. The session_cwd fix is a real correctness win that the PR body undersells: it actually makes flow do --here + flow transcript work for the --force and "started claude elsewhere" cases, which I've hit personally.

Five non-blocking-ish items below, ordered by impact. Two are [should-fix] mostly because they'd bite the moment a second harness lands; the rest are nits.

Backend flow diagrams

`flow do <task>` — harness pick on bootstrap

BEFORE                                  AFTER
──────                                  ─────
cmdDo(args) → findTask → task           cmdDo(args) → findTask → task
        │                                       │
        ▼                                       ▼
(claude assumed everywhere)             [+ h = harnessForSpawn(task)]
        │                                  1. task.harness pinned → use it
        ▼                                  2. ambient env-var set → use it
liveClaudeSessions()                       3. else → claude
        │                                       │
        ▼                                       ▼
spawner.FocusSession(sid)               h.LiveSessionIDs() → map[id]count
        │                               spawner.FocusSession(sid, h.Binary())
        ▼                                       │
newUUID()                                       ▼
        │                               h.NewSessionID()
        ▼                                       │
UPDATE session_id, …                            ▼
        │                               UPDATE session_id,
        ▼                                      [+ session_cwd  = task.work_dir],
"claude --session-id <id> …" prompt            [+ harness      = h.Name()]
        │                                       │
        ▼                                       ▼
spawner.SpawnTab(...)                   command = h.LaunchCmd(sid, prompt, opts)
                                        spawner.SpawnTab(...)

`flow do --here <task>` — new mismatch gate

BEFORE                                  AFTER
──────                                  ─────
sid = $CLAUDE_CODE_SESSION_ID           [+ h = ambientHarness()]
v4 UUID regex check                        - probes every h.SessionIDEnvVar()
        │                                  - nil if none / multiple set
        ▼                                       │
findTask → task                                 ▼
        │                               sid = os.Getenv(h.SessionIDEnvVar())
        ▼                               [+ h.ValidateSessionID(sid)]
"session bound elsewhere?" check                │
        │                                       ▼
        ▼                               findTask → task
UPDATE session_id, status                       │
                                                ▼
                                        [+ if task.harness ≠ h.Name() and pinned:
                                             require --force; warn that prior
                                             transcript history orphans]
                                                │
                                                ▼
                                        "session bound elsewhere?" check
                                                │
                                                ▼
                                        UPDATE session_id,
                                              [+ session_cwd = os.Getwd()],
                                              [+ harness     = h.Name()],
                                              status

`flow transcript <task>` — data-flow rewire

BEFORE                                  AFTER
──────                                  ─────
sessionJSONLPath(task)                  [+ h = harnessForTask(task)]
  = ~/.claude/projects/                          │
    EncodeCwdForClaude(task.work_dir)/           ▼
    <session_id>.jsonl                  cwd = task.session_cwd
        │                                     ?? task.work_dir   ← legacy fallback
        ▼                                       │
renderTranscript(path, compact, cutoff)         ▼
  - jsonl decoder lives in              h.RenderTranscript(cwd, sid, compact,
    internal/app/transcript.go            cutoff, os.Stdout)
                                          - claude impl owns path resolution
                                            AND format decode; jsonl decoder
                                            moves to harness/claude/transcript.go
                                          - decoder targets io.Writer (was stdout)
                                          - error path: enhanced hint pointing
                                            users at `flow do --here` for legacy
                                            session_cwd=NULL rows

Schema delta — `tasks` row

BEFORE                                  AFTER
──────                                  ─────
tasks columns:                          tasks columns:
  ...                                     ...
  session_id           TEXT               session_id           TEXT
  session_started      TEXT               session_started      TEXT
  session_last_resumed TEXT               session_last_resumed TEXT
                                        [+ session_cwd        TEXT  ]
                                        [+ harness            TEXT  ]
                                          ALTER ADD COLUMN, nullable,
                                          no backfill. NULL = back-compat
                                          read paths (work_dir / claude).

Package layout

BEFORE                                  AFTER
──────                                  ─────
internal/app/                           internal/app/
  bootstrap.go  newUUID, EncodeCwd        harness.go   (dispatcher: 6 helpers)
  sessions.go   liveClaudeSessions        transcript.go (slim; delegates to h)
  transcript.go ~270L jsonl decoder
                                        internal/harness/      (new)
                                          harness.go    (Name, LaunchOpts,
                                                         14-method Harness iface)
                                          claude/       (new)
                                            claude.go
                                            transcript.go (jsonl decoder relocated)
                                            *_test.go

Vibecode dial

Vibecode dial:
                       ●
   tight ──────────────┴────────── vibecoded
                       ↑
       (mostly tight refactor; modest helper sprawl + a couple of
        unowned coercion paths nudge it one notch right)

Receipts:

Files changed:          36   (all in scope — harness extraction touches
                              every spawn backend by design)
Lines added (code):     ~1500
Lines added (test):     ~656 (harness_test.go 251 + claude tests 405)
Lines added (comments): heavy — design-rationale doc blocks throughout
New abstractions:       1   (Harness interface, 14 methods)
New helper funcs:       6   (harnessForTask, harnessForSpawn, ambientHarness,
                             defaultHarness, harnessByName, liveSessionsForTasks)
Adjacent refactors:     0
Stale-comment edits:    0   (signature-driven; necessary)
Drive-by changes:       0

What kept it from full-tight: 6 dispatch helpers in internal/app/harness.go where 2–3 would suffice; duplicated encodeCwd / EncodeCwd; one --here comment that asserts more invariant than the code actually upholds; harnessByName's silent unknown-value coercion has a write-back path that can erase non-NULL non-claude values on --fresh bootstrap.

[should-fix] - do.go duplicate-process warning and live-session error no longer hardcode "claude"; format string uses h.Binary() so a future codex/gemini duplicate-tab warning reads correctly. - Bootstrap UPDATE in cmdDo no longer naively overwrites the harness column. The harness column is "set once on first bind" (per the doc comment in flowdb/db.go); a COALESCE clause now guards the write so a pre-existing non-NULL pin survives. `flow do --here --force` remains the explicit lane for harness switches and writes the column unconditionally there. Adds a regression test (TestCmdDoBootstrapPreservesExistingHarnessPin) that pre-pins a task to "codex" and verifies cmdDo doesn't silently coerce it back to "claude". - harnessByName doc-comment now flags the read-side coercion hazard for unknown non-empty values so future maintainers see the constraint when the registry grows. - RenderTranscript's first parameter renamed `workDir` → `cwd` in both the interface signature (internal/harness/harness.go) and the claude impl. Matches the doc comment's wording and removes the false invariant that the value is always task.work_dir (callers pass task.session_cwd when set). Error message at the file-not-found path no longer claims `session_cwd=%q` when the caller actually passed work_dir as a legacy fallback. [nit] - --here cwd-invariant comment in do.go softened: notes that a chained `cd /elsewhere && flow do --here task` in a single Bash call yields os.Getwd() != claude's startup cwd. Same blind spot the legacy work_dir lookup had; documented as a known limitation rather than asserted as an invariant. - Dropped the duplicate `encodeCwd` in internal/harness/claude/. EncodeCwd is the single function; internal callers and the test now both use it. - TestHarnessForSpawnPrefersAmbientForUnpinned renamed to TestHarnessForSpawn_AllPathsLandOnClaudeToday; doc comment now acknowledges the test can't prove precedence (ambient vs fallback are indistinguishable when both are claude). Real precedence coverage waits on a second registered adapter. - Trimmed the stray blank line inside skill.go's import block. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

rr0hit · 2026-05-20T12:49:56Z

Thanks for the thorough pass @anshulsao — all 7 items addressed in 835a308. Per-comment replies above; net of the changes:

[should-fix]

do.go duplicate-process warning + live-session error now format with h.Binary() — won't say "claude processes" when a codex/gemini duplicate tab triggers it.
Bootstrap UPDATE now preserves a pre-existing harness pin via a CASE WHEN harness IS NULL OR harness = '' THEN ? ELSE harness END clause; the "set once on first bind" doc invariant is now upheld in code. flow do --here --force remains the explicit lane to switch harnesses. New regression test pre-pins to "codex" and verifies cmdDo doesn't silently overwrite.
RenderTranscript first param renamed workDir → cwd in interface + impl. Error message at file-not-found drops the misleading session_cwd=%q claim.

[nit]

--here cwd-invariant comment hedged for the chained-cd edge case.
Dropped the duplicate encodeCwd (kept EncodeCwd).
Test renamed to TestHarnessForSpawn_AllPathsLandOnClaudeToday with doc comment acknowledging precedence is uncoverable today.
Stray blank line in skill.go imports trimmed.

Deferred: tightening harnessByName to return (Harness, error) for unknown non-empty values (Anshul's option (a) on item 2) — meaningful only once the registry grows past claude. The COALESCE write-guard removes the actual hazard for now and a comment in harnessByName flags the read-side coercion for whoever wires codex.

Ready for another look when you have a minute.

The remaining deferred item from Anshul's review. harnessByName now returns (Harness, error) so callers see unsupported pins explicitly instead of silently coercing to claude. Behavior: empty/NULL → (claude, nil) # back-compat known → (adapter, nil) unknown → (nil, error) # was: (claude, nil) Propagated through harnessForTask and harnessForSpawn. Each caller handles the error per its context: - cmdDo: errors out with the message + registered-names list. The pin survives (no UPDATE runs), so a future build that registers the pinned harness can still pick the task up. - cmdDone: warns and skips the close-out sweep but keeps the status flip — the flip is the contract, the sweep is best-effort. - cmdTranscript: errors out. - show.go [live] marker: silently skips; the task still renders. - list.go liveSessionsForTasks: silently skips; the task still renders without a [live] marker. The previous "preserve the pin via COALESCE" guard (from earlier in this PR) becomes belt-and-braces: the bootstrap UPDATE still only writes harness when NULL, but cmdDo can no longer reach that UPDATE with a coerced wrong-harness value because harnessForSpawn errors out first. Test updates: - TestHarnessForTask: "unknown name → claude fallback" case becomes "unknown name → error". - TestCmdDoBootstrapPreservesExistingHarnessPin → renamed TestCmdDoRefusesUnsupportedHarnessPin. Asserts both the explicit refusal AND the pin's survival (no session_id written either), so the future-binary upgrade path is exercised. - TestHarnessForSpawn_AllPathsLandOnClaudeToday: wrapped its resolution in a t.Helper() that fails on unexpected error. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

anshulsao

All 7 asks addressed. CI rollup green (vet/test/build), go test ./... green locally. Approving.

Verification per comment:

#	Tag	Ask	Fix
1	should-fix	`do.go:177` hardcoded "claude processes"	both stderr templates now interpolate `h.Binary()` (live-duplicate warning + running-elsewhere error)
2	should-fix	`harnessByName` silent coercion	`harnessByName` → `(harness, error)`; empty/NULL → claude+nil for back-compat, unknown non-empty → nil+error listing registered names. Plus a belt-and-braces `CASE WHEN harness IS NULL OR harness = '' THEN ? ELSE harness END` on the bootstrap UPDATE so even a same-name pin doesn't get re-written. `TestCmdDoRefusesUnsupportedHarnessPin` pins the future-pin → refuse-and-preserve contract
3	should-fix	`workDir` param lies	interface + impl renamed `workDir`→`cwd`; error message switched to `cwd=%q` so it no longer claims a `session_cwd` source the caller may not have used
4	nit	`--here` cwd-invariant comment	rewritten to acknowledge the chained-`cd` blind spot explicitly
5	nit	duplicate `encodeCwd`/`EncodeCwd`	unexported version dropped; doc on the exported form picked up the merge
6	nit	test name oversells	renamed `TestHarnessForSpawn_AllPathsLandOnClaudeToday`; docstring now spells out what is and isn't testable today
7	nit	stray blank line in `skill.go` imports	removed

Two notes I like beyond the asks:

The COALESCE-style guard on the bootstrap UPDATE is a tighter answer than what I suggested. I asked for one of (a) error path or (b) preserve-pin; you did both, which means even if a future code path resolved an unknown pin to claude, the column wouldn't get overwritten silently anymore. Belt-and-braces is the right call for a "set once on first bind" invariant.
done.go's warn-and-skip on harnessForTask error keeps the status flip intact when the close-out sweep can't run. That's the right tradeoff — the flip is the user-facing contract; the sweep is best-effort plumbing.

Vibecode dial — re-review

Vibecode dial:
                ●
   tight ───────┴────────────── vibecoded
                ↑
       (tightened from near-tight to tight after fixes —
        no scope creep, fixes match asks, one extra test
        per concern, no drive-bys)

Receipts on the fix delta (bdfe9c8..386511a):

Files changed:          11   (all directly traceable to one of the asks)
Lines added (code):     ~150
Lines added (test):     ~70  (one regression test per should-fix:
                              future-pin refusal; existing tests
                              updated to thread the new error return)
Lines added (comments): heavy — but all of it is doc on the new
                        error semantics (where unknown values
                        error vs. where they're tolerated)
New abstractions:       0
New helper funcs:       1   (registeredHarnessNames — used in the
                              error message; reasonable)
Adjacent refactors:     0
Stale-comment edits:    1   (the encodeCwd→EncodeCwd doc merge)
Drive-by changes:       0

What kept it from full-tight: a single new helper (registeredHarnessNames) was added to format the error message. Could have been inlined into the fmt.Errorf, but it's small and named so I'd leave it. Otherwise nothing to nit — fixes are surgical, comments are load-bearing, no whilst-we're-here cleanups landed.

…59) flow do --here used to silently bind a claude session whose jsonl was written under a cwd different from task.work_dir. Future `flow do <slug>` resumes spawned at work_dir; claude --resume looked for the jsonl under work_dir's encoded path, didn't find it, and either errored (-p mode) or silently started a fresh conversation (interactive). That's GH #59. Design: - Reverted the session_cwd column from bdfe9c8. The column was the wrong shape — it patched the symptom (record where the jsonl is) rather than the cause (work_dir and session_id allowed to disagree). Since the column was unreleased, no migration baggage. - New invariant: any task with session_id non-null has work_dir equal to the cwd that session was created at. Resume always spawns at work_dir; the invariant guarantees the jsonl is there. - Enforcement via new harness method ValidateSession(workDir, sessionID). Claude impl stats ~/.claude/projects/<encode(workDir)>/<sid>.jsonl. Gemini will share the layout; codex (sid-only) will no-op. - Gates: flow do --here refuses on validation failure; flow update task --work-dir refuses if a session is bound AND the new work_dir doesn't satisfy the invariant. The latter is the supported "fix a broken row" path — point work_dir at where the jsonl actually lives. Honest check, not naive: comparing os.Getwd() to work_dir would be fooled by `cd <work_dir> && flow do --here task` from inside a claude Bash invocation — the subprocess cwd has nothing to do with where the session jsonl lives. Statting the on-disk path is the only check that's robust to that. Run-playbook --here: the inserted run-task now takes work_dir = os.Getwd() instead of pb.WorkDir, since the run is by definition about the binding session's actual cwd. Skill update: §4.16 explicitly documents the new safety property, adds a Cwd-mismatch sub-recipe (Open in new tab / Point work_dir at this session's directory), and adds anti-patterns warning against the cd-cheat and against --force as a bypass. Tests: - TestCmdDoHereRefusesWhenTranscriptMissing exercises the cheat-resistance directly (seed at cwd, stub stat as missing, still refuses). - claude.StatFn is the indirection point so app-level tests stub the on-disk check without materializing fake jsonls. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

rr0hit · 2026-05-21T09:40:59Z

New commit eda3a3c lands the GH #59 fix on this branch. Worth a second look since it reshapes a chunk of what bdfe9c8 introduced.

The shift in approach: bdfe9c8 introduced session_cwd to record where each session was actually started. That was a symptom-side fix — it let flow transcript find the jsonl, but flow do resume still consulted task.work_dir alone (the open half of #59). This commit reverts the session_cwd column entirely (it was unreleased, no migration baggage) and instead enforces the invariant *any task with session_id IS NOT NULL has work_dir == cwd of session at creation*. Resume always spawns at work_dir`; the invariant guarantees the jsonl is there.

Enforcement is per-harness: new Harness.ValidateSession(workDir, sessionID) error method. Claude's impl stats ~/.claude/projects/<encode(workDir)>/<sid>.jsonl. Gemini will share that shape; codex (sid-only sessions, prompts the user about cwd at resume) will no-op. This is the right home for the policy because each harness's transcript layout decides whether (cwd, sid) is a meaningful key.

Why stat, not os.Getwd(): comparing the subprocess's cwd to work_dir is fooled by cd <work_dir> && flow do --here task from inside a claude Bash invocation — the chained cd only affects flow's subprocess, not where the actual session jsonl was written. The stat catches that.

Gates:

flow do --here refuses on validation failure. --force does NOT override.
flow update task --work-dir refuses if a session is bound AND the new path doesn't satisfy ValidateSession. This is the supported "fix a broken row" path — point work_dir at where the jsonl actually lives.

flow run playbook --here: the inserted run-task now takes work_dir = os.Getwd() rather than the playbook's default, since the run is by definition about the binding session's cwd.

Skill update: §4.16 documents the new safety property and adds a cwd-mismatch sub-recipe (Open in new tab / Point work_dir at this session's directory). Anti-patterns explicitly warn against the cd-cheat and against --force as a bypass.

Existing rows that already violate the invariant (your audit found ≥6) error cleanly on next use; the flow update task --work-dir path is how they get fixed. No backfill subcommand — would have hardcoded claude paths, premature with the multi-harness future.

A cleanup commit follows to drop the orphan session_cwd column from any DB that ran bdfe9c8-era binaries during development.

bdfe9c8 introduced tasks.session_cwd to record where each harness session was started; eda3a3c reverted the column from the schema in favor of the cwd==work_dir invariant + Harness.ValidateSession check. Fresh DBs no longer have the column, but DBs that ran intermediate `harness-pluggable` builds during PR-58 development still carry it as an orphan. Add an idempotent ALTER TABLE DROP COLUMN to the migration block so those DBs converge to the clean shape on next open. Test exercises the happy path: ADD COLUMN to simulate a stale DB, re-open, confirm it's gone. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

This reverts commit 3cd894c.

rr0hit and others added 6 commits May 20, 2026 15:56

anshulsao requested changes May 20, 2026

View reviewed changes

anshulsao previously approved these changes May 21, 2026

View reviewed changes

rr0hit dismissed anshulsao’s stale review via eda3a3c May 21, 2026 09:40

rr0hit mentioned this pull request May 21, 2026

bug(do): resume loses prior context when bound session's cwd != task.work_dir #59

Open

rr0hit and others added 2 commits May 21, 2026 15:13

Revert "chore(flowdb): drop orphan session_cwd column on migration"

4b135f0

This reverts commit 3cd894c.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pluggable agent harnesses: claude behind interface, codex/gemini ready#58

Pluggable agent harnesses: claude behind interface, codex/gemini ready#58
rr0hit wants to merge 11 commits into
mainfrom
harness-pluggable

rr0hit commented May 20, 2026

Uh oh!

anshulsao left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

rr0hit commented May 20, 2026

Uh oh!

anshulsao left a comment

Uh oh!

rr0hit commented May 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

rr0hit commented May 20, 2026

Summary

What this unlocks

Design choices worth flagging

Commits (chronological)

Test plan

Uh oh!

anshulsao left a comment

Choose a reason for hiding this comment

Backend flow diagrams

flow do <task> — harness pick on bootstrap

flow do --here <task> — new mismatch gate

flow transcript <task> — data-flow rewire

Schema delta — tasks row

Package layout

Vibecode dial

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

rr0hit commented May 20, 2026

Uh oh!

anshulsao left a comment

Choose a reason for hiding this comment

Vibecode dial — re-review

Uh oh!

rr0hit commented May 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

`flow do <task>` — harness pick on bootstrap

`flow do --here <task>` — new mismatch gate

`flow transcript <task>` — data-flow rewire

Schema delta — `tasks` row