Skip to content

Pluggable agent harnesses: claude behind interface, codex/gemini ready#58

Open
rr0hit wants to merge 11 commits into
mainfrom
harness-pluggable
Open

Pluggable agent harnesses: claude behind interface, codex/gemini ready#58
rr0hit wants to merge 11 commits into
mainfrom
harness-pluggable

Conversation

@rr0hit
Copy link
Copy Markdown
Contributor

@rr0hit rr0hit commented May 20, 2026

Summary

  • Introduces internal/harness package — a 14-method interface abstracting the agent CLI driver (Claude Code today; Codex/Gemini ready). Existing claude-specific code moves into internal/harness/claude/ behind claude.New().
  • Adds tasks.harness column for per-task adapter pinning. NULL = claude (back-compat for pre-column DBs). Set on first flow do / flow do --here from the ambient harness; --force is required to switch.
  • Adds tasks.session_cwd column — fixes a pre-existing transcript-not-found bug for --here-bound sessions where claude was started in a directory other than task.work_dir (commonly via --force binds). Session identity is now keyed on (cwd, session_id) to match claude's on-disk layout.
  • Ambient harness detection via env-var probe (CLAUDE_CODE_SESSION_ID, future CODEX_THREAD_ID, GEMINI_SESSION_ID) — flow do <new-task> from inside a codex shell would pick codex when that adapter lands; falls back to claude.
  • Schema changes are additive + nullable; no backfill; legacy rows behave exactly as before via NULL-fallback paths.

What this unlocks

Codex and Gemini adapters can drop in as one line each in allHarnesses(). Their per-CLI specifics (binary name, headless flag spelling, transcript location/format, hook config path) live entirely inside their own sub-packages. Flow's caller code never branches on harness name — it dispatches via the interface or via harnessForTask(task) / harnessForSpawn(task) / ambientHarness() helpers in internal/app/harness.go.

Design choices worth flagging

  • Pre-allocation strategy is per-harness. Claude's NewSessionID() mints a v4 UUID locally (passed via --session-id). Codex/Gemini will probe their CLI (codex exec / gemini -p) to mint and capture a session id from stdout before opening the user-facing tab. The interface contract is uniform — every impl returns a real id, callers have one spawn path.
  • No env-var injection. Flow never sets FLOW_TASK or any other env var on spawned harness processes. The harness's own session-id env var is the only correlator, and flow only reads it. Avoids polluting nested-session environments and keeps --here's discovery path symmetric with first-spawn binding.
  • Harness owns transcripts end-to-end. RenderTranscript(workDir, sessionID, compact, cutoff, w) lives on the interface; each harness handles both path resolution AND format decoding (claude jsonl, codex event log, gemini single-object json) and writes normalized text. The ~270 lines of claude jsonl decoders move into internal/harness/claude/transcript.go.
  • Cross-harness binding under --force. flow do --here --force on a task pinned to a different harness switches the pinning alongside the session_id rebind, with a warning that the prior transcript is orphaned from flow's view.

Commits (chronological)

  1. Extract abstraction; claude impl behind interface
  2. Close claude hardcoding leaks in spawner backends + app helpers (closing 7 leaks identified by an audit pass)
  3. Tighten interface — rename PrepareSpawnNewSessionID, HeadlessRunSkipPermissionsRun, drop HookEnvForSpawn, replace TranscriptPathRenderTranscript
  4. Per-task harness column + ambient detection
  5. session_cwd column — fix (cwd, session_id) identity for --here binds

Test plan

  • Schema migration is additive (verified by reviewing the ALTER statements; runs on first flow invocation after upgrade)
  • flow do <new-task> writes both harness and session_cwd columns; verify with sqlite3 ~/.flow/flow.db "SELECT harness, session_cwd FROM tasks WHERE slug='<slug>'"
  • flow do --here <task> from a directory ≠ work_dir writes session_cwd = os.Getwd(); subsequent flow transcript <task> finds the file
  • flow do --here --force <task> on a task pinned to a different harness emits the warning and switches the pinning
  • Pre-existing tasks (NULL harness / session_cwd) continue to render transcripts via the work_dir + claude fallbacks
  • flow done <task> still drives the headless close-out sweep against the task's pinned harness
  • make test passes locally (verified uncached on the branch)

🤖 Generated with Claude Code

rr0hit and others added 6 commits May 20, 2026 15:56
Introduces internal/harness with a Harness interface that abstracts the
agent CLI (Claude Code today; codex / gemini next). claude moves to
internal/harness/claude/ behind New(); the app package wires it via a
single defaultHarness() helper.

The interface accommodates both pre-allocating harnesses (claude mints
a UUID up front via --session-id) and self-allocating ones (codex/
gemini mint their own at startup) without per-harness branches in
flow's caller code:

  - PrepareSpawn() returns "" for self-allocating harnesses; callers
    branch on the empty string rather than on harness identity.
  - HookEnvForSpawn(slug) returns nil for claude (DB binding is
    already committed pre-spawn) and {FLOW_TASK: slug} for harnesses
    that complete the bind via SessionStart hook.

Symbols moved out of internal/app:
  - newUUID, EncodeCwdForClaude   → claude.NewUUID, claude.EncodeCwd
  - claudeRunner                  → claude.HeadlessRunner
  - psRunner, liveClaudeSessions  → claude.PSRunner, (LiveSessionIDs)
  - sessionUUIDRe                 → claude.ValidateSessionID
  - install/uninstallClaudeHook   → claude.{Install,Uninstall}*Hook
  - skill install/version paths   → claude.SkillInstallPath etc.

Behavior is preserved end to end — the byte-identity tests in
internal/harness/claude/claude_test.go pin LaunchCmd / ResumeCmd to
the exact strings the pre-refactor do.go produced. Full test suite
(go test ./...) passes.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…pers

A follow-up audit on the harness abstraction found 7 sites where
claude-specific knowledge was leaking outside internal/harness/claude/
and internal/app/harness.go. All fixed here:

- spawner.FocusSession now takes a `binary string` parameter; each
  backend (iterm, terminal, zellij, kitty) filters its process / pane
  list by that name instead of the hardcoded literal "claude". Caller
  in do.go passes h.Binary(). Internal helpers renamed
  ttyForClaudeSession → ttyForHarnessSession (and similar for the
  zellij / kitty pane lookups); claudeSessionRowRe →
  sessionUUIDRowRe.

- internal/app/helpers.go's currentSessionID() now reads
  defaultHarness().SessionIDEnvVar() instead of CLAUDE_CODE_SESSION_ID
  directly. Multi-harness probing will replace the single read later.

- Two flag-help strings in do.go and run.go that named "claude"
  explicitly are now harness-agnostic ("the spawned harness").

After this, claude is named in code only in two places: the impl
package internal/harness/claude/ and the single instantiation point
internal/app/harness.go. A re-audit confirmed clean.

Deferred (per the implementation plan, not addressed here):
- internal/app/hook.go (still emits claude-shaped hook JSON)
- internal/app/skill/SKILL.md (embedded skill content)
- User-facing error message strings mentioning "Claude session"

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…transcript end-to-end

Five interface-level changes:

1. PrepareSpawn → NewSessionID. Docstring now says every impl returns
   a real id; claude generates locally, codex/gemini will probe their
   CLI to mint and capture. Flow's caller has a single uniform spawn
   path — no empty-string branches, no deferred-bind logic.

2. HeadlessRun → SkipPermissionsRun. Seam HeadlessRunner →
   SkipPermissionsRunner. Inner runHeadless → runSkipPermissions.

3. LaunchOpts.SkipApprovals → SkipPermissions. Matches the method
   name; one word for the concept across the package.

4. HookEnvForSpawn dropped from the interface. Flow no longer injects
   any env vars into spawned harness processes; the only injection
   that survives is $FLOW_ROOT (flow's data root, not a binding
   correlator). Codex/Gemini deferred-bind will use the probe-and-
   mint approach in NewSessionID rather than a hook callback —
   keeping flow's env-var contract one-way (harness exports, flow
   reads, never the reverse).

5. TranscriptPath → RenderTranscript. The old API returned only a
   path string; callers in app/ then hardcoded claude's jsonl schema
   to parse it. That's the leak — transcript format differs per
   harness (claude jsonl with claude messages, codex jsonl with
   codex events, gemini single-object json) and the parser couldn't
   stay in app/. Each harness now owns end-to-end transcript handling
   and writes a normalized text stream. The ~270 lines of claude
   jsonl decoders move from internal/app/transcript.go to
   internal/harness/claude/transcript.go. Parser-level tests move to
   internal/harness/claude/transcript_test.go; cmdTranscript wiring
   tests stay in internal/app/. internal/app/transcript.go is now a
   thin dispatcher.

Behavior preserved end to end — go test ./... passes uncached.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Vestige from before the harness refactor — the skip-permissions flag
now propagates via LaunchOpts.SkipPermissions which each harness
translates to its own flag (claude: --dangerously-skip-permissions,
codex: --dangerously-bypass-…, gemini: --yolo). The local var was
appended to but never read.

Caught by the post-refactor audit pass.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Adds a tasks.harness column so flow remembers which agent CLI owns
each session's transcript / resume / close-out lifecycle. New tasks
adopt the ambient harness on first bind; pinned tasks resume in
their own harness regardless of where you run flow from.

Schema (additive, no backfill):
- ALTER TABLE tasks ADD COLUMN harness TEXT (nullable). NULL is the
  back-compat signal for claude — every row that existed before this
  column landed reads as claude in the app layer.

Bootstrap binding (cmdDo, fresh path):
- harnessForSpawn(task) picks the adapter: pinned task.harness wins;
  else the harness running THIS flow process (ambient probe via each
  adapter's SessionIDEnvVar); else claude default.
- The chosen name persists alongside session_id atomically — future
  invocations read the same adapter even from a different ambient
  context.

In-session bind (cmdDoHere):
- --here probes ambient explicitly; refuses if the user isn't inside
  any known harness.
- Cross-harness binding (task pinned to X, ambient is Y) is rejected
  without --force; with --force the harness column updates alongside
  session_id and a warning notes the orphaned prior transcript.
- The bind UPDATE always writes harness, so a previously-unpinned
  task gets pinned on first --here too.

Routing through task.harness:
- cmdDone close-out sweep, cmdTranscript renderer, and the per-task
  [live] markers (show.go, list.go) all now use harnessForTask(t)
  instead of defaultHarness(). list.go batches LiveSessionIDs to one
  call per unique harness across the result set.
- currentSessionID() probes every known harness's env var; previous
  hardcoded claude env-var read is gone.

Helper module (internal/app/harness.go):
- allHarnesses() — registry; codex/gemini drop in as one line each.
- harnessByName / harnessForTask / harnessForSpawn / ambientHarness /
  defaultHarness — covers the four context shapes (have task & it's
  pinned, have task & ambient matters, have neither, just want a
  default).

Tests cover ambient detection, harnessForTask fallback, bootstrap
persistence, --here persistence, cross-harness rejection without
--force, --force harness switch.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Claude's on-disk transcript path is keyed by the pair (cwd, session_id):
~/.claude/projects/<encode(cwd)>/<session_id>.jsonl. Pre-this-commit
flow assumed cwd == task.work_dir for that encode — true for fresh
`flow do` (we spawn the tab with cwd=work_dir) but BROKEN for
`flow do --here` when claude was started in a directory other than
work_dir. Most visible after `flow do --here --force` binds where the
user explicitly attaches a foreign session.

Reproducing case: task run-2-2 with work_dir=/tmp got --here-bound
from a claude session running in /Users/rohit. Transcript lookup hit
~/.claude/projects/-tmp/<sid>.jsonl (didn't exist); actual file was at
~/.claude/projects/-Users-rohit/<sid>.jsonl. Same shape for any
--here bind across directory boundaries.

Fix: record the cwd alongside session_id at bind time. New nullable
column tasks.session_cwd:

- fresh `flow do`: session_cwd = task.work_dir (equal by construction
  — we spawn the tab with cwd=work_dir).
- `flow do --here`: session_cwd = os.Getwd() (inherited from claude's
  startup cwd via the shell).
- legacy rows (NULL): fall back to task.work_dir for transcript
  lookups, matching pre-commit behavior.

Transcript lookup now passes session_cwd (with work_dir fallback) to
harness.RenderTranscript instead of work_dir.

Error messages on lookup failure now explain which cwd was assumed,
why the file might be missing, and how to recover (re-bind via
`flow do --here` to record the right cwd). The claude impl
distinguishes "file not found at expected path" from other open
errors and names the project dir in the message.

Tests cover the new column being written on both `flow do` bootstrap
and `flow do --here`.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@anshulsao anshulsao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Solid extraction overall — the interface seams are well-chosen, the schema migration is clean (additive + nullable, no backfill), and make test is green on the branch. The session_cwd fix is a real correctness win that the PR body undersells: it actually makes flow do --here + flow transcript work for the --force and "started claude elsewhere" cases, which I've hit personally.

Five non-blocking-ish items below, ordered by impact. Two are [should-fix] mostly because they'd bite the moment a second harness lands; the rest are nits.

Backend flow diagrams

flow do <task> — harness pick on bootstrap

BEFORE                                  AFTER
──────                                  ─────
cmdDo(args) → findTask → task           cmdDo(args) → findTask → task
        │                                       │
        ▼                                       ▼
(claude assumed everywhere)             [+ h = harnessForSpawn(task)]
        │                                  1. task.harness pinned → use it
        ▼                                  2. ambient env-var set → use it
liveClaudeSessions()                       3. else → claude
        │                                       │
        ▼                                       ▼
spawner.FocusSession(sid)               h.LiveSessionIDs() → map[id]count
        │                               spawner.FocusSession(sid, h.Binary())
        ▼                                       │
newUUID()                                       ▼
        │                               h.NewSessionID()
        ▼                                       │
UPDATE session_id, …                            ▼
        │                               UPDATE session_id,
        ▼                                      [+ session_cwd  = task.work_dir],
"claude --session-id <id> …" prompt            [+ harness      = h.Name()]
        │                                       │
        ▼                                       ▼
spawner.SpawnTab(...)                   command = h.LaunchCmd(sid, prompt, opts)
                                        spawner.SpawnTab(...)

flow do --here <task> — new mismatch gate

BEFORE                                  AFTER
──────                                  ─────
sid = $CLAUDE_CODE_SESSION_ID           [+ h = ambientHarness()]
v4 UUID regex check                        - probes every h.SessionIDEnvVar()
        │                                  - nil if none / multiple set
        ▼                                       │
findTask → task                                 ▼
        │                               sid = os.Getenv(h.SessionIDEnvVar())
        ▼                               [+ h.ValidateSessionID(sid)]
"session bound elsewhere?" check                │
        │                                       ▼
        ▼                               findTask → task
UPDATE session_id, status                       │
                                                ▼
                                        [+ if task.harness ≠ h.Name() and pinned:
                                             require --force; warn that prior
                                             transcript history orphans]
                                                │
                                                ▼
                                        "session bound elsewhere?" check
                                                │
                                                ▼
                                        UPDATE session_id,
                                              [+ session_cwd = os.Getwd()],
                                              [+ harness     = h.Name()],
                                              status

flow transcript <task> — data-flow rewire

BEFORE                                  AFTER
──────                                  ─────
sessionJSONLPath(task)                  [+ h = harnessForTask(task)]
  = ~/.claude/projects/                          │
    EncodeCwdForClaude(task.work_dir)/           ▼
    <session_id>.jsonl                  cwd = task.session_cwd
        │                                     ?? task.work_dir   ← legacy fallback
        ▼                                       │
renderTranscript(path, compact, cutoff)         ▼
  - jsonl decoder lives in              h.RenderTranscript(cwd, sid, compact,
    internal/app/transcript.go            cutoff, os.Stdout)
                                          - claude impl owns path resolution
                                            AND format decode; jsonl decoder
                                            moves to harness/claude/transcript.go
                                          - decoder targets io.Writer (was stdout)
                                          - error path: enhanced hint pointing
                                            users at `flow do --here` for legacy
                                            session_cwd=NULL rows

Schema delta — tasks row

BEFORE                                  AFTER
──────                                  ─────
tasks columns:                          tasks columns:
  ...                                     ...
  session_id           TEXT               session_id           TEXT
  session_started      TEXT               session_started      TEXT
  session_last_resumed TEXT               session_last_resumed TEXT
                                        [+ session_cwd        TEXT  ]
                                        [+ harness            TEXT  ]
                                          ALTER ADD COLUMN, nullable,
                                          no backfill. NULL = back-compat
                                          read paths (work_dir / claude).

Package layout

BEFORE                                  AFTER
──────                                  ─────
internal/app/                           internal/app/
  bootstrap.go  newUUID, EncodeCwd        harness.go   (dispatcher: 6 helpers)
  sessions.go   liveClaudeSessions        transcript.go (slim; delegates to h)
  transcript.go ~270L jsonl decoder
                                        internal/harness/      (new)
                                          harness.go    (Name, LaunchOpts,
                                                         14-method Harness iface)
                                          claude/       (new)
                                            claude.go
                                            transcript.go (jsonl decoder relocated)
                                            *_test.go

Vibecode dial

Vibecode dial:
                       ●
   tight ──────────────┴────────── vibecoded
                       ↑
       (mostly tight refactor; modest helper sprawl + a couple of
        unowned coercion paths nudge it one notch right)

Receipts:

Files changed:          36   (all in scope — harness extraction touches
                              every spawn backend by design)
Lines added (code):     ~1500
Lines added (test):     ~656 (harness_test.go 251 + claude tests 405)
Lines added (comments): heavy — design-rationale doc blocks throughout
New abstractions:       1   (Harness interface, 14 methods)
New helper funcs:       6   (harnessForTask, harnessForSpawn, ambientHarness,
                             defaultHarness, harnessByName, liveSessionsForTasks)
Adjacent refactors:     0
Stale-comment edits:    0   (signature-driven; necessary)
Drive-by changes:       0

What kept it from full-tight: 6 dispatch helpers in internal/app/harness.go where 2–3 would suffice; duplicated encodeCwd / EncodeCwd; one --here comment that asserts more invariant than the code actually upholds; harnessByName's silent unknown-value coercion has a write-back path that can erase non-NULL non-claude values on --fresh bootstrap.

Comment thread internal/app/do.go Outdated
Comment thread internal/app/harness.go Outdated
Comment thread internal/harness/claude/transcript.go Outdated
Comment thread internal/app/do.go Outdated
Comment thread internal/harness/claude/claude.go Outdated
Comment thread internal/app/harness_test.go Outdated
Comment thread internal/app/skill.go Outdated
[should-fix]
- do.go duplicate-process warning and live-session error no longer
  hardcode "claude"; format string uses h.Binary() so a future
  codex/gemini duplicate-tab warning reads correctly.
- Bootstrap UPDATE in cmdDo no longer naively overwrites the
  harness column. The harness column is "set once on first bind"
  (per the doc comment in flowdb/db.go); a COALESCE clause now
  guards the write so a pre-existing non-NULL pin survives.
  `flow do --here --force` remains the explicit lane for harness
  switches and writes the column unconditionally there. Adds a
  regression test (TestCmdDoBootstrapPreservesExistingHarnessPin)
  that pre-pins a task to "codex" and verifies cmdDo doesn't
  silently coerce it back to "claude".
- harnessByName doc-comment now flags the read-side coercion
  hazard for unknown non-empty values so future maintainers see
  the constraint when the registry grows.
- RenderTranscript's first parameter renamed `workDir` → `cwd` in
  both the interface signature (internal/harness/harness.go) and
  the claude impl. Matches the doc comment's wording and removes
  the false invariant that the value is always task.work_dir
  (callers pass task.session_cwd when set). Error message at the
  file-not-found path no longer claims `session_cwd=%q` when the
  caller actually passed work_dir as a legacy fallback.

[nit]
- --here cwd-invariant comment in do.go softened: notes that a
  chained `cd /elsewhere && flow do --here task` in a single Bash
  call yields os.Getwd() != claude's startup cwd. Same blind spot
  the legacy work_dir lookup had; documented as a known
  limitation rather than asserted as an invariant.
- Dropped the duplicate `encodeCwd` in internal/harness/claude/.
  EncodeCwd is the single function; internal callers and the
  test now both use it.
- TestHarnessForSpawnPrefersAmbientForUnpinned renamed to
  TestHarnessForSpawn_AllPathsLandOnClaudeToday; doc comment now
  acknowledges the test can't prove precedence (ambient vs
  fallback are indistinguishable when both are claude). Real
  precedence coverage waits on a second registered adapter.
- Trimmed the stray blank line inside skill.go's import block.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@rr0hit
Copy link
Copy Markdown
Contributor Author

rr0hit commented May 20, 2026

Thanks for the thorough pass @anshulsao — all 7 items addressed in 835a308. Per-comment replies above; net of the changes:

[should-fix]

  • do.go duplicate-process warning + live-session error now format with h.Binary() — won't say "claude processes" when a codex/gemini duplicate tab triggers it.
  • Bootstrap UPDATE now preserves a pre-existing harness pin via a CASE WHEN harness IS NULL OR harness = '' THEN ? ELSE harness END clause; the "set once on first bind" doc invariant is now upheld in code. flow do --here --force remains the explicit lane to switch harnesses. New regression test pre-pins to "codex" and verifies cmdDo doesn't silently overwrite.
  • RenderTranscript first param renamed workDircwd in interface + impl. Error message at file-not-found drops the misleading session_cwd=%q claim.

[nit]

  • --here cwd-invariant comment hedged for the chained-cd edge case.
  • Dropped the duplicate encodeCwd (kept EncodeCwd).
  • Test renamed to TestHarnessForSpawn_AllPathsLandOnClaudeToday with doc comment acknowledging precedence is uncoverable today.
  • Stray blank line in skill.go imports trimmed.

Deferred: tightening harnessByName to return (Harness, error) for unknown non-empty values (Anshul's option (a) on item 2) — meaningful only once the registry grows past claude. The COALESCE write-guard removes the actual hazard for now and a comment in harnessByName flags the read-side coercion for whoever wires codex.

Ready for another look when you have a minute.

The remaining deferred item from Anshul's review. harnessByName now
returns (Harness, error) so callers see unsupported pins explicitly
instead of silently coercing to claude.

Behavior:
  empty/NULL → (claude, nil)        # back-compat
  known      → (adapter, nil)
  unknown    → (nil, error)          # was: (claude, nil)

Propagated through harnessForTask and harnessForSpawn. Each caller
handles the error per its context:

- cmdDo: errors out with the message + registered-names list. The
  pin survives (no UPDATE runs), so a future build that registers
  the pinned harness can still pick the task up.
- cmdDone: warns and skips the close-out sweep but keeps the status
  flip — the flip is the contract, the sweep is best-effort.
- cmdTranscript: errors out.
- show.go [live] marker: silently skips; the task still renders.
- list.go liveSessionsForTasks: silently skips; the task still
  renders without a [live] marker.

The previous "preserve the pin via COALESCE" guard (from earlier in
this PR) becomes belt-and-braces: the bootstrap UPDATE still only
writes harness when NULL, but cmdDo can no longer reach that UPDATE
with a coerced wrong-harness value because harnessForSpawn errors
out first.

Test updates:
- TestHarnessForTask: "unknown name → claude fallback" case becomes
  "unknown name → error".
- TestCmdDoBootstrapPreservesExistingHarnessPin → renamed
  TestCmdDoRefusesUnsupportedHarnessPin. Asserts both the explicit
  refusal AND the pin's survival (no session_id written either),
  so the future-binary upgrade path is exercised.
- TestHarnessForSpawn_AllPathsLandOnClaudeToday: wrapped its
  resolution in a t.Helper() that fails on unexpected error.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
anshulsao
anshulsao previously approved these changes May 21, 2026
Copy link
Copy Markdown
Contributor

@anshulsao anshulsao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All 7 asks addressed. CI rollup green (vet/test/build), go test ./... green locally. Approving.

Verification per comment:

# Tag Ask Fix
1 should-fix do.go:177 hardcoded "claude processes" both stderr templates now interpolate h.Binary() (live-duplicate warning + running-elsewhere error)
2 should-fix harnessByName silent coercion harnessByName(harness, error); empty/NULL → claude+nil for back-compat, unknown non-empty → nil+error listing registered names. Plus a belt-and-braces CASE WHEN harness IS NULL OR harness = '' THEN ? ELSE harness END on the bootstrap UPDATE so even a same-name pin doesn't get re-written. TestCmdDoRefusesUnsupportedHarnessPin pins the future-pin → refuse-and-preserve contract
3 should-fix workDir param lies interface + impl renamed workDircwd; error message switched to cwd=%q so it no longer claims a session_cwd source the caller may not have used
4 nit --here cwd-invariant comment rewritten to acknowledge the chained-cd blind spot explicitly
5 nit duplicate encodeCwd/EncodeCwd unexported version dropped; doc on the exported form picked up the merge
6 nit test name oversells renamed TestHarnessForSpawn_AllPathsLandOnClaudeToday; docstring now spells out what is and isn't testable today
7 nit stray blank line in skill.go imports removed

Two notes I like beyond the asks:

  • The COALESCE-style guard on the bootstrap UPDATE is a tighter answer than what I suggested. I asked for one of (a) error path or (b) preserve-pin; you did both, which means even if a future code path resolved an unknown pin to claude, the column wouldn't get overwritten silently anymore. Belt-and-braces is the right call for a "set once on first bind" invariant.
  • done.go's warn-and-skip on harnessForTask error keeps the status flip intact when the close-out sweep can't run. That's the right tradeoff — the flip is the user-facing contract; the sweep is best-effort plumbing.

Vibecode dial — re-review

Vibecode dial:
                ●
   tight ───────┴────────────── vibecoded
                ↑
       (tightened from near-tight to tight after fixes —
        no scope creep, fixes match asks, one extra test
        per concern, no drive-bys)

Receipts on the fix delta (bdfe9c8..386511a):

Files changed:          11   (all directly traceable to one of the asks)
Lines added (code):     ~150
Lines added (test):     ~70  (one regression test per should-fix:
                              future-pin refusal; existing tests
                              updated to thread the new error return)
Lines added (comments): heavy — but all of it is doc on the new
                        error semantics (where unknown values
                        error vs. where they're tolerated)
New abstractions:       0
New helper funcs:       1   (registeredHarnessNames — used in the
                              error message; reasonable)
Adjacent refactors:     0
Stale-comment edits:    1   (the encodeCwd→EncodeCwd doc merge)
Drive-by changes:       0

What kept it from full-tight: a single new helper (registeredHarnessNames) was added to format the error message. Could have been inlined into the fmt.Errorf, but it's small and named so I'd leave it. Otherwise nothing to nit — fixes are surgical, comments are load-bearing, no whilst-we're-here cleanups landed.

…59)

flow do --here used to silently bind a claude session whose jsonl was
written under a cwd different from task.work_dir. Future `flow do <slug>`
resumes spawned at work_dir; claude --resume looked for the jsonl under
work_dir's encoded path, didn't find it, and either errored (-p mode) or
silently started a fresh conversation (interactive). That's GH #59.

Design:

- Reverted the session_cwd column from bdfe9c8. The column was the
  wrong shape — it patched the symptom (record where the jsonl is)
  rather than the cause (work_dir and session_id allowed to disagree).
  Since the column was unreleased, no migration baggage.
- New invariant: any task with session_id non-null has work_dir equal
  to the cwd that session was created at. Resume always spawns at
  work_dir; the invariant guarantees the jsonl is there.
- Enforcement via new harness method ValidateSession(workDir, sessionID).
  Claude impl stats ~/.claude/projects/<encode(workDir)>/<sid>.jsonl.
  Gemini will share the layout; codex (sid-only) will no-op.
- Gates: flow do --here refuses on validation failure; flow update task
  --work-dir refuses if a session is bound AND the new work_dir doesn't
  satisfy the invariant. The latter is the supported "fix a broken row"
  path — point work_dir at where the jsonl actually lives.

Honest check, not naive: comparing os.Getwd() to work_dir would be
fooled by `cd <work_dir> && flow do --here task` from inside a claude
Bash invocation — the subprocess cwd has nothing to do with where the
session jsonl lives. Statting the on-disk path is the only check
that's robust to that.

Run-playbook --here: the inserted run-task now takes work_dir =
os.Getwd() instead of pb.WorkDir, since the run is by definition
about the binding session's actual cwd.

Skill update: §4.16 explicitly documents the new safety property,
adds a Cwd-mismatch sub-recipe (Open in new tab / Point work_dir at
this session's directory), and adds anti-patterns warning against
the cd-cheat and against --force as a bypass.

Tests:
- TestCmdDoHereRefusesWhenTranscriptMissing exercises the
  cheat-resistance directly (seed at cwd, stub stat as missing,
  still refuses).
- claude.StatFn is the indirection point so app-level tests stub
  the on-disk check without materializing fake jsonls.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@rr0hit
Copy link
Copy Markdown
Contributor Author

rr0hit commented May 21, 2026

New commit eda3a3c lands the GH #59 fix on this branch. Worth a second look since it reshapes a chunk of what bdfe9c8 introduced.

The shift in approach: bdfe9c8 introduced session_cwd to record where each session was actually started. That was a symptom-side fix — it let flow transcript find the jsonl, but flow do resume still consulted task.work_dir alone (the open half of #59). This commit reverts the session_cwd column entirely (it was unreleased, no migration baggage) and instead enforces the invariant *any task with session_id IS NOT NULL has work_dir == cwd of session at creation*. Resume always spawns at work_dir`; the invariant guarantees the jsonl is there.

Enforcement is per-harness: new Harness.ValidateSession(workDir, sessionID) error method. Claude's impl stats ~/.claude/projects/<encode(workDir)>/<sid>.jsonl. Gemini will share that shape; codex (sid-only sessions, prompts the user about cwd at resume) will no-op. This is the right home for the policy because each harness's transcript layout decides whether (cwd, sid) is a meaningful key.

Why stat, not os.Getwd(): comparing the subprocess's cwd to work_dir is fooled by cd <work_dir> && flow do --here task from inside a claude Bash invocation — the chained cd only affects flow's subprocess, not where the actual session jsonl was written. The stat catches that.

Gates:

  • flow do --here refuses on validation failure. --force does NOT override.
  • flow update task --work-dir refuses if a session is bound AND the new path doesn't satisfy ValidateSession. This is the supported "fix a broken row" path — point work_dir at where the jsonl actually lives.

flow run playbook --here: the inserted run-task now takes work_dir = os.Getwd() rather than the playbook's default, since the run is by definition about the binding session's cwd.

Skill update: §4.16 documents the new safety property and adds a cwd-mismatch sub-recipe (Open in new tab / Point work_dir at this session's directory). Anti-patterns explicitly warn against the cd-cheat and against --force as a bypass.

Existing rows that already violate the invariant (your audit found ≥6) error cleanly on next use; the flow update task --work-dir path is how they get fixed. No backfill subcommand — would have hardcoded claude paths, premature with the multi-harness future.

A cleanup commit follows to drop the orphan session_cwd column from any DB that ran bdfe9c8-era binaries during development.

rr0hit and others added 2 commits May 21, 2026 15:13
bdfe9c8 introduced tasks.session_cwd to record where each harness
session was started; eda3a3c reverted the column from the schema in
favor of the cwd==work_dir invariant + Harness.ValidateSession check.
Fresh DBs no longer have the column, but DBs that ran intermediate
`harness-pluggable` builds during PR-58 development still carry it
as an orphan.

Add an idempotent ALTER TABLE DROP COLUMN to the migration block so
those DBs converge to the clean shape on next open. Test exercises
the happy path: ADD COLUMN to simulate a stale DB, re-open, confirm
it's gone.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants