Skip to content

Commit 71a6770

Browse files
Merge remote-tracking branch 'upstream/main' into dev
2 parents dbc13a9 + 883cef1 commit 71a6770

3 files changed

Lines changed: 254 additions & 9 deletions

File tree

ROADMAP.md

Lines changed: 157 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -830,6 +830,25 @@ Acceptance:
830830
- channel status updates stay short and machine-grounded
831831
- claws stop inferring state from raw build spam
832832

833+
### 137. Model-alias shorthand regression in test suite — bare alias parsing broken on `feat/134-135-session-identity` branch
834+
835+
**Filed:** 2026-04-21 from dogfood cycle — `cargo test --workspace` on `feat/134-135-session-identity` HEAD (`91ba54d`) shows 3 failing tests.
836+
837+
**Problem:** `tests::parses_bare_prompt_and_json_output_flag`, `tests::multi_word_prompt_still_uses_shorthand_prompt_mode`, and `tests::env_permission_mode_overrides_project_config_default` all panic with:
838+
```
839+
args should parse: "invalid model syntax: 'claude-opus'. Expected provider/model (e.g., anthropic/claude-opus-4-6) or known alias (opus, sonnet, haiku)"
840+
```
841+
The #134/#135 session-identity work tightened model-syntax validation but the test fixtures still pass bare `claude-opus` style strings that the new validator rejects. 162 tests pass; only the three tests using legacy bare-alias model names fail.
842+
843+
**Fix shape:**
844+
- Update the three failing test fixtures to use either a valid alias (`opus`, `sonnet`, `haiku`) or a fully-qualified model id (`anthropic/claude-opus-4-6`)
845+
- Alternatively, if `claude-opus` is an intended supported alias, add it to the alias registry
846+
- Verify `cargo test --workspace` returns 0 failures before merging the feat branch to `main`
847+
848+
**Acceptance:**
849+
- `cargo test --workspace` passes with 0 failures on the `feat/134-135-session-identity` branch
850+
- No regression on the 162 tests currently passing
851+
833852
### 133. Blocked-state subphase contract (was §6.5)
834853
**Filed:** 2026-04-20 from dogfood cycle — previous cycle identified §4.44.5 provenance gap, this cycle targets §6.5 implementation.
835854

@@ -5014,3 +5033,141 @@ ear], /color [scheme], /effort [low|medium|high], /fast, /summary, /tag [label],
50145033
**Blocker.** None. Reuses existing `stale_base` module; no new logic needed, just a missing call site.
50155034

50165035
**Source.** Jobdori dogfood 2026-04-20 against `/tmp/jobdori-129-mcp-cred-order` + `/tmp/stale-branch` in response to 10-min cron cycle. Confirmed: `claw doctor` on branch 5 commits behind main says "Status: ok" but `prompt` dispatch would warn "worktree HEAD does not match expected base commit." Gap is a missing invocation of the already-correct `run_stale_base_preflight()` in the `doctor` action handler. Joins **Boot preflight / doctor contract (#80–#83, #114)** family — doctor is the single machine-readable preflight surface; missing checks degrade operator trust. Also relates to **Silent-state inventory** cluster (#102/#127/#129/#245) because stale-base is a runtime truth ("my branch is behind main") that the preflight surface (doctor) does not expose.
5036+
5037+
## Pinpoint #135. `claw status --json` missing `active_session` boolean and `session.id` cross-reference — two surfaces that should be unified are inconsistent
5038+
5039+
**Gap.** `claw status --json` exposes a snapshot of the runtime state but does not include (1) a stable `session.id` field (filed as #134 — the fix from the other side is to emit it in lane events; the consumer side needs it queryable via `status` too) and (2) an `active_session: bool` that tells an orchestrator whether the runtime currently has a live session in flight. An external orchestrator (Clawhip, remote agent) running `claw status --json` after sending a prompt has no machine-readable way to confirm whether the session is alive, idle, or stalled without parsing log output.
5040+
5041+
**Trace path.**
5042+
- `claw status --json` (dispatcher in `main.rs` `CliAction::Status`) renders a `StatusReport` struct that includes `git_state`, `config`, `model`, `provider` — but no `session_id` or `active_session` fields.
5043+
- `claw status` (text mode) also omits both.
5044+
- The `session.id` fix from #134 introduces a UUID at session init; it should be threaded through to `StatusReport` so the round-trip is complete: emit on startup event → queryable via `status --json` → correlatable in lane events.
5045+
5046+
**Fix shape (~30 lines).**
5047+
1. Add `session_id: Option<String>` and `active_session: bool` to `StatusReport` struct. Both `null`/`false` when no session is active. When a session is running, `session_id` is the same UUID emitted in the startup lane event (#134).
5048+
2. Thread the session state into the `status` handler via a shared `Arc<Mutex<SessionState>>` or equivalent (same mechanism #134 uses for startup event emission).
5049+
3. Text-mode `claw status` surfaces the value: `Session: active (id: abc123)` or `Session: idle`.
5050+
4. Regression tests: (a) `claw status --json` before any prompt → `active_session: false, session_id: null`. (b) `claw status --json` during a prompt session → `active_session: true, session_id: <uuid>`. (c) UUID matches the `session.id` in the first lane event of the same run.
5051+
5052+
**Acceptance.** An orchestrator can poll `claw status --json` and determine: is there a live session? What is its correlation ID? Does it match the ID from the last startup event? This closes the round-trip opened by #134.
5053+
5054+
**Blocker.** Depends on #134 (session.id generation at init). Can be filed and implemented together.
5055+
5056+
**Source.** Jobdori dogfood 2026-04-21 06:53 KST on main HEAD `2c42f8b` during recurring cron cycle. Direct sibling of #134 — #134 covers the event-emission side, #135 covers the query side. Joins **Session identity completeness** (§4.7) and **status surface completeness** cluster (#80/#83/#114/#122). Natural bundle: **#134 + #135** closes the full session-identity round-trip. Session tally: ROADMAP #135.
5057+
5058+
## Pinpoint #134. No run/correlation ID at session boundary — every observer must infer session identity from timing or prompt content
5059+
5060+
**Gap.** When a `claw` session starts, no stable correlation ID is emitted in the first structured event (or any event). Every observer — lane event consumer, log aggregator, Clawhip router, test harness — has to infer session identity from timing proximity or prompt content. If two sessions start in close succession there is no unambiguous way to attribute subsequent events to the correct session. `claw status --json` returns session metadata but does not expose an opaque stable ID that could be used as a correlation key across the event stream.
5061+
5062+
**Fix shape.**
5063+
- Emit `session.id` (opaque, stable, scoped to this boot) in the first structured event at startup
5064+
- Include same ID in all subsequent lane events as `session_id` field
5065+
- Expose via `claw status --json` so callers can retrieve the active session's ID from outside
5066+
- Add regression: golden-fixture asserting `session.id` is present in startup event and value matches across a multi-event trace
5067+
5068+
**Acceptance.** Any observer can correlate all events from a session using `session_id` without parsing prompt content or relying on timestamp proximity. `claw status --json` exposes the current session's ID.
5069+
5070+
**Blocker.** None. Requires a UUID/nanoid generated at session init and threaded through the event emitter.
5071+
5072+
**Source.** Jobdori dogfood 2026-04-21 01:54 KST on main HEAD `50e3fa3` during recurring cron cycle. Joins **Session identity completeness at creation time** (ROADMAP §4.7) — §4.7 covers identity fields at creation time; #134 covers the stable correlation handle that ties those fields to downstream events. Joins **Event provenance / environment labeling** (§4.6) — provenance requires a stable anchor; without `session.id` the provenance chain is broken at the root. Natural bundle with **#241** (no startup run/correlation id, filed by gaebal-gajae 2026-04-20) — #241 approached from the startup cluster; #134 approaches from the event-stream observer side. Same root fix closes both. Session tally: ROADMAP #134.
5073+
5074+
## Pinpoint #136. `--compact` flag output is not machine-readable — compact turn emits plain text instead of JSON when `--output-format json` is also passed
5075+
5076+
**Gap.** `claw --compact <prompt>` runs a prompt turn with compacted output (tool-use suppressed, final assistant text only). But `run_with_output()` routes on `(output_format, compact)` with an explicit early-return match: `CliOutputFormat::Text if compact => run_prompt_compact(input)`. The `CliOutputFormat::Json` branch is never reached when `--compact` is set. Result: passing `--compact --output-format json` silently produces plain-text output — the compact flag wins and the format flag is silently ignored. No warning or error is emitted.
5077+
5078+
**Trace path.**
5079+
- `rust/crates/rusty-claude-cli/src/main.rs:3872-3879` — `run_with_output()` match:
5080+
```
5081+
CliOutputFormat::Text if compact => self.run_prompt_compact(input),
5082+
CliOutputFormat::Text => self.run_turn(input),
5083+
CliOutputFormat::Json => self.run_prompt_json(input),
5084+
```
5085+
The `Json` arm is unreachable when `compact = true` because the first arm matches first regardless of `output_format`.
5086+
- `run_prompt_compact()` at line 3879 calls `println!("{final_text}")` — always plain text, no JSON envelope.
5087+
- `run_prompt_json()` at line 3891 wraps output in a JSON object with `message`, `model`, `iterations`, `usage`, `tool_uses`, `tool_results`, etc.
5088+
5089+
**Fix shape (~20 lines).**
5090+
1. Add a `CliOutputFormat::Json if compact` arm (or merge compact flag into `run_prompt_json` as a parameter) that produces a JSON object with `message: <final_text>` and a `compact: true` marker. Tool-use fields remain present but empty arrays (consistent with compact semantics — tools ran but are not returned verbatim).
5091+
2. Emit a warning or `error.kind: "flag_conflict"` if conflicting flags are passed in a way that silently wins (or document the precedence explicitly in `--help`).
5092+
3. Regression tests: `claw --compact --output-format json <prompt>` must produce valid JSON with at minimum `{message: "...", compact: true}`.
5093+
5094+
**Acceptance.** An orchestrator that requests compact output for token efficiency AND machine-readable JSON gets both. Silent flag override is never a correct behavior for a tool targeting machine consumers.
5095+
5096+
**Blocker.** None. Additive change to existing match arms.
5097+
5098+
**Source.** Jobdori dogfood 2026-04-21 12:25 KST on main HEAD `8b52e77` during recurring cron cycle. Joins **Output format completeness** cluster (#90/#91/#92/#127/#130) — all surfaces that produce inconsistent or plain-text fallbacks when JSON is requested. Also joins **CLI/REPL parity** (§7.1) — compact is available as both `--compact` flag and `/compact` REPL command; JSON output gap affects only the flag path. Session tally: ROADMAP #136.
5099+
5100+
## Pinpoint #138. Dogfood cycle report-gate opacity — nudge surface collapses "bundle converged", "follow-up landed", and "pre-existing flake only" into single closure shape
5101+
5102+
**Gap.** When a dogfood nudge triggers on a branch with landed work, the report surface emits status like "fixed 3 tests, pushed branch, 1 unrelated red remains" — but downstream nudges cannot distinguish:
5103+
1. `bundle converged, merge-ready` (e.g., #134/#135 branch after fixes)
5104+
2. `follow-up landed on main, branch still valid` (e.g., #137 + #136 fixes after #134/#135 was ready)
5105+
3. `only pre-existing flake remains, no new regressions` (e.g., `resume_latest...` test failure on main that also fails on feature branch)
5106+
4. `work still in flight, blocker not yet resolved`
5107+
5. `merged and closed, re-nudge is a dup`
5108+
5109+
Result: repeat nudges look identical whether the prior work converged or is still broken. Claws re-open what was already resolved, burning cycles on rediscovery.
5110+
5111+
**Concrete example from this session:**
5112+
- 14:30 nudge triggered on bundle already clear (14:25)
5113+
- Reported finding was "nudge closure-state opacity" but manifested as "should we re-nudge or not?"
5114+
- No explicit surface like "status: done", "last-updated: 2026-04-21T14:25", "next-action: none" that stops re-nudges on unchanged state
5115+
5116+
**Fix shape (~30-50 lines, surfaces not code).**
5117+
1. Dogfood report should carry an explicit **closure state** field: `converged`, `follow-up-landed`, `pre-existing-flake-only`, `in-flight`, `merged`, `dup`.
5118+
2. Each state has a **last-updated timestamp** (when report was filed) and **next-action** (null if converged, or describe blocker).
5119+
3. Nudge logic checks prior report state: if `converged` + timestamp < 10 min old, skip nudge and post "still converged as of HH:MM, no action".
5120+
4. If state changed (e.g., new commits landed), emit **state transition** explicitly: "bundle done (14:25) → follow-up landed (14:42)".
5121+
5. Store closure state in a **shared metadata surface** (Discord message edit, ROADMAP inline, or compact JSON file) so next cycle can read it.
5122+
5123+
**Acceptance.**
5124+
- Repeat nudges on converged work are replaced with "no change since last report" (skip).
5125+
- State transitions are explicit: "was X, now Y" instead of ambiguous "X and also Y".
5126+
- Claws can scan closure states and prioritize fresh work over already-handled bundles.
5127+
5128+
**Blocker.** Design question: **where should closure state live?** Options:
5129+
- Edit the prior Discord message with a closure tag (e.g., 🟢 CONVERGED).
5130+
- Add a `.dogfood-closure.json` file to the worktree branch that tracks state.
5131+
- File a new ROADMAP entry per bundle completion (meta-tracking).
5132+
- Embedded in claw-code CLI output (machine-readable, but creates coupling).
5133+
5134+
Current state is **design question unresolved**. Implementation is straightforward once closure-state model is settled.
5135+
5136+
**Source.** Jobdori dogfood 2026-04-21 14:25-14:47 KST — multi-cycle convergence pattern exposed by repeat nudges on #134/#135 bundle. Joins **Dogfood loop observability** (related to earlier §4.7 session-identity, but one level up — session-identity is plumbing, closure-state is the **reporting contract**). Also joins **False-green report gating** (from 14:05 finding) — this is the downstream effect: unclear reports beget re-nudges on stale work.
5137+
5138+
Session tally: ROADMAP #138.
5139+
5140+
### Evidence for #138 — feat/134-135-session-identity branch is pushed but no PR was opened (2026-04-21 15:05)
5141+
5142+
**Concrete gap observed:**
5143+
- Branch `feat/134-135-session-identity` pushed to `origin` at `7235260` (commits `f55612e`, `2b7095e`, `230d97a`, `7235260`)
5144+
- Dogfood loop declared bundle "merge-ready" at 14:25
5145+
- ~40 min elapsed; no PR opened, no merge, branch still unmerged
5146+
- Meanwhile #136 and #137 landed directly on main (`a8beca1`, `21adae9`) without going through the branch
5147+
5148+
**Direct verification of #135 on main:**
5149+
- `env -i $BIN status --output-format json` on main HEAD `768c1ab` shows `active_session: null, session_id: null`
5150+
- Fields exist in JSON schema (added by schema-only?) but values are None because the producer plumbing (`#134`) is not on main
5151+
- #135 consumer relies on #134 producer; both live on feat/134-135 only
5152+
5153+
**Impact:**
5154+
- `claw status --output-format json` on main returns JSON without the #135 session identity signals (because they're only on feat/134-135)
5155+
- Orchestrators that shipped using the 13:00 "round-trip proof" report believing #134+#135 was merge-ready will get null fields
5156+
- Evidence for #138: "closure-state" = "pushed branch" ≠ "merged" ≠ "in-PR" — nudge surface collapses all three
5157+
5158+
**Proposed closure-state transition:**
5159+
1. `pushed` — branch exists on origin but no PR (current state for feat/134-135)
5160+
2. `in-PR` — PR open, review pending
5161+
3. `approved` — PR approved, awaiting merge
5162+
4. `merged` — in main
5163+
5. `deployed` — if applicable
5164+
6. `abandoned` — PR closed without merge
5165+
5166+
Nudge surface should report explicit state + timestamp: `"feat/134-135 state=pushed (no PR) since 13:00; no closure action taken"` instead of ambiguous "merge-ready."
5167+
5168+
**Token/permission note:**
5169+
- `code-yeongyu` token has write access to push branches to `ultraworkers/claw-code` but lacks `createPullRequest` permission (GraphQL 404)
5170+
- Issues are disabled on the repo (can't open issue-based tracking)
5171+
- Means closure-state tracking must live inside the repo (ROADMAP) or in an external surface (Discord message edits, `.dogfood-closure.json`)
5172+
5173+
**Filed:** 2026-04-21 15:05 KST as evidence for #138 by Jobdori dogfood loop.

rust/crates/rusty-claude-cli/src/main.rs

Lines changed: 42 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -4170,6 +4170,7 @@ impl LiveCli {
41704170
compact: bool,
41714171
) -> Result<(), Box<dyn std::error::Error>> {
41724172
match output_format {
4173+
CliOutputFormat::Json if compact => self.run_prompt_compact_json(input),
41734174
CliOutputFormat::Text if compact => self.run_prompt_compact(input),
41744175
CliOutputFormat::Text => self.run_turn(input),
41754176
CliOutputFormat::Json => self.run_prompt_json(input),
@@ -4189,6 +4190,32 @@ impl LiveCli {
41894190
Ok(())
41904191
}
41914192

4193+
4194+
fn run_prompt_compact_json(&mut self, input: &str) -> Result<(), Box<dyn std::error::Error>> {
4195+
let (mut runtime, hook_abort_monitor) = self.prepare_turn_runtime(false)?;
4196+
let mut permission_prompter = CliPermissionPrompter::new(self.permission_mode);
4197+
let result = runtime.run_turn(input, Some(&mut permission_prompter));
4198+
hook_abort_monitor.stop();
4199+
let summary = result?;
4200+
self.replace_runtime(runtime)?;
4201+
self.persist_session()?;
4202+
println!(
4203+
"{}",
4204+
json!({
4205+
"message": final_assistant_text(&summary),
4206+
"compact": true,
4207+
"model": self.model,
4208+
"usage": {
4209+
"input_tokens": summary.usage.input_tokens,
4210+
"output_tokens": summary.usage.output_tokens,
4211+
"cache_creation_input_tokens": summary.usage.cache_creation_input_tokens,
4212+
"cache_read_input_tokens": summary.usage.cache_read_input_tokens,
4213+
},
4214+
})
4215+
);
4216+
Ok(())
4217+
}
4218+
41924219
fn run_prompt_json(&mut self, input: &str) -> Result<(), Box<dyn std::error::Error>> {
41934220
let (mut runtime, hook_abort_monitor) = self.prepare_turn_runtime(false)?;
41944221
let mut permission_prompter = CliPermissionPrompter::new(self.permission_mode);
@@ -9371,15 +9398,15 @@ mod tests {
93719398
let args = vec![
93729399
"--output-format=json".to_string(),
93739400
"--model".to_string(),
9374-
"claude-opus".to_string(),
9401+
"opus".to_string(),
93759402
"explain".to_string(),
93769403
"this".to_string(),
93779404
];
93789405
assert_eq!(
93799406
parse_args(&args).expect("args should parse"),
93809407
CliAction::Prompt {
93819408
prompt: "explain this".to_string(),
9382-
model: "claude-opus".to_string(),
9409+
model: "claude-opus-4-6".to_string(),
93839410
output_format: CliOutputFormat::Json,
93849411
allowed_tools: None,
93859412
permission_mode: PermissionMode::DangerFullAccess,
@@ -10149,15 +10176,21 @@ mod tests {
1014910176
fn multi_word_prompt_still_uses_shorthand_prompt_mode() {
1015010177
let _guard = env_lock();
1015110178
std::env::remove_var("RUSTY_CLAUDE_PERMISSION_MODE");
10152-
// Input is ["help", "me", "debug"] so the joined prompt shorthand
10153-
// must be "help me debug". A previous batch accidentally rewrote
10154-
// the expected string to "$help overview" (copy-paste slip).
10179+
// Input is ["--model", "opus", "please", "debug", "this"] so the joined
10180+
// prompt shorthand must stay a normal multi-word prompt while still
10181+
// honoring alias validation at parse time.
1015510182
assert_eq!(
10156-
parse_args(&["help".to_string(), "me".to_string(), "debug".to_string()])
10157-
.expect("prompt shorthand should still work"),
10183+
parse_args(&[
10184+
"--model".to_string(),
10185+
"opus".to_string(),
10186+
"please".to_string(),
10187+
"debug".to_string(),
10188+
"this".to_string(),
10189+
])
10190+
.expect("prompt shorthand should still work"),
1015810191
CliAction::Prompt {
10159-
prompt: "help me debug".to_string(),
10160-
model: DEFAULT_MODEL.to_string(),
10192+
prompt: "please debug this".to_string(),
10193+
model: "claude-opus-4-6".to_string(),
1016110194
output_format: CliOutputFormat::Text,
1016210195
allowed_tools: None,
1016310196
permission_mode: crate::default_permission_mode(),

0 commit comments

Comments
 (0)