itwizardo
diff --git a/‎ROADMAP.md‎
Lines changed: 12 additions & 1 deletion b/‎ROADMAP.md‎
Lines changed: 12 additions & 1 deletion
@@ -6253,4 +6253,15 @@ Original filing (2026-04-18): the session emitted `SessionStart hook (completed)
 
 246. **Dogfood reminder cron can self-fail by timing out during active cycles, so the nudge loop itself is not trustworthy as an observability surface** — dogfooded 2026-04-21 in `#clawcode-building-in-public` after multiple consecutive alerts: `Cron job "clawcode-dogfood-cycle-reminder" failed: cron: job execution timed out` at 14:14, 14:24, 14:34, 14:44, 15:13, and 15:23 KST while the same dogfood cycle was actively producing reports and fixes. This is not just scheduler noise — it is a clawability gap in the reminder/control loop itself. A downstream claw seeing both repeated dogfood nudges and repeated cron timeouts cannot tell whether the reminder actually delivered, partially delivered, duplicated, or died after side effects. **Required fix shape:** (a) classify reminder execution outcome explicitly (`delivered`, `timed_out_after_send`, `timed_out_before_send`, `suppressed_as_duplicate`, `skipped_due_to_active_cycle`) instead of a single generic timeout; (b) attach the target message/report cycle id and whether a Discord post was already emitted before timeout; (c) add a fast-path/no-op path when the cycle state is unchanged or an active report is already in flight so the reminder job can exit cleanly instead of hanging; (d) add regression coverage proving repeated unchanged-state cycles do not stack timeouts or duplicate nudges. **Why this matters:** if the reminder loop itself is ambiguous, claws waste time responding to scheduler artifacts instead of real product state, and the dogfood surface stops being a reliable source of truth. Source: live clawhip/Jobdori dogfood cycle on 2026-04-21 with repeated timeout alerts in `#clawcode-building-in-public`.
 
-247. **MCP memory permission prompts can recur after a transport failure, leaving an active worker blocked in a second consent loop instead of a typed degraded state** — dogfooded 2026-04-27 from live session `clawcode-human` while responding to the claw-code dogfood nudge. The session first asked permission for `omx_memory.project_memory_read`; after approval, the call failed with `Transport closed`, then the runtime immediately attempted `omx_memory.notepad_read` and blocked again on a fresh allow prompt. From the outside this looks like an automation-hostile MCP lifecycle gap: the worker is neither cleanly ready nor cleanly failed, and downstream claws must scrape the pane to learn that memory MCP is both consent-gated and transport-degraded. **Required fix shape:** (a) after an MCP transport closes, emit a typed degraded state such as `mcp_transport_closed` with server/tool identity; (b) suppress or batch follow-up permission prompts for the same failed MCP server until transport recovery is proven; (c) expose whether the task can continue without that MCP tool or is blocked on memory; (d) add regression coverage for `permission granted -> transport closed -> follow-up tool attempt` so it becomes one structured blocker instead of repeated interactive consent loops. **Why this matters:** MCP memory should either be available, explicitly degraded, or explicitly blocked; repeated permission prompts after a closed transport make prompt delivery and readiness ambiguous. Source: live `clawcode-human` pane on 2026-04-27 04:3x UTC.
+247. **MCP memory permission prompts can recur after a transport failure, leaving an active worker blocked in a second consent loop instead of a typed degraded state** — dogfooded 2026-04-27 from live session `clawcode-human` while responding to the claw-code dogfood nudge. The session first asked permission for `omx_memory.project_memory_read`; after approval, the call failed with `Transport closed`, then the runtime immediately attempted `omx_memory.notepad_read` and blocked again on a fresh allow prompt. From the outside this looks like an automation-hostile MCP lifecycle gap: the worker is neither cleanly ready nor cleanly failed, and downstream claws must scrape the pane to learn that memory MCP is both consent-gated and transport-degraded. **Required fix shape:** (a) after an MCP transport closes, emit a typed degraded state such as `mcp_transport_closed` with server/tool identity; (b) suppress or batch follow-up permission prompts for the same failed MCP server until transport recovery is proven; (c) expose whether the task can continue without that MCP tool or is blocked on memory; (d) add regression coverage for `permission granted -> transport closed -> follow-up tool attempt` so it becomes one structured blocker instead of repeated interactive consent loops. **Why this matters:** MCP memory should either be available, explicitly degraded, or explicitly blocked; repeated permission prompts after a closed transport make prompt delivery and readiness ambiguous. Source: live `clawcode-human` pane on 2026-04-27 04:3x UTC. **Fresh-run follow-up 2026-04-29:** owner-requested live session `claw-code-issue-247-human-fresh-run` used the actual `./rust/target/debug/claw` binary; `doctor` and `status` were green, so the remaining Phase-0 fresh-run evidence moved from MCP consent-loop reproduction to the non-interactive prompt silent-hang captured separately as #248.
+
+248. **Non-interactive prompt mode can exceed caller timeouts with no in-band startup/API phase event or partial status artifact** — dogfooded 2026-04-29 from live tmux session `claw-code-issue-247-human-fresh-run` after the owner explicitly asked gaebal-gajae to make a fresh session and use `claw-code` directly. The actual `./rust/target/debug/claw` binary was launched via `clawhip tmux new` on current main. `claw doctor --output-format json` and `claw status --output-format json` both succeeded and reported auth/config/workspace ok, but minimal non-interactive prompt calls (`timeout 120 ./rust/target/debug/claw --output-format json --dangerously-skip-permissions "echo hello"` and `timeout 120 ./rust/target/debug/claw --output-format json prompt "Reply with just the word hello"`) both timed out from the outer harness after roughly 150s with only `Command exceeded timeout` visible. There was no machine-readable `api_request_started`, `waiting_for_first_token`, provider/model/base-url identity, retry count, or partial status file/event that would let clawhip distinguish slow provider, network stall, auth/OAuth drift, stream parser hang, or prompt-mode bug. **Required fix shape:** (a) emit structured non-interactive lifecycle events for `startup_ok`, `api_request_started`, `first_byte/first_token`, retry/backoff, and terminal `timeout_or_stall` states; (b) include provider/model/base URL source and auth source category without leaking secrets; (c) support a CLI/request timeout flag or env override that returns a typed JSON error before the outer orchestrator kills the process; (d) write/emit a final partial status artifact on timeout so lane monitors do not have to infer state from a dead process. **Why this matters:** non-interactive prompt mode is the automation path; if it can hang past the caller's timeout while doctor/status are green, claws lose the ability to tell whether startup, auth, transport, provider latency, or stream consumption failed. Source: live session `claw-code-issue-247-human-fresh-run` on 2026-04-29.
+
+249. **`/issue` advertises GitHub issue creation but never reaches a GitHub/OAuth/auth preflight or creation path, and the non-interactive error suggests unusable resume forms** — dogfooded 2026-04-29 on current main `8e22f757` while chasing the remaining Phase-0 GitHub OAuth blocker. The visible help advertises `/issue [context]` as “Draft or create a GitHub issue from the conversation,” but the actual implementation path only renders a local `Issue` report (`format_issue_report`) and does not invoke `gh`, GitHub API, OAuth, token discovery, browser auth, or even a dry-run/auth-preflight surface. Direct non-interactive use (`./rust/target/debug/claw '/issue dogfood test'`) returns `slash command /issue dogfood test is interactive-only` and suggests `claw --resume SESSION.jsonl /issue ...` / `claw --resume latest /issue ...` “when the command is marked [resume]”, while `/help` does not mark `/issue` as resume-safe and resume dispatch rejects interactive-only commands. That leaves operators with a GitHub-labeled command whose real behavior is neither issue creation nor a clear GitHub OAuth blocker. **Required fix shape:** (a) split the contract explicitly: either rename/copy to “draft issue text” or implement a real `create` path with GitHub auth preflight; (b) surface a machine-readable GitHub auth state (`gh_cli_authenticated`, `github_token_present`, `oauth_required`, `creation_unavailable`) before any issue-create attempt; (c) make the direct-mode error avoid suggesting resume forms for commands not marked resume-safe; (d) add regression coverage proving `/issue` help, direct-mode rejection, resume support flags, and creation/draft behavior agree. **Why this matters:** Phase-0 GitHub OAuth verification cannot complete if the only GitHub issue surface stops at local prose while still advertising creation. Claws need to know whether they are missing GitHub auth, using a draft-only helper, or hitting an unimplemented creation path. Source: gaebal-gajae dogfood cycle in `#clawcode-building-in-public` on 2026-04-29.
+322. **Config deprecation warnings are emitted to stderr even under `--output-format json`, making JSON output unparseable from combined stdout+stderr capture** — dogfooded 2026-04-29 by Jobdori on current main (`8e22f75`). Running `cargo run --bin claw -- doctor --output-format json 2>&1 | python3 -c "import sys,json; json.loads(sys.stdin.read())"` fails with `Expecting value: line 1 column 1 (char 0)` because a `warning: /path/settings.json: field "enabledPlugins" is deprecated. Use "plugins.enabled" instead` line is emitted to stderr before the JSON body begins. When a caller captures combined output (the common automation pattern: `2>&1`, subprocess `STDOUT | STDERR`, PTY capture, or tmux pane scrape) the warning prefix breaks JSON parse for every downstream consumer. Root cause: `rust/crates/runtime/src/config.rs` line ~300 calls `eprintln!("warning: {warning}")` unconditionally during `ClawSettings::load_merged()` regardless of active output format. **Required fix shape:** (a) thread the active `CliOutputFormat` through the config loading path and suppress or defer human-readable warning strings when `json` mode is active; (b) instead, collect deprecation diagnostics and inject them into the JSON output as a top-level `"warnings": [...]` array (same field already used by `doctor`); (c) ensure the JSON body is always the first bytes on stdout and all prose warnings stay on stderr or are suppressed in json mode; (d) add regression coverage proving `claw <any-cmd> --output-format json` stdout is valid JSON regardless of config deprecation state. **Why this matters:** `--output-format json` is the automation/claw contract; if config warnings can silently corrupt the JSON stream, every orchestration layer that captures combined output gets broken parse-on-warning with no stable fallback. Source: Jobdori live dogfood on mengmotaHost, claw-code main `8e22f75`, 2026-04-29.
+
+323. **`status --output-format json` reports `session.session = "live-repl"` while simultaneously reporting `session_lifecycle.kind = "saved_only"` — contradictory session identity in a single status snapshot** — dogfooded 2026-04-29 by Jobdori on current main (`804d96b`). Running `claw status --output-format json` from an active REPL-style invocation produced `"session": "live-repl"` in the `workspace` block and `"session_lifecycle": {"kind": "saved_only", "pane_id": null, ...}` in the same object. Those two fields carry contradictory claims: `"live-repl"` asserts there is an active interactive session, while `"saved_only"` asserts there is no live tmux pane hosting the session — the session exists only as a saved artifact. A downstream claw reading this snapshot cannot tell which claim to trust: is this a running session whose pane is undetectable, or a saved-only session that the `session` field is misclassifying? Root cause: `"live-repl"` is a fallback sentinel emitted by `main.rs:6070` when `context.session_path` is `None`, while `session_lifecycle` is computed independently by `classify_session_lifecycle_for()` from tmux pane discovery; the two fields share no common source and can diverge. **Required fix shape:** (a) derive both `session.session` and `session_lifecycle.kind` from the same lifecycle classification result so they cannot diverge; (b) replace the `"live-repl"` free-form sentinel with a structured `session_kind` field (`live_repl`, `saved`, `resume`, etc.) that carries the same type vocabulary as `session_lifecycle.kind`; (c) when `session_lifecycle.kind = "saved_only"`, never emit `"session": "live-repl"` (or vice versa); (d) add a regression test proving `status --output-format json` never emits `session.kind = "live_repl"` and `session_lifecycle.kind = "saved_only"` simultaneously. **Why this matters:** `status --output-format json` is the machine-readable truth surface for session state; if two fields in the same snapshot contradict each other, every lane, monitor, and orchestrator has to pick a winner instead of reading a coherent state. Source: Jobdori live dogfood on mengmotaHost, claw-code `804d96b`, 2026-04-29.
+
+324. **Stale local debug binaries can impersonate the current workspace because version/status/doctor do not compare embedded build provenance to repo HEAD** — dogfooded 2026-04-29 on current `origin/main` / workspace HEAD `e7074f47` after PR #2838. The working tree was at `e7074f47`, but running `./rust/target/debug/claw version --output-format json` reported embedded `git_sha` `1f901988`. `status` and `doctor` remained green and exposed no warning that the executable under test was stale relative to the workspace HEAD, nor any structured build-provenance freshness signal that downstream claws could use to decide whether the observed behavior came from the checked-out code or an older debug artifact. This is a repo-identity opacity gap: the JSON truth surfaces can look authoritative while actually describing a different binary lineage than the source tree being dogfooded. **Required fix shape:** (a) compare the embedded build `git_sha` / build date with the current workspace git HEAD and dirty state when the binary can discover a containing worktree; (b) expose redaction-safe structured fields in `version --output-format json`, `status --output-format json`, and `doctor --output-format json`, including `binary_provenance`, `workspace_head`, and `stale_binary` (with enough reason/detail to distinguish clean match, dirty workspace, unknown workspace, and definite stale SHA mismatch); (c) warn in human/text mode when executing a stale local debug binary such as `./rust/target/debug/claw` so dogfooders do not trust old behavior as current-main evidence; (d) avoid leaking secrets or absolute sensitive paths beyond the existing workspace-identification policy; (e) add regression/fixture coverage for matching HEAD, dirty workspace, no-worktree/unknown provenance, and stale embedded SHA cases. **Why this matters:** status/doctor/version are supposed to be the machine-readable basis for dogfood truth. If a stale binary can report a different `git_sha` than the checked-out repo without any freshness warning, claws can file or verify bugs against the wrong code and waste cycles chasing already-fixed or not-yet-built behavior. Source: gaebal-gajae dogfood follow-up from current main `e7074f47` after PR #2838; observed `./rust/target/debug/claw version --output-format json` reporting `git_sha` `1f901988` with no stale-binary-vs-workspace-HEAD warning.
+
+325. **`help --output-format json` returns valid JSON but hides the actual help schema inside one prose `message` string** — dogfooded 2026-04-29 on current `origin/main` / workspace HEAD `d607ff36`. Running `./rust/target/debug/claw help --output-format json` produces parseable JSON, but the object only exposes top-level keys like `kind` and `message`; all command names, global flags, slash-command metadata, aliases, resume-safety, output-format support, auth/preflight notes, and descriptions are flattened into one human-oriented prose blob. That technically satisfies “valid JSON” while still forcing automation to scrape the same help text humans read, making `/issue`, `/help`, and resume-safety contracts opaque to claws. **Required fix shape:** (a) keep `message` as the compact human-rendered help summary, but add a documented structured schema with `schema` / `schema_version` fields; (b) expose first-class arrays/objects such as `commands[]`, `options[]`, and `slash_commands[]` with stable fields including `name`, `aliases`, `description`, `args`, `output_formats_supported`, `resume_safe`, `interactive_only`, and `creates_external_side_effects`; (c) include auth and creation preflight metadata where relevant, especially for GitHub/issue flows (`auth_preflight`, `creation_unavailable`, `gh_cli_authenticated`, `github_token_present`, or equivalent non-secret state); (d) make `/issue`, `/help`, aliases, and resume-dispatch safety machine-readable from the JSON payload instead of recoverable only by parsing prose markers; (e) add regression coverage proving `help --output-format json` is valid JSON and that `/issue`, `/help`, resume-safe vs interactive-only slash commands, aliases, descriptions, supported output formats, and side-effect/auth-preflight fields are present and internally consistent. **Why this matters:** help JSON is the discoverability surface automation uses before invoking commands. If it is just prose wrapped in JSON, claws cannot safely decide whether a command can run non-interactively, resume from a saved session, create external GitHub side effects, or requires auth/preflight without brittle text scraping. Source: gaebal-gajae dogfood follow-up from current main `d607ff36`; observed `./rust/target/debug/claw help --output-format json` returning valid JSON with only `{kind,message}` at the top level while the actionable command schema remained buried in `message`.