ROADMAP ultraworkers#117: -p flag is super-greedy, swallows all subsequent args into prompt; --help/--version/--model after -p silently consumed; flag-like prompts bypass emptiness check

code-yeongyu · code-yeongyu · commit b9331ae61b9e · 2026-04-18T15:01:47.000+09:00
Dogfooded 2026-04-18 on main HEAD f2d6538 from /tmp/cdSS. -p (Claude Code compat shortcut) at main.rs:524-538: "-p" => { let prompt = args[index + 1..].join(" "); if prompt.trim().is_empty() { return Err(...); } return Ok(CliAction::Prompt {...}); } args[index+1..].join(" ") = ABSORBS EVERY subsequent arg. return Ok(...) = short-circuits parser, discards wants_help etc. Failure modes: 1. Silent flag swallow: claw -p "test" --model sonnet --output-format json → prompt = "test --model sonnet --output-format json" → model: default (not sonnet), format: text (not json) → LLM receives literal string '--model sonnet' as user input → billable tokens burned on corrupted prompt 2. --help/--version defeated: claw -p "test" --help → sends 'test --help' to LLM claw -p "test" --version → sends 'test --version' to LLM claw --help -p "test" → wants_help=true set, then discarded by -p's early return. Help never prints. 3. Emptiness check too weak: claw -p --model sonnet → prompt = "--model sonnet" (non-empty) → passes is_empty() check → sends '--model sonnet' to LLM as the user prompt → no error raised 4. Flag-order invisible: claw --model sonnet -p "test" → WORKS (model parsed first) claw -p "test" --model sonnet → BROKEN (--model swallowed) Same flags, different order, different behavior. --help has zero warning about flag-order semantics. Compare Claude Code: claude -p "prompt" --model sonnet → works (model takes effect) claw -p "prompt" --model sonnet → silently broken Fix shape (~40 lines): - "-p" takes exactly args[index+1] as prompt, continues parsing: let prompt = args.get(index+1).cloned().unwrap_or_default(); if prompt.trim().is_empty() || prompt.starts_with('-') { return Err("-p requires a prompt string"); } pending_prompt = Some(prompt); index += 2; - Reject prompts that start with '-' unless preceded by '--': 'claw -p -- --literal-prompt' = literal '--literal-prompt' - Consult wants_help before returning from -p branch. - Regression tests: -p "prompt" --model sonnet → model takes effect -p "prompt" --help → help prints -p --foo → error --help -p "test" → help prints -p -- --literal → literal prompt sent Joins Silent-flag/documented-but-unenforced (ultraworkers#96-ultraworkers#101, ultraworkers#104, ultraworkers#108, ultraworkers#111, ultraworkers#115, ultraworkers#116) as 12th — -p is undocumented in --help yet actively broken. Joins Parallel-entry-point asymmetry (ultraworkers#91, ultraworkers#101, ultraworkers#104, ultraworkers#105, ultraworkers#108, ultraworkers#114) as 7th — three entry points (prompt TEXT, bare positional, -p TEXT) with subtly different arg-parsing. Joins Claude Code migration parity (ultraworkers#103, ultraworkers#109, ultraworkers#116) as 4th — users typing 'claude -p "..." --model ...' muscle memory get silent prompt corruption. Joins Truth-audit — parser lies about what it parsed. Natural bundles: ultraworkers#108 + ultraworkers#117 — billable-token silent-burn pair: typo fallthrough burns tokens (ultraworkers#108) + flag-swallow burns tokens (ultraworkers#117) ultraworkers#105 + ultraworkers#108 + ultraworkers#117 — model-resolution triangle: status ignores .claw.json model (ultraworkers#105) + typo statuss burns tokens (ultraworkers#108) + -p --model sonnet silently ignored (ultraworkers#117) Filed in response to Clawhip pinpoint nudge 1494933025857736836 in #clawcode-building-in-public.
diff --git a/ROADMAP.md b/ROADMAP.md
@@ -3704,3 +3704,99 @@ ear], /color [scheme], /effort [low|medium|high], /fast, /summary, /tag [label],
      **Blocker.** Policy decision: does the project want strict-by-default (current) or lax-by-default? The fix shape assumes lax-by-default with strict opt-in, matching industry-standard forward-compat conventions and easing Claude Code migration.
 
      **Source.** Jobdori dogfood 2026-04-18 against `/tmp/cdRR` on main HEAD `ad02761` in response to Clawhip pinpoint nudge at `1494925472239321160`. Joins **Claude Code migration parity** (#103, #109) as 3rd member — this is the most severe migration-parity break, since it's a HARD FAIL at startup rather than a silent drop (#103) or a stderr-prose warning (#109). Joins **Reporting-surface / config-hygiene** (#90, #91, #92, #110, #115) on the error-routing-vs-stdout axis: `--output-format json` consumers get empty stdout on config errors. Joins **Silent-flag / documented-but-unenforced** (#96–#101, #104, #108, #111, #115) because only the first error is reported and all subsequent errors are silent. Cross-cluster with **Truth-audit / diagnostic-integrity** (#80–#87, #89, #100, #102, #103, #105, #107, #109, #110, #112, #114, #115) because `validation.is_ok()` hides all-but-the-first structured problem. Natural bundle: **#103 + #109 + #116** — Claude Code migration parity triangle: `claw agents` drops `.md` (loss of compatibility) + config warnings stderr-prose (loss of structure) + config unknowns hard-fail (loss of forward-compat). Also **#109 + #116** — config validation reporting surface: only first warning surfaces structurally (#109) + only first error surfaces structurally and halts loading (#116). Session tally: ROADMAP #116.
+
+117. **`-p` (Claude Code compat shortcut for "prompt") is super-greedy: the parser at `main.rs:524-538` does `let prompt = args[index + 1..].join(" ")` and immediately returns, swallowing EVERY subsequent arg into the prompt text. `--model sonnet`, `--output-format json`, `--help`, `--version`, and any other flag placed AFTER `-p` are silently consumed into the prompt that gets sent to the LLM. Flags placed BEFORE `-p` are also dropped when parser-state variables like `wants_help` are set and then discarded by the early `return Ok(CliAction::Prompt {...})`. The emptiness check (`if prompt.trim().is_empty()`) is too weak: `claw -p --model sonnet` produces prompt=`"--model sonnet"` which is non-empty, so no error is raised and the literal flag string is sent to the LLM as user input** — dogfooded 2026-04-18 on main HEAD `f2d6538` from `/tmp/cdSS`.
+
+     **Concrete repro.**
+     ```
+     # Test: -p swallows --help (which should short-circuit):
+     $ claw -p "test" --help
+     # Expected: help output (--help short-circuits)
+     # Actual: tries to run prompt "test --help" — sends it to LLM
+     error: missing Anthropic credentials ...
+
+     # Test: --help BEFORE -p is silently discarded:
+     $ claw --help -p "test"
+     # Expected: help output (--help seen first)
+     # Actual: tries to run prompt "test" — wants_help=true was set, then discarded
+     error: missing Anthropic credentials ...
+
+     # Test: -p swallows --version:
+     $ claw -p "test" --version
+     # Expected: version output
+     # Actual: tries to run prompt "test --version"
+
+     # Test: -p with actual credentials — the SWALLOWING is visible:
+     $ ANTHROPIC_AUTH_TOKEN=sk-bogus claw -p "hello" --model sonnet
+     7[1G[2K[38;5;12m⠋ 🦀 Thinking...[0m8[1G[2K[38;5;9m✘ ❌ Request failed
+     error: api returned 401 Unauthorized (authentication_error)
+     # The 401 comes back AFTER the request went out. The --model sonnet was
+     # swallowed into the prompt "hello --model sonnet", the binary's default
+     # model was used (not sonnet), and the bogus token hit auth failure.
+
+     # Test: prompt-starts-with-flag sneaks past emptiness check:
+     $ claw -p --model sonnet
+     error: missing Anthropic credentials ...
+     # prompt = "--model sonnet" (non-empty, so check passes).
+     # No "-p requires a prompt string" error.
+     # The literal string "--model sonnet" is sent to the LLM.
+     ```
+
+     **Trace path.**
+     - `rust/crates/rusty-claude-cli/src/main.rs:524-538` — the `-p` branch:
+       ```rust
+       "-p" => {
+           // Claw Code compat: -p "prompt" = one-shot prompt
+           let prompt = args[index + 1..].join(" ");
+           if prompt.trim().is_empty() {
+               return Err("-p requires a prompt string".to_string());
+           }
+           return Ok(CliAction::Prompt {
+               prompt,
+               model: resolve_model_alias_with_config(&model),
+               output_format,
+               ...
+           });
+       }
+       ```
+       The `args[index + 1..].join(" ")` is the greedy absorption. The `return Ok(...)` short-circuits the parser loop, discarding any parser state set by earlier iterations.
+     - `rust/crates/rusty-claude-cli/src/main.rs:403` — `let mut wants_help = false;` declared but can be set and immediately dropped if `-p` returns.
+     - `rust/crates/rusty-claude-cli/src/main.rs:415-418` — `"--help" | "-h" if rest.is_empty() => { wants_help = true; index += 1; }`. The `-p` branch doesn't consult `wants_help` before returning.
+     - `rust/crates/rusty-claude-cli/src/main.rs:524-528` — emptiness check: `if prompt.trim().is_empty()`. Fails only on totally-empty joined string. `-p --foo` produces `"--foo"` which passes.
+     - Compare Claude Code's `-p`: `claude -p "prompt"` takes exactly ONE positional arg, subsequent flags are parsed normally. claw-code's `-p` is greedy and short-circuits the rest of the parser.
+     - The short-circuit also means flags set AFTER `-p` (e.g. `-p "text" --output-format json`) that actually do end up in the Prompt struct (like `output_format`) only work if they appear BEFORE `-p`. Anything after is swallowed.
+
+     **Why this is specifically a clawability gap.**
+     1. *Silent prompt corruption.* A claw building a command line via string concatenation ends up sending the literal string `"--model sonnet --output-format json"` to the LLM when that string is appended after `-p`. The LLM gets garbage prompts that weren't what the user/orchestrator meant. Billable tokens burned on corrupted prompts.
+     2. *Flag order sensitivity is invisible.* Nothing in `--help` warns that flags must be placed BEFORE `-p`. Users and claws try `-p "prompt" --model sonnet` based on Claude Code muscle memory and get silent misbehavior.
+     3. *`--help` and `--version` short-circuits are defeated.* `claw -p "test" --help` should print help. Instead it tries to run the prompt "test --help". `claw --help -p "test"` (flag-first) STILL tries to run the prompt — `wants_help` is set but dropped on -p's return. Help is inaccessible when -p is in the command line.
+     4. *Emptiness check too weak.* `-p --foo` produces prompt `"--foo"` which the check considers non-empty. So no guard. A claw or shell script that conditionally constructs `-p "$PROMPT" --output-format json` where `$PROMPT` is empty or missing silently sends `"--output-format json"` as the user prompt.
+     5. *Joins truth-audit.* The parser is lying about what it parsed. Presence of `--model sonnet` in the args does NOT mean the model got set. Depending on order, the same args produce different parse outcomes. A claw inspecting its own argv cannot predict behavior from arg composition alone.
+     6. *Joins parallel-entry-point asymmetry.* `-p "prompt"` and `claw prompt TEXT` and bare positional `claw TEXT` are three entry points to the same Prompt action. Each has different arg-parsing semantics. Inconsistent.
+     7. *Joins Claude Code migration parity.* `claude -p "..." --model ..."` works in Claude Code. The same command in claw-code silently corrupts the prompt. Users migrating get mysterious wrong-model-used or garbage-prompt symptoms.
+     8. *Combined with #108 (subcommand typos fall through to Prompt).* A typo like `claw -p helo --model sonnet` gets sent as "helo --model sonnet" to the LLM AND gets counted against token usage AND gets no warning. Two bugs compound: typo + swallow.
+
+     **Fix shape — `-p` takes exactly one argument, subsequent flags parse normally.**
+     1. *Take only `args[index + 1]` as the prompt; continue parsing afterward.* ~10 lines.
+       ```rust
+       "-p" => {
+           let prompt = args.get(index + 1).cloned().unwrap_or_default();
+           if prompt.trim().is_empty() || prompt.starts_with('-') {
+               return Err("-p requires a prompt string (use quotes for multi-word prompts)".to_string());
+           }
+           pending_prompt = Some(prompt);
+           index += 2;
+       }
+       ```
+       Then after the loop, if `pending_prompt.is_some()` and `rest.is_empty()`, build the Prompt action with the collected flags.
+     2. *Handle the emptiness check rigorously.* Reject prompts that start with `-` (likely a flag) with an error: `-p appears to be followed by a flag, not a prompt. Did you mean '-p "<prompt>"' or '-p -- -flag-as-prompt'?` ~5 lines.
+     3. *Support the `--` separator.* `claw -p -- --model` lets users opt into a literal `--model` string as the prompt. ~5 lines.
+     4. *Consult `wants_help` before returning.* If `wants_help` was set, print help regardless of -p. ~3 lines.
+     5. *Deprecate the current greedy behavior with a runtime warning.* For one release, detect the old-style invocation (multiple args after `-p` with some looking flag-like) and emit: `warning: "-p" absorption changed. See CHANGELOG.` ~15 lines.
+     6. *Regression tests.* (a) `-p "prompt" --model sonnet` uses sonnet model. (b) `-p "prompt" --help` prints help. (c) `-p --foo` errors out. (d) `--help -p "test"` prints help. (e) `claw -p -- --literal-prompt` sends "--literal-prompt" to the LLM.
+
+     **Acceptance.** `-p "prompt"` takes exactly ONE argument. Subsequent `--model`, `--output-format`, `--help`, `--version`, `--permission-mode`, etc. are parsed normally. `claw -p "test" --help` prints help. `claw -p --model sonnet` errors out with a message explaining flag-like prompts require `--`. `claw --help -p "test"` prints help. Token-burning silent corruption is impossible.
+
+     **Blocker.** None. Parser refactor is localized to one arm. Compatibility concern: anyone currently relying on `-p` greedy absorption (unlikely because it's silently-broken) would see a behavior change. Deprecation warning for one release softens the transition.
+
+     **Source.** Jobdori dogfood 2026-04-18 against `/tmp/cdSS` on main HEAD `f2d6538` in response to Clawhip pinpoint nudge at `1494933025857736836`. Joins **Silent-flag / documented-but-unenforced** (#96–#101, #104, #108, #111, #115, #116) as 12th member — `-p` is an undocumented-in-`--help` shortcut whose silent greedy behavior makes flag-order semantics invisible. Joins **Parallel-entry-point asymmetry** (#91, #101, #104, #105, #108, #114) as 7th — three entry points (`claw prompt TEXT`, bare positional `claw TEXT`, `claw -p TEXT`) with subtly different arg-parsing semantics. Joins **Truth-audit** — the parser is lying about what it parsed when `-p` is present. Joins **Claude Code migration parity** (#103, #109, #116) as 4th — users migrating `claude -p "..." --model ..."` silently get corrupted prompts. Cross-cluster with **Silent-flag** quartet (#96, #98, #108, #111) now quintet: #108 (subcommand typos fall through to Prompt, burning billed tokens) + **#117** (prompt flags swallowed into prompt text, ALSO burning billed tokens) — both are silent-token-burn failure modes. Natural bundle: **#108 + #117** — billable-token silent-burn pair: typo fallthrough + flag-swallow. Also **#105 + #108 + #117** — model-resolution triangle: `claw status` ignores .claw.json model (#105) + typo'd `claw statuss` burns tokens (#108) + `-p "test" --model sonnet` silently ignores the model (#117). Session tally: ROADMAP #117.