Skip to content

Commit b9331ae

Browse files
committed
ROADMAP ultraworkers#117: -p flag is super-greedy, swallows all subsequent args into prompt; --help/--version/--model after -p silently consumed; flag-like prompts bypass emptiness check
Dogfooded 2026-04-18 on main HEAD f2d6538 from /tmp/cdSS. -p (Claude Code compat shortcut) at main.rs:524-538: "-p" => { let prompt = args[index + 1..].join(" "); if prompt.trim().is_empty() { return Err(...); } return Ok(CliAction::Prompt {...}); } args[index+1..].join(" ") = ABSORBS EVERY subsequent arg. return Ok(...) = short-circuits parser, discards wants_help etc. Failure modes: 1. Silent flag swallow: claw -p "test" --model sonnet --output-format json → prompt = "test --model sonnet --output-format json" → model: default (not sonnet), format: text (not json) → LLM receives literal string '--model sonnet' as user input → billable tokens burned on corrupted prompt 2. --help/--version defeated: claw -p "test" --help → sends 'test --help' to LLM claw -p "test" --version → sends 'test --version' to LLM claw --help -p "test" → wants_help=true set, then discarded by -p's early return. Help never prints. 3. Emptiness check too weak: claw -p --model sonnet → prompt = "--model sonnet" (non-empty) → passes is_empty() check → sends '--model sonnet' to LLM as the user prompt → no error raised 4. Flag-order invisible: claw --model sonnet -p "test" → WORKS (model parsed first) claw -p "test" --model sonnet → BROKEN (--model swallowed) Same flags, different order, different behavior. --help has zero warning about flag-order semantics. Compare Claude Code: claude -p "prompt" --model sonnet → works (model takes effect) claw -p "prompt" --model sonnet → silently broken Fix shape (~40 lines): - "-p" takes exactly args[index+1] as prompt, continues parsing: let prompt = args.get(index+1).cloned().unwrap_or_default(); if prompt.trim().is_empty() || prompt.starts_with('-') { return Err("-p requires a prompt string"); } pending_prompt = Some(prompt); index += 2; - Reject prompts that start with '-' unless preceded by '--': 'claw -p -- --literal-prompt' = literal '--literal-prompt' - Consult wants_help before returning from -p branch. - Regression tests: -p "prompt" --model sonnet → model takes effect -p "prompt" --help → help prints -p --foo → error --help -p "test" → help prints -p -- --literal → literal prompt sent Joins Silent-flag/documented-but-unenforced (ultraworkers#96-ultraworkers#101, ultraworkers#104, ultraworkers#108, ultraworkers#111, ultraworkers#115, ultraworkers#116) as 12th — -p is undocumented in --help yet actively broken. Joins Parallel-entry-point asymmetry (ultraworkers#91, ultraworkers#101, ultraworkers#104, ultraworkers#105, ultraworkers#108, ultraworkers#114) as 7th — three entry points (prompt TEXT, bare positional, -p TEXT) with subtly different arg-parsing. Joins Claude Code migration parity (ultraworkers#103, ultraworkers#109, ultraworkers#116) as 4th — users typing 'claude -p "..." --model ...' muscle memory get silent prompt corruption. Joins Truth-audit — parser lies about what it parsed. Natural bundles: ultraworkers#108 + ultraworkers#117 — billable-token silent-burn pair: typo fallthrough burns tokens (ultraworkers#108) + flag-swallow burns tokens (ultraworkers#117) ultraworkers#105 + ultraworkers#108 + ultraworkers#117 — model-resolution triangle: status ignores .claw.json model (ultraworkers#105) + typo statuss burns tokens (ultraworkers#108) + -p --model sonnet silently ignored (ultraworkers#117) Filed in response to Clawhip pinpoint nudge 1494933025857736836 in #clawcode-building-in-public.
1 parent f2d6538 commit b9331ae

1 file changed

Lines changed: 96 additions & 0 deletions

File tree

ROADMAP.md

Lines changed: 96 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3704,3 +3704,99 @@ ear], /color [scheme], /effort [low|medium|high], /fast, /summary, /tag [label],
37043704
**Blocker.** Policy decision: does the project want strict-by-default (current) or lax-by-default? The fix shape assumes lax-by-default with strict opt-in, matching industry-standard forward-compat conventions and easing Claude Code migration.
37053705
37063706
**Source.** Jobdori dogfood 2026-04-18 against `/tmp/cdRR` on main HEAD `ad02761` in response to Clawhip pinpoint nudge at `1494925472239321160`. Joins **Claude Code migration parity** (#103, #109) as 3rd member — this is the most severe migration-parity break, since it's a HARD FAIL at startup rather than a silent drop (#103) or a stderr-prose warning (#109). Joins **Reporting-surface / config-hygiene** (#90, #91, #92, #110, #115) on the error-routing-vs-stdout axis: `--output-format json` consumers get empty stdout on config errors. Joins **Silent-flag / documented-but-unenforced** (#96–#101, #104, #108, #111, #115) because only the first error is reported and all subsequent errors are silent. Cross-cluster with **Truth-audit / diagnostic-integrity** (#80–#87, #89, #100, #102, #103, #105, #107, #109, #110, #112, #114, #115) because `validation.is_ok()` hides all-but-the-first structured problem. Natural bundle: **#103 + #109 + #116** — Claude Code migration parity triangle: `claw agents` drops `.md` (loss of compatibility) + config warnings stderr-prose (loss of structure) + config unknowns hard-fail (loss of forward-compat). Also **#109 + #116** — config validation reporting surface: only first warning surfaces structurally (#109) + only first error surfaces structurally and halts loading (#116). Session tally: ROADMAP #116.
3707+
3708+
117. **`-p` (Claude Code compat shortcut for "prompt") is super-greedy: the parser at `main.rs:524-538` does `let prompt = args[index + 1..].join(" ")` and immediately returns, swallowing EVERY subsequent arg into the prompt text. `--model sonnet`, `--output-format json`, `--help`, `--version`, and any other flag placed AFTER `-p` are silently consumed into the prompt that gets sent to the LLM. Flags placed BEFORE `-p` are also dropped when parser-state variables like `wants_help` are set and then discarded by the early `return Ok(CliAction::Prompt {...})`. The emptiness check (`if prompt.trim().is_empty()`) is too weak: `claw -p --model sonnet` produces prompt=`"--model sonnet"` which is non-empty, so no error is raised and the literal flag string is sent to the LLM as user input** — dogfooded 2026-04-18 on main HEAD `f2d6538` from `/tmp/cdSS`.
3709+
3710+
**Concrete repro.**
3711+
```
3712+
# Test: -p swallows --help (which should short-circuit):
3713+
$ claw -p "test" --help
3714+
# Expected: help output (--help short-circuits)
3715+
# Actual: tries to run prompt "test --help" — sends it to LLM
3716+
error: missing Anthropic credentials ...
3717+
3718+
# Test: --help BEFORE -p is silently discarded:
3719+
$ claw --help -p "test"
3720+
# Expected: help output (--help seen first)
3721+
# Actual: tries to run prompt "test" — wants_help=true was set, then discarded
3722+
error: missing Anthropic credentials ...
3723+
3724+
# Test: -p swallows --version:
3725+
$ claw -p "test" --version
3726+
# Expected: version output
3727+
# Actual: tries to run prompt "test --version"
3728+
3729+
# Test: -p with actual credentials — the SWALLOWING is visible:
3730+
$ ANTHROPIC_AUTH_TOKEN=sk-bogus claw -p "hello" --model sonnet
3731+
7[1G[2K[38;5;12m⠋ 🦀 Thinking...[0m8[1G[2K[38;5;9m✘ ❌ Request failed
3732+
error: api returned 401 Unauthorized (authentication_error)
3733+
# The 401 comes back AFTER the request went out. The --model sonnet was
3734+
# swallowed into the prompt "hello --model sonnet", the binary's default
3735+
# model was used (not sonnet), and the bogus token hit auth failure.
3736+
3737+
# Test: prompt-starts-with-flag sneaks past emptiness check:
3738+
$ claw -p --model sonnet
3739+
error: missing Anthropic credentials ...
3740+
# prompt = "--model sonnet" (non-empty, so check passes).
3741+
# No "-p requires a prompt string" error.
3742+
# The literal string "--model sonnet" is sent to the LLM.
3743+
```
3744+
3745+
**Trace path.**
3746+
- `rust/crates/rusty-claude-cli/src/main.rs:524-538` — the `-p` branch:
3747+
```rust
3748+
"-p" => {
3749+
// Claw Code compat: -p "prompt" = one-shot prompt
3750+
let prompt = args[index + 1..].join(" ");
3751+
if prompt.trim().is_empty() {
3752+
return Err("-p requires a prompt string".to_string());
3753+
}
3754+
return Ok(CliAction::Prompt {
3755+
prompt,
3756+
model: resolve_model_alias_with_config(&model),
3757+
output_format,
3758+
...
3759+
});
3760+
}
3761+
```
3762+
The `args[index + 1..].join(" ")` is the greedy absorption. The `return Ok(...)` short-circuits the parser loop, discarding any parser state set by earlier iterations.
3763+
- `rust/crates/rusty-claude-cli/src/main.rs:403` — `let mut wants_help = false;` declared but can be set and immediately dropped if `-p` returns.
3764+
- `rust/crates/rusty-claude-cli/src/main.rs:415-418` — `"--help" | "-h" if rest.is_empty() => { wants_help = true; index += 1; }`. The `-p` branch doesn't consult `wants_help` before returning.
3765+
- `rust/crates/rusty-claude-cli/src/main.rs:524-528` — emptiness check: `if prompt.trim().is_empty()`. Fails only on totally-empty joined string. `-p --foo` produces `"--foo"` which passes.
3766+
- Compare Claude Code's `-p`: `claude -p "prompt"` takes exactly ONE positional arg, subsequent flags are parsed normally. claw-code's `-p` is greedy and short-circuits the rest of the parser.
3767+
- The short-circuit also means flags set AFTER `-p` (e.g. `-p "text" --output-format json`) that actually do end up in the Prompt struct (like `output_format`) only work if they appear BEFORE `-p`. Anything after is swallowed.
3768+
3769+
**Why this is specifically a clawability gap.**
3770+
1. *Silent prompt corruption.* A claw building a command line via string concatenation ends up sending the literal string `"--model sonnet --output-format json"` to the LLM when that string is appended after `-p`. The LLM gets garbage prompts that weren't what the user/orchestrator meant. Billable tokens burned on corrupted prompts.
3771+
2. *Flag order sensitivity is invisible.* Nothing in `--help` warns that flags must be placed BEFORE `-p`. Users and claws try `-p "prompt" --model sonnet` based on Claude Code muscle memory and get silent misbehavior.
3772+
3. *`--help` and `--version` short-circuits are defeated.* `claw -p "test" --help` should print help. Instead it tries to run the prompt "test --help". `claw --help -p "test"` (flag-first) STILL tries to run the prompt — `wants_help` is set but dropped on -p's return. Help is inaccessible when -p is in the command line.
3773+
4. *Emptiness check too weak.* `-p --foo` produces prompt `"--foo"` which the check considers non-empty. So no guard. A claw or shell script that conditionally constructs `-p "$PROMPT" --output-format json` where `$PROMPT` is empty or missing silently sends `"--output-format json"` as the user prompt.
3774+
5. *Joins truth-audit.* The parser is lying about what it parsed. Presence of `--model sonnet` in the args does NOT mean the model got set. Depending on order, the same args produce different parse outcomes. A claw inspecting its own argv cannot predict behavior from arg composition alone.
3775+
6. *Joins parallel-entry-point asymmetry.* `-p "prompt"` and `claw prompt TEXT` and bare positional `claw TEXT` are three entry points to the same Prompt action. Each has different arg-parsing semantics. Inconsistent.
3776+
7. *Joins Claude Code migration parity.* `claude -p "..." --model ..."` works in Claude Code. The same command in claw-code silently corrupts the prompt. Users migrating get mysterious wrong-model-used or garbage-prompt symptoms.
3777+
8. *Combined with #108 (subcommand typos fall through to Prompt).* A typo like `claw -p helo --model sonnet` gets sent as "helo --model sonnet" to the LLM AND gets counted against token usage AND gets no warning. Two bugs compound: typo + swallow.
3778+
3779+
**Fix shape — `-p` takes exactly one argument, subsequent flags parse normally.**
3780+
1. *Take only `args[index + 1]` as the prompt; continue parsing afterward.* ~10 lines.
3781+
```rust
3782+
"-p" => {
3783+
let prompt = args.get(index + 1).cloned().unwrap_or_default();
3784+
if prompt.trim().is_empty() || prompt.starts_with('-') {
3785+
return Err("-p requires a prompt string (use quotes for multi-word prompts)".to_string());
3786+
}
3787+
pending_prompt = Some(prompt);
3788+
index += 2;
3789+
}
3790+
```
3791+
Then after the loop, if `pending_prompt.is_some()` and `rest.is_empty()`, build the Prompt action with the collected flags.
3792+
2. *Handle the emptiness check rigorously.* Reject prompts that start with `-` (likely a flag) with an error: `-p appears to be followed by a flag, not a prompt. Did you mean '-p "<prompt>"' or '-p -- -flag-as-prompt'?` ~5 lines.
3793+
3. *Support the `--` separator.* `claw -p -- --model` lets users opt into a literal `--model` string as the prompt. ~5 lines.
3794+
4. *Consult `wants_help` before returning.* If `wants_help` was set, print help regardless of -p. ~3 lines.
3795+
5. *Deprecate the current greedy behavior with a runtime warning.* For one release, detect the old-style invocation (multiple args after `-p` with some looking flag-like) and emit: `warning: "-p" absorption changed. See CHANGELOG.` ~15 lines.
3796+
6. *Regression tests.* (a) `-p "prompt" --model sonnet` uses sonnet model. (b) `-p "prompt" --help` prints help. (c) `-p --foo` errors out. (d) `--help -p "test"` prints help. (e) `claw -p -- --literal-prompt` sends "--literal-prompt" to the LLM.
3797+
3798+
**Acceptance.** `-p "prompt"` takes exactly ONE argument. Subsequent `--model`, `--output-format`, `--help`, `--version`, `--permission-mode`, etc. are parsed normally. `claw -p "test" --help` prints help. `claw -p --model sonnet` errors out with a message explaining flag-like prompts require `--`. `claw --help -p "test"` prints help. Token-burning silent corruption is impossible.
3799+
3800+
**Blocker.** None. Parser refactor is localized to one arm. Compatibility concern: anyone currently relying on `-p` greedy absorption (unlikely because it's silently-broken) would see a behavior change. Deprecation warning for one release softens the transition.
3801+
3802+
**Source.** Jobdori dogfood 2026-04-18 against `/tmp/cdSS` on main HEAD `f2d6538` in response to Clawhip pinpoint nudge at `1494933025857736836`. Joins **Silent-flag / documented-but-unenforced** (#96–#101, #104, #108, #111, #115, #116) as 12th member — `-p` is an undocumented-in-`--help` shortcut whose silent greedy behavior makes flag-order semantics invisible. Joins **Parallel-entry-point asymmetry** (#91, #101, #104, #105, #108, #114) as 7th — three entry points (`claw prompt TEXT`, bare positional `claw TEXT`, `claw -p TEXT`) with subtly different arg-parsing semantics. Joins **Truth-audit** — the parser is lying about what it parsed when `-p` is present. Joins **Claude Code migration parity** (#103, #109, #116) as 4th — users migrating `claude -p "..." --model ..."` silently get corrupted prompts. Cross-cluster with **Silent-flag** quartet (#96, #98, #108, #111) now quintet: #108 (subcommand typos fall through to Prompt, burning billed tokens) + **#117** (prompt flags swallowed into prompt text, ALSO burning billed tokens) — both are silent-token-burn failure modes. Natural bundle: **#108 + #117** — billable-token silent-burn pair: typo fallthrough + flag-swallow. Also **#105 + #108 + #117** — model-resolution triangle: `claw status` ignores .claw.json model (#105) + typo'd `claw statuss` burns tokens (#108) + `-p "test" --model sonnet` silently ignores the model (#117). Session tally: ROADMAP #117.

0 commit comments

Comments
 (0)