Problem
nodeRuntimeCommandWithGuard (internal/agent/node_install.go:72) wraps the agent's real command (e.g. claude -p ..., codex exec --json ...) with the nvm/Node.js bootstrap and ships the whole thing as a single multi-line shell script to one rt.Exec call.
This means the node-runtime bootstrap and the agent invocation share one Exec, one exit code, one stderr, and one stdout. They are operationally distinct phases but indistinguishable in the result.
Affected call sites
| Agent |
Call site |
Coupled command |
| Claude Code — Run |
internal/agent/claude_code.go:127 |
nodeRuntimeCommandWithGuard(\"claude\", buildClaudePrintCmd(...)) |
| Claude Code — MCP install |
internal/agent/claude_code.go:72 |
nodeRuntimeCommandWithGuard(\"claude\", cmd) |
| Codex — Run |
internal/agent/codex.go:236 |
nodeRuntimeCommandWithGuard(\"codex\", buildCodexRunCmdWithLastMessage(...)) |
| Codex — MCP install |
internal/agent/codex.go:136, :171 |
nodeRuntimeCommandWithGuard(\"codex\", cmd.String()) |
qodercli is not affected (its Run/InstallMCP don't use this wrapper, and its install path uses curl ... | bash standalone).
Why this affects judgment / debugging
-
Misleading error attribution. When the bootstrap fails (e.g. curl to the nvm installer times out, sha256 mismatch, nvm install fails on the offline mirror), the error surfaces as claude-code run failed: ... / codex run failed: .... Operators have to read the multi-line shell script to figure out it was actually a node/nvm problem, not an agent invocation problem.
-
Polluted stderr feeds signal detectors. providerAuthFailureSignal / providerRateLimitSignal in claude_code.go scan result.Stderr for substrings. Bootstrap noise (curl warnings, nvm install output, sha256 messages) is concatenated into the same buffer and widens the false-positive surface.
-
Mixed stdout. When the run path also writes stdout.json from the same Exec, bootstrap-stage stdout (e.g. nvm install banners) can prepend the actual agent JSONL stream.
-
Bootstrap re-runs after Install. The Install step is supposed to be the one place that prepares the runtime. The Run-time guard makes the bootstrap silently fire again when claude/codex is somehow not on PATH at Run time, masking what should be a clear "Install was never called / Install was wiped" failure.
-
Harder to read shell. The Exec'd script is now ~20 lines of mixed install + invocation logic, with set -e covering both phases — diagnosing partial failures from the recorded artifact takes extra parsing.
Proposed direction
A couple of options, not married to any one:
- A. Two Execs. Run the node bootstrap (when needed) as a separate
rt.Exec call, then run the agent command as its own Exec. Each has its own exit code, stderr, stdout and timestamps in the trace.
- B. Lean Run. Strip Run/InstallMCP down to just the agent command (plus optionally a one-liner
[ -s \"$NVM_DIR/nvm.sh\" ] && . \"$NVM_DIR/nvm.sh\" to source an already-installed nvm). Heavy bootstrap stays in Install only. If claude/codex is missing at Run time, return a structured error pointing the user at skill-up install (the current Check method is already designed for this).
- C. Tagged sections. If keeping one Exec is necessary, prefix each phase's output with sentinels and split them in
result.Stderr / result.Stdout before passing to detectors and persisters.
Option B (or A) seems most consistent with how qodercli already works.
Reproduction sketch
Temporarily break the nvm bootstrap (e.g. point NVM_SOURCE at a non-existent host) and call skill-up with the claude-code agent on a runtime that already has claude installed — the failure is reported as claude-code run failed, even though claude was never invoked.
Acceptance criteria
- A node bootstrap failure no longer surfaces as an agent-run/agent-MCP failure.
result.Stderr consumed by providerAuthFailureSignal / providerRateLimitSignal contains only output from the agent process.
stdout.json artifact contains only agent output, no bootstrap output prefix.
qodercli behavior is unchanged.
Problem
nodeRuntimeCommandWithGuard(internal/agent/node_install.go:72) wraps the agent's real command (e.g.claude -p ...,codex exec --json ...) with the nvm/Node.js bootstrap and ships the whole thing as a single multi-line shell script to onert.Execcall.This means the node-runtime bootstrap and the agent invocation share one Exec, one exit code, one stderr, and one stdout. They are operationally distinct phases but indistinguishable in the result.
Affected call sites
internal/agent/claude_code.go:127nodeRuntimeCommandWithGuard(\"claude\", buildClaudePrintCmd(...))internal/agent/claude_code.go:72nodeRuntimeCommandWithGuard(\"claude\", cmd)internal/agent/codex.go:236nodeRuntimeCommandWithGuard(\"codex\", buildCodexRunCmdWithLastMessage(...))internal/agent/codex.go:136,:171nodeRuntimeCommandWithGuard(\"codex\", cmd.String())qodercliis not affected (its Run/InstallMCP don't use this wrapper, and its install path usescurl ... | bashstandalone).Why this affects judgment / debugging
Misleading error attribution. When the bootstrap fails (e.g.
curlto the nvm installer times out, sha256 mismatch,nvm installfails on the offline mirror), the error surfaces asclaude-code run failed: .../codex run failed: .... Operators have to read the multi-line shell script to figure out it was actually a node/nvm problem, not an agent invocation problem.Polluted stderr feeds signal detectors.
providerAuthFailureSignal/providerRateLimitSignalinclaude_code.goscanresult.Stderrfor substrings. Bootstrap noise (curl warnings, nvm install output, sha256 messages) is concatenated into the same buffer and widens the false-positive surface.Mixed stdout. When the run path also writes
stdout.jsonfrom the same Exec, bootstrap-stage stdout (e.g. nvm install banners) can prepend the actual agent JSONL stream.Bootstrap re-runs after Install. The
Installstep is supposed to be the one place that prepares the runtime. The Run-time guard makes the bootstrap silently fire again whenclaude/codexis somehow not on PATH at Run time, masking what should be a clear "Install was never called / Install was wiped" failure.Harder to read shell. The Exec'd script is now ~20 lines of mixed install + invocation logic, with
set -ecovering both phases — diagnosing partial failures from the recorded artifact takes extra parsing.Proposed direction
A couple of options, not married to any one:
rt.Execcall, then run the agent command as its own Exec. Each has its own exit code, stderr, stdout and timestamps in the trace.[ -s \"$NVM_DIR/nvm.sh\" ] && . \"$NVM_DIR/nvm.sh\"to source an already-installed nvm). Heavy bootstrap stays inInstallonly. Ifclaude/codexis missing at Run time, return a structured error pointing the user atskill-up install(the currentCheckmethod is already designed for this).result.Stderr/result.Stdoutbefore passing to detectors and persisters.Option B (or A) seems most consistent with how
qoderclialready works.Reproduction sketch
Temporarily break the nvm bootstrap (e.g. point
NVM_SOURCEat a non-existent host) and callskill-upwith the claude-code agent on a runtime that already hasclaudeinstalled — the failure is reported asclaude-code run failed, even thoughclaudewas never invoked.Acceptance criteria
result.Stderrconsumed byproviderAuthFailureSignal/providerRateLimitSignalcontains only output from the agent process.stdout.jsonartifact contains only agent output, no bootstrap output prefix.qoderclibehavior is unchanged.