Skip to content

fix(spawn): wait for inner agent subshell before returning PID#327

Open
rubin-johnson wants to merge 1 commit intonyldn:mainfrom
rubin-johnson:fix/probe-pid-tracking-wait
Open

fix(spawn): wait for inner agent subshell before returning PID#327
rubin-johnson wants to merge 1 commit intonyldn:mainfrom
rubin-johnson:fix/probe-pid-tracking-wait

Conversation

@rubin-johnson
Copy link
Copy Markdown

@rubin-johnson rubin-johnson commented Apr 28, 2026

Problem

probe_discover backgrounds the outer spawn_agent wrappers and tracks their PIDs:

spawn_agent ... &
pids+=($!)

Inside spawn_agent, the real work runs in an inner subshell:

(
    # ... set up env, run claude/codex/gemini agent ...
) &
pid=$!

The outer wrapper captures that inner PID, writes it to the PID file, then exits — in about 1 second. display_rich_progress tracks the outer wrapper PIDs, sees them all go dead almost immediately, concludes all agents have finished, and triggers synthesis. At that point the inner agent subshells are still running and the output files are empty or partial.

Result: every multi-provider probe returns an empty synthesis, making the whole orchestration useless.

Fix

Add wait "$pid" 2>/dev/null || true in spawn_agent after the PID file write, before the log INFO line. This keeps the outer wrapper alive until the inner agent subshell completes. display_rich_progress now sees accurate liveness and synthesis fires only after real output exists.

+    wait "$pid" 2>/dev/null || true
+
     log INFO "Agent spawned with PID: $pid"
     echo "$pid"
 }

Verification

Tested on WSL2/Debian with OCTOPUS_DISPATCH_STRATEGY=full dispatching codex + gemini + claude-sonnet. Before the fix: synthesis triggered with 0B output files. After: all agents completed with real output (codex: 255KB, claude-sonnet: 23K+16K, gemini: 8K+8K).

Summary by CodeRabbit

  • Chores
    • Improved internal process management for background agent execution.

spawn_agent backgrounds an inner subshell ( ... ) & and captures its
PID. The outer wrapper then exits in ~1s. display_rich_progress tracks
the outer PIDs, sees them all dead almost immediately, and triggers
synthesis before any agent has produced output.

Adding wait "$pid" keeps the wrapper alive until the inner agent
finishes. Progress monitoring now sees accurate liveness and synthesis
fires only after real output exists.
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 28, 2026

📝 Walkthrough

Walkthrough

Adds a best-effort synchronous wait on the backgrounded agent PID with stderr suppressed and failure ignored to scripts/lib/spawn.sh. This occurs before logging and echoing the spawn PID, with no other control flow or error handling modifications.

Changes

Cohort / File(s) Summary
Agent PID Synchronization
scripts/lib/spawn.sh
Adds best-effort synchronous wait on backgrounded agent PID with stderr suppressed before logging/echoing spawn PID.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~3 minutes

Possibly related issues

  • Issue #326: The main change directly addresses the spawn agent PID-handling race condition, ensuring proper synchronization between inner provider PID and outer wrapper PID during agent spawning.

Poem

🐰 A rabbit waits with patient heart,
The agent's PID plays its part,
No races run in misaligned time,
With synchronous wait, all things align,
The spawn now flows so smooth and right! ✨

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main fix: adding a synchronous wait before returning the inner subshell PID in the spawn function, which directly addresses the core issue.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@scripts/lib/spawn.sh`:
- Around line 826-827: The unconditional wait on "$pid" in spawn_agent makes
spawn_agent blocking; change it to only wait when an explicit synchronous/probe
mode is requested (e.g., a flag like PROBE_MODE or BLOCK_ON_SPAWN set earlier in
scripts/lib/spawn.sh or passed into spawn_agent), so by default spawn_agent
returns the child PID immediately and async callers in
scripts/async-tmux-features.sh and scripts/orchestrate.sh continue to work;
alternatively, move progress/liveness tracking into the code that consumes the
PID instead of waiting here—update the check around wait "$pid" to gate it on
that mode/flag and ensure callers that need blocking behavior set that flag.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 6a89dcdd-d155-44e4-86fe-48b1dcee6f0e

📥 Commits

Reviewing files that changed from the base of the PR and between 14e8ee4 and aa96043.

📒 Files selected for processing (1)
  • scripts/lib/spawn.sh

Comment thread scripts/lib/spawn.sh
Comment on lines +826 to +827
wait "$pid" 2>/dev/null || true

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Unconditional wait turns spawn_agent into a blocking API and breaks async callers

Line 826 now blocks until the child exits, so PID return is no longer immediate. This regresses async flows in scripts/async-tmux-features.sh (spawn + later coordinated wait) and makes scripts/orchestrate.sh spawn behave synchronously.

Gate this wait behind an explicit mode used only by probe/discover (or move progress/liveness tracking to inner PID consumers) so default spawn_agent behavior stays asynchronous.

Suggested minimal guard at this call site
-    wait "$pid" 2>/dev/null || true
+    if [[ "${OCTOPUS_WAIT_FOR_AGENT_COMPLETION:-false}" == "true" ]]; then
+        wait "$pid" 2>/dev/null || true
+    fi
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
wait "$pid" 2>/dev/null || true
if [[ "${OCTOPUS_WAIT_FOR_AGENT_COMPLETION:-false}" == "true" ]]; then
wait "$pid" 2>/dev/null || true
fi
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/lib/spawn.sh` around lines 826 - 827, The unconditional wait on
"$pid" in spawn_agent makes spawn_agent blocking; change it to only wait when an
explicit synchronous/probe mode is requested (e.g., a flag like PROBE_MODE or
BLOCK_ON_SPAWN set earlier in scripts/lib/spawn.sh or passed into spawn_agent),
so by default spawn_agent returns the child PID immediately and async callers in
scripts/async-tmux-features.sh and scripts/orchestrate.sh continue to work;
alternatively, move progress/liveness tracking into the code that consumes the
PID instead of waiting here—update the check around wait "$pid" to gate it on
that mode/flag and ensure callers that need blocking behavior set that flag.

@nyldn
Copy link
Copy Markdown
Owner

nyldn commented Apr 28, 2026

Thanks for jumping on this. I opened #328 with a related fix that keeps spawn_agent non-blocking and moves PID correctness into the parallel callers. That preserves async/tmux behavior while still making probe/develop/YAML/retry progress track the inner provider PID instead of the short-lived wrapper.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants