Skip to content

feat(batch_execute): opt-in concurrency for parallel command runs#358

Open
sebastianbreguel wants to merge 2 commits intomksglu:nextfrom
sebastianbreguel:feat/batch-execute-parallel
Open

feat(batch_execute): opt-in concurrency for parallel command runs#358
sebastianbreguel wants to merge 2 commits intomksglu:nextfrom
sebastianbreguel:feat/batch-execute-parallel

Conversation

@sebastianbreguel
Copy link
Copy Markdown
Contributor

Summary

Adds opt-in concurrency field (default 1, max 8) to ctx_batch_execute. Independent shell commands now run in parallel when requested.

  • Serial path preserved bit-for-bit at concurrency=1 (default). Existing callers unaffected — same shared timeout budget, same cascading skip-on-timeout.
  • Parallel path uses inline worker pool (~10 LOC, no new deps). Order preserved via index-keyed array. Each command gets its full timeout ms (not a shared budget).
  • Deny policy still pre-runs once per batch. __CM_FS__ markers parsed per command. bytesSandboxed accumulator stays race-free (JS single-thread).

Bench (macOS arm64, Node 22, 5× 500ms sleeps)

concurrency wall-clock speedup
1 3336ms 1.00×
2 2120ms 1.57×
4 1598ms 2.09×
8 984ms 3.39×

Reproduce: npm run bench:batch-parallel. Tunable via N, SLEEP_MS, LEVELS env vars. Bench script imports runBatchCommands directly — no MCP stdio in the loop.

Design notes

  • concurrency (integer 1–8) chosen over parallel: boolean for forward flexibility — single knob, clear semantics.
  • Default = 1 (opt-in). Conservative; no breaking change for callers depending on serial side-effects.
  • Inline semaphore (worker pool with shared nextIdx++) instead of p-limit dep.
  • Per-command timeout under concurrency: shared budget loses meaning when commands run in parallel. Documented in tool description.
  • Cascading skip-on-timeout dropped under concurrency (commands already started). Each times out independently.

Test plan

  • Source-scan tests: schema field, tool description, bytesSandboxed callback shape
  • runBatchCommands serial path: happy path, cascading skip, shared timeout
  • runBatchCommands parallel path: happy path, order preservation under reverse delays, concurrency cap, per-cmd timeout, cap clamped at cmd count, FS bytes callback
  • Edge cases: empty commands, empty stdout, nodeOptsPrefix prepending
  • All 118 tests in tests/core/server.test.ts pass
  • npx tsc --noEmit clean
  • Local bench produces expected speedup curve

Files

  • src/server.ts — extracted runBatchCommands + types, added schema field, updated tool description
  • tests/core/server.test.ts — 3 new describe blocks (~180 LOC) + updated existing source-scan test
  • tests/bench/batch-execute-parallel.ts — manual bench script (not vitest)
  • package.jsonbench:batch-parallel script

github-actions Bot and others added 2 commits April 26, 2026 12:35
Add concurrency field (default 1, max 8) to ctx_batch_execute. Serial path
preserved bit-for-bit at concurrency=1. Parallel path uses inline worker pool
(no new deps), preserves output order via index-keyed array, gives each command
its own full timeout instead of shared budget.

Bench (macOS arm64, 5x 500ms sleeps):
- c=1: 3336ms  (baseline)
- c=2: 2120ms  (1.57x)
- c=4: 1598ms  (2.09x)
- c=8:  984ms  (3.39x)

Includes tests/bench/batch-execute-parallel.ts for repeatable measurement.
@mksglu mksglu changed the base branch from main to next April 27, 2026 07:43
@sebastianbreguel
Copy link
Copy Markdown
Contributor Author

Real end-to-end test on Claude Code MCP transport

Spawned npx tsx src/server.ts as a child, spoke MCP JSON-RPC over stdio (initialize → notifications/initialized → tools/list → tools/call), captured the live ctx_batch_execute responses. Not the unit tests, not the bench — the actual MCP path Claude Code uses.

Schema (from tools/list):

{"type":"integer","minimum":1,"maximum":8,"default":1,"description":"Max commands to run in parallel (1-8, default: 1). >1 switches to per-command timeouts (no shared budget) and individual `(timed out)` blocks instead of cascading skip."}

Wall-clock (4 commands with reverse delays: 800ms / 600ms / 400ms / 200ms):

concurrency wall-clock speedup
1 (serial) 2546ms
4 (parallel) 1137ms 2.24×

Serial ≈ Σ delays = 2000ms (+ MCP/index overhead). Parallel ≈ max delay = 800ms (+ overhead).

Order preservation under reverse delays. cmd_d finishes first (200ms) but cmd_a is slowest (800ms). Index summary in the response:

## Indexed Sections

- cmd_a (0.0KB)
- cmd_b (0.0KB)
- cmd_c (0.0KB)
- cmd_d (0.0KB)

Input order, not completion order. Both concurrency=1 and concurrency=4 produce identical ordering — index-keyed worker pool is doing what it claims.

Validation: tools/call with concurrency: 99 is rejected by Zod (max=8) before the handler runs. concurrency: 1 is bit-for-bit the legacy serial path.

Harness script that produced this: https://gist.github.com/sebastianbreguel — happy to inline if you want it added under tests/ instead of as a one-shot.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant