Skip to content

feat(bench): cold-start benchmark script (measurement only, no CI guard)#357

Open
sebastianbreguel wants to merge 2 commits intomksglu:nextfrom
sebastianbreguel:feat/cold-start-bench
Open

feat(bench): cold-start benchmark script (measurement only, no CI guard)#357
sebastianbreguel wants to merge 2 commits intomksglu:nextfrom
sebastianbreguel:feat/cold-start-bench

Conversation

@sebastianbreguel
Copy link
Copy Markdown
Contributor

Summary

Adds tests/cold-start-benchmark.ts — a measurement-only script that spawns node start.mjs N times, polls each child's MCP readiness sentinel, and reports p50/p95/p99 of spawn-to-ready latency plus skip count.

Measurement only. No vitest assertion, no budget, no CI gate. Goal is to give the project a real cold-start baseline so a follow-up PR can add a regression guard with empirically-grounded thresholds rather than guessed ones (same measure-then-tune sequencing as #356).

This addresses cold-start latency, which #270 explicitly listed as a priority (with a Windows call-out).

What it measures

process spawnstart.mjs self-heal layers → ensure-depsimport server.bundle.mjsserver.connect(transport) → sentinel writeFileSync (src/server.ts:2412-2416).

Detection uses the existing PID-stamped sentinel (context-mode-mcp-ready-<pid> via sentinelPathForPid() from hooks/core/mcp-ready.mjs). No new public APIs, no changes to the readiness contract.

Local baseline (macOS arm64, Node v25.9.0, bundle prebuilt)

Metric Value (ms)
ok-count 10
skip-count 0
min 272.0
p50 290.5
p95 349.0
p99 349.0
max 349.0
$ npm run bench:cold-start
Context Mode — Cold-Start Benchmark
====================================
Node:        v25.9.0
Platform:    darwin (arm64)
Bundle:      PRESENT
Iterations:  10 (warmup: 1)
Timeout:     30000ms per iteration
Warming up (1 iteration, discarded)...
  warmup 1: 282.6ms
  iteration 1: 349.0ms
  iteration 2: 305.6ms
  ...

Scope (deliberately small)

Included:

  • Cross-platform spawn via process.execPath + start.mjs resolved relative to script dir
  • Sentinel polling via existsSync at 10ms cadence (deterministic across platforms vs fs.watch)
  • 30s per-iter timeout → SKIP semantics (partial data > no data)
  • SIGTERM → 500ms grace → SIGKILL cleanup chain (server's own graceful shutdown handler unlinks its sentinel)
  • SIGINT cleanup hook on the bench script itself — no orphaned children if the bench is killed mid-run
  • Env tunables: ITERATIONS (default 10), WARMUP (1), TIMEOUT_MS (30000), POLL_MS (10)

Explicitly NOT in scope (deferred to follow-up PRs):

  • Regression guard test with budget — needs the baseline numbers from this PR first
  • First-ctx_search end-to-end latency via MCP stdio roundtrip — needs MCP client SDK plumbing; double-scopes this PR
  • Windows-specific cold-start audit — separate PR; this script will be the measurement tool that audit uses

Caveats (in script header + below)

  • Requires server.bundle.mjs present for representative numbers. Without the bundle, start.mjs falls into the npx tsc --silent first-build branch (slow). Run npm run bundle first.
  • "Cold per spawn" means fresh node process per iteration — not fresh disk state. Disk caches stay warm across iterations. True ice-cold (post-rm -rf node_modules) is environmental and out of scope.
  • Wall-clock noise on shared-runner CI is real. p95 (not p99) is the headline number.

Test plan

  • Happy path: 10 iterations on macOS arm64 with bundle prebuilt → p95 = 349ms, 0 skips, no orphaned node start.mjs processes after run
  • Timeout path: TIMEOUT_MS=50 → all iterations record SKIP (timeout), summary handles all-skip with exit code 1
  • Cleanup: pgrep -f start.mjs after each test run shows only pre-existing dev MCP servers, no orphans from the bench
  • Typecheck clean (npm run typecheck)
  • Linux Node 22 — owner CI matrix
  • Windows Node 22 — owner CI matrix (sentinel uses tmpdir() on Windows via existing sentinelDir())

Bench output is paste-ready into a follow-up PR body so the regression-guard PR can quote real numbers.

github-actions Bot and others added 2 commits April 26, 2026 12:35
Add tests/cold-start-benchmark.ts — spawns `node start.mjs` N times,
polls each child's MCP readiness sentinel (PID-stamped, written after
server.connect() resolves), reports p50/p95/p99 + skip count.

This PR is measurement only. No vitest assertion, no budget. Goal is
to give the owner a real cold-start baseline before a follow-up adds
a regression guard with empirically-grounded thresholds (mirrors the
fuzzyCache U1 → U2 sequencing).

What it measures: process spawn → start.mjs self-heal → ensure-deps →
import server.bundle.mjs → server.connect(transport) → sentinel write.

Reuses sentinelPathForPid() from hooks/core/mcp-ready.mjs so path
resolution stays cross-platform without re-implementing the
Linux/macOS hardcoded /tmp + Windows tmpdir() branching.

Includes:
- 30s per-iter timeout with skip semantics (partial data > no data)
- SIGTERM → 500ms → SIGKILL kill chain (graceful shutdown first)
- SIGINT cleanup hook on the bench script itself (no orphan leaks)
- Env tunables: ITERATIONS (default 10), WARMUP (1), TIMEOUT_MS (30000)

Local baseline (macOS arm64, Node v25.9.0, bundle prebuilt):
| Metric    | Value (ms) |
|-----------|------------|
| ok-count  |         10 |
| skip-count|          0 |
| min       |      272.0 |
| p50       |      290.5 |
| p95       |      349.0 |
| p99       |      349.0 |
| max       |      349.0 |

Run: `npm run bench:cold-start`
@mksglu mksglu changed the base branch from main to next April 27, 2026 07:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant