feat(bench): cold-start benchmark script (measurement only, no CI guard) by sebastianbreguel · Pull Request #357 · mksglu/context-mode

sebastianbreguel · 2026-04-27T04:30:23Z

Summary

Adds tests/cold-start-benchmark.ts — a measurement-only script that spawns node start.mjs N times, polls each child's MCP readiness sentinel, and reports p50/p95/p99 of spawn-to-ready latency plus skip count.

Measurement only. No vitest assertion, no budget, no CI gate. Goal is to give the project a real cold-start baseline so a follow-up PR can add a regression guard with empirically-grounded thresholds rather than guessed ones (same measure-then-tune sequencing as #356).

This addresses cold-start latency, which #270 explicitly listed as a priority (with a Windows call-out).

What it measures

process spawn → start.mjs self-heal layers → ensure-deps → import server.bundle.mjs → server.connect(transport) → sentinel writeFileSync (src/server.ts:2412-2416).

Detection uses the existing PID-stamped sentinel (context-mode-mcp-ready-<pid> via sentinelPathForPid() from hooks/core/mcp-ready.mjs). No new public APIs, no changes to the readiness contract.

Local baseline (macOS arm64, Node v25.9.0, bundle prebuilt)

Metric	Value (ms)
ok-count	10
skip-count	0
min	272.0
p50	290.5
p95	349.0
p99	349.0
max	349.0

$ npm run bench:cold-start
Context Mode — Cold-Start Benchmark
====================================
Node:        v25.9.0
Platform:    darwin (arm64)
Bundle:      PRESENT
Iterations:  10 (warmup: 1)
Timeout:     30000ms per iteration
Warming up (1 iteration, discarded)...
  warmup 1: 282.6ms
  iteration 1: 349.0ms
  iteration 2: 305.6ms
  ...

Scope (deliberately small)

Included:

Cross-platform spawn via process.execPath + start.mjs resolved relative to script dir
Sentinel polling via existsSync at 10ms cadence (deterministic across platforms vs fs.watch)
30s per-iter timeout → SKIP semantics (partial data > no data)
SIGTERM → 500ms grace → SIGKILL cleanup chain (server's own graceful shutdown handler unlinks its sentinel)
SIGINT cleanup hook on the bench script itself — no orphaned children if the bench is killed mid-run
Env tunables: ITERATIONS (default 10), WARMUP (1), TIMEOUT_MS (30000), POLL_MS (10)

Explicitly NOT in scope (deferred to follow-up PRs):

Regression guard test with budget — needs the baseline numbers from this PR first
First-ctx_search end-to-end latency via MCP stdio roundtrip — needs MCP client SDK plumbing; double-scopes this PR
Windows-specific cold-start audit — separate PR; this script will be the measurement tool that audit uses

Caveats (in script header + below)

Requires server.bundle.mjs present for representative numbers. Without the bundle, start.mjs falls into the npx tsc --silent first-build branch (slow). Run npm run bundle first.
"Cold per spawn" means fresh node process per iteration — not fresh disk state. Disk caches stay warm across iterations. True ice-cold (post-rm -rf node_modules) is environmental and out of scope.
Wall-clock noise on shared-runner CI is real. p95 (not p99) is the headline number.

Test plan

Happy path: 10 iterations on macOS arm64 with bundle prebuilt → p95 = 349ms, 0 skips, no orphaned node start.mjs processes after run
Timeout path: TIMEOUT_MS=50 → all iterations record SKIP (timeout), summary handles all-skip with exit code 1
Cleanup: pgrep -f start.mjs after each test run shows only pre-existing dev MCP servers, no orphans from the bench
Typecheck clean (npm run typecheck)
Linux Node 22 — owner CI matrix
Windows Node 22 — owner CI matrix (sentinel uses tmpdir() on Windows via existing sentinelDir())

Bench output is paste-ready into a follow-up PR body so the regression-guard PR can quote real numbers.

Add tests/cold-start-benchmark.ts — spawns `node start.mjs` N times, polls each child's MCP readiness sentinel (PID-stamped, written after server.connect() resolves), reports p50/p95/p99 + skip count. This PR is measurement only. No vitest assertion, no budget. Goal is to give the owner a real cold-start baseline before a follow-up adds a regression guard with empirically-grounded thresholds (mirrors the fuzzyCache U1 → U2 sequencing). What it measures: process spawn → start.mjs self-heal → ensure-deps → import server.bundle.mjs → server.connect(transport) → sentinel write. Reuses sentinelPathForPid() from hooks/core/mcp-ready.mjs so path resolution stays cross-platform without re-implementing the Linux/macOS hardcoded /tmp + Windows tmpdir() branching. Includes: - 30s per-iter timeout with skip semantics (partial data > no data) - SIGTERM → 500ms → SIGKILL kill chain (graceful shutdown first) - SIGINT cleanup hook on the bench script itself (no orphan leaks) - Env tunables: ITERATIONS (default 10), WARMUP (1), TIMEOUT_MS (30000) Local baseline (macOS arm64, Node v25.9.0, bundle prebuilt): | Metric | Value (ms) | |-----------|------------| | ok-count | 10 | | skip-count| 0 | | min | 272.0 | | p50 | 290.5 | | p95 | 349.0 | | p99 | 349.0 | | max | 349.0 | Run: `npm run bench:cold-start`

github-actions Bot and others added 2 commits April 26, 2026 12:35

ci: update server.bundle.mjs, cli.bundle.mjs & session hook bundles

033e2a4

mksglu changed the base branch from main to next April 27, 2026 07:43

mksglu force-pushed the next branch from 106d70c to 859b87c Compare April 27, 2026 15:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(bench): cold-start benchmark script (measurement only, no CI guard)#357

feat(bench): cold-start benchmark script (measurement only, no CI guard)#357
sebastianbreguel wants to merge 2 commits intomksglu:nextfrom
sebastianbreguel:feat/cold-start-bench

sebastianbreguel commented Apr 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

sebastianbreguel commented Apr 27, 2026

Summary

What it measures

Local baseline (macOS arm64, Node v25.9.0, bundle prebuilt)

Scope (deliberately small)

Caveats (in script header + below)

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant