feat(bench): cold-start benchmark script (measurement only, no CI guard)#357
Open
sebastianbreguel wants to merge 2 commits intomksglu:nextfrom
Open
feat(bench): cold-start benchmark script (measurement only, no CI guard)#357sebastianbreguel wants to merge 2 commits intomksglu:nextfrom
sebastianbreguel wants to merge 2 commits intomksglu:nextfrom
Conversation
Add tests/cold-start-benchmark.ts — spawns `node start.mjs` N times, polls each child's MCP readiness sentinel (PID-stamped, written after server.connect() resolves), reports p50/p95/p99 + skip count. This PR is measurement only. No vitest assertion, no budget. Goal is to give the owner a real cold-start baseline before a follow-up adds a regression guard with empirically-grounded thresholds (mirrors the fuzzyCache U1 → U2 sequencing). What it measures: process spawn → start.mjs self-heal → ensure-deps → import server.bundle.mjs → server.connect(transport) → sentinel write. Reuses sentinelPathForPid() from hooks/core/mcp-ready.mjs so path resolution stays cross-platform without re-implementing the Linux/macOS hardcoded /tmp + Windows tmpdir() branching. Includes: - 30s per-iter timeout with skip semantics (partial data > no data) - SIGTERM → 500ms → SIGKILL kill chain (graceful shutdown first) - SIGINT cleanup hook on the bench script itself (no orphan leaks) - Env tunables: ITERATIONS (default 10), WARMUP (1), TIMEOUT_MS (30000) Local baseline (macOS arm64, Node v25.9.0, bundle prebuilt): | Metric | Value (ms) | |-----------|------------| | ok-count | 10 | | skip-count| 0 | | min | 272.0 | | p50 | 290.5 | | p95 | 349.0 | | p99 | 349.0 | | max | 349.0 | Run: `npm run bench:cold-start`
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds
tests/cold-start-benchmark.ts— a measurement-only script that spawnsnode start.mjsN times, polls each child's MCP readiness sentinel, and reports p50/p95/p99 of spawn-to-ready latency plus skip count.Measurement only. No vitest assertion, no budget, no CI gate. Goal is to give the project a real cold-start baseline so a follow-up PR can add a regression guard with empirically-grounded thresholds rather than guessed ones (same measure-then-tune sequencing as #356).
This addresses cold-start latency, which #270 explicitly listed as a priority (with a Windows call-out).
What it measures
process spawn→start.mjsself-heal layers →ensure-deps→import server.bundle.mjs→server.connect(transport)→ sentinelwriteFileSync(src/server.ts:2412-2416).Detection uses the existing PID-stamped sentinel (
context-mode-mcp-ready-<pid>viasentinelPathForPid()fromhooks/core/mcp-ready.mjs). No new public APIs, no changes to the readiness contract.Local baseline (macOS arm64, Node v25.9.0, bundle prebuilt)
Scope (deliberately small)
Included:
process.execPath+start.mjsresolved relative to script direxistsSyncat 10ms cadence (deterministic across platforms vsfs.watch)ITERATIONS(default 10),WARMUP(1),TIMEOUT_MS(30000),POLL_MS(10)Explicitly NOT in scope (deferred to follow-up PRs):
ctx_searchend-to-end latency via MCP stdio roundtrip — needs MCP client SDK plumbing; double-scopes this PRCaveats (in script header + below)
server.bundle.mjspresent for representative numbers. Without the bundle,start.mjsfalls into thenpx tsc --silentfirst-build branch (slow). Runnpm run bundlefirst.rm -rf node_modules) is environmental and out of scope.Test plan
node start.mjsprocesses after runTIMEOUT_MS=50→ all iterations recordSKIP (timeout), summary handles all-skip with exit code 1pgrep -f start.mjsafter each test run shows only pre-existing dev MCP servers, no orphans from the benchnpm run typecheck)tmpdir()on Windows via existingsentinelDir())Bench output is paste-ready into a follow-up PR body so the regression-guard PR can quote real numbers.