diff --git a/codev/projects/bugfix-1024-vscode-cli-preflight-400ms-tim/status.yaml b/codev/projects/bugfix-1024-vscode-cli-preflight-400ms-tim/status.yaml new file mode 100644 index 000000000..b4350a472 --- /dev/null +++ b/codev/projects/bugfix-1024-vscode-cli-preflight-400ms-tim/status.yaml @@ -0,0 +1,16 @@ +id: bugfix-1024 +title: vscode-cli-preflight-400ms-tim +protocol: bugfix +phase: pr +plan_phases: [] +current_plan_phase: null +gates: + pr: + status: pending + requested_at: '2026-06-10T09:42:53.334Z' +iteration: 1 +build_complete: false +history: [] +started_at: '2026-06-10T09:25:24.642Z' +updated_at: '2026-06-10T09:42:53.334Z' +pr_ready_for_human: true diff --git a/codev/state/bugfix-1024_thread.md b/codev/state/bugfix-1024_thread.md new file mode 100644 index 000000000..c99f0b09c --- /dev/null +++ b/codev/state/bugfix-1024_thread.md @@ -0,0 +1,79 @@ +# bugfix-1024 — CLI preflight 400ms timeout false "CLI missing" + +## Investigate + +Issue #1024: `VERSION_TIMEOUT_MS = 400` in `packages/vscode/src/preflight/preflight.ts:42` +is too tight. On slow envs (remote SSH, WSL, nvm shims, AV scanning) the `codev --version` +probe exceeds 400ms → `decidePreflight` returns `missing` → walkthrough opens, Status row +sticks at `missing`, guarded commands no-op. Root cause confirmed by reading the source: the +budget value, not the resolver chain. + +### Fix shape (mechanical, per issue) +1. Bump default 400 → 5000ms. +2. New setting `codev.cliVersionTimeoutMs` (number, default 5000, min 100, max 60000) in + `package.json`, read by the preflight via `getConfiguration('codev')`. +3. Log a `[Preflight]` line to the OutputChannel when the cap fires (timeout), naming the + value + recovery action. +4. Unit test `runCodevVersion` honours explicit `timeoutMs`; default falls back when unset. + +### Key design decision +`runCodevVersion` is **vscode-free** (only `spawn` + timer). Importing `preflight.ts` for a +unit test would drag in `EventEmitter` (constructed at module load), `TowerClient`, +`tower-starter` — fragile. So I relocate `runCodevVersion` + a pure `resolveVersionTimeout` +helper + the timeout constants into `preflight-core.ts` (loads only `node:path`). The probe +is unchanged (not rewritten — out-of-scope respected), just moved to the file the project +already unit-tests under vitest. Added a `timedOut` flag to the return so the glue can log +the timeout case distinctly from spawn-error / non-zero-exit. + +## Fix → PR + +Implemented. `porch check` green (build 4.4s, tests 20.1s). Note: the porch +`tests` check is `pnpm --filter @cluesmith/codev test` (the codev-package vitest), +NOT the vscode-package vitest — so it doesn't touch my files, but it passes. +The vscode-package vitest has 7 pre-existing FAILING files (unbuilt +@cluesmith/codev-core / codev-types workspace-dep resolution, e.g. +terminal-adapter/terminal-resolve/reconnect-link) — unrelated to this change. +My new `preflight-version-timeout.test.ts` (12 tests) + the existing +`preflight-core.test.ts` (31) both pass. + +PR #1026 created (`Fixes #1024`). Running CMAP (gemini/codex/claude, --type pr). + +Gotcha: consult auto-detect failed with "Multiple projects found" because this +worktree is `.builders/bugfix-1024` (no `-` suffix), which the consult +project-resolver regex `\.builders/[^/]*?-?(\d+)-([^/]+)` can't match. Fix: +pass `--project-id bugfix-1024` explicitly. + +## CMAP + PR gate + +CMAP verdicts on PR #1026: +- gemini: SKIPPED (agy/Antigravity unauthenticated — non-blocking per design) +- codex: APPROVE (MEDIUM) +- claude: APPROVE (HIGH) + +Both codex and claude independently flagged stale `400ms` comments (extension.ts +activation comment + the #983 Tower-divergence comment in preflight.ts). Both +non-blocking, but addressed in f855d227 (comment-only, version-agnostic now). + +Notified architect. Ran `porch done` → `porch gate` → now WAITING at the `pr` +gate for `porch approve bugfix-1024 pr`. STOP here per strict-mode protocol. + +## Simplification (architect review at pr gate) + +Architect pushed back on the `resolveVersionTimeout` helper + MIN/MAX constants +as over-built for a constant-bump bugfix. Walked through it: the helper's stated +purpose (unset-fallback) is dead at runtime — VSCode returns the package.json +`default` for an unset setting, so the undefined branch only ran in tests; its +only live value was the clamp + a non-number guard, both marginal for a +`"type": "number"` setting VSCode already validates in its UI. + +Collapsed to the codebase idiom (matches `overviewRefreshSeconds`): +`getConfiguration('codev').get(KEY, DEFAULT_VERSION_TIMEOUT_MS)`. +Deleted `resolveVersionTimeout`, `MIN_VERSION_TIMEOUT_MS`, `MAX_VERSION_TIMEOUT_MS`. +Bounds (100/60000) now live only in package.json (UI-enforced). `5000` lives in +package.json + `DEFAULT_VERSION_TIMEOUT_MS` (param default + test anchor) — the +irreducible minimum. Trade-off accepted: no runtime clamp of a hand-edited +out-of-range `settings.json` value. + +Tests: dropped the resolveVersionTimeout cases; runCodevVersion suite now covers +explicit-timeout (positive), default-when-omitted (negative), ok/spawn-error, +and the 5000 regression anchor. check-types + lint + vitest green (41 tests). diff --git a/packages/vscode/package.json b/packages/vscode/package.json index bc933aff2..682c032cc 100644 --- a/packages/vscode/package.json +++ b/packages/vscode/package.json @@ -730,6 +730,13 @@ "default": true, "description": "Auto-start Tower if not running on activation" }, + "codev.cliVersionTimeoutMs": { + "type": "number", + "default": 5000, + "minimum": 100, + "maximum": 60000, + "description": "Maximum time (in milliseconds) to wait for `codev --version` during the startup CLI preflight. Bump higher if you're on a slow remote / VM / network filesystem and the 'Get started with Codev' walkthrough keeps appearing despite codev being installed (check the Codev Output channel for `[Preflight] codev --version timed out` lines to confirm). Default 5000ms covers most local and remote-dev environments." + }, "codev.autoOpenBuilderTerminal": { "type": "string", "enum": ["off", "notify", "auto"], diff --git a/packages/vscode/src/__tests__/preflight-version-timeout.test.ts b/packages/vscode/src/__tests__/preflight-version-timeout.test.ts new file mode 100644 index 000000000..14c4f6e72 --- /dev/null +++ b/packages/vscode/src/__tests__/preflight-version-timeout.test.ts @@ -0,0 +1,81 @@ +/** + * Unit tests for the #1024 probe-timeout behavior. `runCodevVersion` lives in + * `preflight-core.ts` (vscode-free), so this runs under vitest with no vscode + * mock. It spawns a real process; we drive it with tiny temp scripts (a fast one + * that prints a version, a slow one that hangs) so the timeout path is exercised + * deterministically without depending on `codev`. + * + * The setting → timeout plumbing in `preflight.ts` is just an idiomatic + * `getConfiguration('codev').get(key, DEFAULT_VERSION_TIMEOUT_MS)` read + * (VSCode supplies the package.json default when unset, and enforces the + * contributed min/max in its settings UI), so it isn't unit-tested here. + */ + +import { describe, it, expect, beforeAll, afterAll } from 'vitest'; +import { mkdtempSync, writeFileSync, chmodSync, rmSync } from 'node:fs'; +import { tmpdir } from 'node:os'; +import { join } from 'node:path'; +import { + DEFAULT_VERSION_TIMEOUT_MS, + runCodevVersion, +} from '../preflight/preflight-core.js'; + +describe('runCodevVersion', () => { + let dir: string; + let fastBin: string; + let slowBin: string; + + beforeAll(() => { + dir = mkdtempSync(join(tmpdir(), 'codev-preflight-')); + // Ignores its args (so the hardcoded `--version` is irrelevant) and prints + // a version immediately. + fastBin = join(dir, 'fast.sh'); + writeFileSync(fastBin, '#!/bin/sh\necho 3.1.9\n'); + chmodSync(fastBin, 0o755); + // Hangs well past any test timeout so the probe's own timer is what settles it. + slowBin = join(dir, 'slow.sh'); + writeFileSync(slowBin, '#!/bin/sh\nsleep 30\n'); + chmodSync(slowBin, 0o755); + }); + + afterAll(() => { + rmSync(dir, { recursive: true, force: true }); + }); + + it('keeps the default budget at 5000ms (regression guard against the old 400ms)', () => { + expect(DEFAULT_VERSION_TIMEOUT_MS).toBe(5000); + }); + + it('resolves ok with stdout when the probe completes within budget', async () => { + const result = await runCodevVersion(fastBin, null, 5000); + expect(result.ok).toBe(true); + expect(result.timedOut).toBe(false); + expect(result.stdout).toContain('3.1.9'); + }); + + it('falls back to the default budget when no timeoutMs is passed', async () => { + // Negative case (#1024 acceptance): with the setting unset, the glue passes + // the default through; the probe must still succeed under it, not hang. + const result = await runCodevVersion(fastBin, null); + expect(result.ok).toBe(true); + expect(result.timedOut).toBe(false); + }); + + it('honours an explicit timeoutMs: kills a hung probe and reports timedOut', async () => { + // Positive case (#1024 acceptance): a binary that never returns is killed at + // the supplied budget, not left to hang. A generous budget would let `sleep + // 30` outlast the test, so the small explicit value is what makes this pass. + const start = Date.now(); + const result = await runCodevVersion(slowBin, null, 150); + const elapsed = Date.now() - start; + expect(result.ok).toBe(false); + expect(result.timedOut).toBe(true); + expect(elapsed).toBeLessThan(5000); + }); + + it('reports ok=false (not timedOut) when the binary cannot be spawned', async () => { + const result = await runCodevVersion(join(dir, 'does-not-exist'), null, 1000); + expect(result.ok).toBe(false); + expect(result.timedOut).toBe(false); + }); +}); diff --git a/packages/vscode/src/extension.ts b/packages/vscode/src/extension.ts index 0e6d40e2c..f96c47818 100644 --- a/packages/vscode/src/extension.ts +++ b/packages/vscode/src/extension.ts @@ -919,7 +919,8 @@ export async function activate(context: vscode.ExtensionContext) { // CLI preflight (#791): verify the codev CLI is installed and >= this // extension's version. Fire-and-forget so activation isn't blocked — the - // probe self-bounds at 400ms and caches its result for the session. Uses + // probe self-bounds at the `codev.cliVersionTimeoutMs` budget (#1024) and + // caches its result for the session. Uses // detectWorkspacePath() directly (connectionManager.getWorkspacePath() isn't // populated until initialize() resolves, which may wait on Tower auto-start). runPreflight(context, detectWorkspacePath(), outputChannel); diff --git a/packages/vscode/src/preflight/preflight-core.ts b/packages/vscode/src/preflight/preflight-core.ts index f179f6905..af0763d57 100644 --- a/packages/vscode/src/preflight/preflight-core.ts +++ b/packages/vscode/src/preflight/preflight-core.ts @@ -1,15 +1,68 @@ /** - * Pure, vscode-free logic for the startup CLI preflight (#791). + * vscode-free logic for the startup CLI preflight (#791). * * Kept free of any `vscode` import so it can be unit-tested under vitest - * (`src/__tests__/preflight-core.test.ts`). The vscode-dependent glue — - * spawning the CLI, toasts, walkthrough, the Status-view row — lives in - * `preflight.ts` and is reviewed by running the worktree at the - * `dev-approval` gate. + * (`src/__tests__/preflight-core.test.ts`). This includes the `codev --version` + * probe (`runCodevVersion`): it only needs `spawn` + a timer, no vscode, so it + * lives here where it's directly testable. The genuinely vscode-dependent glue + * — reading the timeout setting, toasts, walkthrough, the Status-view row — + * lives in `preflight.ts` and is reviewed by running the worktree. */ +import { spawn } from 'node:child_process'; import { resolve } from 'node:path'; +/** + * Default hard cap on the `codev --version` probe so a hung binary can't stall + * startup. Bumped from 400ms to 5000ms (#1024): 400ms was too tight against the + * legitimate latency sources on slow envs (remote SSH, WSL, nvm/fnm/volta shim + * re-resolution, AV scanning, network filesystems), causing a false `missing` + * → the `Get started with Codev` walkthrough opening despite a healthy install. + * Overridable per-workspace via the `codev.cliVersionTimeoutMs` setting. + */ +export const DEFAULT_VERSION_TIMEOUT_MS = 5000; + +/** + * Spawn `codev --version` with a hard timeout. Resolves `{ ok, stdout, timedOut }`; + * `ok` is false on spawn error (binary not on PATH), non-zero exit, or timeout + * (the child is killed). `timedOut` distinguishes the timeout case so the glue + * can log it — the false-`missing` symptom #1024 is about — distinctly from a + * genuinely absent / broken binary. + */ +export function runCodevVersion( + codevPath: string, + cwd: string | null, + timeoutMs: number = DEFAULT_VERSION_TIMEOUT_MS, +): Promise<{ ok: boolean; stdout: string; timedOut: boolean }> { + return new Promise((resolveResult) => { + let stdout = ''; + let settled = false; + const finish = (ok: boolean, timedOut: boolean) => { + if (!settled) { + settled = true; + resolveResult({ ok, stdout, timedOut }); + } + }; + + let child: ReturnType; + try { + child = spawn(codevPath, ['--version'], { cwd: cwd ?? undefined }); + } catch { + finish(false, false); + return; + } + + const timer = setTimeout(() => { + try { child.kill(); } catch { /* already gone */ } + finish(false, true); + }, timeoutMs); + + child.stdout?.on('data', (d: Buffer) => { stdout += d.toString(); }); + child.on('error', () => { clearTimeout(timer); finish(false, false); }); + child.on('close', (code) => { clearTimeout(timer); finish(code === 0, false); }); + }); +} + /** * The outcome of a preflight check. * diff --git a/packages/vscode/src/preflight/preflight.ts b/packages/vscode/src/preflight/preflight.ts index 741d01e34..f2ec54b99 100644 --- a/packages/vscode/src/preflight/preflight.ts +++ b/packages/vscode/src/preflight/preflight.ts @@ -16,15 +16,16 @@ */ import * as vscode from 'vscode'; -import { spawn } from 'node:child_process'; import { existsSync } from 'node:fs'; import type { TowerClient } from '@cluesmith/codev-core/tower-client'; import { decidePreflight, decideTowerStatus, + DEFAULT_VERSION_TIMEOUT_MS, parseCliVersion, preflightFeedbackMessage, resolveCodevPath, + runCodevVersion, towerDivergenceMessage, type PreflightStatus, type TowerStatus, @@ -38,8 +39,8 @@ const WALKTHROUGH_ID = 'cluesmith.codev-vscode#codevGettingStarted'; const WALKTHROUGH_SHOWN_KEY = 'codev.preflight.walkthroughShown'; /** Install docs surfaced from the outdated-CLI notification and walkthrough. */ export const INSTALL_DOCS_URL = 'https://github.com/cluesmith/codev#quick-start'; -/** Hard cap on the `codev --version` probe so a hung binary can't stall startup. */ -const VERSION_TIMEOUT_MS = 400; +/** The setting that overrides the `codev --version` probe timeout (#1024). */ +const VERSION_TIMEOUT_SETTING = 'cliVersionTimeoutMs'; /** The command users / UI invoke to re-verify after fixing the CLI. */ export const RECHECK_COMMAND = 'codev.recheckCli'; @@ -93,52 +94,13 @@ export function getPreflightState(): PreflightState { /** * Whether CLI-dependent commands may run. Optimistic: `pending` (preflight - * hasn't finished its <400ms background probe yet) counts as ready so a - * command fired during the startup window isn't falsely blocked. + * hasn't finished its background `codev --version` probe yet) counts as ready + * so a command fired during the startup window isn't falsely blocked. */ export function isCliReady(): boolean { return cachedStatus === 'ok' || cachedStatus === 'pending'; } -/** - * Spawn `codev --version` with a hard timeout. Resolves `{ ok, stdout }`; - * `ok` is false on spawn error (binary not on PATH), non-zero exit, or - * timeout (the child is killed). - */ -function runCodevVersion( - codevPath: string, - cwd: string | null, - timeoutMs = VERSION_TIMEOUT_MS, -): Promise<{ ok: boolean; stdout: string }> { - return new Promise((resolveResult) => { - let stdout = ''; - let settled = false; - const finish = (ok: boolean) => { - if (!settled) { - settled = true; - resolveResult({ ok, stdout }); - } - }; - - let child: ReturnType; - try { - child = spawn(codevPath, ['--version'], { cwd: cwd ?? undefined }); - } catch { - finish(false); - return; - } - - const timer = setTimeout(() => { - try { child.kill(); } catch { /* already gone */ } - finish(false); - }, timeoutMs); - - child.stdout?.on('data', (d: Buffer) => { stdout += d.toString(); }); - child.on('error', () => { clearTimeout(timer); finish(false); }); - child.on('close', (code) => { clearTimeout(timer); finish(code === 0); }); - }); -} - /** * Resolve → probe → decide → cache → dispatch UX. Shared by activation * (`runPreflight`) and the recheck command (`recheckCli`). @@ -151,7 +113,13 @@ async function performPreflight(): Promise { const extVersion = context.extension.packageJSON.version as string; const codevPath = resolveCodevPath(workspacePath, existsSync); - const { ok, stdout } = await runCodevVersion(codevPath, workspacePath); + // VSCode resolves an unset setting to the package.json-declared default, and + // enforces the contributed `minimum`/`maximum` in its settings UI; the inline + // default here is the belt-and-suspenders fallback for that read. + const timeoutMs = vscode.workspace + .getConfiguration('codev') + .get(VERSION_TIMEOUT_SETTING, DEFAULT_VERSION_TIMEOUT_MS); + const { ok, stdout, timedOut } = await runCodevVersion(codevPath, workspacePath, timeoutMs); const cliVersion = parseCliVersion(stdout); const status = decidePreflight({ cliFound: ok, cliVersion, extVersion }); @@ -169,6 +137,17 @@ async function performPreflight(): Promise { + `cli=${cliVersion ?? 'none'} ext=${extVersion}`, ); + // #1024: a timeout (not a genuinely absent binary) is the false-`missing` + // case. Surface it so the next person who hits a slow-env false-negative can + // diagnose it instead of being silently told their CLI is missing. + if (timedOut) { + outputChannel.appendLine( + `[${new Date().toISOString()}] [Preflight] codev --version timed out after ` + + `${timeoutMs}ms, falling back to 'missing'. Run \`codev --version\` manually; ` + + `if it succeeds slowly, raise the 'codev.cliVersionTimeoutMs' setting.`, + ); + } + if (status === 'missing') { maybeOpenWalkthrough(context); } else if (status === 'outdated') { @@ -177,7 +156,7 @@ async function performPreflight(): Promise { // #983: the Tower-divergence comparison needs the installed-CLI version this // check just resolved. If a Tower probe already ran (Tower connected before - // this ~400ms CLI check finished), it saw `installedCli = null` and reported + // this CLI check finished), it saw `installedCli = null` and reported // `ok`; re-run it now with the resolved version so the divergence isn't lost // to that startup race. No-op until the first probe sets `lastTowerClient`. if (lastTowerClient) {