Skip to content

fix(task): add per-request timeout to devFetch http.request() call to prevent indefinite hang #4292

@Nexory

Description

@Nexory

Summary

The devFetch helper defined in src/task.ts — used by both runTask() and listTasks() — wraps http.request() without setting any timeout. If the dev server worker is alive (so _pidIsRunning passes) but stalled or not accepting connections on the socket, the Promise returned by devFetch never settles. The caller — typically a CLI command or a script in a CI pipeline — hangs indefinitely with no error, no diagnostic, and no way to recover short of a job-level kill signal.

What I observed

File: src/task.ts — introduced in PR #3268 (merged 2025-12-16).

// src/task.ts  lines 73-104
const request = http.request(
  url,
  {
    socketPath,
    method: options?.method,
    headers: {
      Accept: "application/json",
      "Content-Type": "application/json",
    },
    // ← no `timeout` option here
  },
  (response) => { /* … */ }
);

request.on("error", (e) => reject(e));
// ← no request.setTimeout() call, no AbortSignal

http.request() supports a timeout field in its options object (Node.js ≥ 8). When omitted, the socket waits until the OS TCP stack times out (often minutes, sometimes never on a Unix domain socket). There is no fallback rejection path.

Suggested fix

Add a timeout option directly to the http.request() options, and destroy the socket on the 'timeout' event so the connection is actually aborted:

const TIMEOUT_MS = 30_000; // 30 s — configurable via opts if desired

const request = http.request(
  url,
  {
    socketPath,
    method: options?.method,
    timeout: TIMEOUT_MS,
    headers: {
      Accept: "application/json",
      "Content-Type": "application/json",
    },
  },
  (response) => { /* … unchanged … */ }
);

request.on("timeout", () => {
  request.destroy(
    new Error(`devFetch timed out after ${TIMEOUT_MS}ms ${_devHint}`)
  );
});
request.on("error", (e) => reject(e));

Calling request.destroy(err) on 'timeout' ensures the 'error' handler fires with a meaningful message, so the existing reject(e) path surfaces to the caller. The timeout value (30 s) is a reasonable default; it could optionally be exposed on TaskRunnerOptions for users who run unusually slow tasks.

Notes

  • The _pidIsRunning guard (line 113) only checks whether the process is alive via kill(pid, 0) — it does not verify that the socket is accepting connections. So a zombie or newly-restarting worker can pass the guard while the socket is not yet ready, making this timeout the only backstop.
  • PR refactor(task): use http.request #3268 replaced ofetch with bare http.request specifically to support socketPath; the timeout was not addressed in that refactor.
  • A TaskRunnerOptions.timeout?: number extension would allow callers to set a tighter limit for CI use, but a safe default is the minimum viable fix.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions