fix(core): bound aria-tree frame.evaluate with a timeout to prevent indefinite agent freeze#511
Conversation
There was a problem hiding this comment.
Pull request overview
This PR prevents indefinite hangs during accessibility-tree snapshotting by bounding Playwright page/frame.evaluate() calls (used by the aria-tree script) with an explicit timeout, allowing unresponsive ad/third-party iframes to be skipped while still surfacing main-frame failures.
Changes:
- Added a
withTimeout()helper and a defaultARIA_FRAME_EVALUATE_TIMEOUT_MSto time-bound aria-tree evaluation. - Introduced
ariaFrameEvaluateTimeoutMsoption onPlaywrightBrowserto configure the per-frame (and main-frame) evaluate timeout. - Added unit tests to verify hung child frames are skipped and main-frame timeouts surface as ordinary errors (not
BrowserDisconnectedError).
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| packages/core/src/browser/playwrightBrowser.ts | Adds configurable timeout bounding for aria-tree evaluate() in main and child frames to prevent indefinite hangs. |
| packages/core/test/playwrightBrowser.test.ts | Adds regression tests covering child-frame hangs and main-frame timeout error classification. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| /** | ||
| * Timeout for evaluating the aria-tree script in a single frame, in milliseconds | ||
| * (default: 5000). A frame that exceeds it is skipped rather than hanging the snapshot. | ||
| * Primarily a testability seam; rarely needs overriding in production. | ||
| */ | ||
| ariaFrameEvaluateTimeoutMs?: number; |
There was a problem hiding this comment.
Fixed in b9f1101 — the option doc now states that a child frame that exceeds the timeout is skipped, while a main-frame timeout surfaces as an error.
| it("surfaces a main-frame evaluate timeout as an error (not a disconnect)", async () => { | ||
| (browser as any).page = { | ||
| evaluate: vi.fn().mockReturnValue(new Promise(() => {})), // never resolves | ||
| frames: vi.fn().mockReturnValue([]), | ||
| mainFrame: vi.fn(), | ||
| }; | ||
|
|
||
| await expect(browser.getTreeWithRefs()).rejects.toThrow(/timed out/i); | ||
| await expect(browser.getTreeWithRefs()).rejects.not.toThrow(BrowserDisconnectedError); | ||
| }); |
There was a problem hiding this comment.
Fixed in b9f1101 — the test now invokes getTreeWithRefs() once, captures the rejection, and asserts both the /timed out/i message and not.toBeInstanceOf(BrowserDisconnectedError) against that single error.
…ndefinite freeze Building the page snapshot evaluated the aria-tree script in every child frame via frame.evaluate, which has no timeout in Playwright and is not governed by setDefaultTimeout. A live-but-unresponsive ad iframe (busy main thread) left the evaluate pending forever — neither resolving nor rejecting — so the existing try/catch (which only catches rejections) never fired and the whole agent hung indefinitely. Reproduced and confirmed via DEBUG=pw:api against theverge.com: frame.evaluate logged "started" with no matching "succeeded" while the process sat idle at 0% CPU and the browser connection stayed alive. Wrap both the main-frame page.evaluate and the per-frame frame.evaluate in a withTimeout race (default 5s, overridable via the ariaFrameEvaluateTimeoutMs option). A per-frame timeout is caught and the frame skipped, exactly as a cross-origin or detached frame already is; a main-frame timeout propagates with a distinct message so it is not misclassified as a BrowserDisconnectedError. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
036e5e8 to
b9f1101
Compare
|
Filed the out-of-scope follow-ups (same class of latent hang, not addressed here):
|
Closes #515 and #516 (filed them as one PR — they overlap; a per-iteration watchdog is the natural mechanism to bound both the stalled stream and the iteration). ## Problem - **#516**: a single agent iteration had no wall-clock bound. `maxIterations` caps the *count*, not time, so one hung operation freezes the whole task forever. - **#515**: `generateAndProcessAction` passes `timeout`/`abortSignal` to `streamText`, but the `for await (fullStream)` loop + trailing `Promise.all` can stall mid-stream with no independent bound. Abort was previously **external-only** (`this.abortSignal` from the caller); no internal watchdog. ## Fix — per-iteration watchdog `runWithIterationWatchdog(work)` in `runMainLoop`: - Creates an `AbortController` + a timer (`iterationTimeoutMs`), and sets `this.currentIterationSignal = AbortSignal.any([external, watchdog])`. - Wraps the iteration body (snapshot + `generateAndProcessAction`) in `Promise.race([work(), timeout])`. **The race is the correctness guarantee** — the iteration is bounded even if an operation ignores the abort signal. The abort is best-effort cleanup of the in-flight LLM call. - The action `streamText` and the validation `generateTextWithRetry` now use `this.currentIterationSignal ?? this.abortSignal` (**#515** — a stalled stream gets aborted). Planning (runs before the loop) and tool creation (once) keep the external signal. - On timeout: `IterationTimeoutError` → dedicated catch branch → ends the task with the new `TaskErrorCode.ITERATION_TIMEOUT`, distinct from the user-abort path. ### Scope - `iterationTimeoutMs` is a **WebAgent constructor option** + `DEFAULT_ITERATION_TIMEOUT_MS = 300000` (5 min), mirroring #511's `ariaFrameEvaluateTimeoutMs`. Not wired through the CLI/PiloConfig system this pass (separable follow-up). 5 min is generous: an iteration legitimately spans an LLM call (~120s) + tool execution (extract can be another ~120s call), so it only fires on a genuine hang. - **Per-iteration only** and **watchdog-only for the stream** (no per-chunk inactivity timer) — both decided up front. The open question "does `abortSignal` interrupt an in-flight `for await`?" doesn't affect correctness here: the `Promise.race` bounds the iteration regardless; the abort is propagation/cleanup. A tighter stream inactivity timer remains a follow-up only if needed. ## Testing New "iteration watchdog" tests (real timers + `iterationTimeoutMs: 50`): - A hung LLM stream → `execute` returns `success:false`, `error.code === "ITERATION_TIMEOUT"` (vs. hanging forever today). - `streamText` receives a watchdog-derived abort signal that fires on timeout. Full core suite: **863 pass** (861 baseline + 2). Typecheck + `prettier --check` clean. One live run vs `https://theverge.com` confirms normal operation still completes (watchdog does not fire at the 300s default). ## Context Follow-up to #511 / #514. Last of the three hang-class reliability issues filed from #511. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…#517) Closes #514. ## Problem `PlaywrightBrowser.ensureOptimizedPageLoad()` waited for `domcontentloaded` with **no timeout**: ```ts await this.page.waitForLoadState("domcontentloaded"); // unbounded ``` The `load` wait immediately below is bounded by `actionTimeoutMs`, and the settle is a fixed 1s — only this first wait was unbounded. A navigation that commits to a page (or SPA soft-navigation) that never fires `DOMContentLoaded` as Playwright tracks it could hang the await indefinitely. Same class of freeze as #511, on the path that runs after **every** navigation (click-nav, `goto`, back/forward). ## Fix Pass `{ timeout: this.actionTimeoutMs }` to the `domcontentloaded` wait, matching the `load` wait. It's already inside a `try/catch` that continues on failure, so a timeout simply proceeds to the bounded `load` wait and the settle delay — **no behavior change on healthy pages**, just removes the unbounded-hang risk. Reuses `actionTimeoutMs` rather than adding a new knob (both waits answer "how long to wait for the page to be ready"). ## Testing - New `ensureOptimizedPageLoad` test asserting `waitForLoadState("domcontentloaded", { timeout: actionTimeoutMs })` is called (guards against the timeout being dropped again). Red before the fix, green after. - Full core suite: **862 pass** (861 baseline + 1). Typecheck + `prettier --check` clean. No live run: this is a latent path (needs a page that never fires `DOMContentLoaded`), not reliably reproducible on demand; the unit test is the meaningful guard and the change mirrors the already-bounded `load` wait. ## Context Follow-up to #511. Sibling reliability issues: #515 (stream timeout), #516 (agent-loop watchdog). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Problem
Pilo intermittently froze indefinitely when driving ad-heavy pages (reproduced against
https://theverge.com, observed with a remote CDP browser service). The agent stopped mid-run with no error, no progress, and 0% CPU.Root cause
getTreeWithRefsImpl()builds the page snapshot by evaluating the aria-tree script in the main frame and in every child frame (page.frames()) — including third-party ad iframes (doubleclick, GPT, etc.).The per-frame loop calls
await frame.evaluate(...), and Playwright'sevaluatehas no timeout option and is not governed bysetDefaultTimeout. When a child frame's execution context is alive but unresponsive (a busy ad-script main thread), the evaluate neither resolves nor rejects — so the existingtry/catch(which only catches detached/cross-origin rejections) never fires, and the await hangs forever.Confirmed with
DEBUG=pw:api: in a captured freeze,frame.evaluatelogged 5×startedbut only 4×succeeded— exactly one outstanding, hung ~9 minutes, while the process sat idle at 0% CPU and the browser connection stayed alive. Intermittent because it depends on which ad frames are present and their JS state at snapshot time.Fix
withTimeout(promise, ms, message)helper (Promise.race+clearTimeoutin.finally).page.evaluateand the per-frameframe.evaluatein it. Default 5000 ms, overridable per instance via a newariaFrameEvaluateTimeoutMsoption (primarily a testability seam).catch→ the frame is skipped, exactly as a cross-origin/detached frame already is. The snapshot completes from the responsive frames.BrowserDisconnectedError.Testing
New unit tests in
playwrightBrowser.test.ts(use a tiny injected timeout under real timers):evaluatenever resolves is skipped; the snapshot still returns the main + responsive-frame content (a hang before the fix).evaluatetimeout surfaces as a/timed out/ierror, not a disconnect.pnpm --filter pilo-core run test→ 861 pass (859 baseline + 2). Core typecheck and rootprettier --checkclean.Live verification (
DEBUG=pw:apivshttps://theverge.com)frame.evaluatePre-fix, the freeze reproduced on the first run. Post-fix, 3/3 runs complete (330 frame evaluations), with run #2 catching and recovering from a real timeout live.
Out of scope (potential follow-ups)
Same class of latent hang, not addressed here:
waitForLoadState("domcontentloaded")inensureOptimizedPageLoad.streamTextfor awaitLLM stream has no iteration-level timeout.🤖 Generated with Claude Code