Skip to content

fix(core): bound aria-tree frame.evaluate with a timeout to prevent indefinite agent freeze#511

Merged
lmorchard merged 1 commit into
mainfrom
fix/aria-tree-frame-evaluate-hang
Jun 4, 2026
Merged

fix(core): bound aria-tree frame.evaluate with a timeout to prevent indefinite agent freeze#511
lmorchard merged 1 commit into
mainfrom
fix/aria-tree-frame-evaluate-hang

Conversation

@lmorchard

Copy link
Copy Markdown
Collaborator

Problem

Pilo intermittently froze indefinitely when driving ad-heavy pages (reproduced against https://theverge.com, observed with a remote CDP browser service). The agent stopped mid-run with no error, no progress, and 0% CPU.

Root cause

getTreeWithRefsImpl() builds the page snapshot by evaluating the aria-tree script in the main frame and in every child frame (page.frames()) — including third-party ad iframes (doubleclick, GPT, etc.).

The per-frame loop calls await frame.evaluate(...), and Playwright's evaluate has no timeout option and is not governed by setDefaultTimeout. When a child frame's execution context is alive but unresponsive (a busy ad-script main thread), the evaluate neither resolves nor rejects — so the existing try/catch (which only catches detached/cross-origin rejections) never fires, and the await hangs forever.

Confirmed with DEBUG=pw:api: in a captured freeze, frame.evaluate logged started but only 4× succeeded — exactly one outstanding, hung ~9 minutes, while the process sat idle at 0% CPU and the browser connection stayed alive. Intermittent because it depends on which ad frames are present and their JS state at snapshot time.

Fix

  • Add a small withTimeout(promise, ms, message) helper (Promise.race + clearTimeout in .finally).
  • Wrap the main-frame page.evaluate and the per-frame frame.evaluate in it. Default 5000 ms, overridable per instance via a new ariaFrameEvaluateTimeoutMs option (primarily a testability seam).
  • Per-frame timeout → caught by the existing catch → the frame is skipped, exactly as a cross-origin/detached frame already is. The snapshot completes from the responsive frames.
  • Main-frame timeout → propagates with a distinct message (the page's own document is unusable), so it is not misclassified as a BrowserDisconnectedError.

Testing

New unit tests in playwrightBrowser.test.ts (use a tiny injected timeout under real timers):

  • A child frame whose evaluate never resolves is skipped; the snapshot still returns the main + responsive-frame content (a hang before the fix).
  • A main-frame evaluate timeout surfaces as a /timed out/i error, not a disconnect.

pnpm --filter pilo-core run test861 pass (859 baseline + 2). Core typecheck and root prettier --check clean.

Live verification (DEBUG=pw:api vs https://theverge.com)

Run frame.evaluate Outcome
1 89 / 89 Completed, full answer — zero false-positive timeouts on healthy frames
2 69 / 68 One unresponsive frame timed out & was skipped; agent recovered and completed
3 172 / 172 Completed, full answer

Pre-fix, the freeze reproduced on the first run. Post-fix, 3/3 runs complete (330 frame evaluations), with run #2 catching and recovering from a real timeout live.

Out of scope (potential follow-ups)

Same class of latent hang, not addressed here:

  • Unbounded waitForLoadState("domcontentloaded") in ensureOptimizedPageLoad.
  • No global per-iteration / per-task watchdog around the agent loop.
  • streamText for await LLM stream has no iteration-level timeout.

🤖 Generated with Claude Code

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR prevents indefinite hangs during accessibility-tree snapshotting by bounding Playwright page/frame.evaluate() calls (used by the aria-tree script) with an explicit timeout, allowing unresponsive ad/third-party iframes to be skipped while still surfacing main-frame failures.

Changes:

  • Added a withTimeout() helper and a default ARIA_FRAME_EVALUATE_TIMEOUT_MS to time-bound aria-tree evaluation.
  • Introduced ariaFrameEvaluateTimeoutMs option on PlaywrightBrowser to configure the per-frame (and main-frame) evaluate timeout.
  • Added unit tests to verify hung child frames are skipped and main-frame timeouts surface as ordinary errors (not BrowserDisconnectedError).

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
packages/core/src/browser/playwrightBrowser.ts Adds configurable timeout bounding for aria-tree evaluate() in main and child frames to prevent indefinite hangs.
packages/core/test/playwrightBrowser.test.ts Adds regression tests covering child-frame hangs and main-frame timeout error classification.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +79 to +84
/**
* Timeout for evaluating the aria-tree script in a single frame, in milliseconds
* (default: 5000). A frame that exceeds it is skipped rather than hanging the snapshot.
* Primarily a testability seam; rarely needs overriding in production.
*/
ariaFrameEvaluateTimeoutMs?: number;

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in b9f1101 — the option doc now states that a child frame that exceeds the timeout is skipped, while a main-frame timeout surfaces as an error.

Comment on lines +1325 to +1334
it("surfaces a main-frame evaluate timeout as an error (not a disconnect)", async () => {
(browser as any).page = {
evaluate: vi.fn().mockReturnValue(new Promise(() => {})), // never resolves
frames: vi.fn().mockReturnValue([]),
mainFrame: vi.fn(),
};

await expect(browser.getTreeWithRefs()).rejects.toThrow(/timed out/i);
await expect(browser.getTreeWithRefs()).rejects.not.toThrow(BrowserDisconnectedError);
});

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in b9f1101 — the test now invokes getTreeWithRefs() once, captures the rejection, and asserts both the /timed out/i message and not.toBeInstanceOf(BrowserDisconnectedError) against that single error.

…ndefinite freeze

Building the page snapshot evaluated the aria-tree script in every child frame
via frame.evaluate, which has no timeout in Playwright and is not governed by
setDefaultTimeout. A live-but-unresponsive ad iframe (busy main thread) left the
evaluate pending forever — neither resolving nor rejecting — so the existing
try/catch (which only catches rejections) never fired and the whole agent hung
indefinitely. Reproduced and confirmed via DEBUG=pw:api against theverge.com:
frame.evaluate logged "started" with no matching "succeeded" while the process
sat idle at 0% CPU and the browser connection stayed alive.

Wrap both the main-frame page.evaluate and the per-frame frame.evaluate in a
withTimeout race (default 5s, overridable via the ariaFrameEvaluateTimeoutMs
option). A per-frame timeout is caught and the frame skipped, exactly as a
cross-origin or detached frame already is; a main-frame timeout propagates with
a distinct message so it is not misclassified as a BrowserDisconnectedError.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@lmorchard lmorchard force-pushed the fix/aria-tree-frame-evaluate-hang branch from 036e5e8 to b9f1101 Compare June 4, 2026 05:27
@lmorchard lmorchard requested a review from Copilot June 4, 2026 05:27

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated no new comments.

@lmorchard

Copy link
Copy Markdown
Collaborator Author

Filed the out-of-scope follow-ups (same class of latent hang, not addressed here):

@lmorchard lmorchard merged commit 16466a2 into main Jun 4, 2026
14 checks passed
lmorchard added a commit that referenced this pull request Jun 9, 2026
Closes #515 and #516 (filed them as one PR — they overlap; a
per-iteration watchdog is the natural mechanism to bound both the
stalled stream and the iteration).

## Problem

- **#516**: a single agent iteration had no wall-clock bound.
`maxIterations` caps the *count*, not time, so one hung operation
freezes the whole task forever.
- **#515**: `generateAndProcessAction` passes `timeout`/`abortSignal` to
`streamText`, but the `for await (fullStream)` loop + trailing
`Promise.all` can stall mid-stream with no independent bound.

Abort was previously **external-only** (`this.abortSignal` from the
caller); no internal watchdog.

## Fix — per-iteration watchdog

`runWithIterationWatchdog(work)` in `runMainLoop`:

- Creates an `AbortController` + a timer (`iterationTimeoutMs`), and
sets `this.currentIterationSignal = AbortSignal.any([external,
watchdog])`.
- Wraps the iteration body (snapshot + `generateAndProcessAction`) in
`Promise.race([work(), timeout])`. **The race is the correctness
guarantee** — the iteration is bounded even if an operation ignores the
abort signal. The abort is best-effort cleanup of the in-flight LLM
call.
- The action `streamText` and the validation `generateTextWithRetry` now
use `this.currentIterationSignal ?? this.abortSignal` (**#515** — a
stalled stream gets aborted). Planning (runs before the loop) and tool
creation (once) keep the external signal.
- On timeout: `IterationTimeoutError` → dedicated catch branch → ends
the task with the new `TaskErrorCode.ITERATION_TIMEOUT`, distinct from
the user-abort path.

### Scope

- `iterationTimeoutMs` is a **WebAgent constructor option** +
`DEFAULT_ITERATION_TIMEOUT_MS = 300000` (5 min), mirroring #511's
`ariaFrameEvaluateTimeoutMs`. Not wired through the CLI/PiloConfig
system this pass (separable follow-up). 5 min is generous: an iteration
legitimately spans an LLM call (~120s) + tool execution (extract can be
another ~120s call), so it only fires on a genuine hang.
- **Per-iteration only** and **watchdog-only for the stream** (no
per-chunk inactivity timer) — both decided up front. The open question
"does `abortSignal` interrupt an in-flight `for await`?" doesn't affect
correctness here: the `Promise.race` bounds the iteration regardless;
the abort is propagation/cleanup. A tighter stream inactivity timer
remains a follow-up only if needed.

## Testing

New "iteration watchdog" tests (real timers + `iterationTimeoutMs: 50`):
- A hung LLM stream → `execute` returns `success:false`, `error.code ===
"ITERATION_TIMEOUT"` (vs. hanging forever today).
- `streamText` receives a watchdog-derived abort signal that fires on
timeout.

Full core suite: **863 pass** (861 baseline + 2). Typecheck + `prettier
--check` clean. One live run vs `https://theverge.com` confirms normal
operation still completes (watchdog does not fire at the 300s default).

## Context

Follow-up to #511 / #514. Last of the three hang-class reliability
issues filed from #511.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
lmorchard added a commit that referenced this pull request Jun 9, 2026
…#517)

Closes #514.

## Problem

`PlaywrightBrowser.ensureOptimizedPageLoad()` waited for
`domcontentloaded` with **no timeout**:

```ts
await this.page.waitForLoadState("domcontentloaded");   // unbounded
```

The `load` wait immediately below is bounded by `actionTimeoutMs`, and
the settle is a fixed 1s — only this first wait was unbounded. A
navigation that commits to a page (or SPA soft-navigation) that never
fires `DOMContentLoaded` as Playwright tracks it could hang the await
indefinitely. Same class of freeze as #511, on the path that runs after
**every** navigation (click-nav, `goto`, back/forward).

## Fix

Pass `{ timeout: this.actionTimeoutMs }` to the `domcontentloaded` wait,
matching the `load` wait. It's already inside a `try/catch` that
continues on failure, so a timeout simply proceeds to the bounded `load`
wait and the settle delay — **no behavior change on healthy pages**,
just removes the unbounded-hang risk. Reuses `actionTimeoutMs` rather
than adding a new knob (both waits answer "how long to wait for the page
to be ready").

## Testing

- New `ensureOptimizedPageLoad` test asserting
`waitForLoadState("domcontentloaded", { timeout: actionTimeoutMs })` is
called (guards against the timeout being dropped again). Red before the
fix, green after.
- Full core suite: **862 pass** (861 baseline + 1). Typecheck +
`prettier --check` clean.

No live run: this is a latent path (needs a page that never fires
`DOMContentLoaded`), not reliably reproducible on demand; the unit test
is the meaningful guard and the change mirrors the already-bounded
`load` wait.

## Context

Follow-up to #511. Sibling reliability issues: #515 (stream timeout),
#516 (agent-loop watchdog).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants