Skip to content

bug(sandbox): "The operation was aborted." leaks into error tracking from daemon config timeout and idle abort paths #3763

@phdro88

Description

@phdro88

Summary

Error 0b570eb9b0The operation was aborted. — is the #1 chronic error in the platform with 20,440 lifetime hits since 2026-02-27 (103 days). It is currently flapping in studio (14,296 hits) and was previously widespread in mesh (6,144 hits). As of 2026-06-10 07:30 UTC it has 412 hits in the current 2h window with a recent peak of 1,675/2h on Jun 9 18:00 UTC.

This is not a user-facing crash — it is control-flow aborts in the sandbox / idle paths leaking into error tracking as if they were failures.


Root cause (two paths)

Path 1 — Sandbox config timeout (primary)

packages/sandbox/server/daemon-client.ts sends POST /_sandbox/config with a hard 10s abort signal:

// daemon-client.ts
const CONFIG_TIMEOUT_MS = 10_000;
// ...
fetch("/_sandbox/config", { signal: AbortSignal.timeout(CONFIG_TIMEOUT_MS) });

When the daemon is slow to apply config (cold-start, heavy sandbox), the request exceeds 10s and throws a DOMException("The operation was aborted.").

apps/mesh/src/link-daemon/user-desktop-provider.ts classifies this as timeout-like via a regex:

function isTimeoutLike(err): boolean {
  return /operation was aborted|aborted|timed out/.test(err.message);
}

It then kills the daemon and emits a failed bring-up event — surfaced to error tracking as a spike.

Path 2 — Idle session abort

deco-sites/admin/components/spaces/siteEditor/sdk.ts:455:

const unsubIdle = idleTracker.onIdle(() => ctrl.abort());

When a user is idle for DEFAULT_EDITOR_IDLE_MS (5 min), the AbortController fires on the in-flight /watch daemon request. The resulting AbortError is not suppressed before reaching error tracking.


Impact

  • Error tracking polluted with expected control-flow events
  • No way to distinguish true sandbox startup failures from timeout/idle aborts
  • Spike alerts fire on coordinated idle transitions (multiple users, deployments)

Suggested fixes

  1. Raise or make adaptive CONFIG_TIMEOUT_MS — 10s is too tight for cold sandbox startup; consider 30–60s or a two-phase timeout (liveness check first, then config).
  2. Suppress benign AbortError before error tracking — catch err.name === 'AbortError' in the daemon-client and idle paths; log as debug/info rather than error.
  3. Separate telemetry — track "abort reason" (idle, config-timeout, client-disconnect) as distinct metrics so control flow doesn't all look identical in the error dashboard.
  4. SWR revalidation timeout (apps/mesh/src/api/app.ts:933REVALIDATION_TIMEOUT_MS = 30_000) — consider increasing and adding a graceful teardown instead of immediate abort.

References

  • Error ID: 0b570eb9b0
  • Services: studio (serviceId 922), mesh (serviceId 16)
  • Lifetime hits: 20,440 (first seen 2026-02-27)
  • Key files:
    • packages/sandbox/server/daemon-client.tsCONFIG_TIMEOUT_MS = 10_000
    • apps/mesh/src/link-daemon/user-desktop-provider.tsisTimeoutLike() + postConfig() abort handler
    • deco-sites/admin/components/spaces/siteEditor/sdk.ts:455 — idle abort
    • apps/mesh/src/web/components/chat/store/thread-connection.tsinflightPost?.abort()
    • apps/mesh/src/web/components/chat/highlight/parse-error-message.ts — classifies aborted|timeout as failures
  • Monitoring investigations: 5 filed (all status: investigating)

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions