Skip to content

[superlog] Use warn for tool-level errors to prevent false ERROR on succeeded jobs#504

Open
superlog-app[bot] wants to merge 1 commit into
stagingfrom
superlog/fix-tool-error-level
Open

[superlog] Use warn for tool-level errors to prevent false ERROR on succeeded jobs#504
superlog-app[bot] wants to merge 1 commit into
stagingfrom
superlog/fix-tool-error-level

Conversation

@superlog-app

@superlog-app superlog-app Bot commented Jun 27, 2026

Copy link
Copy Markdown

Summary

Insights generation jobs that succeed are being logged at ERROR level because individual SQL query tool failures (which the AI agent internally handles and retries) call requestLogger.error() on the job-level evlog logger. In evlog, once error() is called on a RequestLogger, it sets hasError = true permanently, so the final emit() at the end of the job uses ERROR severity even when job_status: "succeeded".

The setAiRequestLoggerProvider(getActiveInsightsLog) wiring in apps/insights/src/index.ts means createToolLogger's error() method writes directly to the job-level wide-event logger. Tool errors that the AI agent recovers from (e.g., a SQL query fails on the first attempt, the agent reformulates and retries successfully) were incorrectly escalating the entire job log to ERROR.

The fix changes createToolLogger.error() to call requestLogger.warn() instead of requestLogger.error(). Job-level failures still correctly log at ERROR because apps/insights/src/jobs.ts:180 calls logger.error(err) directly in the job's own catch block — that path is unchanged.

Alternative: a separate evlog RequestLogger scope could be created per tool call so tool errors are isolated from the job-level logger. That would give more granular scoping but requires a larger change across all tool implementations. The warn() approach is a minimal, low-risk fix with the same outcome.

Incident on Superlog


Was this PR helpful? Leave feedback — goes straight to the Superlog team.


Summary by cubic

Prevents successful insights jobs from being marked as ERROR by downgrading tool-level failures to warnings. Job-level failures still log as errors.

  • Bug Fixes
    • In packages/ai/src/ai/tools/utils/logger.ts, createToolLogger.error() now calls requestLogger.warn() instead of requestLogger.error().
    • Tool retries can fail and recover without escalating the job’s final severity.

Written for commit 1a436ea. Summary will update on new commits.

Review in cubic

@vercel

vercel Bot commented Jun 27, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
databuddy-status Ready Ready Preview, Comment Jun 27, 2026 11:51am
2 Skipped Deployments
Project Deployment Actions Updated (UTC)
dashboard Skipped Skipped Jun 27, 2026 11:51am
documentation Skipped Skipped Jun 27, 2026 11:51am

@unkey-deploy

unkey-deploy Bot commented Jun 27, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Unkey Deploy

Name Status Preview Inspect Updated (UTC)
api (preview) Ready Visit Preview Inspect Jun 27, 2026 11:51am

@greptile-apps

greptile-apps Bot commented Jun 27, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR fixes a log-level escalation bug where completed insights generation jobs were emitted at ERROR severity because individual AI tool failures (retried internally by the agent) called requestLogger.error(), permanently setting hasError = true on the evlog RequestLogger. The fix changes createToolLogger.error() to call requestLogger.warn() so transient tool failures don't poison the job-level log severity.

  • Minimal, targeted fix: only packages/ai/src/ai/tools/utils/logger.ts changes; the fatal-error path in jobs.ts (line 180 logger.error(err)) is untouched, so true job failures still emit at ERROR.
  • Global scope: the change affects all three setAiRequestLoggerProvider consumers (apps/insights, apps/slack, apps/api), not just insights — tool errors will be demoted to WARN in Slack and per-request API contexts too.

Confidence Score: 4/5

Safe to merge for the insights use case; the change is small and the fatal-error path in jobs.ts is untouched.

The core fix is correct and well-documented — transient tool failures no longer permanently escalate the job-level log. The only nuance is that the change globally affects the apps/slack and apps/api requestLogger paths too, not just insights. Those contexts share the same createToolLogger package, so their tool-level errors also shift to WARN. This is likely fine given the same agent-recovery reasoning, but it was not explicitly analyzed in the PR.

The single changed file packages/ai/src/ai/tools/utils/logger.ts is straightforward. Worth a quick sanity-check on apps/slack and apps/api to confirm they are fine with tool errors surfacing only at WARN on their request loggers.

Important Files Changed

Filename Overview
packages/ai/src/ai/tools/utils/logger.ts Changes requestLogger.error() to requestLogger.warn() in the createToolLogger.error() method to prevent tool-level failures from permanently setting hasError = true on the evlog request logger. Also removes the now-unused new Error(message) allocation. The global fallback path (log.error(...)) is correctly unchanged.

Sequence Diagram

%%{init: {'theme': 'neutral'}}%%
sequenceDiagram
    participant Agent as AI Agent
    participant Tool as SQL Tool
    participant Logger as createToolLogger
    participant RL as RequestLogger (evlog)
    participant Job as jobs.ts catch block

    Note over Agent,Job: Tool failure that agent recovers from

    Agent->>Tool: execute()
    Tool->>Tool: chQuery() fails
    Tool->>Logger: .error("Query failed", ctx)

    rect rgb(255, 220, 220)
        Note over Logger,RL: BEFORE (bug)
        Logger->>RL: requestLogger.error(err, ctx)
        RL-->>RL: "hasError = true (permanent)"
    end

    rect rgb(220, 255, 220)
        Note over Logger,RL: AFTER (fix)
        Logger->>RL: requestLogger.warn(message, ctx)
        RL-->>RL: hasError stays false
    end

    Tool-->>Agent: throws error
    Agent->>Tool: retry with reformulated query
    Tool->>Tool: chQuery() succeeds
    Tool-->>Agent: result

    Agent-->>Job: job completes successfully
    Job->>RL: "emit({ job_status: succeeded })"

    Note over RL: BEFORE: emits at ERROR (hasError=true)
    Note over RL: AFTER: emits at INFO (hasError=false)

    Note over Job: Real failures still log at ERROR
    Job->>RL: logger.error(err) in catch block
    RL-->>RL: "hasError = true (correct)"
Loading
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
sequenceDiagram
    participant Agent as AI Agent
    participant Tool as SQL Tool
    participant Logger as createToolLogger
    participant RL as RequestLogger (evlog)
    participant Job as jobs.ts catch block

    Note over Agent,Job: Tool failure that agent recovers from

    Agent->>Tool: execute()
    Tool->>Tool: chQuery() fails
    Tool->>Logger: .error("Query failed", ctx)

    rect rgb(255, 220, 220)
        Note over Logger,RL: BEFORE (bug)
        Logger->>RL: requestLogger.error(err, ctx)
        RL-->>RL: "hasError = true (permanent)"
    end

    rect rgb(220, 255, 220)
        Note over Logger,RL: AFTER (fix)
        Logger->>RL: requestLogger.warn(message, ctx)
        RL-->>RL: hasError stays false
    end

    Tool-->>Agent: throws error
    Agent->>Tool: retry with reformulated query
    Tool->>Tool: chQuery() succeeds
    Tool-->>Agent: result

    Agent-->>Job: job completes successfully
    Job->>RL: "emit({ job_status: succeeded })"

    Note over RL: BEFORE: emits at ERROR (hasError=true)
    Note over RL: AFTER: emits at INFO (hasError=false)

    Note over Job: Real failures still log at ERROR
    Job->>RL: logger.error(err) in catch block
    RL-->>RL: "hasError = true (correct)"
Loading

Comments Outside Diff (1)

  1. packages/ai/src/ai/tools/utils/logger.ts, line 21-32 (link)

    P2 .error() silently downgrades to warn across all three evlog consumers

    The public createToolLogger interface exposes an error method that now calls warn internally whenever a requestLogger is active. This affects all three setAiRequestLoggerProvider consumers — apps/insights, apps/slack, and apps/api — not just insights. In the apps/api context (Elysia per-request logger via useLogger), and in apps/slack, tool errors that don't recover will still only appear as WARN on the request-level logger until the top-level handler logs the final failure at ERROR. If those contexts don't share the same hasError = true evlog escalation concern, the downgrade may suppress signal unnecessarily.

    The PR description acknowledges the alternative (per-tool RequestLogger scope), which would isolate the behavior to the insights path only. As-is, callers in any context who call toolLogger.error() expecting error-level visibility on their request logger will silently get warn-level instead.

Reviews (1): Last reviewed commit: "[superlog] Use warn for tool-level error..." | Re-trigger Greptile

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

0 participants