Skip to content

[superlog] Log batch_union fallback as WARN instead of ERROR#502

Open
superlog-app[bot] wants to merge 7 commits into
stagingfrom
superlog/warn-batch-union-fallback
Open

[superlog] Log batch_union fallback as WARN instead of ERROR#502
superlog-app[bot] wants to merge 7 commits into
stagingfrom
superlog/warn-batch-union-fallback

Conversation

@superlog-app

@superlog-app superlog-app Bot commented Jun 27, 2026

Copy link
Copy Markdown

Summary

The POST /v1/query endpoint logs at ERROR level whenever the ClickHouse UNION ALL batch optimization fails, even though the code immediately falls back to individual queries and always returns HTTP 200 with data. This produces noisy incidents and can mask real errors.

The root cause is that captureError() is called in the runGroup catch block before the fallback runs. captureError calls requestLog.error(), which promotes the entire wide-event log to ERROR level regardless of the recovery that follows.

This patch adds a captureWarning helper to packages/ai/src/lib/tracing.ts that calls requestLog.warn() instead of requestLog.error(), and swaps captureError for captureWarning in the batch union catch block of packages/ai/src/query/batch-executor.ts. The batch_union_fallback, batch_union_error, and operation fields are still recorded — the only change is the log level for these recovered failures.

An alternative approach would be to omit any telemetry call in the catch block and rely solely on the mergeWideEvent({ batch_union_fallback: 1 }) fields for monitoring fallback rate, which would produce cleaner INFO-level logs. The warn approach was chosen to preserve the ability to detect when the fallback fires via log-level filtering.

Incident on Superlog


Was this PR helpful? Leave feedback — goes straight to the Superlog team.


Summary by cubic

Log batch-union fallback as WARN instead of ERROR in POST /v1/query to reduce noisy incidents while keeping visibility and telemetry intact.

  • Bug Fixes
    • Added captureWarning in packages/ai/src/lib/tracing.ts to log recovered failures with requestLog.warn(...).
    • Replaced captureError with captureWarning in packages/ai/src/query/batch-executor.ts for batch_union failures; retained batch_union_fallback, batch_union_error, and operation fields.

Written for commit 8d891ce. Summary will update on new commits.

Review in cubic

@vercel

vercel Bot commented Jun 27, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
databuddy-status Ready Ready Preview, Comment Jun 27, 2026 6:41am
2 Skipped Deployments
Project Deployment Actions Updated (UTC)
dashboard Skipped Skipped Jun 27, 2026 6:41am
documentation Skipped Skipped Jun 27, 2026 6:41am

@unkey-deploy

unkey-deploy Bot commented Jun 27, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Unkey Deploy

Name Status Preview Inspect Updated (UTC)
api (preview) Ready Visit Preview Inspect Jun 27, 2026 6:42am

@greptile-apps

greptile-apps Bot commented Jun 27, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR downgrades the log level for batch UNION ALL fallback failures from ERROR to WARN by adding a captureWarning helper in tracing.ts and using it in the runGroup catch block. The fallback itself (running queries individually) and all diagnostic fields are unchanged.

  • packages/ai/src/lib/tracing.ts: New captureWarning function modelled after captureError, calling requestLog.warn() instead of requestLog.error(). It passes err.message (string) to the request logger whereas captureError passes the full Error object, so stack traces are absent from warn-level logs.
  • packages/ai/src/query/batch-executor.ts: One-line change swapping captureErrorcaptureWarning in the batch union catch block; all telemetry fields are preserved.

Confidence Score: 4/5

Safe to merge — the fallback logic and all telemetry fields are unchanged; only the log level is lowered for a fully-recovered failure path.

The change is narrow and correct. The one notable asymmetry is that captureWarning passes err.message (a plain string) to the request logger while captureError passes the full Error object, meaning stack traces won't appear in warn-level logs for fallback failures. This won't affect runtime behavior but could make future debugging harder.

packages/ai/src/lib/tracing.ts — the captureWarning implementation drops the stack trace when logging through the request logger path.

Important Files Changed

Filename Overview
packages/ai/src/lib/tracing.ts Adds captureWarning helper mirroring captureError but calling requestLog.warn() instead of requestLog.error(). The new function passes err.message (string) rather than the full Error object to the request logger, which drops stack trace information in the warn path unlike the error path.
packages/ai/src/query/batch-executor.ts Swaps captureError for captureWarning in the batch union catch block. All diagnostic fields (operation, batch_types, batch_size, batch_union_fallback, batch_union_error) are preserved; only the log level changes from ERROR to WARN.

Sequence Diagram

%%{init: {'theme': 'neutral'}}%%
sequenceDiagram
    participant Client
    participant executeBatch
    participant runGroup
    participant ClickHouse
    participant tracing

    Client->>executeBatch: POST /v1/query
    executeBatch->>runGroup: run compiled group
    runGroup->>ClickHouse: UNION ALL query
    alt success
        ClickHouse-->>runGroup: rows
        runGroup-->>executeBatch: "{unionCount:1, singleCount:0}"
    else failure (before this PR)
        ClickHouse-->>runGroup: error
        runGroup->>tracing: captureError() → requestLog.error() → wide-event promoted to ERROR
        runGroup->>ClickHouse: individual fallback queries
        ClickHouse-->>runGroup: rows
        runGroup-->>executeBatch: "{unionCount:0, singleCount:N}"
    else failure (after this PR)
        ClickHouse-->>runGroup: error
        runGroup->>tracing: captureWarning() → requestLog.warn() → wide-event stays at WARN
        runGroup->>ClickHouse: individual fallback queries
        ClickHouse-->>runGroup: rows
        runGroup-->>executeBatch: "{unionCount:0, singleCount:N}"
    end
    executeBatch-->>Client: HTTP 200 with data
Loading
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
sequenceDiagram
    participant Client
    participant executeBatch
    participant runGroup
    participant ClickHouse
    participant tracing

    Client->>executeBatch: POST /v1/query
    executeBatch->>runGroup: run compiled group
    runGroup->>ClickHouse: UNION ALL query
    alt success
        ClickHouse-->>runGroup: rows
        runGroup-->>executeBatch: "{unionCount:1, singleCount:0}"
    else failure (before this PR)
        ClickHouse-->>runGroup: error
        runGroup->>tracing: captureError() → requestLog.error() → wide-event promoted to ERROR
        runGroup->>ClickHouse: individual fallback queries
        ClickHouse-->>runGroup: rows
        runGroup-->>executeBatch: "{unionCount:0, singleCount:N}"
    else failure (after this PR)
        ClickHouse-->>runGroup: error
        runGroup->>tracing: captureWarning() → requestLog.warn() → wide-event stays at WARN
        runGroup->>ClickHouse: individual fallback queries
        ClickHouse-->>runGroup: rows
        runGroup-->>executeBatch: "{unionCount:0, singleCount:N}"
    end
    executeBatch-->>Client: HTTP 200 with data
Loading

Reviews (1): Last reviewed commit: "[superlog] Log batch_union fallback as W..." | Re-trigger Greptile

Comment on lines +23 to +30
if (requestLog) {
if (payload) {
requestLog.warn(err.message, payload);
} else {
requestLog.warn(err.message);
}
return;
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 captureWarning passes the string err.message to requestLog.warn(), while captureError passes the full Error object (err) to requestLog.error(). This drops the stack trace from warn-level logs, so when the fallback fires it will be harder to determine where in ClickHouse (or the query compilation path) the union failure originated. Passing the Error object is consistent with the existing captureError pattern and most logger APIs accept an Error as the first argument to warn just as well as to error.

Suggested change
if (requestLog) {
if (payload) {
requestLog.warn(err.message, payload);
} else {
requestLog.warn(err.message);
}
return;
}
if (requestLog) {
if (payload) {
requestLog.warn(err, payload);
} else {
requestLog.warn(err);
}
return;
}

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant