Skip to content

ref: Only forward ERROR-level logs to Sentry as issues (keep WARN as logs)#8077

Merged
phacops merged 4 commits into
masterfrom
claude/ecstatic-dirac-72fvfc
Jun 22, 2026
Merged

ref: Only forward ERROR-level logs to Sentry as issues (keep WARN as logs)#8077
phacops merged 4 commits into
masterfrom
claude/ecstatic-dirac-72fvfc

Conversation

@phacops

@phacops phacops commented Jun 19, 2026

Copy link
Copy Markdown
Contributor

Summary

Several of the highest-volume Snuba Sentry issues are transient operational/lifecycle noise — they recover on their own and aren't actionable. Rather than filtering each one by message/module (fragile, coupled to upstream log formats), this PR adopts the conventional policy: only ERROR and above become Sentry issues; warnings are kept as logs.

This came out of an RCA of all the unresolved Snuba errors seen in the last hour. The two genuine query-validation bugs found in the same sweep (SNUBA-9VC "Not a valid UUID string" and SNUBA-9VD "missing required conditions for project_id") are ERROR-level and intentionally left untouched — they're deterministic and should be fixed, not silenced.

Changes

Python — snuba/environment.py

  • LoggingIntegration(event_level=logging.WARNING)logging.ERROR (the SDK default). Warnings are still captured as logs/breadcrumbs, just not turned into issues.

Rust — rust_snuba/src/logging.rs

  • The tracing event_filter now maps WARNEventFilter::Log (was Event | Log); only ERROR creates issues.
  • One explicit exception is retained: sentry_usage_accountant ERRORLog. Those events (Purged in queue/flight) are emitted at ERROR level by a best-effort billing side channel whose Kafka producer is flushed on consumer shutdown/rebalance, so the blanket WARN rule doesn't cover them.

Why this approach

The previous revision of this PR filtered each transient issue individually (a datadog.dogstatsd ignore, an arroyo "Commit failed" before_send string match, and arroyo target matching). A reviewer ([bot] correctly) flagged the string matching as fragile to upstream format changes. Since 4 of the 6 transient issues are WARN-level, treating warnings as logs removes that fragility entirely and is the more conventional policy. Only the two ERROR-level usage_accountant events need an explicit downgrade.

Trade-off: this is a repo-wide change — no WARN-level log in Snuba (Python or Rust) becomes a Sentry issue anymore. Anything previously relying on a warning surfacing as an issue will now be a log/breadcrumb only.

Issues addressed

Issue Error Level Covered by
SNUBA-A3N Error submitting packet: [Errno 111] Connection refused … WARN WARN→log
SNUBA-9WM Commit failed: KafkaError{UNKNOWN_MEMBER_ID …} WARN WARN→log
SNUBA-4VF Timeout reached while waiting for tasks to finish WARN WARN→log
SNUBA-4WS Timeout Some(0ns) reached while waiting for tasks to finish WARN WARN→log
SNUBA-474 Message production failed … PurgeQueue ERROR sentry_usage_accountant ERROR→log
SNUBA-475 Message production failed … PurgeInflight ERROR sentry_usage_accountant ERROR→log

🤖 Generated with Claude Code

https://claude.ai/code/session_01XRxGfhiUKoyuTUQsJrBahM

Several high-volume Snuba errors are operational/lifecycle noise that
recover on their own and aren't actionable. Filter them out where the
Sentry events are created instead of muting them in the Sentry UI.

Python (snuba/environment.py):
- ignore_logger("datadog.dogstatsd"): drops metrics-transport warnings
  like "[Errno 111] Connection refused, dropping the packet ..." emitted
  when the local statsd agent socket is briefly unavailable (SNUBA-A3N).
- before_send now drops "Commit failed" logs from
  arroyo.backends.kafka.consumer that carry a transient rebalance code
  (UNKNOWN_MEMBER_ID / REBALANCE_IN_PROGRESS / ILLEGAL_GENERATION). These
  self-heal once the new consumer generation re-commits (SNUBA-9WM).

Rust (rust_snuba/src/logging.rs):
- Downgrade ERROR/WARN from sentry_usage_accountant to logs only; the
  usage accountant is a best-effort billing side channel whose Kafka
  producer logs "Purged in queue/flight" on shutdown/rebalance
  (SNUBA-474, SNUBA-475).
- Downgrade arroyo run_task_in_threads / reduce task-join timeout WARNs
  (logged while draining work during shutdown) to logs only
  (SNUBA-4VF, SNUBA-4WS).

All of these remain visible as Sentry logs/breadcrumbs; they just no
longer create ongoing issues. Genuine errors from the same modules
(non-rebalance commit failures, arroyo ERRORs) are still reported.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01XRxGfhiUKoyuTUQsJrBahM
@phacops phacops requested a review from a team as a code owner June 19, 2026 18:04
Comment thread snuba/environment.py Outdated
Switch from the targeted per-source filters to a blanket policy: only
ERROR and above become Sentry issues. Warnings are operational noise far
more often than they're actionable (consumer rebalance/shutdown
timeouts, transient transport failures, etc.) and remain captured as
logs/breadcrumbs.

This dissolves the fragile message/target string matching the previous
approach relied on:
- Python (environment.py): LoggingIntegration event_level WARNING -> ERROR.
  Removes the datadog.dogstatsd ignore_logger (SNUBA-A3N) and the arroyo
  "Commit failed" rebalance before_send filter (SNUBA-9WM) -- both are
  WARN, so the policy now covers them.
- Rust (logging.rs): WARN -> EventFilter::Log instead of Event. Drops the
  arroyo run_task_in_threads/reduce target matching (SNUBA-4VF, SNUBA-4WS,
  also WARN).

The usage accountant errors (SNUBA-474, SNUBA-475) are ERROR-level, so
they still need an explicit downgrade: sentry_usage_accountant ERROR ->
Log is retained.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01XRxGfhiUKoyuTUQsJrBahM
@phacops phacops changed the title ref: Silence transient Sentry errors at the source ref: Only forward ERROR-level logs to Sentry as issues (keep WARN as logs) Jun 19, 2026
Comment thread snuba/environment.py
LoggingIntegration(event_level=ERROR) only covers the stdlib logging
path. structlog logs go through structlog-sentry's SentryProcessor,
which is now pinned to event_level=ERROR as well so WARNING-level
structlog events stay as logs/breadcrumbs rather than Sentry issues.
This is already structlog-sentry's default; setting it explicitly keeps
the policy consistent across both logging paths and robust to upstream
default changes.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01XRxGfhiUKoyuTUQsJrBahM
@linear-code

linear-code Bot commented Jun 22, 2026

Copy link
Copy Markdown

EAP-573

@phacops phacops enabled auto-merge (squash) June 22, 2026 21:31
@phacops phacops merged commit 245a76a into master Jun 22, 2026
68 checks passed
@phacops phacops deleted the claude/ecstatic-dirac-72fvfc branch June 22, 2026 21:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants