ref: Only forward ERROR-level logs to Sentry as issues (keep WARN as logs)#8077
Merged
Conversation
Several high-volume Snuba errors are operational/lifecycle noise that
recover on their own and aren't actionable. Filter them out where the
Sentry events are created instead of muting them in the Sentry UI.
Python (snuba/environment.py):
- ignore_logger("datadog.dogstatsd"): drops metrics-transport warnings
like "[Errno 111] Connection refused, dropping the packet ..." emitted
when the local statsd agent socket is briefly unavailable (SNUBA-A3N).
- before_send now drops "Commit failed" logs from
arroyo.backends.kafka.consumer that carry a transient rebalance code
(UNKNOWN_MEMBER_ID / REBALANCE_IN_PROGRESS / ILLEGAL_GENERATION). These
self-heal once the new consumer generation re-commits (SNUBA-9WM).
Rust (rust_snuba/src/logging.rs):
- Downgrade ERROR/WARN from sentry_usage_accountant to logs only; the
usage accountant is a best-effort billing side channel whose Kafka
producer logs "Purged in queue/flight" on shutdown/rebalance
(SNUBA-474, SNUBA-475).
- Downgrade arroyo run_task_in_threads / reduce task-join timeout WARNs
(logged while draining work during shutdown) to logs only
(SNUBA-4VF, SNUBA-4WS).
All of these remain visible as Sentry logs/breadcrumbs; they just no
longer create ongoing issues. Genuine errors from the same modules
(non-rebalance commit failures, arroyo ERRORs) are still reported.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01XRxGfhiUKoyuTUQsJrBahM
Switch from the targeted per-source filters to a blanket policy: only ERROR and above become Sentry issues. Warnings are operational noise far more often than they're actionable (consumer rebalance/shutdown timeouts, transient transport failures, etc.) and remain captured as logs/breadcrumbs. This dissolves the fragile message/target string matching the previous approach relied on: - Python (environment.py): LoggingIntegration event_level WARNING -> ERROR. Removes the datadog.dogstatsd ignore_logger (SNUBA-A3N) and the arroyo "Commit failed" rebalance before_send filter (SNUBA-9WM) -- both are WARN, so the policy now covers them. - Rust (logging.rs): WARN -> EventFilter::Log instead of Event. Drops the arroyo run_task_in_threads/reduce target matching (SNUBA-4VF, SNUBA-4WS, also WARN). The usage accountant errors (SNUBA-474, SNUBA-475) are ERROR-level, so they still need an explicit downgrade: sentry_usage_accountant ERROR -> Log is retained. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01XRxGfhiUKoyuTUQsJrBahM
LoggingIntegration(event_level=ERROR) only covers the stdlib logging path. structlog logs go through structlog-sentry's SentryProcessor, which is now pinned to event_level=ERROR as well so WARNING-level structlog events stay as logs/breadcrumbs rather than Sentry issues. This is already structlog-sentry's default; setting it explicitly keeps the policy consistent across both logging paths and robust to upstream default changes. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01XRxGfhiUKoyuTUQsJrBahM
MeredithAnya
approved these changes
Jun 22, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Several of the highest-volume Snuba Sentry issues are transient operational/lifecycle noise — they recover on their own and aren't actionable. Rather than filtering each one by message/module (fragile, coupled to upstream log formats), this PR adopts the conventional policy: only
ERRORand above become Sentry issues; warnings are kept as logs.This came out of an RCA of all the unresolved Snuba errors seen in the last hour. The two genuine query-validation bugs found in the same sweep (
SNUBA-9VC"Not a valid UUID string" andSNUBA-9VD"missing required conditions for project_id") areERROR-level and intentionally left untouched — they're deterministic and should be fixed, not silenced.Changes
Python —
snuba/environment.pyLoggingIntegration(event_level=logging.WARNING)→logging.ERROR(the SDK default). Warnings are still captured as logs/breadcrumbs, just not turned into issues.Rust —
rust_snuba/src/logging.rsevent_filternow mapsWARN→EventFilter::Log(wasEvent | Log); onlyERRORcreates issues.sentry_usage_accountantERROR→Log. Those events (Purged in queue/flight) are emitted at ERROR level by a best-effort billing side channel whose Kafka producer is flushed on consumer shutdown/rebalance, so the blanket WARN rule doesn't cover them.Why this approach
The previous revision of this PR filtered each transient issue individually (a
datadog.dogstatsdignore, an arroyo "Commit failed"before_sendstring match, and arroyo target matching). A reviewer ([bot] correctly) flagged the string matching as fragile to upstream format changes. Since 4 of the 6 transient issues areWARN-level, treating warnings as logs removes that fragility entirely and is the more conventional policy. Only the twoERROR-levelusage_accountantevents need an explicit downgrade.Trade-off: this is a repo-wide change — no
WARN-level log in Snuba (Python or Rust) becomes a Sentry issue anymore. Anything previously relying on a warning surfacing as an issue will now be a log/breadcrumb only.Issues addressed
Error submitting packet: [Errno 111] Connection refused …Commit failed: KafkaError{UNKNOWN_MEMBER_ID …}Timeout reached while waiting for tasks to finishTimeout Some(0ns) reached while waiting for tasks to finishMessage production failed … PurgeQueuesentry_usage_accountantERROR→logMessage production failed … PurgeInflightsentry_usage_accountantERROR→log🤖 Generated with Claude Code
https://claude.ai/code/session_01XRxGfhiUKoyuTUQsJrBahM