You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Why this is tracked separately from the __set_* fix (#8074)
PR #8074 fixes the mixed-version Code: 10 ... Not found column ... While executing Remote. failures caused by constant IN-sets leaking unstable __set_<hash> identifiers into SELECT-clause column names (SNUBA-9W6, SNUBA-A1W, SNUBA-B6C, SNUBA-B62, SNUBA-B63, SNUBA-B67, SNUBA-A13).
SNUBA-A23 is a different failure. It is a distributed-query cancellation (QUERY_WAS_CANCELLED, code 394), not a query-construction error. The query in the failing event has no constant IN-set — its aggregate conditions are and(has(mapKeys(attributes_string_2), 'user'), True) — so the membership_as_has rewrite in #8074 does not apply and will not resolve it.
Observed context:
Endpoint: EndpointTraceItemTable/v1, referrer tagstore.get_groups_user_counts (a count_unique(user) aggregation).
duration_group: <10s — cancelled quickly, so not a long max_execution_time timeout.
The cancellation is reported by a remote shard (...arm-5-3:9000) via DB::QueryStatus::throwQueryWasCancelled.
Likely causes to investigate
A sibling shard/replica failing or erroring, causing the coordinator to cancel the query on the remaining shards.
A memory limit / overcommit tracker killing the query on a shard.
Upstream cancellation (client disconnect, rate limiter, or max_threads/concurrency limits).
Pull a few clickhouse-server logs on the EAP shards around an A23 timestamp to see the originating error that triggered the cancellation (the 394 is the downstream symptom).
Summary
EndpointTraceItemTable(and occasionally other EAP endpoints) intermittently fail with:Sentry: SNUBA-A23 — https://sentry.sentry.io/issues/SNUBA-A23 (≈126 occurrences, ongoing since 2025-12-12).
Why this is tracked separately from the
__set_*fix (#8074)PR #8074 fixes the mixed-version
Code: 10 ... Not found column ... While executing Remote.failures caused by constantIN-sets leaking unstable__set_<hash>identifiers into SELECT-clause column names (SNUBA-9W6, SNUBA-A1W, SNUBA-B6C, SNUBA-B62, SNUBA-B63, SNUBA-B67, SNUBA-A13).SNUBA-A23 is a different failure. It is a distributed-query cancellation (
QUERY_WAS_CANCELLED, code 394), not a query-construction error. The query in the failing event has no constantIN-set — its aggregate conditions areand(has(mapKeys(attributes_string_2), 'user'), True)— so themembership_as_hasrewrite in #8074 does not apply and will not resolve it.Observed context:
EndpointTraceItemTable/v1, referrertagstore.get_groups_user_counts(acount_unique(user)aggregation).duration_group: <10s— cancelled quickly, so not a longmax_execution_timetimeout....arm-5-3:9000) viaDB::QueryStatus::throwQueryWasCancelled.Likely causes to investigate
max_threads/concurrency limits).__set_*bug and made a shard error, in-flight sibling queries could be cancelled — worth re-checking once Inline constant IN-sets in SELECT-clause filters to fix mixed-version distributed reads (SNUBA-9W6) #8074 is deployed to see whether A23's frequency drops.Suggested next steps
clickhouse-serverlogs on the EAP shards around an A23 timestamp to see the originating error that triggered the cancellation (the 394 is the downstream symptom).Filed as a follow-up while triaging the last hour of
QueryExceptions alongside #8074.