feat(eap-items): use indexed_name column for sentry.op/sentry.metric.name filters by phacops · Pull Request #8052 · getsentry/snuba

phacops · 2026-06-17T23:40:45Z

Summary

Filters on the per-item-type primary name attribute — sentry.op for spans, sentry.metric.name for metrics — are translated into an unindexed attributes_string bucket map lookup (arrayElement(attributes_string_N, ...)). That same value is also stored in the dedicated indexed_name String column, which carries a bloom filter index (added in migration 0057_add_name_column_and_index).

This PR adds an IndexedNameOptimizer logical query processor that rewrites such a filter to reference the indexed indexed_name column, so the bloom filter index is used instead of scanning the bucket map. The rewrite is gated behind a runtime config flag (default off).

How it works

equals(arrayElement(attributes_string, 'sentry.op'), 'db.query')
-> equals(indexed_name, 'db.query')

Matches the real RPC filter shape. The EAP RPC builder emits map-backed filters as arrayElement(column('attributes_string'), literal(key)) (see _map_backed_operands, added in fix(eap): Generate minimal NULL-free SQL for map-backed attribute filters #8035), not as a SubscriptableReference. The processor targets the arrayElement form (and also handles the hand-written SnQL SubscriptableReference form). It runs before HashBucketFunctionTransformer, so any access it doesn't rewrite still gets mapped to its hashed bucket column downstream.
Item-type scoped. indexed_name is item-type specific (sentry.op only for spans, sentry.metric.name only for metrics), so the rewrite is applied only when the query is unambiguously scoped to the owning item type via a single equals(item_type, N) condition. The EAP resolvers always inject that condition (resolver_trace_item_table.py), so RPC traffic is covered; otherwise it leaves the access untouched and falls back to the bucket lookup.
Gated behind a config flag. indexed_name was added (with no backfill) by migration 0057, so rows written before the column existed — or before the value started being populated — have an empty indexed_name while still carrying the value in attributes_string. Reading indexed_name for those rows would silently drop them, so the rewrite is gated behind the eap_items_use_indexed_name runtime flag (default 0). It must only be enabled once indexed_name is fully populated across the live retention window.

Why index-only (no bucket OR fallback)

An earlier iteration kept an or(equals(indexed_name, v), equals(bucket, v)) fallback for correctness. That negates the optimization: a skip index can only prune a granule when every branch of an OR is provably non-matching, and the bucket branch has no index — so ClickHouse would scan every granule and the bloom filter would never prune. The config flag gives the same safety (the rewrite simply stays off until backfill is done) while preserving the full index benefit once enabled.

Changes

snuba/query/processors/logical/indexed_name_optimizer.py — the processor.
snuba/datasets/configuration/events_analytics_platform/entities/eap_items.yaml — wired in ahead of the hash-bucket processors.
tests/query/processors/test_indexed_name_optimizer.py — unit tests.

Testing

Unit tests covering: span/metric rewrites in both arrayElement and SubscriptableReference forms, IN, an arbitrary operator (notEquals), SELECT alias preservation, mismatched key, non-indexed attribute, missing/unsupported/ambiguous item_type, and the flag-off no-op path.
Verified against the live RPC builder that the targeted filter shape is exactly equals(arrayElement(column('attributes_string'), literal('sentry.op')), literal('db.query')), and that with the flag on, running the processor followed by HashBucketFunctionTransformer yields equals(indexed_name, 'db.query') (vs. the unindexed attributes_string_19 bucket lookup when off). The builder's arrayElement shape is itself locked by tests/web/rpc/test_common.py::TestAnalyzerSafeFilters.
ruff (lint + format) pass.

Rollout

Merge with the flag off (no behavior change).
Backfill indexed_name across the retention window.
Set eap_items_use_indexed_name = 1 to start serving these filters from the bloom filter index.

Linear: EAP-588

🤖 Generated with Claude Code

…name filters Filtering on the per-item-type primary name attribute (sentry.op for spans, sentry.metric.name for metrics) was translated into an unindexed attributes_string bucket map lookup (arrayElement(attributes_string_N, ...)). The same value is also stored in the dedicated indexed_name String column, which has a bloom filter index (migration 0057). Add an IndexedNameOptimizer logical query processor that rewrites attributes_string['sentry.op'|'sentry.metric.name'] SubscriptableReferences to the indexed_name column so the bloom filter index is used. Because indexed_name is item-type specific, the rewrite is only applied when the query is unambiguously scoped to the owning item type via an equals(item_type, N) condition; otherwise it safely falls back to the bucket lookup. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01GXBnLMCV79qaAkVRkQu7Nw Agent transcript: https://claudescope.sentry.dev/share/cdjPeZ3xwrigGK-uGIWE0P3ZX1YWtOxOZwNTOWPmZMo

…values Keeps the optimizer tests in sync with the sentry_protos enum values they key off of, so a divergence in the upstream enum fails the tests instead of silently degrading the optimization. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01GXBnLMCV79qaAkVRkQu7Nw

linear-code · 2026-06-18T02:05:14Z

EAP-588

MeredithAnya · 2026-06-18T17:28:28Z

+
+        item_types: set[int] = set()
+        for cond in get_first_level_and_conditions(condition):
+            if not isinstance(cond, FunctionCall) or cond.function_name != "equals":


nit: ConditionFunctions.EQ

Good call — switched to ConditionFunctions.EQ. Since this branch is already up, the fix is in a small stacked PR off this branch: #8075.

— Claude Code

Generated by Claude Code

MeredithAnya

Not sure if it's worth writing a test for IN item_types

Address review nit: reference the ConditionFunctions.EQ constant rather than the hardcoded "equals" string in IndexedNameOptimizer. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01CajnvSCYTxvJ9Hw5jELRPH

…xed-name-filter-optimizer

…PC filter shape The IndexedNameOptimizer matched SubscriptableReference, but the EAP RPC builder emits map-backed filters as arrayElement(attributes_string, key) (since #8035), so the optimizer never fired on real traffic. It also replaced the bucket lookup outright, which would drop rows whose indexed_name is empty (migration 0057 adds the column with no backfill). Rewrite a WHERE-clause equals/in filter on the indexed attribute into or(<op on indexed_name>, <original op on the bucket>): the bloom filter index can prune granules via indexed_name while the bucket lookup keeps results correct for non-backfilled rows. Only non-empty string literals are pushed onto the index (indexed_name = '' would over-match every non-backfilled row), and SELECT/GROUP BY references keep reading the bucket. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01Hax3wseAtUEKY7GfpcJNTx

…the fallback Replace the OR bucket fallback with a direct index-only rewrite (attributes_string[key] -> indexed_name), gated behind the eap_items_use_indexed_name runtime flag (default off). The fallback forced ClickHouse to scan every granule (an OR with the unindexed bucket branch is never prunable), so it negated the bloom filter index it was meant to use. Index-only filtering is only correct once indexed_name is fully populated across the retention window (migration 0057 adds the column without a backfill), so the flag must stay off until then; flipping it on lets the bloom filter index serve sentry.op / sentry.metric.name filters. The rewrite still matches the real arrayElement(attributes_string, key) shape emitted by the RPC builder (and the SnQL SubscriptableReference form). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01Hax3wseAtUEKY7GfpcJNTx

…rm closure mypy widens the narrowed Optional[str] back to Optional[str] inside the nested transform() closure, failing the pre-commit mypy hook. Bind it to a str-typed local before the closure. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01Hax3wseAtUEKY7GfpcJNTx

…ia indexed_name Add an EndpointTimeSeries test that writes METRIC items and runs a sentry.metric.name-filtered SUM time series with the IndexedNameOptimizer flag enabled. Asserts the name filter still returns the correct rows once it's rewritten onto the indexed_name column, and that results are identical with the flag off (bucket lookup) and on (indexed_name). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01Hax3wseAtUEKY7GfpcJNTx

phacops requested review from a team as code owners June 17, 2026 23:40

sentry Bot reviewed Jun 17, 2026

View reviewed changes

Comment thread snuba/query/processors/logical/indexed_name_optimizer.py

MeredithAnya reviewed Jun 18, 2026

View reviewed changes

MeredithAnya approved these changes Jun 18, 2026

View reviewed changes

phacops mentioned this pull request Jun 19, 2026

ref(eap-items): use ConditionFunctions.EQ in IndexedNameOptimizer #8075

Merged

getsantry Bot and others added 6 commits June 19, 2026 03:18

[getsentry/action-github-commit] Auto commit

77b3cde

Merge remote-tracking branch 'origin/master' into feat/eap-items-inde…

034eed2

…xed-name-filter-optimizer

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(eap-items): use indexed_name column for sentry.op/sentry.metric.name filters#8052

feat(eap-items): use indexed_name column for sentry.op/sentry.metric.name filters#8052
phacops wants to merge 9 commits into
masterfrom
feat/eap-items-indexed-name-filter-optimizer

phacops commented Jun 17, 2026 •

edited

Loading

Uh oh!

Uh oh!

linear-code Bot commented Jun 18, 2026

Uh oh!

MeredithAnya Jun 18, 2026

Uh oh!

phacops Jun 19, 2026

Uh oh!

MeredithAnya left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

phacops commented Jun 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

How it works

Why index-only (no bucket OR fallback)

Changes

Testing

Rollout

Uh oh!

Uh oh!

linear-code Bot commented Jun 18, 2026

Uh oh!

MeredithAnya Jun 18, 2026

Choose a reason for hiding this comment

Uh oh!

phacops Jun 19, 2026

Choose a reason for hiding this comment

Uh oh!

MeredithAnya left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

phacops commented Jun 17, 2026 •

edited

Loading