Skip to content

feat(eap-items): use indexed_name column for sentry.op/sentry.metric.name filters#8052

Open
phacops wants to merge 9 commits into
masterfrom
feat/eap-items-indexed-name-filter-optimizer
Open

feat(eap-items): use indexed_name column for sentry.op/sentry.metric.name filters#8052
phacops wants to merge 9 commits into
masterfrom
feat/eap-items-indexed-name-filter-optimizer

Conversation

@phacops

@phacops phacops commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

Summary

Filters on the per-item-type primary name attribute — sentry.op for spans, sentry.metric.name for metrics — are translated into an unindexed attributes_string bucket map lookup (arrayElement(attributes_string_N, ...)). That same value is also stored in the dedicated indexed_name String column, which carries a bloom filter index (added in migration 0057_add_name_column_and_index).

This PR adds an IndexedNameOptimizer logical query processor that rewrites such a filter to reference the indexed indexed_name column, so the bloom filter index is used instead of scanning the bucket map. The rewrite is gated behind a runtime config flag (default off).

How it works

equals(arrayElement(attributes_string, 'sentry.op'), 'db.query')
-> equals(indexed_name, 'db.query')
  • Matches the real RPC filter shape. The EAP RPC builder emits map-backed filters as arrayElement(column('attributes_string'), literal(key)) (see _map_backed_operands, added in fix(eap): Generate minimal NULL-free SQL for map-backed attribute filters #8035), not as a SubscriptableReference. The processor targets the arrayElement form (and also handles the hand-written SnQL SubscriptableReference form). It runs before HashBucketFunctionTransformer, so any access it doesn't rewrite still gets mapped to its hashed bucket column downstream.
  • Item-type scoped. indexed_name is item-type specific (sentry.op only for spans, sentry.metric.name only for metrics), so the rewrite is applied only when the query is unambiguously scoped to the owning item type via a single equals(item_type, N) condition. The EAP resolvers always inject that condition (resolver_trace_item_table.py), so RPC traffic is covered; otherwise it leaves the access untouched and falls back to the bucket lookup.
  • Gated behind a config flag. indexed_name was added (with no backfill) by migration 0057, so rows written before the column existed — or before the value started being populated — have an empty indexed_name while still carrying the value in attributes_string. Reading indexed_name for those rows would silently drop them, so the rewrite is gated behind the eap_items_use_indexed_name runtime flag (default 0). It must only be enabled once indexed_name is fully populated across the live retention window.

Why index-only (no bucket OR fallback)

An earlier iteration kept an or(equals(indexed_name, v), equals(bucket, v)) fallback for correctness. That negates the optimization: a skip index can only prune a granule when every branch of an OR is provably non-matching, and the bucket branch has no index — so ClickHouse would scan every granule and the bloom filter would never prune. The config flag gives the same safety (the rewrite simply stays off until backfill is done) while preserving the full index benefit once enabled.

Changes

  • snuba/query/processors/logical/indexed_name_optimizer.py — the processor.
  • snuba/datasets/configuration/events_analytics_platform/entities/eap_items.yaml — wired in ahead of the hash-bucket processors.
  • tests/query/processors/test_indexed_name_optimizer.py — unit tests.

Testing

  • Unit tests covering: span/metric rewrites in both arrayElement and SubscriptableReference forms, IN, an arbitrary operator (notEquals), SELECT alias preservation, mismatched key, non-indexed attribute, missing/unsupported/ambiguous item_type, and the flag-off no-op path.
  • Verified against the live RPC builder that the targeted filter shape is exactly equals(arrayElement(column('attributes_string'), literal('sentry.op')), literal('db.query')), and that with the flag on, running the processor followed by HashBucketFunctionTransformer yields equals(indexed_name, 'db.query') (vs. the unindexed attributes_string_19 bucket lookup when off). The builder's arrayElement shape is itself locked by tests/web/rpc/test_common.py::TestAnalyzerSafeFilters.
  • ruff (lint + format) pass.

Rollout

  1. Merge with the flag off (no behavior change).
  2. Backfill indexed_name across the retention window.
  3. Set eap_items_use_indexed_name = 1 to start serving these filters from the bloom filter index.

Linear: EAP-588

🤖 Generated with Claude Code

…name filters

Filtering on the per-item-type primary name attribute (sentry.op for spans,
sentry.metric.name for metrics) was translated into an unindexed
attributes_string bucket map lookup (arrayElement(attributes_string_N, ...)).
The same value is also stored in the dedicated indexed_name String column, which
has a bloom filter index (migration 0057).

Add an IndexedNameOptimizer logical query processor that rewrites
attributes_string['sentry.op'|'sentry.metric.name'] SubscriptableReferences to
the indexed_name column so the bloom filter index is used. Because indexed_name
is item-type specific, the rewrite is only applied when the query is
unambiguously scoped to the owning item type via an equals(item_type, N)
condition; otherwise it safely falls back to the bucket lookup.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01GXBnLMCV79qaAkVRkQu7Nw

Agent transcript: https://claudescope.sentry.dev/share/cdjPeZ3xwrigGK-uGIWE0P3ZX1YWtOxOZwNTOWPmZMo
@phacops phacops requested review from a team as code owners June 17, 2026 23:40
Comment thread snuba/query/processors/logical/indexed_name_optimizer.py
…values

Keeps the optimizer tests in sync with the sentry_protos enum values they
key off of, so a divergence in the upstream enum fails the tests instead of
silently degrading the optimization.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01GXBnLMCV79qaAkVRkQu7Nw
@linear-code

linear-code Bot commented Jun 18, 2026

Copy link
Copy Markdown

EAP-588


item_types: set[int] = set()
for cond in get_first_level_and_conditions(condition):
if not isinstance(cond, FunctionCall) or cond.function_name != "equals":

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: ConditionFunctions.EQ

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call — switched to ConditionFunctions.EQ. Since this branch is already up, the fix is in a small stacked PR off this branch: #8075.

— Claude Code


Generated by Claude Code

@MeredithAnya MeredithAnya left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if it's worth writing a test for IN item_types

Address review nit: reference the ConditionFunctions.EQ constant rather
than the hardcoded "equals" string in IndexedNameOptimizer.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01CajnvSCYTxvJ9Hw5jELRPH
getsantry Bot and others added 6 commits June 19, 2026 03:18
…PC filter shape

The IndexedNameOptimizer matched SubscriptableReference, but the EAP RPC
builder emits map-backed filters as arrayElement(attributes_string, key)
(since #8035), so the optimizer never fired on real traffic. It also
replaced the bucket lookup outright, which would drop rows whose
indexed_name is empty (migration 0057 adds the column with no backfill).

Rewrite a WHERE-clause equals/in filter on the indexed attribute into
or(<op on indexed_name>, <original op on the bucket>): the bloom filter
index can prune granules via indexed_name while the bucket lookup keeps
results correct for non-backfilled rows. Only non-empty string literals
are pushed onto the index (indexed_name = '' would over-match every
non-backfilled row), and SELECT/GROUP BY references keep reading the
bucket.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01Hax3wseAtUEKY7GfpcJNTx
…the fallback

Replace the OR bucket fallback with a direct index-only rewrite
(attributes_string[key] -> indexed_name), gated behind the
eap_items_use_indexed_name runtime flag (default off). The fallback forced
ClickHouse to scan every granule (an OR with the unindexed bucket branch is
never prunable), so it negated the bloom filter index it was meant to use.

Index-only filtering is only correct once indexed_name is fully populated
across the retention window (migration 0057 adds the column without a
backfill), so the flag must stay off until then; flipping it on lets the
bloom filter index serve sentry.op / sentry.metric.name filters. The rewrite
still matches the real arrayElement(attributes_string, key) shape emitted by
the RPC builder (and the SnQL SubscriptableReference form).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01Hax3wseAtUEKY7GfpcJNTx
…rm closure

mypy widens the narrowed Optional[str] back to Optional[str] inside the nested
transform() closure, failing the pre-commit mypy hook. Bind it to a str-typed
local before the closure.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01Hax3wseAtUEKY7GfpcJNTx
…ia indexed_name

Add an EndpointTimeSeries test that writes METRIC items and runs a
sentry.metric.name-filtered SUM time series with the IndexedNameOptimizer flag
enabled. Asserts the name filter still returns the correct rows once it's
rewritten onto the indexed_name column, and that results are identical with the
flag off (bucket lookup) and on (indexed_name).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01Hax3wseAtUEKY7GfpcJNTx
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants