feat(eap-items): use indexed_name column for sentry.op/sentry.metric.name filters#8052
Open
phacops wants to merge 9 commits into
Open
feat(eap-items): use indexed_name column for sentry.op/sentry.metric.name filters#8052phacops wants to merge 9 commits into
phacops wants to merge 9 commits into
Conversation
…name filters Filtering on the per-item-type primary name attribute (sentry.op for spans, sentry.metric.name for metrics) was translated into an unindexed attributes_string bucket map lookup (arrayElement(attributes_string_N, ...)). The same value is also stored in the dedicated indexed_name String column, which has a bloom filter index (migration 0057). Add an IndexedNameOptimizer logical query processor that rewrites attributes_string['sentry.op'|'sentry.metric.name'] SubscriptableReferences to the indexed_name column so the bloom filter index is used. Because indexed_name is item-type specific, the rewrite is only applied when the query is unambiguously scoped to the owning item type via an equals(item_type, N) condition; otherwise it safely falls back to the bucket lookup. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01GXBnLMCV79qaAkVRkQu7Nw Agent transcript: https://claudescope.sentry.dev/share/cdjPeZ3xwrigGK-uGIWE0P3ZX1YWtOxOZwNTOWPmZMo
…values Keeps the optimizer tests in sync with the sentry_protos enum values they key off of, so a divergence in the upstream enum fails the tests instead of silently degrading the optimization. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01GXBnLMCV79qaAkVRkQu7Nw
|
|
||
| item_types: set[int] = set() | ||
| for cond in get_first_level_and_conditions(condition): | ||
| if not isinstance(cond, FunctionCall) or cond.function_name != "equals": |
Member
There was a problem hiding this comment.
nit: ConditionFunctions.EQ
Contributor
Author
There was a problem hiding this comment.
Good call — switched to ConditionFunctions.EQ. Since this branch is already up, the fix is in a small stacked PR off this branch: #8075.
— Claude Code
Generated by Claude Code
MeredithAnya
approved these changes
Jun 18, 2026
MeredithAnya
left a comment
Member
There was a problem hiding this comment.
Not sure if it's worth writing a test for IN item_types
Address review nit: reference the ConditionFunctions.EQ constant rather than the hardcoded "equals" string in IndexedNameOptimizer. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01CajnvSCYTxvJ9Hw5jELRPH
…xed-name-filter-optimizer
…PC filter shape The IndexedNameOptimizer matched SubscriptableReference, but the EAP RPC builder emits map-backed filters as arrayElement(attributes_string, key) (since #8035), so the optimizer never fired on real traffic. It also replaced the bucket lookup outright, which would drop rows whose indexed_name is empty (migration 0057 adds the column with no backfill). Rewrite a WHERE-clause equals/in filter on the indexed attribute into or(<op on indexed_name>, <original op on the bucket>): the bloom filter index can prune granules via indexed_name while the bucket lookup keeps results correct for non-backfilled rows. Only non-empty string literals are pushed onto the index (indexed_name = '' would over-match every non-backfilled row), and SELECT/GROUP BY references keep reading the bucket. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01Hax3wseAtUEKY7GfpcJNTx
…the fallback Replace the OR bucket fallback with a direct index-only rewrite (attributes_string[key] -> indexed_name), gated behind the eap_items_use_indexed_name runtime flag (default off). The fallback forced ClickHouse to scan every granule (an OR with the unindexed bucket branch is never prunable), so it negated the bloom filter index it was meant to use. Index-only filtering is only correct once indexed_name is fully populated across the retention window (migration 0057 adds the column without a backfill), so the flag must stay off until then; flipping it on lets the bloom filter index serve sentry.op / sentry.metric.name filters. The rewrite still matches the real arrayElement(attributes_string, key) shape emitted by the RPC builder (and the SnQL SubscriptableReference form). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01Hax3wseAtUEKY7GfpcJNTx
…rm closure mypy widens the narrowed Optional[str] back to Optional[str] inside the nested transform() closure, failing the pre-commit mypy hook. Bind it to a str-typed local before the closure. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01Hax3wseAtUEKY7GfpcJNTx
…ia indexed_name Add an EndpointTimeSeries test that writes METRIC items and runs a sentry.metric.name-filtered SUM time series with the IndexedNameOptimizer flag enabled. Asserts the name filter still returns the correct rows once it's rewritten onto the indexed_name column, and that results are identical with the flag off (bucket lookup) and on (indexed_name). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01Hax3wseAtUEKY7GfpcJNTx
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Filters on the per-item-type primary name attribute —
sentry.opfor spans,sentry.metric.namefor metrics — are translated into an unindexedattributes_stringbucket map lookup (arrayElement(attributes_string_N, ...)). That same value is also stored in the dedicatedindexed_nameStringcolumn, which carries a bloom filter index (added in migration0057_add_name_column_and_index).This PR adds an
IndexedNameOptimizerlogical query processor that rewrites such a filter to reference the indexedindexed_namecolumn, so the bloom filter index is used instead of scanning the bucket map. The rewrite is gated behind a runtime config flag (default off).How it works
arrayElement(column('attributes_string'), literal(key))(see_map_backed_operands, added in fix(eap): Generate minimal NULL-free SQL for map-backed attribute filters #8035), not as aSubscriptableReference. The processor targets thearrayElementform (and also handles the hand-written SnQLSubscriptableReferenceform). It runs beforeHashBucketFunctionTransformer, so any access it doesn't rewrite still gets mapped to its hashed bucket column downstream.indexed_nameis item-type specific (sentry.oponly for spans,sentry.metric.nameonly for metrics), so the rewrite is applied only when the query is unambiguously scoped to the owning item type via a singleequals(item_type, N)condition. The EAP resolvers always inject that condition (resolver_trace_item_table.py), so RPC traffic is covered; otherwise it leaves the access untouched and falls back to the bucket lookup.indexed_namewas added (with no backfill) by migration0057, so rows written before the column existed — or before the value started being populated — have an emptyindexed_namewhile still carrying the value inattributes_string. Readingindexed_namefor those rows would silently drop them, so the rewrite is gated behind theeap_items_use_indexed_nameruntime flag (default0). It must only be enabled onceindexed_nameis fully populated across the live retention window.Why index-only (no bucket OR fallback)
An earlier iteration kept an
or(equals(indexed_name, v), equals(bucket, v))fallback for correctness. That negates the optimization: a skip index can only prune a granule when every branch of anORis provably non-matching, and the bucket branch has no index — so ClickHouse would scan every granule and the bloom filter would never prune. The config flag gives the same safety (the rewrite simply stays off until backfill is done) while preserving the full index benefit once enabled.Changes
snuba/query/processors/logical/indexed_name_optimizer.py— the processor.snuba/datasets/configuration/events_analytics_platform/entities/eap_items.yaml— wired in ahead of the hash-bucket processors.tests/query/processors/test_indexed_name_optimizer.py— unit tests.Testing
arrayElementandSubscriptableReferenceforms,IN, an arbitrary operator (notEquals),SELECTalias preservation, mismatched key, non-indexed attribute, missing/unsupported/ambiguousitem_type, and the flag-off no-op path.equals(arrayElement(column('attributes_string'), literal('sentry.op')), literal('db.query')), and that with the flag on, running the processor followed byHashBucketFunctionTransformeryieldsequals(indexed_name, 'db.query')(vs. the unindexedattributes_string_19bucket lookup when off). The builder'sarrayElementshape is itself locked bytests/web/rpc/test_common.py::TestAnalyzerSafeFilters.ruff(lint + format) pass.Rollout
indexed_nameacross the retention window.eap_items_use_indexed_name = 1to start serving these filters from the bloom filter index.Linear: EAP-588
🤖 Generated with Claude Code