feat(eap-items): double-write arrays into typed map columns by phacops · Pull Request #8079 · getsentry/snuba

phacops · 2026-06-19T19:55:25Z

Summary

Follow-up to #8056, which added the typed Map(String, Array(T)) columns for array-valued attributes (attributes_array_string, attributes_array_int, attributes_array_float, attributes_array_bool) but left populating them as a follow-up.

This PR populates those columns from EAPItemsProcessor (the Rust eap_items processor), double-writing them alongside the existing attributes_array JSON column. The JSON column keeps being written so the read path is unaffected until it is migrated to the typed columns in a separate change.

What it does

In AttributeMap::insert_array, each array attribute is split by element type into the matching typed bucket and inserted under the same key:

string elements → attributes_array_string
double elements → attributes_array_float
int elements → attributes_array_int and attributes_array_float
bool elements → attributes_array_bool

Homogeneous arrays — the common case — populate exactly one bucket; mixed arrays are split across buckets by element type.

Integer elements are also written as floats, mirroring the scalar insert_int double-write. The read path resolves numeric attribute types (val_int, val_float, val_double, and their array variants) to the float columns via _VALUE_TYPE_TO_COLUMN, so int arrays must be present in attributes_array_float to be queryable, while attributes_array_int retains exact integer values.

The four new columns are appended to EAPItemRow and to EAPItemRow::COLUMN_NAMES (sent explicitly with INSERT ... FORMAT RowBinary) in matching struct order. The serializer already encodes Vec<(String, Vec<T>)> as Map(String, Array(T)).

Tests

New test_array_typed_columns_double_write covers homogeneous arrays of each type, the int→float double-write, no cross-type bleed, and that the legacy JSON column is still written.
test_row_binary_attributes now also asserts the typed attributes_array_string column is populated.
test_column_names_match_struct_layout updated for the four new columns (96 → 100).

All processors::eap_items unit tests pass; cargo clippy --lib and cargo fmt are clean. (test_row_binary_clickhouse_insert requires a live ClickHouse and only runs in CI.)

🤖 Generated with Claude Code

Generated by Claude Code

Populate the typed `Map(String, Array(T))` columns added in #8056 (`attributes_array_string`, `attributes_array_int`, `attributes_array_float`, `attributes_array_bool`) from `EAPItemsProcessor`, alongside the existing `attributes_array` JSON column. This is a double write: the legacy JSON column keeps being written so the read path is unaffected until it is migrated to the typed columns. Each array attribute is split by element type into the matching typed bucket. Homogeneous arrays — the common case — populate exactly one bucket. Integer elements are also written as floats, mirroring the scalar `insert_int` double-write, because the read path resolves numeric attributes to the float columns (`_VALUE_TYPE_TO_COLUMN`). The new columns are appended to `EAPItemRow` and `COLUMN_NAMES` (which is sent explicitly on `INSERT ... FORMAT RowBinary`) in matching order. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_017PBiwoiUxf5TAnzWmbZebu

Satisfies clippy::unnecessary_get_then_check on the CI toolchain. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_017PBiwoiUxf5TAnzWmbZebu

linear-code · 2026-06-22T18:28:33Z

EAP-573

Address review feedback: - Drop the int->float double-write for arrays. Integer array elements now go only to `attributes_array_int`, not also to `attributes_array_float`. - Stop cloning array values (and keys) when populating the typed columns. Instead of building parallel typed maps in `insert_array` (which cloned every string and every key), `insert_array` is back to only building the `attributes_array` JSON representation. The typed `Map(String, Array(T))` columns are derived in `TryFrom<EAPItem> for EAPItemRow` by consuming that representation and moving the values into the typed buckets. A `push_typed_array` helper moves the attribute key into the single bucket a homogeneous array touches (the common case) and only clones it for the extra buckets a rare mixed-type array spans. This keeps the production RowBinary path (eap_items always runs with --use-row-binary) writing both the legacy JSON column and the typed columns, with no per-element clones. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_017PBiwoiUxf5TAnzWmbZebu

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit 85f5f3a. Configure here.}

cursor · 2026-06-22T19:11:50Z

+                &mut remaining,
+                floats,
+            );
+            push_typed_array(&mut attributes_array_bool, &mut key, &mut remaining, bools);


Int arrays skip float column

Medium Severity

Homogeneous integer array attributes are written only to attributes_array_int, not also to attributes_array_float. That breaks parity with scalar insert_int and with _VALUE_TYPE_TO_COLUMN, which routes val_int_array to the float column family, so migrated read paths would not find int array data where queries expect it.

^{Reviewed by Cursor Bugbot for commit 85f5f3a. Configure here.}

The typed Map(String, Array(T)) columns were only being derived in the RowBinary EAPItemRow conversion, so the JSON (JSONEachRow) path stopped double-writing them. Move the typed maps back onto AttributeMap, which both write paths serialize, so the double-write happens on both the JSON and RowBinary paths. Simplify the write: insert_array now appends each element to its typed column as the value is read (entry API), instead of buffering into per-type vectors and deriving the columns afterward. The EAPItemRow conversion is a plain move of the maps again, and the push_typed_array helper is gone. The JSON<->RowBinary equivalence test now also asserts the typed array columns match across both paths. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_017PBiwoiUxf5TAnzWmbZebu

sentry · 2026-06-22T19:45:29Z

+                Some(Value::IntValue(int)) => {
+                    self.attributes_array_int
+                        .entry(k.clone())
+                        .or_default()
+                        .push(int);
+                    values.push(EAPValue::Int(int));


Bug: Integer arrays are only written to attributes_array_int, not the float column, making them unqueryable after the planned read-path migration, which is expected to read from the float column.
_{Severity: HIGH}

Suggested Fix

Modify the insert_array function for Value::IntValue. When an integer array is processed, it should be written to both attributes_array_int and attributes_array_float to mirror the behavior of scalar integers. This ensures that the data will be queryable when the read path is migrated to use the new typed array columns.

Prompt for AI Agent

Review the code at the location below. A potential bug has been identified by an AI agent. Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not valid. Location: rust_snuba/src/processors/eap_items.rs#L502-L507 Potential issue: The `insert_array` method for integer values writes data exclusively to the `attributes_array_int` column. This is inconsistent with the existing behavior for scalar integers, which are double-written to both integer and float columns (`attributes_float`). The read path currently resolves numeric types to float columns to enable querying. When the read path is migrated for arrays in a future change, it will likely look for integer arrays in `attributes_array_float`. Since the data written by this PR will only be in `attributes_array_int`, queries for this data will return empty results, leading to silent data inaccessibility.

Also affects:

rust_snuba/src/processors/eap_items.rs:1393~1394

_{Did we get this right? 👍 / 👎 to inform future reviews.}

phacops requested a review from a team as a code owner June 19, 2026 19:55

fix(eap-items): use !contains_key instead of get().is_none() in test

2a1347f

Satisfies clippy::unnecessary_get_then_check on the CI toolchain. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_017PBiwoiUxf5TAnzWmbZebu

MeredithAnya approved these changes Jun 22, 2026

View reviewed changes

cursor Bot reviewed Jun 22, 2026

View reviewed changes

sentry Bot reviewed Jun 22, 2026

View reviewed changes

phacops enabled auto-merge (squash) June 22, 2026 21:53

Merge branch 'master' into claude/awesome-ride-pfuppy

72db08d

phacops merged commit d5f53d3 into master Jun 22, 2026
68 checks passed

phacops deleted the claude/awesome-ride-pfuppy branch June 22, 2026 22:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(eap-items): double-write arrays into typed map columns#8079

feat(eap-items): double-write arrays into typed map columns#8079
phacops merged 5 commits into
masterfrom
claude/awesome-ride-pfuppy

phacops commented Jun 19, 2026

Uh oh!

linear-code Bot commented Jun 22, 2026

Uh oh!

cursor Bot left a comment

Uh oh!

cursor Bot Jun 22, 2026

Uh oh!

sentry Bot Jun 22, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

phacops commented Jun 19, 2026

Summary

What it does

Tests

Uh oh!

linear-code Bot commented Jun 22, 2026

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor Bot Jun 22, 2026

Choose a reason for hiding this comment

Int arrays skip float column

Uh oh!

sentry Bot Jun 22, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants