feat(eap-items): double-write arrays into typed map columns#8079
Conversation
Populate the typed `Map(String, Array(T))` columns added in #8056 (`attributes_array_string`, `attributes_array_int`, `attributes_array_float`, `attributes_array_bool`) from `EAPItemsProcessor`, alongside the existing `attributes_array` JSON column. This is a double write: the legacy JSON column keeps being written so the read path is unaffected until it is migrated to the typed columns. Each array attribute is split by element type into the matching typed bucket. Homogeneous arrays — the common case — populate exactly one bucket. Integer elements are also written as floats, mirroring the scalar `insert_int` double-write, because the read path resolves numeric attributes to the float columns (`_VALUE_TYPE_TO_COLUMN`). The new columns are appended to `EAPItemRow` and `COLUMN_NAMES` (which is sent explicitly on `INSERT ... FORMAT RowBinary`) in matching order. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_017PBiwoiUxf5TAnzWmbZebu
Satisfies clippy::unnecessary_get_then_check on the CI toolchain. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_017PBiwoiUxf5TAnzWmbZebu
Address review feedback: - Drop the int->float double-write for arrays. Integer array elements now go only to `attributes_array_int`, not also to `attributes_array_float`. - Stop cloning array values (and keys) when populating the typed columns. Instead of building parallel typed maps in `insert_array` (which cloned every string and every key), `insert_array` is back to only building the `attributes_array` JSON representation. The typed `Map(String, Array(T))` columns are derived in `TryFrom<EAPItem> for EAPItemRow` by consuming that representation and moving the values into the typed buckets. A `push_typed_array` helper moves the attribute key into the single bucket a homogeneous array touches (the common case) and only clones it for the extra buckets a rare mixed-type array spans. This keeps the production RowBinary path (eap_items always runs with --use-row-binary) writing both the legacy JSON column and the typed columns, with no per-element clones. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_017PBiwoiUxf5TAnzWmbZebu
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 85f5f3a. Configure here.
| &mut remaining, | ||
| floats, | ||
| ); | ||
| push_typed_array(&mut attributes_array_bool, &mut key, &mut remaining, bools); |
There was a problem hiding this comment.
Int arrays skip float column
Medium Severity
Homogeneous integer array attributes are written only to attributes_array_int, not also to attributes_array_float. That breaks parity with scalar insert_int and with _VALUE_TYPE_TO_COLUMN, which routes val_int_array to the float column family, so migrated read paths would not find int array data where queries expect it.
Reviewed by Cursor Bugbot for commit 85f5f3a. Configure here.
The typed Map(String, Array(T)) columns were only being derived in the RowBinary EAPItemRow conversion, so the JSON (JSONEachRow) path stopped double-writing them. Move the typed maps back onto AttributeMap, which both write paths serialize, so the double-write happens on both the JSON and RowBinary paths. Simplify the write: insert_array now appends each element to its typed column as the value is read (entry API), instead of buffering into per-type vectors and deriving the columns afterward. The EAPItemRow conversion is a plain move of the maps again, and the push_typed_array helper is gone. The JSON<->RowBinary equivalence test now also asserts the typed array columns match across both paths. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_017PBiwoiUxf5TAnzWmbZebu
| Some(Value::IntValue(int)) => { | ||
| self.attributes_array_int | ||
| .entry(k.clone()) | ||
| .or_default() | ||
| .push(int); | ||
| values.push(EAPValue::Int(int)); |
There was a problem hiding this comment.
Bug: Integer arrays are only written to attributes_array_int, not the float column, making them unqueryable after the planned read-path migration, which is expected to read from the float column.
Severity: HIGH
Suggested Fix
Modify the insert_array function for Value::IntValue. When an integer array is processed, it should be written to both attributes_array_int and attributes_array_float to mirror the behavior of scalar integers. This ensures that the data will be queryable when the read path is migrated to use the new typed array columns.
Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent. Verify if this is a real issue. If it is, propose a fix; if not, explain why it's
not valid.
Location: rust_snuba/src/processors/eap_items.rs#L502-L507
Potential issue: The `insert_array` method for integer values writes data exclusively to
the `attributes_array_int` column. This is inconsistent with the existing behavior for
scalar integers, which are double-written to both integer and float columns
(`attributes_float`). The read path currently resolves numeric types to float columns to
enable querying. When the read path is migrated for arrays in a future change, it will
likely look for integer arrays in `attributes_array_float`. Since the data written by
this PR will only be in `attributes_array_int`, queries for this data will return empty
results, leading to silent data inaccessibility.
Also affects:
rust_snuba/src/processors/eap_items.rs:1393~1394
Did we get this right? 👍 / 👎 to inform future reviews.


Summary
Follow-up to #8056, which added the typed
Map(String, Array(T))columns for array-valued attributes (attributes_array_string,attributes_array_int,attributes_array_float,attributes_array_bool) but left populating them as a follow-up.This PR populates those columns from
EAPItemsProcessor(the Rusteap_itemsprocessor), double-writing them alongside the existingattributes_arrayJSON column. The JSON column keeps being written so the read path is unaffected until it is migrated to the typed columns in a separate change.What it does
In
AttributeMap::insert_array, each array attribute is split by element type into the matching typed bucket and inserted under the same key:attributes_array_stringattributes_array_floatattributes_array_intandattributes_array_floatattributes_array_boolHomogeneous arrays — the common case — populate exactly one bucket; mixed arrays are split across buckets by element type.
Integer elements are also written as floats, mirroring the scalar
insert_intdouble-write. The read path resolves numeric attribute types (val_int,val_float,val_double, and their array variants) to the float columns via_VALUE_TYPE_TO_COLUMN, so int arrays must be present inattributes_array_floatto be queryable, whileattributes_array_intretains exact integer values.The four new columns are appended to
EAPItemRowand toEAPItemRow::COLUMN_NAMES(sent explicitly withINSERT ... FORMAT RowBinary) in matching struct order. The serializer already encodesVec<(String, Vec<T>)>asMap(String, Array(T)).Tests
test_array_typed_columns_double_writecovers homogeneous arrays of each type, the int→float double-write, no cross-type bleed, and that the legacy JSON column is still written.test_row_binary_attributesnow also asserts the typedattributes_array_stringcolumn is populated.test_column_names_match_struct_layoutupdated for the four new columns (96 → 100).All
processors::eap_itemsunit tests pass;cargo clippy --libandcargo fmtare clean. (test_row_binary_clickhouse_insertrequires a live ClickHouse and only runs in CI.)🤖 Generated with Claude Code
Generated by Claude Code