Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
32 changes: 32 additions & 0 deletions arrow-array/src/array/map_array.rs
Original file line number Diff line number Diff line change
Expand Up @@ -61,6 +61,7 @@ impl MapArray {
/// * `entries.null_count() != 0`
/// * `entries.columns().len() != 2`
/// * `field.data_type() != entries.data_type()`
/// * the keys field is nullable
pub fn try_new(
field: FieldRef,
offsets: OffsetBuffer<i32>,
Expand Down Expand Up @@ -109,6 +110,16 @@ impl MapArray {
)));
}

// The Arrow spec requires the "key" field to be non-nullable
// <https://github.com/apache/arrow/blob/98347d233f03bcf4d116d77f1769d498902b1fc8/format/Schema.fbs#L138>
if entries.fields()[0].is_nullable() {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we check the data type? Or maybe just the actual contents of the array (null_count() > 0)?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I worry that the data_type is not always set super precisely (as in there may be some code that claims the key is "nullable" for convenience, but in reality the key is not nullable)

🤔

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i feel its better to be strict/explicit in this case

for reference, in the builder we also check nullability already:

let keys_field = match &self.key_field {
Some(f) => {
assert!(!f.is_nullable(), "Keys field must not be nullable");
f.clone()
}

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already check that entries data type equal to field data type

Which struct is not nullable so it should enforce the no nulls

This is why I did not add it

return Err(ArrowError::InvalidArgumentError(
"MapArray keys field cannot be nullable".to_string(),
));
}
// No need to verify if key contain nulls since `StructArray`
// already disallow nulls for non nullable fields

Ok(Self {
data_type: DataType::Map(field, ordered),
nulls,
Expand Down Expand Up @@ -931,6 +942,27 @@ mod tests {
);
}

#[test]
fn test_try_new_nullable_keys_field() {
// https://github.com/apache/arrow-rs/issues/10268
let keys = Int32Array::from(vec![Some(1), None]);
let values = Int32Array::from(vec![None, Some(2)]);
let fields = Fields::from(vec![
Field::new("keys", DataType::Int32, true),
Field::new("values", DataType::Int32, true),
]);
let entries =
StructArray::new(fields.clone(), vec![Arc::new(keys), Arc::new(values)], None);
let field = Arc::new(Field::new("entries", DataType::Struct(fields), false));

let err = MapArray::try_new(field, OffsetBuffer::from_lengths([2]), entries, None, false)
.unwrap_err();
assert_eq!(
err.to_string(),
"Invalid argument error: MapArray keys field cannot be nullable"
);
}

#[test]
fn test_from_vec_of_maps() {
for ordered in [true, false] {
Expand Down
Loading
Loading