enable customizing list inner child element name?

When Spark outputs a parquet file, I believe it always uses the inner list item name of `element` as opposed to `item`:

```proto
message spark_schema {
  ....
  OPTIONAL group mylistcolumn (LIST) {
    REPEATED group list {
      OPTIONAL BYTE_ARRAY element (UTF8);
    }
  }
  ...
}
```

It appears this crate (or one of its dependencies, perhaps arrow2 itself?), is always assuming that the inner field name of a list is `item` rather than `element`.

>Expected: Struct([Field { name: \"mylistcolumn\", data_type: List(Field { name: \"item\", data_type: Int32, is_nullable: false, metadata: {} }), is_nullable: false, metadata: {} }])

>Actual: Struct([Field { name: \"mylistcolumn\", data_type: List(Field { name: \"element\", data_type: Int32, is_nullable: false, metadata: {} }), is_nullable: false, metadata: {} }])

I'm guessing this is because of this line of code?

https://github.com/DataEngineeringLabs/arrow2-convert/blob/7d9e13254a74b853019ad2e731814bdb16284932/arrow2_convert/src/field.rs#L214

1. If this is controlled by arrow2-convert, can we perhaps customize this via an annotation on the struct member?
2. Should the default by re-evaluated if [parquet-mr / Spark uses `element`](https://github.com/apache/parquet-mr/blob/5608695f5777de1eb0899d9075ec9411cfdf31d3/parquet-column/src/main/java/org/apache/parquet/schema/ConversionPatterns.java#L34)?

P.S. Likely not related, but I ran into a very similar error in this other crate as well: https://github.com/timvw/qv/issues/31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

enable customizing list inner child element name? #84

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

enable customizing list inner child element name? #84

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions