Skip to content

is it possible to run try_into_collection on a Chunk instead of an Array? #82

@AlJohri

Description

@AlJohri

Starting with the parquet_read_parallel example from arrow2, I am trying to deserialize a Chunk into a Vec of structs.

Using the deserialize_parallel function as defined in the above example, the following code currently works for me:

pub struct Document {
    content: String,
}

...
let chunk = deserialize_parallel(&mut columns)?;
let array = StructArray::new(
    DataType::Struct(fields.clone()),
    chunk.arrays().to_vec(),
    None,
);
let documents: Vec<Document> = array.to_boxed().try_into_collection().unwrap();

Questions:

  1. With the currently exposed APIs in arrow2 and arrow2-convert, is there a better way to convert the Chunk into a Struct? I think the extra conversion from Chunk to StructArray with the to_boxed at the end is perhaps not the most efficient.
  2. Would it be possible to expose TryIntoCollection::try_into_collection directly on the Chunk as well?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions