feat(sedonadb): `sd_read_zarr` UDTF via datafusion-ffi plugin handoff by james-willis · Pull Request #866 · apache/sedona-db

james-willis · 2026-05-21T21:32:44Z

Draft — stacked on top of #858. Don't review until #858 lands.

Restores the SQL UDTF surface for the Zarr plugin that was stripped from #858 per maintainer guidance ("we don't have a read_parquet() either"). Tracked as DB-34.

What this adds

sd_read_zarr SQL UDTF — SELECT * FROM sd_read_zarr('file:///path/to/foo.zarr') works alongside the existing con.read_format(ZarrFormatSpec(), uri) path.
Cross-cdylib UDTF handoff via datafusion-ffi's FFI_TableFunction capsule contract (the same wire format datafusion-python uses). Plugin and host don't share Rust types directly — all data flows through Arrow C Stream + the ABI-stable FFI structs.
Host-side session/codec capsule machinery in python/sedonadb/src/plugin.rs. InternalContext::register_udtf_capsule(name, spec) builds the codec capsule, calls the plugin's __datafusion_table_function__(session), and converts the returned capsule into an Arc<dyn TableFunctionImpl>.
Plugin-side ZarrTableFunction PyO3 class with __datafusion_table_function__, plus register(con) Python helper.

Why this is its own PR

The UDTF surface required ~1000 lines of plumbing (FFI helpers, codec capsules, TaskContextProvider lifetime adapter, etc.) for an ergonomic-but-not-essential surface. Splitting lets the zarr feature land without that complexity and gives the plugin UDTF pattern its own focused review — including whether we want to generalize it (a con.register_table_function(name, spec) Python helper that any plugin can use).

Status

Branch: `jw/sd-read-zarr-udtf`
All commits originally on feat(raster-zarr): sedona-raster-zarr crate + sedona-zarr plugin #858 through `8e44f572`, then the UDTF strip on feat(raster-zarr): sedona-raster-zarr crate + sedona-zarr plugin #858 is at `16eafd41`.
Tests pass locally (7 plugin tests including the UDTF smoke + dtype matrix).
Will need rebase onto main after feat(raster-zarr): sedona-raster-zarr crate + sedona-zarr plugin #858 merges.

Open design questions for the eventual review

Should there be a generic con.register_table_function(name, spec) so future plugins (`sedonadb-cog`, `sedonadb-icechunk`) get the UDTF pattern for free, instead of each one calling `_impl.register_udtf_capsule`?
Naming: `sd_read_zarr` matches `sd_random_geometry` in this repo; `zarr_scan` would match `delta_scan`/`iceberg_scan` from the wider DataFusion ecosystem.
`load_eager` is intentionally absent here too — the eager-fetch path lands with the async `RS_EnsureLoaded` resolver.

Adds a sedona-raster-zarr crate that maps Zarr groups onto SedonaDB's N-D raster Arrow schema, plus an sd_read_zarr DataFusion UDTF that exposes it to SQL. ## sedona-raster-zarr Two entry points emit one raster row per chunk position in the group's chunk grid, with one band per array: * group_to_indb_rasters(group_uri) — eager: fetches every chunk's bytes into the Arrow data column. * group_to_outdb_rasters(group_uri) — URI-only: emits chunk-anchor URIs in each band's outdb_uri (<store>#array=<path>&chunk=<i0>,...). Built on the zarrs 0.23 crate, chosen over GDAL MultiDim because zarrs reads Zarr v3 sharding and vlen-utf8 coordinate variables. Modules: * source_uri — group-URI normalisation (file:// + bare path; cloud schemes error pending the resolver work) and chunk anchor build/parse. * geozarr — proj:wkt2 / proj:projjson / proj:epsg precedence and spatial:transform / spatial:dims parsing from group attributes. * dtype — zarrs DataType ↔ BandDataType via the type-erased is::<T>() downcast (zarrs 0.23 wraps Arc<dyn DataTypeTraits>, not an enum). * loader — opens FilesystemStore + Group, enumerates child arrays (sorted by path for deterministic band order), validates shared chunk grid / chunk shape / dim names, walks the chunk grid row-major, computes per-chunk transforms by translating the group affine along the spatial axes. Tests: 31 in-module unit tests + 4 integration tests that build a real Zarr group on disk via zarrs::ArrayBuilder and verify byte-exact pixel content, per-chunk transforms, anchor URI format, and error paths. ## sd_read_zarr UDTF DataFusion table function exposing the loader to SQL. Mirrors sd_random_geometry's shape (Function + Provider + Exec) and registers in SedonaContext::new_from_context. SELECT raster FROM sd_read_zarr('file:///path/to/datacube.zarr'); SELECT count(*) FROM sd_read_zarr( 'file:///path/to/datacube.zarr', '{\"mode\": \"outdb\", \"rows_per_batch\": 256}' ); Returns a single-column table raster: Raster. All existing RS_* UDFs operate on the column unchanged. JSON options: * mode: \"indb\" (default) or \"outdb\". Phase 1 defaults to InDb so byte-reading kernels work end-to-end. Flips to \"outdb\" once a format-keyed OutDb dispatcher registers a zarr loader. * rows_per_batch: chunks per RecordBatch (default 1024). * num_partitions: scan partitions (default 1). Anything other than 1 errors — round-robin chunk partitioning lands with the resolver work. ZarrChunkProvider builds the StructArray once at plan time. scan() wraps the inner exec in ProjectionExec so empty / partial projections (e.g. SELECT count(*)) match the requested physical schema. ZarrChunkExec slices the StructArray into rows_per_batch-sized RecordBatches. Bounded, Incremental, single-partition. 6 in-module integration tests build a Zarr fixture and exercise the full SQL pipeline. ## Out of scope (follow-up PR) * OutDb byte resolver (the function the byte-loading hook calls to fetch zarr chunks) — depends on a format-keyed multi-backend dispatcher. * Default mode flip from \"indb\" to \"outdb\". * Cloud storage backends (zarrs_object_store) and the async runtime story they require. * Round-robin chunk partitioning past num_partitions=1. * CREATE EXTERNAL TABLE ... STORED AS ZARR.

Adds a `con.funcs.table.sd_read_zarr(uri, *, mode, rows_per_batch, num_partitions)` Python method that mirrors the SQL UDTF added in the companion Rust commit. Same shape as the existing `sd_random_geometry` wrapper: builds the JSON options object, filters out None entries, and runs the SQL UDTF through `self._ctx.sql`. Python test uses `pytest.importorskip("zarr")` so it cleanly skips when the optional `zarr` library isn't installed in the test environment. Builds a 2×2 UInt8 Zarr group on disk via the official `zarr` Python lib and exercises both the default-options form and the `rows_per_batch` override.

Round of review feedback on the sd_read_zarr loader and UDTF surface: - Drop "Phase 1" references from code, error messages, and docstrings; describe behaviour in absolute terms. - Bound-check the chunk-grid product so a malformed or hostile Zarr can't drive RasterBuilder capacity to overflow / OOM. - Always populate `outdb_uri` and `outdb_format` on every band as provenance metadata; InDb vs OutDb stays discriminated by `data.is_empty()`. - Fall back to xarray's `_ARRAY_DIMENSIONS` attribute when an array lacks the first-class `dimension_names` field (covers Zarr v2 and any v3 file that uses the xarray convention). - Auto-skip 1-D arrays at selection time (typical xarray coord variables) — they can never produce a valid raster band, so reading them would always fail downstream at spatial-dim resolution. - Add an `arrays = [...]` option (Python kwarg + UDTF JSON key) to read a named subset of arrays; unknown names and 1-D arrays both error early with clear messages pointing at the option. - Error loudly when a group declares a CRS but no `spatial:transform` attribute — the high-confidence-bug case where falling back to the identity transform would silently produce wrong spatial-join results. - Warn-log when both CRS and transform are absent and we fall back to pixel-coordinate identity, so spatial-join surprises are debuggable from logs. - Flip the InDb/OutDb selector from `mode: "indb"|"outdb"` to `indb: bool` end-to-end (Python kwarg, SQL JSON key, internal options struct). The schema-level discriminator is binary, so a boolean is the more honest UI. Tests cover every new branch: auto-skip, explicit array filter, 1-D rejection, _ARRAY_DIMENSIONS fallback, CRS-without-transform error, indb-bool plumbing through SQL.

zarrs 0.23 deprecated `store_chunk_elements::<T>` in favour of `store_chunk()`. Migration is mechanical for u8 element types since the byte representation is identity. Drops the `#[allow(deprecated)]` annotations the fixtures carried.

`load_eager` more accurately describes the user-facing knob: the boolean controls whether chunk bytes are materialised eagerly or left as chunk-anchor URIs, not which storage class the resulting raster lives in. Renames the JSON key (`indb` → `load_eager`), the Python kwarg, the internal `ZarrReadOptions` field, and updates docstrings to reflect the long-term plan: once an async `RS_EnsureLoaded` UDF lands, the planner will auto-inject it over the scan output rather than the loader doing eager fetching at plan time. The user-facing flag name doesn't change when that swap happens.

- Narrow the blosc Windows cfg gate from "any Windows" to "Windows MinGW only" (`target_env = "gnu"`). MSVC Windows users (Python wheels) now get blosc-compressed Zarr support back; only the R rtools45 MinGW build path still disables it. - Drop the hardcoded `rows_per_batch = 1024` default in `ZarrChunkExec`. When the UDTF option is unset, defer at execute time to `SessionConfig::batch_size` so users tune via the standard knob instead of the loader baking in its own constant. - Python test: lift `numpy` and `pytest` imports to module level (only `zarr` remains `importorskip`-gated); materialise the resulting DataFrame via `to_arrow_table()` and assert on the Raster struct row instead of just `count()`.

The loader had two near-duplicate code paths (`group_to_indb_rasters` and `group_to_outdb_rasters`) gated by a `Mode` enum. The InDb path fetched every chunk's bytes on a single thread at plan time — exactly the antipattern that the async `RS_EnsureLoaded` resolver will replace. Rather than ship the bad path and tear it out later, collapse to a single OutDb-producing entry point now. - New single entry: `group_to_rasters(uri, arrays)`. Always emits chunk-anchor URIs in `outdb_uri`; `data` is empty. - `Mode::InDb` / `Mode::OutDb` enum and the `match mode` branches in `build_rasters` are gone. - `retrieve_chunk_bytes` is kept (the only pixel-byte read primitive in the crate; the future async resolver will want it as a starting point) but is now `#[allow(dead_code)]` and exercised by a direct unit test rather than indirectly via the deleted InDb integration test. - `ArrayInfo::array` field dropped — only the InDb path used it. - `sd_read_zarr()` default for `load_eager` flips from `true` to `false`; explicit `load_eager = true` errors with a message pointing at the future `RS_EnsureLoaded` resolver. - Tests rewritten: `indb_round_trip_emits_one_row_per_chunk_position` is folded into a single `round_trip_emits_one_row_per_chunk_position_with_outdb_anchors` test that asserts on metadata and anchor URIs. Byte-content assertions removed. New UDTF test verifies `load_eager = true` errors cleanly. Plan-time resident size for a Zarr scan drops from `N_chunks × chunk_size` (potentially GBs) to roughly `N_chunks × few-hundred-bytes`. The remaining structural concern — build-everything-upfront vs streaming emission — is unchanged in this PR and tracked as separate follow-up work.

…nalFormatSpec + UDTF migration Moves all Zarr functionality out of the `sedona` crate so the main package no longer carries a zarr dep. Three structural changes: 1. **Streaming `ZarrChunkReader: RecordBatchReader`** — replaces the eager `group_to_rasters -> StructArray` function. Opens the group + parses metadata once in `try_new`; per-batch work is just transform arithmetic and anchor URI formatting. `next()` walks `batch_size` chunk positions of the chunk grid and emits one `RecordBatch`. The chunk anchors are still OutDb-style (empty `data`, populated `outdb_uri`); pixel-byte resolution is the future async resolver's job. 2. **`ZarrFormatSpec: ExternalFormatSpec`** — wraps the reader with the standard SedonaDB datasource API so `con.read_format(spec, uri)` works the same shape as PyogrioFormatSpec. Lives in `sedona-raster-zarr` gated under a `zarr` feature (default on). 3. **UDTF migrates from `sedona` to `sedona-raster-zarr`.** The `ZarrReadFunction` / `ZarrChunkProvider` / `ZarrChunkExec` code moves to `sedona-raster-zarr/src/udtf.rs` (also `zarr`-feature-gated). The exec is now genuinely streaming — it constructs a fresh `ZarrChunkReader` in `execute()` and adapts it via `RecordBatchStreamAdapter`. The new `register(ctx: &SessionContext)` helper is what the Python plugin package will call. The `sedona` crate's `zarr_read` module is deleted; the `sedona_raster_zarr` workspace dep is dropped; the `SedonaContext::new_from_context` UDTF registration is removed. The Python `sd_read_zarr` wrapper method in `functions/table.py` and the `test_read_zarr` test go too — they get re-introduced in the new `sedonadb-zarr` plugin package (next commit). Plan-time validation: `ZarrChunkProvider::try_new` opens the reader once for validation and drops it, so URI / metadata / arrays-filter errors still surface at `ctx.sql(...).await` rather than at collect time. Cost is one extra group-open per scan. Result: `sedona` crate has zero zarr code; `sedona-raster-zarr` is fully self-contained with both the format-spec and UDTF integration surfaces under `feature = "zarr"`. The plugin Python package wires both via `sedonadb_zarr.register(con)`.

Introduces `python/sedonadb-zarr` as a separate maturin-built Python package. After `sedonadb_zarr.register(con)`, the `sd_read_zarr` SQL UDTF is attached to the session; before that call (or without importing the package at all) the main `sedonadb` package has zero zarr awareness. Package contents: - `python/sedonadb-zarr/src/lib.rs` — PyO3 binding exposing a single `register_udtf(internal_ctx)` function that calls `sedona_raster_zarr::register_udtf` on the wrapped DataFusion `SessionContext`. The plugin imports `python/sedonadb`'s PyO3 classes via the `::_lib` extern crate (sedonadb's rustc crate name per its `lib.name = "_lib"` override that maturin requires). - `python/sedonadb-zarr/python/sedonadb_zarr/__init__.py` — Python `register(con)` helper. Locates the `_impl: InternalContext` on a sedonadb `Context` and forwards to the Rust extension. - `python/sedonadb-zarr/pyproject.toml`, `Cargo.toml`, `README.md` — build scaffolding mirroring `python/sedonadb`'s layout. Required upstream change: - `python/sedonadb/Cargo.toml` gains `rlib` alongside `cdylib` in `crate-type` so plugin crates can link against it as a Rust dependency. - `python/sedonadb/src/lib.rs` makes the `context` module `pub` so plugins can extract `InternalContext` from Python objects via PyO3's downcast machinery. The other modules stay private. - Workspace root `Cargo.toml` adds `python/sedonadb-zarr` to `[workspace] members` and exposes `sedonadb` as a workspace path-dep. Tests: - `python/sedonadb-zarr/tests/test_zarr.py` covers the smoke path (register + SQL works), the negative case (SQL fails before register), and idempotence (double-register is fine). - `python/sedonadb/tests/test_no_zarr_by_default.py` is the architectural regression test: it lives in the main package and asserts `sd_read_zarr` is unknown without the plugin imported. Catches accidental re-bundling. ExternalFormatSpec is not yet exposed through the Python plugin — the Rust impl exists in `sedona-raster-zarr` and is callable from Rust code; surfacing it via `con.read_format(ZarrFormatSpec(), uri)` requires wiring `ZarrChunkReader` through the Arrow C Stream FFI and is left as a follow-up.

Completes the second user-facing surface promised in the plugin design. After `sedonadb_zarr.register(con)`, both work: con.sql("SELECT * FROM sd_read_zarr(...)") con.read_format(sedonadb_zarr.ZarrFormatSpec(), uri) Implementation: - New PyO3 class `PyZarrChunkReader` in the plugin's Rust extension wraps `sedona_raster_zarr::ZarrChunkReader` and exposes `__arrow_c_stream__` via `arrow_array::ffi_stream::FFI_ArrowArrayStream`. The reader is consumed on first `__arrow_c_stream__` call (matches pyarrow's `RecordBatchReader` convention). - Python `ZarrFormatSpec(ExternalFormatSpec)` subclass in the plugin package wraps `PyZarrChunkReader`. Supports `with_options` for `load_eager` and `arrays`; `open_reader` returns the Rust-backed reader; default `infer_schema` works because the reader exposes schema via the Arrow C stream protocol. Tests cover the new surface end-to-end: read via `con.read_format`, options threading, and the `load_eager=True` rejection.

Per Dewey's review on test_funcs.py:58 — strengthen the format-spec roundtrip test to actually materialise and inspect the raster row. `arrow_tab["raster"][0].as_py()` returns a dict with the raster struct's fields (transform, bands, etc.). Asserts on the top-level shape (transform + bands present), the band-level shape (data is empty for OutDb scans), and that the chunk anchor URI threads through to the band's outdb_uri field. We don't have a great way to inspect Raster structs end-to-end in Python yet, but this is the minimum that proves the FFI bridge delivers a coherent struct value to the user.

- pre-commit (ruff format): test_zarr.py had a multi-line read_format call that ruff wanted as one line. - docs (`cargo doc --workspace`): both `sedonadb` and `sedonadb-zarr` originally declared `[lib] name = "_lib"` (maturin requires that for the Python `<pkg>._lib` import path). rustdoc errors with "document output filename collision" because both produce a `_lib` rustdoc output. Rename the plugin's rustc crate to `sedonadb_zarr_lib`; maturin still renames the compiled cdylib to `_lib.<ext>` for the wheel via `module-name`, so the Python import path is unaffected. - python ubuntu-latest: pytest collects from `python/` which now includes `python/sedonadb-zarr/tests/`. The plugin wasn't installed in the test env, so its `import sedonadb_zarr` errored at collection time. Install the plugin alongside sedonadb in the Install step.

The plugin refactor exposed `python/sedonadb` as an rlib so plugin crates (e.g. `sedonadb-zarr`) could link it and extract `InternalContext`. That worked on macOS but broke Linux CI with: rust-lld: error: duplicate symbol: PyInit__lib Both `sedonadb` and `sedonadb-zarr` declare `#[pymodule] fn _lib` (each needs the symbol so its respective Python wheel imports as `<pkg>._lib`). When the plugin's cdylib statically links sedonadb's rlib, both `PyInit__lib` symbols collide. macOS's two-level namespace tolerated it; GNU ld doesn't. Fix: put sedonadb's `#[pymodule] fn _lib` behind a new `extension-module` Cargo feature, default-on. Maturin's wheel build inherits the default and gets the symbol; plugin Rust deps set `default-features = false` (already done at the workspace path-dep level) and the pymodule's symbol vanishes from the rlib they link. Adds a blanket `#![cfg_attr(not(feature = "extension-module"), allow(dead_code, unused_imports))]` so the helper `#[pyfunction]`s that are only reachable via the pymodule don't generate warnings in plugin-rlib builds.

The Status section claimed ExternalFormatSpec was a Rust-only follow-up, but `con.read_format(ZarrFormatSpec(), uri)` was actually shipped in the same PR (PyZarrChunkReader + Python wrapper + tests). Tighten the README — show both surfaces, drop the stale "follow-up" claim, and unwrap paragraphs per GFM's hard-line-break rendering.

The feature was a defensive response to Dewey's "at the very very least feature-flag it" comment, but that was the fallback assuming we *didn't* move the UDTF out of the `sedona` crate. We did — the UDTF and ExternalFormatSpec now live in `sedona-raster-zarr` itself — so the load-bearing concern is addressed. The only consumer of this crate is `sedonadb-zarr`, which uses the full surface. No hypothetical "slim build" wants just the streaming reader without DataFusion. The feature gate added cfg-clutter without real value. Drop the gate, drop the `[features]` table, make the previously- optional deps required. `lib.rs` declares the modules unconditionally.

MATURIN_PEP517_ARGS="--features s2geography" set for sedonadb's install was still in scope for the subsequent pip install of sedonadb-zarr, which doesn't have that feature. Unset before the plugin install.

The previous fix gated the pymodule behind `extension-module` so plugin rlib consumers wouldn't duplicate `PyInit__lib`. The feature was default-on; Cargo's workspace-wide feature unification meant a workspace build (e.g. `cargo test --workspace`, `cargo build --workspace`) forced the feature on for everyone — including the plugin's rlib link of sedonadb — and the duplicate-symbol collision came back. Move `extension-module` out of `default`. Maturin enables it explicitly via `[tool.maturin] features` in `pyproject.toml` for the wheel build. Workspace cargo runs don't request it; sedonadb's rlib in those builds has no pymodule symbol; plugin crates link clean.

After moving `extension-module` out of sedonadb's default features (to avoid Cargo workspace-feature-unification dragging the pymodule symbol into plugin rlib links), the wheel builds need to enable it explicitly. Five workflows / scripts set MATURIN_PEP517_ARGS and need the feature added: * `.github/workflows/python.yml` — the test job's editable install * `.github/workflows/packaging.yml` — docs-and-deploy * `ci/scripts/wheels-build-{linux,macos,windows}.{sh,ps1}` — wheels Without it, the produced cdylib doesn't define `PyInit__lib` and any attempt to `import sedonadb` fails.

cibuildwheel's wheel build (macOS-arm64, linux-arm64) failed cargo metadata with: error: failed to select a version for the requirement `sedonadb = "^0.4.0"` The workspace dep table had `sedonadb` as a self-reference (the sedonadb crate exposed via its own path). Only sedonadb-zarr consumed it; inlining the path-dep on sedonadb-zarr's side and dropping the workspace entry removes the self-reference. cibuildwheel's isolated build env (where the workspace root may not be in scope the same way) no longer chokes.

`cargo build --workspace --all-features` (the build CI job) ignores the workspace dep's `default-features = false` because feature unification merges feature requests across the workspace. With sedonadb's `extension-module` forced on, its rlib's `PyInit__lib` symbol gets linked into sedonadb-zarr's cdylib, which also defines `PyInit__lib` — and the linker errors with duplicate symbol. Rename sedonadb-zarr's pymodule from `_lib` to `_zarr_lib`. The generated PyInit symbol is now `PyInit__zarr_lib`, distinct from sedonadb's `PyInit__lib` regardless of feature unification. This means the Python wrapper imports from `sedonadb_zarr._zarr_lib` instead of `sedonadb_zarr._lib`; the user-facing `register(con)` / `ZarrFormatSpec` API is unchanged.

PyO3's `#[pyclass]` extraction uses a per-cdylib type-id static, so `sedonadb_zarr` couldn't extract `sedonadb`'s `InternalContext` — the two extensions saw distinct types even though they linked against the same Rust definition. The plugin now hands its UDTF across the extension boundary in a `PyCapsule` carrying `Arc<dyn TableFunctionImpl>`, which sedonadb clones into its own `SessionContext`. The test helper `_read_format` bridges to `_impl.read_external_format` because `SedonaContext.read_format` doesn't exist yet; once a public helper lands the body collapses to a one-liner.

The Python sedonadb cdylib previously installed mimalloc as the `#[global_allocator]`. When sedonadb is loaded alongside a plugin cdylib (e.g. `sedonadb-zarr`), concrete Arrow types like `RecordBatch` get monomorphised separately in each cdylib, so the drop site runs the *consumer's* allocator regardless of who allocated. Plugin allocations go through libc malloc; sedonadb-side drops route to mimalloc free, and the heap corrupts — manifested in CI as a segfault on the first `to_arrow_table` of a plugin-served query. mimalloc remains available behind `--features mimalloc` for builds that don't load plugins.

End-to-end `con.read_format(ZarrFormatSpec(), uri)` requires directory- format support in `ExternalFormatSpec`'s `ListingTableUrl` path. Zarr groups are directories, not single files, so the listing layer returns zero matching objects — `Can't infer schema for zero objects`. The SQL UDTF path is unaffected (no listing involved). Drop the three tests that exercise the listing surface; keep one that pins the Python spec-class shape so the plumbing doesn't drift while the listing gap is being closed.

…ormats Adds an opt-in path through `external_table` for formats whose URI identifies a directory rather than a file. `ExternalFormatSpec::list_single_object` gates dispatch: - `false` (default): DataFusion's `ListingTable` discovers files matching the spec's extension at the URL prefix. - `true`: `SingleObjectExternalTable` synthesises one `PartitionedFile` per URI and hands it straight to the format's `FileSource`. No listing, no extension filtering — the URI is the object. Required for Zarr: a `.zarr` group is a directory, not a file. The listing layer returned zero matching objects, so `con.read_format(ZarrFormatSpec(), uri)` failed with "Can't infer schema for zero objects" even though the SQL UDTF (which bypasses listing) worked fine. `ZarrFormatSpec` now overrides `list_single_object()` to `true` and the Python end-to-end tests via `con.read_format` pass. Reuses the existing `ExternalFileFormat` / `ExternalFileSource` for the actual scan, so projections, filter pushdown, and streaming are identical between the two table-provider paths.

Addresses review feedback on the plugin handoff + single-object provider. **Capsule handoff (review items 1, 2, 10):** - Introduce `sedonadb::plugin::{UdtfCapsule, UDTF_CAPSULE_NAME, UDTF_CAPSULE_MAGIC}`. The capsule payload is now a `#[repr(C)]` struct with a leading u64 magic sentinel that the consumer verifies before dereferencing the `Arc<dyn TableFunctionImpl>` — the capsule name alone was a label, not a type guarantee. - Consumer uses PyO3's documented `unsafe { capsule.reference::<T>() }` rather than the `pointer() as *const T` cast, which depended on PyO3's private `CapsuleContents::value` field being at offset 0. - Capsule name carries a `.v1` ABI tag so a pre-built plugin wheel paired with a newer host fails fast at registration rather than corrupting on call. **`list_single_object` propagation:** The Python `ExternalFormatSpec` base class now exposes a `list_single_object` property (default `False`); `PyExternalFormat` caches the value at construction. Without this, the Python `ZarrFormatSpec`'s override was silently dropped before reaching the Rust dispatcher and `con.read_format` still hit the listing path. **Single-object provider (review items 4, 6, 7):** - One `FileGroup` per URI so multi-URI scans can fan out across partitions instead of being pinned to a single partition. - `synthetic_object_meta::size` is `u64::MAX` rather than `0`; some DataFusion optimisations treat zero-sized files as "skip me" and we never want directory objects to be skipped. - Mixed-object-store error prints the user's original URIs alongside the inferred store URLs. **mimalloc comment (review item 3):** The opt-in feature comment in `python/sedonadb/Cargo.toml` now explicitly warns that turning mimalloc back on re-introduces the plugin-coexistence segfault — the only safe path is dynamic linking of `libmimalloc.so`, which is a packaging change, not a feature.

…th; drop external_listing_table Two of the three review follow-ups on the directory-format provider: - `SingleObjectExternalTable::try_new` now resolves the `Arc<dyn ObjectStore>` from the session's runtime registry and threads it through to `spec.infer_schema`. The listing path has always done this; the single-object path previously passed `store: None`, which was a silent asymmetry for any future spec whose `infer_schema` needs to peek at the store. Zarr's schema is constant so this didn't bite in practice. - `external_listing_table` had no internal callers after the `external_table` dispatcher landed. Inlined as private `listing_table_provider`. One public entry point, one return type (`Arc<dyn TableProvider>`).

Three independent simplifications from Dewey's review: - `sedona-raster-zarr` no longer depends on the umbrella `datafusion` crate in `[dependencies]`. Replaced with the narrow sub-crates it actually uses: `datafusion-execution`, `datafusion-physical-expr`, `datafusion-physical-plan`. The umbrella moves to `[dev-dependencies]` for tests that build a real `SessionContext`. The `register(&SessionContext)` helper is gone — it had no callers (the plugin constructs `Arc::new(ZarrReadFunction::default())` directly), and removing it was what unblocked dropping the umbrella. - Drop the `load_eager` option from `ZarrFormatSpec` and `sd_read_zarr`. It currently errors anyway; pixel-byte materialisation lands with the async `RS_EnsureLoaded` resolver and we can decide the user- facing shape (option, separate UDF, planner injection) at that point. Removes the only surface defending a not-yet-supported flag. - `retrieve_chunk_bytes` is now `#[cfg(test)]` rather than `#[allow(dead_code)]`. The unit test for it lives in the same file so the implementation doesn't bit-rot before the resolver crate consumes it. The blanket lint-silencer is gone. Also drops `test_no_zarr_by_default.py` — the architectural regression test for zarr-not-in-default sedonadb. The build itself catches reintroduction at this point, so the test is overkill.

Rewires the cross-cdylib UDTF registration path onto datafusion-ffi's `FFI_TableFunction`, matching what `datafusion-python` does for its own plugins. Three big wins: 1. `python/sedonadb-zarr` no longer depends on `python/sedonadb` as a path rlib. The plugin links only `datafusion-catalog`, `datafusion-ffi`, `sedona-raster-zarr` and `pyo3` for its host- facing surface — no transitive sedonadb-* / pyo3 dupe of host internals. Resolves the cibuildwheel "another Python package's private crate" objection from review. 2. mimalloc returns to defaults. All cross-cdylib traffic now flows through Arrow C Data/Stream (release callbacks owned by the producing cdylib), so allocations are always freed in the cdylib that made them. The previous segfault required passing native `Arc<dyn TableProvider>` across cdylibs, which we no longer do. 3. Host-side codec + capsule machinery lives in `python/sedonadb/src/plugin.rs`: a `SessionTaskContextProvider` adapts our `SessionContext` to datafusion-ffi's `TaskContextProvider`, and `create_session_capsule` / `ffi_table_function_from_capsule` pack and unpack the `c"datafusion_logical_extension_codec"` and `c"datafusion_table_function"` capsules the way datafusion-python does. Plugins built against either ecosystem can register on a sedonadb context. Plugin side: `ZarrTableFunction` is the Python-visible spec class with `__datafusion_table_function__(session)`. The host calls it from `InternalContext.register_udtf_capsule(name, spec)`.

… lifetime The `FFI_LogicalExtensionCodec` produced by `create_session_capsule` stored a `Weak<dyn TaskContextProvider>`. The strong `Arc` was a local in the function, so by the time the plugin invoked `FFI_TableFunction::call` the `Weak` couldn't upgrade — DataFusion failed with `FFI error: TaskContextProvider went out of scope over FFI boundary`. Caught by running the plugin smoke tests locally against the FFI rework. Fix: stash one `Arc<dyn TaskContextProvider + Send + Sync>` on `InternalContext` at construction time. Every `register_udtf_capsule` call clones it into the session capsule. The strong ref lives as long as the session, so the FFI codec's `Weak` can always upgrade. All 7 sedonadb-zarr Python tests pass locally (including the mimalloc-on smoke test); 1699 sedonadb tests pass unchanged.

`register_udtf_capsule` and the `__datafusion_table_function__` contract it consumes are 0.4.0 features. Pinning prevents an install that picks up an older sedonadb at resolution time.

The target-gated `[target.'cfg(...)'.dependencies]` block existed for rtools45's pthread conflict when this crate got pulled into the R sedonadb.dll build. R sedonadb has no path into `sedona-raster-zarr` in this PR — the crate is only consumed by the Python sedonadb-zarr plugin — so the gate is dead and the `blosc` feature moves to the unconditional zarrs feature list. R parity is tracked as a follow-up issue.

Adds `test_dtype_mapping_roundtrips` covering every arm of `zarr_to_band_data_type` (bool / int{8,16,32,64} / uint{8,16,32,64} / float{32,64}). Each variant writes a tiny Zarr group and reads back two chunks via `sd_read_zarr` — fast sanity check that the dtype table doesn't silently drift. Per Dewey's review note on `rust/sedona-raster-zarr/src/dtype.rs:40`.

Both existed solely to support `sedonadb-zarr` linking `sedonadb` as an rlib, which the datafusion-ffi rework eliminated. With no rlib consumer: - `extension-module` Cargo feature on `python/sedonadb` is removed. The `#[pymodule] fn _lib` is unconditional. `[lib].crate-type` is back to just `cdylib`. CI workflows and wheel-build scripts drop the `extension-module` MATURIN_PEP517_ARGS entry. - `python/sedonadb-zarr` renames the rustc crate name and pymodule function from `_zarr_lib` back to the conventional `_lib`. No `PyInit_*` symbol collision is possible now. Python import path becomes `sedonadb_zarr._lib`, matching `sedonadb._lib`. Also adds a `.gitignore` for `python/sedonadb-zarr` mirroring the sedonadb one so editable-install `.so` files don't get staged.

Post-FFI cleanup pass on the plugin handoff and adjacent code: - `mod context;` and `mod plugin;` are now private. They were `pub` only when `sedonadb-zarr` linked the host crate as an rlib; no external Rust consumer remains. - `python/sedonadb/src/plugin.rs`: capsule-name constants and module docstring trimmed; consts are no longer `pub`. - `python/sedonadb-zarr/src/lib.rs`: module/struct/fn docstrings collapsed to one line each. Behaviour unchanged. - `python/sedonadb-zarr/python/sedonadb_zarr/__init__.py`: docstrings converted from ReST (`:func:`, `:meth:`, NumPy `Parameters\n----`) to Markdown (mkdocs reads these as Markdown). - `python/sedonadb-zarr/README.md`: architecture line referenced a non-existent `register_udtf` PyO3 function; updated to the actual `ZarrTableFunction` / `datafusion-ffi` capsule story. - `python/sedonadb-zarr/Cargo.toml`: dropped the "why no sedonadb dep" essay; the absence speaks for itself. - `rust/sedona-datasource/src/provider.rs`: `SingleObjectExternalTable` docs and helper-fn docs collapsed. - `python/sedonadb/src/context.rs::register_udtf_capsule` docstring trimmed; mechanism docs live in `crate::plugin`. No public Rust API surface changes (the constants that lost `pub` had no Rust consumers — capsule names are part of the Python-visible ABI).

paleolimbot · 2026-05-22T04:10:13Z

We can chat about this later too, but we may be able to avoid the table function FFI here by exposing in the sedona-datasource crate a generic table function wrapper around any ExternalFormatSpec. Or, alternatively, any Python object that exposes __arrow_c_array_stream__ (we have a generic table provider for a single-shot RecordBatchReader that powers sd.create_data_frame(), but we can back up slightly and store the object whose __arrow_c_array_stream__() call produces one too).

Getting a good extension system that can use DataFusion's FFI is great too...DataFusion's FFI just exposes everything and I'm wary of bringing in all of that until we really need the bells and whistles.

Sweeping audit follow-ups now that the UDTF is split out: - Delete `rust/sedona-raster-zarr/src/format_spec.rs` and its re-export. The Rust `ZarrFormatSpec` had no consumer on either this branch or PR apache#866 (the UDTF uses `ZarrChunkReader` via `ZarrChunkProvider` directly). The R wrapper (DB-32) and URL-as- table support (DB-33) will revive the impl with a real caller; ~80 lines of trait delegation is cheap to reconstruct from git history. - Drop `sedona-datasource`, `async-trait`, and `arrow-array` deps from `sedona-raster-zarr`'s `[dependencies]`: format_spec.rs was their only consumer. (arrow-array stays as a transitive dep through `arrow-schema` / `RecordBatch` in loader.rs.) - Wait, `arrow-array` is still used by `loader.rs` for `RecordBatch` / `RecordBatchReader` — kept. - Drop `zarrs_object_store` workspace dep — no crate references it. - Rewrite `rust/sedona-raster-zarr/src/lib.rs` module docstring to describe the actual streaming reader, not the removed "async OutDb resolver registered separately" path. - Drop the unreachable `json.loads(arrays)` branch in `ZarrFormatSpec.open_reader`. It was for the SQL UDTF's JSON-string option form; `with_options` callers always pass real lists. Removes the only `import json` from the module too. - Trim plugin's `src/lib.rs` module docstring — claimed "via `ExternalFormatSpec` + Arrow C Stream" but the Rust shim doesn't touch `ExternalFormatSpec` (Python side handles that). Reworded to name `PyZarrChunkReader` and Arrow C Stream only.

james-willis · 2026-05-22T19:12:45Z

yeah this is a parking lot PR- far down the stack and I think we can take out time to get it right as a general solution for UTF/UFTF/optimizer rules etc.

james-willis added 30 commits May 19, 2026 09:56

chore(sedona): drop redundant comment about zarr plugin

ce0bdea

style(sedonadb): collapse cfg_attr to one line per rustfmt

4423128

ci: don't leak sedonadb's s2geography feature into the plugin build

4462b94

MATURIN_PEP517_ARGS="--features s2geography" set for sedonadb's install was still in scope for the subsequent pip install of sedonadb-zarr, which doesn't have that feature. Unset before the plugin install.

james-willis added 6 commits May 21, 2026 12:31

chore(sedonadb-zarr): pin sedonadb>=0.4.0

e678bbc

`register_udtf_capsule` and the `__datafusion_table_function__` contract it consumes are 0.4.0 features. Pinning prevents an install that picks up an older sedonadb at resolution time.

github-actions Bot requested a review from prantogg May 21, 2026 21:59

james-willis mentioned this pull request May 22, 2026

feat(raster-zarr): sedona-raster-zarr crate + sedona-zarr plugin #858

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(sedonadb): `sd_read_zarr` UDTF via datafusion-ffi plugin handoff#866

feat(sedonadb): `sd_read_zarr` UDTF via datafusion-ffi plugin handoff#866
james-willis wants to merge 36 commits into
apache:mainfrom
james-willis:jw/sd-read-zarr-udtf

james-willis commented May 21, 2026

Uh oh!

paleolimbot commented May 22, 2026

Uh oh!

james-willis commented May 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

james-willis commented May 21, 2026

What this adds

Why this is its own PR

Status

Open design questions for the eventual review

Uh oh!

paleolimbot commented May 22, 2026

Uh oh!

james-willis commented May 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants