Skip to content

Error when filtering by UUID in table scan #2372

@sevakva

Description

@sevakva

Apache Iceberg version

main (development)

Please describe the bug 🐞

Problem
Getting a pyarrow.lib.ArrowNotImplementedError: Function 'equal' has no kernel matching input types (extension<arrow.uuid>, extension<arrow.uuid>) when trying to scan a PyIceberg table with a row filter using UUID comparison. The error indicates that PyArrow's equal function doesn't have a kernel for comparing UUID extension types.

Environment
pyiceberg: Nightly build (expected to support UUIDs)
pyarrow: 21.0.0
Python: 3.13

Code to Reproduce

import uuid
from pyiceberg.expressions import EqualTo

# This fails with ArrowNotImplementedError
df = table.scan(row_filter=EqualTo("batch_id", uuid.UUID("0190de80-647f-4bbc-a80e-efda686b910f")))

Full Error Stack Trace

  File "/opt/homebrew/Cellar/python@3.13/3.13.3/Frameworks/Python.framework/Versions/3.13/lib/python3.13/concurrent/futures/_base.py", line 401, in __get_result
    raise self._exception
  File "/opt/homebrew/Cellar/python@3.13/3.13.3/Frameworks/Python.framework/Versions/3.13/lib/python3.13/concurrent/futures/thread.py", line 59, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/Users/.venv/lib/python3.13/site-packages/pyiceberg/io/pyarrow.py", line 1694, in batches_for_task
    return list(self._record_batches_from_scan_tasks_and_deletes([task], deletes_per_file))
  File "/Users/.venv/lib/python3.13/site-packages/pyiceberg/io/pyarrow.py", line 1732, in _record_batches_from_scan_tasks_and_deletes
    for batch in batches:
                 ^^^^^^^
  File "/Users/.venv/lib/python3.13/site-packages/pyiceberg/io/pyarrow.py", line 1518, in _task_to_record_batches
    fragment_scanner = ds.Scanner.from_fragment(
        fragment=fragment,
    ...<4 lines>...
        columns=[col.name for col in file_project_schema.columns],
    )
  File "pyarrow/_dataset.pyx", line 3692, in pyarrow._dataset.Scanner.from_fragment
  File "pyarrow/_dataset.pyx", line 3458, in pyarrow._dataset._populate_builder
  File "pyarrow/_compute.pyx", line 2732, in pyarrow._compute._bind
  File "pyarrow/error.pxi", line 155, in pyarrow.lib.pyarrow_internal_check_status
  File "pyarrow/error.pxi", line 92, in pyarrow.lib.check_status
pyarrow.lib.ArrowNotImplementedError: Function 'equal' has no kernel matching input types (extension<arrow.uuid>, extension<arrow.uuid>)

Expected Behavior
The table scan should successfully filter rows by UUID without throwing a kernel matching error.

Willingness to contribute

  • I can contribute a fix for this bug independently
  • I would be willing to contribute a fix for this bug with guidance from the Iceberg community
  • I cannot contribute a fix for this bug at this time

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions