Skip to content

Commit 9befbaa

Browse files
committed
Add Bodo support
1 parent ae11ba4 commit 9befbaa

4 files changed

Lines changed: 64 additions & 0 deletions

File tree

mkdocs/docs/api.md

Lines changed: 46 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1570,6 +1570,52 @@ print(ray_dataset.take(2))
15701570
]
15711571
```
15721572

1573+
### Bodo
1574+
1575+
PyIceberg interfaces closely with Bodo Dataframes (see [Bodo Iceberg Quick Start](https://docs.bodo.ai/latest/quick_start/quickstart_local_iceberg/)),
1576+
which provides a drop-in replacement for Pandas that applies query, compiler and HPC optimizations automatically.
1577+
Bodo accelerates and scales Python code from single laptops to large clusters without code rewrites.
1578+
1579+
<!-- prettier-ignore-start -->
1580+
1581+
!!! note "Requirements"
1582+
This requires [`bodo` to be installed](index.md).
1583+
1584+
```python
1585+
pip install pyiceberg['bodo']
1586+
```
1587+
<!-- prettier-ignore-end -->
1588+
1589+
A table can be read easily into a Bodo Dataframe to perform Pandas operations:
1590+
1591+
```python
1592+
df = table.to_bodo() # equivalent to `bodo.pandas.read_iceberg_table(table)`
1593+
df = df[df["trip_distance"] >= 10.0]
1594+
df = df[["VendorID", "tpep_pickup_datetime", "tpep_dropoff_datetime"]]
1595+
print(df)
1596+
```
1597+
1598+
This creates a lazy query, optimizes it, and runs it on all available cores (print triggers execution):
1599+
1600+
```python
1601+
VendorID tpep_pickup_datetime tpep_dropoff_datetime
1602+
0 2 2023-01-01 00:27:12 2023-01-01 00:49:56
1603+
1 2 2023-01-01 00:09:29 2023-01-01 00:29:23
1604+
2 1 2023-01-01 00:13:30 2023-01-01 00:44:00
1605+
3 2 2023-01-01 00:41:41 2023-01-01 01:19:32
1606+
4 2 2023-01-01 00:22:39 2023-01-01 01:30:45
1607+
... ... ... ...
1608+
245478 2 2023-01-31 22:32:57 2023-01-31 23:01:48
1609+
245479 2 2023-01-31 22:03:26 2023-01-31 22:46:13
1610+
245480 2 2023-01-31 23:25:56 2023-02-01 00:05:42
1611+
245481 2 2023-01-31 23:18:00 2023-01-31 23:46:00
1612+
245482 2 2023-01-31 23:18:00 2023-01-31 23:41:00
1613+
1614+
[245483 rows x 3 columns]
1615+
```
1616+
1617+
Bodo is optimized to take advantage of Iceberg features such as hidden partitioning and various statistics for efficient reads.
1618+
15731619
### Daft
15741620

15751621
PyIceberg interfaces closely with Daft Dataframes (see also: [Daft integration with Iceberg](https://www.getdaft.io/projects/docs/en/stable/integrations/iceberg/)) which provides a full lazily optimized query engine interface on top of PyIceberg tables.

mkdocs/docs/index.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -52,6 +52,7 @@ You can mix and match optional dependencies depending on your needs:
5252
| pandas | Installs both PyArrow and Pandas |
5353
| duckdb | Installs both PyArrow and DuckDB |
5454
| ray | Installs PyArrow, Pandas, and Ray |
55+
| bodo | Installs Bodo |
5556
| daft | Installs Daft |
5657
| polars | Installs Polars |
5758
| s3fs | S3FS as a FileIO implementation to interact with the object store |

pyiceberg/table/__init__.py

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -137,6 +137,7 @@
137137
from pyiceberg.utils.properties import property_as_bool
138138

139139
if TYPE_CHECKING:
140+
import bodo.pandas as bd
140141
import daft
141142
import pandas as pd
142143
import polars as pl
@@ -1484,6 +1485,16 @@ def to_daft(self) -> daft.DataFrame:
14841485

14851486
return daft.read_iceberg(self)
14861487

1488+
def to_bodo(self) -> bd.DataFrame:
1489+
"""Read a bodo DataFrame lazily from this Iceberg table.
1490+
1491+
Returns:
1492+
bd.DataFrame: Unmaterialized Bodo Dataframe created from the Iceberg table
1493+
"""
1494+
import bodo.pandas as bd
1495+
1496+
return bd.read_iceberg_table(self)
1497+
14871498
def to_polars(self) -> pl.LazyFrame:
14881499
"""Lazily read from this Apache Iceberg table.
14891500

pyproject.toml

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -79,6 +79,7 @@ huggingface-hub = { version = ">=0.24.0", optional = true }
7979
psycopg2-binary = { version = ">=2.9.6", optional = true }
8080
sqlalchemy = { version = "^2.0.18", optional = true }
8181
getdaft = { version = ">=0.2.12", optional = true }
82+
bodo = { version = ">=2025.7.2", optional = true }
8283
cachetools = ">=5.5,<7.0"
8384
pyiceberg-core = { version = "^0.5.1", optional = true }
8485
polars = { version = "^1.21.0", optional = true }
@@ -299,6 +300,7 @@ pandas = ["pandas", "pyarrow"]
299300
duckdb = ["duckdb", "pyarrow"]
300301
ray = ["ray", "pyarrow", "pandas"]
301302
daft = ["getdaft"]
303+
bodo = ["bodo"]
302304
polars = ["polars"]
303305
snappy = ["python-snappy"]
304306
hive = ["thrift"]
@@ -482,6 +484,10 @@ ignore_missing_imports = true
482484
module = "daft.*"
483485
ignore_missing_imports = true
484486

487+
[[tool.mypy.overrides]]
488+
module = "bodo.*"
489+
ignore_missing_imports = true
490+
485491
[[tool.mypy.overrides]]
486492
module = "pyparsing.*"
487493
ignore_missing_imports = true

0 commit comments

Comments
 (0)