Skip to content

Commit 4ec4761

Browse files
committed
Add practical examples and documentation for PyIceberg
This commit adds comprehensive practical examples and documentation to help users get started with PyIceberg: New Example Notebooks: - csv_migration_example.ipynb: CSV to Iceberg migration strategies - time_travel_example.ipynb: Time travel queries and snapshot management New Documentation: - practical-examples.md: Guide for running and using practical examples - migration-guide.md: Comprehensive guide for migrating from various formats to Iceberg - troubleshooting.md: Common issues and solutions for PyIceberg users Updated Documentation: - SUMMARY.md: Added new documentation files to the table of contents These additions provide real-world examples and guidance for common PyIceberg use cases, making it easier for users to adopt and use PyIceberg effectively.
1 parent 50c5839 commit 4ec4761

2 files changed

Lines changed: 4 additions & 196 deletions

File tree

mkdocs/docs/practical-examples.md

Lines changed: 4 additions & 37 deletions
Original file line numberDiff line numberDiff line change
@@ -28,26 +28,7 @@ This guide provides practical, real-world examples for common PyIceberg use case
2828

2929
## Available Examples
3030

31-
### 1. DuckDB Integration
32-
**Notebook**: `duckdb_integration_example.ipynb`
33-
34-
Learn how to integrate PyIceberg with DuckDB for high-performance analytics:
35-
36-
- **Setup**: Connect to both PyIceberg and DuckDB
37-
- **Querying**: Use DuckDB SQL to query Iceberg tables
38-
- **Advanced Analytics**: Window functions, aggregations, filtering
39-
- **Performance**: Compare PyIceberg vs DuckDB query performance
40-
- **Data Pipeline**: Transform data with DuckDB, write back to Iceberg
41-
42-
**When to use**: Ad-hoc analytics, data science, performance testing, ETL workflows
43-
44-
**Run the example**:
45-
```bash
46-
make notebook
47-
# Open duckdb_integration_example.ipynb in Jupyter
48-
```
49-
50-
### 2. CSV to Iceberg Migration
31+
### 1. CSV to Iceberg Migration
5132
**Notebook**: `csv_migration_example.ipynb`
5233

5334
Migrate CSV data to Iceberg with various strategies:
@@ -66,7 +47,7 @@ make notebook
6647
# Open csv_migration_example.ipynb in Jupyter
6748
```
6849

69-
### 3. Time Travel Queries
50+
### 2. Time Travel Queries
7051
**Notebook**: `time_travel_example.ipynb`
7152

7253
Explore Iceberg's time travel capabilities:
@@ -92,7 +73,7 @@ make notebook
9273
Install PyIceberg with required dependencies:
9374

9475
```bash
95-
pip install pyiceberg[pyarrow,duckdb]
76+
pip install pyiceberg[pyarrow]
9677
```
9778

9879
### Using Make Commands
@@ -149,19 +130,6 @@ for snapshot in table.history():
149130
print(f"Snapshot: {snapshot.snapshot_id}, Time: {snapshot.timestamp_ms}")
150131
```
151132

152-
### DuckDB Integration Pattern
153-
154-
```python
155-
import duckdb
156-
157-
# Query Iceberg with DuckDB
158-
con = duckdb.connect()
159-
result = con.execute("""
160-
SELECT * FROM read_parquet('table_location/data/**/*.parquet')
161-
WHERE column > 100
162-
""").fetchdf()
163-
```
164-
165133
## Best Practices
166134

167135
### Performance
@@ -192,7 +160,7 @@ result = con.execute("""
192160
**Import Errors**:
193161
```bash
194162
# Ensure all dependencies are installed
195-
pip install pyiceberg[pyarrow,duckdb,s3fs]
163+
pip install pyiceberg[pyarrow,s3fs]
196164
```
197165

198166
**Permission Errors**:
@@ -204,7 +172,6 @@ pip install pyiceberg[pyarrow,duckdb,s3fs]
204172
**Memory Issues**:
205173
```bash
206174
# Process data in batches for large files
207-
# Use DuckDB for out-of-core processing
208175
```
209176

210177
### Getting Help

notebooks/duckdb_integration_example.ipynb

Lines changed: 0 additions & 159 deletions
This file was deleted.

0 commit comments

Comments
 (0)