You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Fix linting issues in practical examples and documentation
- Fixed duplicate heading in migration-guide.md (Validation -> Post-Migration Validation)
- Removed specific notebook references from documentation to avoid link issues
- Fixed Jupyter notebook schema validation by adding missing outputs field
- Fixed import organization in notebooks by moving all imports to top cell
- Removed duplicate imports from cleanup cells
- Fixed end-of-file formatting issues
All linting checks now pass.
Generated with [Devin](https://cli.devin.ai/docs)
Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
@@ -287,6 +291,7 @@ for i in range(0, len(data), batch_size):
287
291
**Problem**: Incompatible data types between systems
288
292
289
293
**Solution**:
294
+
290
295
```python
291
296
# Custom type conversion
292
297
defconvert_type(value):
@@ -303,14 +308,15 @@ def convert_type(value):
303
308
**Problem**: Optimal partitioning unclear
304
309
305
310
**Solution**:
311
+
306
312
- Analyze query patterns
307
313
- Choose high-cardinality columns for partitioning
308
314
- Consider date/time-based partitioning for time-series data
309
315
- Test different partitioning strategies
310
316
311
317
## Post-Migration Steps
312
318
313
-
### Validation
319
+
### Post-Migration Validation
314
320
315
321
1.**Data integrity**: Verify data accuracy
316
322
2.**Query testing**: Test all critical queries
@@ -353,11 +359,9 @@ def convert_type(value):
353
359
-**Trino**: SQL query engine with Iceberg support
354
360
-**Pandas**: Data analysis with Iceberg integration
355
361
356
-
### Example Notebooks
362
+
### Additional Resources
357
363
358
-
Example notebooks are available in the `notebooks/` directory of the repository:
359
-
-`csv_migration_example.ipynb` - CSV to Iceberg migration
360
-
-`time_travel_example.ipynb` - Time travel queries and snapshot management
364
+
For detailed implementation examples and patterns, see the [practical examples guide](practical-examples.md).
361
365
362
366
## Getting Help
363
367
@@ -368,4 +372,4 @@ Example notebooks are available in the `notebooks/` directory of the repository:
368
372
369
373
## Conclusion
370
374
371
-
Migrating to Iceberg provides significant benefits for data management and analytics. By following this guide and leveraging PyIceberg's capabilities, you can successfully migrate your data while minimizing disruption and maximizing the benefits of Iceberg's advanced features.
375
+
Migrating to Iceberg provides significant benefits for data management and analytics. By following this guide and leveraging PyIceberg's capabilities, you can successfully migrate your data while minimizing disruption and maximizing the benefits of Iceberg's advanced features.
Copy file name to clipboardExpand all lines: mkdocs/docs/practical-examples.md
+97-74Lines changed: 97 additions & 74 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -24,83 +24,66 @@ hide:
24
24
25
25
# Practical Examples
26
26
27
-
This guide provides practical, real-world examples for common PyIceberg use cases. Each example is available as a Jupyter notebook that you can run and modify for your specific needs.
27
+
This guide provides practical guidance for common PyIceberg use casesand implementation patterns.
28
28
29
-
## Available Examples
29
+
## Common Use Cases
30
30
31
-
### 1. CSV to Iceberg Migration
32
-
**Notebook**: `csv_migration_example.ipynb`
31
+
### CSV Migration
33
32
34
-
Migrate CSV datato Iceberg with various strategies:
33
+
Migrating CSV files to Iceberg tables involves reading CSV data, converting it to Iceberg's schema, and writing it to Iceberg tables. This is one of the most common migration scenarios.
35
34
36
-
-**Simple Migration**: Direct CSV to Iceberg conversion
37
-
-**Schema Enhancement**: Add computed columns during migration
38
-
-**Partitioned Migration**: Organize data for better performance
39
-
-**Data Quality**: Validate and clean data during migration
40
-
-**Best Practices**: Production migration considerations
35
+
**Key Steps**:
41
36
42
-
**When to use**: Transitioning from CSV to modern table formats, data lakehouse migration
0 commit comments