You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: mkdocs/docs/api.md
+48-8Lines changed: 48 additions & 8 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1004,6 +1004,30 @@ To show only data files or delete files in the current snapshot, use `table.insp
1004
1004
1005
1005
Expert Iceberg users may choose to commit existing parquet files to the Iceberg table as data files, without rewriting them.
1006
1006
1007
+
<!-- prettier-ignore-start -->
1008
+
1009
+
!!! note "Name Mapping"
1010
+
Because `add_files` uses existing files without writing new parquet files that are aware of the Iceberg's schema, it requires the Iceberg's table to have a [Name Mapping](https://iceberg.apache.org/spec/?h=name+mapping#name-mapping-serialization) (The Name mapping maps the field names within the parquet files to the Iceberg field IDs). Hence, `add_files` requires that there are no field IDs in the parquet file's metadata, and creates a new Name Mapping based on the table's current schema if the table doesn't already have one.
1011
+
1012
+
!!! note "Partitions"
1013
+
`add_files`only requires the client to read the existing parquet files' metadata footer to infer the partition value of each file. This implementation also supports adding files to Iceberg tables with partition transforms like `MonthTransform`, and `TruncateTransform` which preserve the order of the values after the transformation (Any Transform that has the `preserves_order` property set to True is supported). Please note that if the column statistics of the `PartitionField`'s source column are not present in the parquet metadata, the partition value is inferred as `None`.
1014
+
1015
+
!!! warning "Maintenance Operations"
1016
+
Because `add_files` commits the existing parquet files to the Iceberg Table as any other data file, destructive maintenance operations like expiring snapshots will remove them.
# A new snapshot is committed to the table with manifests pointing to the existing parquet files
1020
1044
```
1021
1045
1022
-
<!-- prettier-ignore-start -->
1046
+
Add files to Iceberg table with custom snapshot properties:
1047
+
```python
1048
+
# Assume an existing Iceberg table object `tbl`
1023
1049
1024
-
!!! note "Name Mapping"
1025
-
Because `add_files` uses existing files without writing new parquet files that are aware of the Iceberg's schema, it requires the Iceberg's table to have a [Name Mapping](https://iceberg.apache.org/spec/?h=name+mapping#name-mapping-serialization) (The Name mapping maps the field names within the parquet files to the Iceberg field IDs). Hence, `add_files` requires that there are no field IDs in the parquet file's metadata, and creates a new Name Mapping based on the table's current schema if the table doesn't already have one.
1050
+
file_paths = [
1051
+
"s3a://warehouse/default/existing-1.parquet",
1052
+
"s3a://warehouse/default/existing-2.parquet",
1053
+
]
1026
1054
1027
-
!!! note "Partitions"
1028
-
`add_files`only requires the client to read the existing parquet files' metadata footer to infer the partition value of each file. This implementation also supports adding files to Iceberg tables with partition transforms like `MonthTransform`, and `TruncateTransform` which preserve the order of the values after the transformation (Any Transform that has the `preserves_order` property set to True is supported). Please note that if the column statistics of the `PartitionField`'s source column are not present in the parquet metadata, the partition value is inferred as `None`.
1055
+
# Custom snapshot properties
1056
+
snapshot_properties = {"abc": "def"}
1029
1057
1030
-
!!! warning "Maintenance Operations"
1031
-
Because `add_files` commits the existing parquet files to the Iceberg Table as any other data file, destructive maintenance operations like expiring snapshots will remove them.
1058
+
# Enable duplicate file checking
1059
+
check_duplicate_files = True
1032
1060
1033
-
<!-- prettier-ignore-end -->
1061
+
# Add the Parquet files to the Iceberg table without rewriting
1062
+
tbl.add_files(
1063
+
file_paths=file_paths,
1064
+
snapshot_properties=snapshot_properties,
1065
+
check_duplicate_files=check_duplicate_files
1066
+
)
1067
+
1068
+
# NameMapping must have been set to enable reads
1069
+
assert tbl.name_mapping() is not None
1070
+
1071
+
# Verify that the snapshot property was set correctly
0 commit comments