Skip to content

Commit 995c794

Browse files
committed
add more notes
1 parent ae85ac9 commit 995c794

1 file changed

Lines changed: 21 additions & 18 deletions

File tree

mkdocs/docs/api.md

Lines changed: 21 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -2071,17 +2071,17 @@ import pyarrow as pa
20712071
| `DateType` | `pa.date32()` |
20722072
| `TimeType` | `pa.time64("us")` |
20732073
| `TimestampType` | `pa.timestamp("us")` |
2074-
| `TimestampNanoType` | `pa.timestamp("ns")` |
2075-
| `TimestamptzType` | `pa.timestamp("us", tz="UTC")` |
2076-
| `TimestamptzNanoType` | `pa.timestamp("ns", tz="UTC")` |
2074+
| `TimestampNanoType` (format version 3 only) | `pa.timestamp("ns")` [[2]](#notes) |
2075+
| `TimestamptzType` | `pa.timestamp("us", tz="UTC")` [[1]](#notes) |
2076+
| `TimestamptzNanoType` (format version 3 only) | `pa.timestamp("ns", tz="UTC")` [[1]](#notes) [[2]](#notes) |
20772077
| `StringType` | `pa.large_string()` |
20782078
| `UUIDType` | `pa.uuid()` |
20792079
| `BinaryType` | `pa.large_binary()` |
20802080
| `FixedType(L)` | `pa.binary(L)` |
20812081
| `StructType` | `pa.struct()` |
20822082
| `ListType(e)` | `pa.large_list(e)` |
20832083
| `MapType(k, v)` | `pa.map_(k, v)` |
2084-
| `UnknownType` | `pa.null()` |
2084+
| `UnknownType` (format version 3 only) | `pa.null()` [[2]](#notes) |
20852085

20862086
---
20872087

@@ -2090,7 +2090,7 @@ import pyarrow as pa
20902090
| PyArrow type | PyIceberg type class |
20912091
|------------------------------------|-----------------------------|
20922092
| `pa.bool_()` | `BooleanType` |
2093-
| `pa.int32()` | `IntegerType` |
2093+
| `pa.int8()` / `pa.int16()` / `pa.int32()` | `IntegerType` |
20942094
| `pa.int64()` | `LongType` |
20952095
| `pa.float32()` | `FloatType` |
20962096
| `pa.float64()` | `DoubleType` |
@@ -2099,24 +2099,27 @@ import pyarrow as pa
20992099
| `pa.date32()` | `DateType` |
21002100
| `pa.date64()` | Unsupported |
21012101
| `pa.time64("us")` | `TimeType` |
2102-
| `pa.timestamp("us")` | `TimestampType` |
2103-
| `pa.timestamp("ns")` | `TimestampNanoType` |
2104-
| `pa.timestamp("us", tz="UTC")` | `TimestamptzType` |
2105-
| `pa.timestamp("ns", tz="UTC")` | `TimestamptzNanoType` |
2106-
| `pa.string()` / `pa.large_string()`| `StringType` |
2102+
| `pa.timestamp("s")` / `pa.timestamp("ms")` / `pa.timestamp("us")` | `TimestampType` |
2103+
| `pa.timestamp("ns")` | `TimestampNanoType` (format version 3 only) [[2]](#notes) |
2104+
| `pa.timestamp("s", tz="UTC")` / `pa.timestamp("ms", tz="UTC")` / `pa.timestamp("us", tz="UTC")` | `TimestamptzType` [[1]](#notes) |
2105+
| `pa.timestamp("ns", tz="UTC")` | `TimestamptzNanoType` (format version 3 only) [[1]](#notes) [[2]](#notes) |
2106+
| `pa.string()` / `pa.large_string()` / `pa.string_view()` | `StringType` |
21072107
| `pa.uuid()` | `UUIDType` |
2108-
| `pa.binary()` / `pa.large_binary()`| `BinaryType` |
2108+
| `pa.binary()` / `pa.large_binary()` / `pa.binary_view()` | `BinaryType` |
21092109
| `pa.binary(L)` | `FixedType(L)` |
21102110
| `pa.struct([...])` | `StructType` |
2111-
| `pa.list_(e)` / `pa.large_list(e)` | `ListType(e)` |
2111+
| `pa.list_(e)` / `pa.large_list(e)` / `pa.list_(e, fixed_size)` | `ListType(e)` |
21122112
| `pa.map_(k, v)` | `MapType(k, v)` |
2113-
| `pa.null()` | `UnknownType` |
2113+
| `pa.null()` | `UnknownType` (format version 3 only) [[2]](#notes) |
21142114

21152115
---
21162116

2117-
***Notes***
2117+
#### Notes
21182118

2119-
- PyIceberg `GeometryType` and `GeographyType` types are mapped to a GeoArrow WKB extension type.
2120-
Otherwise, falls back to `pa.large_binary()` which stores WKB bytes.
2121-
- For timestamp types (`TimestampNanoType`, `TimestamptzType`, `TimestamptzNanoType`), writing in format version 3 (which supports the `ns` unit) is not yet implemented
2122-
(see [Github issue](https://github.com/apache/iceberg-python/issues/1551)). Only the `UTC` timezone and its aliases are supported.
2119+
[1] Only the `UTC` timezone and its aliases are supported for PyArrow-to-PyIceberg timestamp-with-timezone conversion.
2120+
2121+
[2] The PyArrow-to-PyIceberg mappings for `pa.timestamp("ns")`, `pa.timestamp("ns", tz="UTC")`, and `pa.null()` require Iceberg format version 3. By default, `pyarrow_to_schema()` uses format version 2. `TimestampNanoType`, `TimestamptzNanoType`, and `UnknownType` are likewise format-version-3-only Iceberg types.
2122+
2123+
[3] For nanosecond Iceberg timestamp types (`TimestampNanoType` and `TimestamptzNanoType`), writing in format version 3 is not yet implemented (see [GitHub issue #1551](https://github.com/apache/iceberg-python/issues/1551)).
2124+
2125+
[4] The mappings are not fully symmetric. On read, PyArrow normalizes some families of types into a single Iceberg type, and on write PyIceberg emits a canonical PyArrow type: for example, `pa.int8()` and `pa.int16()` read as `IntegerType` and write back as `pa.int32()`, `pa.string()` reads as `StringType` and writes back as `pa.large_string()`, `pa.binary()` reads as `BinaryType` and writes back as `pa.large_binary()`, `pa.list_(...)` writes back as `pa.large_list(...)`, and `pa.timestamp("s")` / `pa.timestamp("ms")` read as `TimestampType` and write back as `pa.timestamp("us")`.

0 commit comments

Comments
 (0)