Skip to content

Commit a8fe03d

Browse files
committed
chore: documented new retention methods
1 parent 3da1528 commit a8fe03d

1 file changed

Lines changed: 55 additions & 0 deletions

File tree

mkdocs/docs/api.md

Lines changed: 55 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1376,6 +1376,61 @@ def cleanup_old_snapshots(table_name: str, snapshot_ids: list[int]):
13761376
cleanup_old_snapshots("analytics.user_events", [12345, 67890, 11111])
13771377
```
13781378

1379+
#### Advanced Retention Strategies
1380+
1381+
PyIceberg provides additional retention helpers on `ExpireSnapshots` to balance safety and cleanup:
1382+
1383+
Key table properties used as defaults (all optional):
1384+
1385+
- `history.expire.max-snapshot-age-ms`: Default age threshold for `with_retention_policy`
1386+
- `history.expire.min-snapshots-to-keep`: Minimum total snapshots to retain
1387+
- `history.expire.max-ref-age-ms`: (Reserved for future protected ref/branch cleanup logic)
1388+
1389+
Protected snapshots (referenced by branches or tags) are never expired by these APIs.
1390+
1391+
Keep only the last N snapshots (plus protected):
1392+
1393+
```python
1394+
table.maintenance.expire_snapshots().retain_last_n(5).commit()
1395+
```
1396+
1397+
Expire older snapshots but always keep the most recent N and a safety floor:
1398+
1399+
```python
1400+
from datetime import datetime, timedelta
1401+
1402+
cutoff = int((datetime.now() - timedelta(days=7)).timestamp() * 1000)
1403+
table.maintenance.expire_snapshots().older_than_with_retention(
1404+
timestamp_ms=cutoff,
1405+
retain_last_n=3,
1406+
min_snapshots_to_keep=4,
1407+
).commit()
1408+
```
1409+
1410+
Unified policy that also reads table property defaults:
1411+
1412+
```python
1413+
# Uses table properties if arguments omitted
1414+
table.maintenance.expire_snapshots().with_retention_policy().commit()
1415+
1416+
# Override selectively
1417+
table.maintenance.expire_snapshots().with_retention_policy(
1418+
retain_last_n=2, # keep 2 newest regardless of age
1419+
min_snapshots_to_keep=5, # never go below 5 total
1420+
# timestamp_ms omitted -> falls back to history.expire.max-snapshot-age-ms if set
1421+
).commit()
1422+
```
1423+
1424+
Parameter interaction rules:
1425+
1426+
- `retain_last_n` snapshots are always kept (plus protected refs)
1427+
- `timestamp_ms` filters candidates (older than only)
1428+
- `min_snapshots_to_keep` stops expiration once the floor would be violated
1429+
- If all of (`timestamp_ms`, `retain_last_n`, `min_snapshots_to_keep`) are None in `with_retention_policy`, nothing is expired
1430+
- Passing invalid values (`< 1`) for counts raises `ValueError`
1431+
1432+
Safety tip: Start with higher `min_snapshots_to_keep` when first enabling automated cleanup.
1433+
13791434
## Views
13801435

13811436
PyIceberg supports view operations.

0 commit comments

Comments
 (0)