Parent: #589
Depends on #866 (DV bitmap reader landing first to settle the codec). v3 prefers DVs over Parquet position-delete files. Iceberg-go would be the first non-Java client with a DV writer — pyiceberg writes Parquet position-deletes today, and iceberg-rust hasn't shipped this either.
Shape:
// New file: table/internal/dv_writer.go
type DVWriter struct {
fs io.IO
blobs []puffin.Blob
}
func (w *DVWriter) Add(dataFilePath string, positions []int64) error
func (w *DVWriter) Flush(ctx context.Context, location string) (DataFile, error)
Serializes a Roaring 64-bit bitmap → Puffin blob (blob_type=deletion-vector-v1) → manifest entry with content=1, file_format=PUFFIN, ReferencedDataFile, ContentOffset, ContentSizeInBytes. Commits through the existing RowDelta.AddDeletes() API so the rest of the producer stack is unchanged.
Add a table property write.delete.format=position|dv. On v3 the default flips to dv; v2 stays on position. When dv, the existing position-delete writer in table/internal/parquet_files.go is bypassed in favor of DVWriter.
Scope is large enough that it should land across multiple PRs — roaring serialization + writer skeleton, then producer wiring + property gating, then cross-client tests is one reasonable split. Discussion of the breakdown is welcome in this thread before any code lands.
Spec: Iceberg deletion vectors, Puffin format, RoaringBitmap serialization. Cross-client coverage: write a DV via iceberg-go, read back by Java/pyiceberg, assert filtered rows match.
Parent: #589
Depends on #866 (DV bitmap reader landing first to settle the codec). v3 prefers DVs over Parquet position-delete files. Iceberg-go would be the first non-Java client with a DV writer — pyiceberg writes Parquet position-deletes today, and iceberg-rust hasn't shipped this either.
Shape:
Serializes a Roaring 64-bit bitmap → Puffin blob (
blob_type=deletion-vector-v1) → manifest entry withcontent=1, file_format=PUFFIN, ReferencedDataFile, ContentOffset, ContentSizeInBytes. Commits through the existingRowDelta.AddDeletes()API so the rest of the producer stack is unchanged.Add a table property
write.delete.format=position|dv. On v3 the default flips todv; v2 stays onposition. Whendv, the existing position-delete writer intable/internal/parquet_files.gois bypassed in favor ofDVWriter.Scope is large enough that it should land across multiple PRs — roaring serialization + writer skeleton, then producer wiring + property gating, then cross-client tests is one reasonable split. Discussion of the breakdown is welcome in this thread before any code lands.
Spec: Iceberg deletion vectors, Puffin format, RoaringBitmap serialization. Cross-client coverage: write a DV via iceberg-go, read back by Java/pyiceberg, assert filtered rows match.