Skip to content

Support insert-only Iceberg changelog native scan #2253

@weimingdiit

Description

@weimingdiit

Support insert-only Iceberg changelog native scan

Is your feature request related to a problem? Please describe.

Currently, Iceberg changelog scans cannot be converted to native scan and have to fall back to Spark. For insert-only changelog queries, the scan reads added data files only, which can reuse the existing native Parquet/ORC scan path safely.

Describe the solution you'd like

Support native scan conversion for insert-only Iceberg changelog scans.

The initial scope is intentionally limited:

  • Support SparkChangelogScan with AddedRowsScanTask
  • Support INSERT changelog operation only
  • Support Parquet and ORC data files
  • Materialize supported metadata columns through per-task partition values
  • Keep delete/update/equality-delete changelog tasks on Spark fallback path
  • Keep mixed file formats and unsupported metadata columns on fallback path

Describe alternatives you've considered

A broader implementation could support delete/update changelog records and position/equality delete handling in native scan. That requires native delete-file semantics and is better handled as a separate follow-up.

Additional context

This improves native Iceberg coverage for append-only changelog workloads while keeping correctness boundaries conservative.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions