Skip to content

Iceberg ingestion can fail with duplicate AWS SDK IdentityProperty Bucket when S3 extension is also loaded #19636

Description

@mfroembgen

Affected Version

Observed with a downstream Druid image built from the 37.0.0 line (37.0.0-SNAPSHOT at runtime) using AWS SDK v2 2.40.0 and Iceberg 1.10.0.

The same packaging pattern appears to still exist on current master (38.0.0-SNAPSHOT):

  • distribution/docker/deduplicate_jars.sh keeps /opt/druid/lib jars as canonical, then symlinks duplicate extension jars by filename.
  • extensions-core/s3-extensions declares AWS SDK v2 S3-related jars.
  • extensions-contrib/druid-iceberg-extensions also declares AWS SDK v2 S3/Glue/KMS-related jars.

Description

When both druid-s3-extensions and druid-iceberg-extensions are loaded, an Iceberg batch ingestion task using Glue/S3 can fail before it starts running subtasks. The failure happens while Iceberg constructs the AWS SDK S3 client.

The cluster configuration that hit this used Kubernetes indexing service / peon tasks, S3 deep storage and task logs, and this extension load list shape:

druid.extensions.loadList=[...,"druid-s3-extensions",...,"druid-iceberg-extensions"]

The task failed with:

java.lang.ExceptionInInitializerError
Caused by: java.lang.IllegalArgumentException: No duplicate IdentityProperty names allowed but both IdentityProperties 5fb0a09e and 536fc38 have the same namespace (java.lang.String) and name (Bucket). IdentityProperty should be referenced from a shared static constant to protect against erroneous or unexpected collisions.

Relevant stack frames:

software.amazon.awssdk.services.s3.internal.s3express.S3ExpressPlugin.configureClient(S3ExpressPlugin.java:35)
software.amazon.awssdk.services.s3.DefaultS3BaseClientBuilder.invokePlugins
software.amazon.awssdk.core.client.builder.SdkDefaultClientBuilder.syncClientConfiguration
org.apache.iceberg.aws.AwsClientFactories$DefaultAwsClientFactory.s3
org.apache.iceberg.aws.s3.PrefixedS3Client.s3
org.apache.iceberg.aws.s3.S3InputFile.fromLocation
org.apache.iceberg.aws.s3.S3FileIO.newInputFile
org.apache.iceberg.TableMetadataParser.read
org.apache.iceberg.aws.glue.GlueTableOperations.doRefresh
org.apache.druid.iceberg.input.IcebergCatalog.extractSnapshotDataFiles
org.apache.druid.iceberg.input.IcebergInputSource.createSplits
org.apache.druid.indexing.common.task.batch.parallel.SinglePhaseParallelIndexTaskRunner.subTaskSpecIterator
org.apache.druid.indexing.common.task.batch.parallel.ParallelIndexSupervisorTask.runTask

The resulting runtime jar layout had AWS SDK core/identity jars in the parent Druid lib path, but several S3 service-side jars remained extension-owned. For example:

/opt/druid/lib/identity-spi-2.40.0.jar
/opt/druid/lib/aws-core-2.40.0.jar
/opt/druid/lib/sdk-core-2.40.0.jar

/opt/druid/extensions/druid-s3-extensions/s3-2.40.0.jar -> /opt/druid/extensions/druid-iceberg-extensions/s3-2.40.0.jar
/opt/druid/extensions/druid-iceberg-extensions/s3-2.40.0.jar

/opt/druid/extensions/druid-s3-extensions/aws-xml-protocol-2.40.0.jar -> /opt/druid/extensions/druid-iceberg-extensions/aws-xml-protocol-2.40.0.jar
/opt/druid/extensions/druid-iceberg-extensions/aws-xml-protocol-2.40.0.jar

My read of this is that identity-spi / AWS SDK core classes are loaded from the parent Druid classloader, while S3 service classes can still be loaded through an extension classloader. When the Iceberg path initializes the S3 client, the AWS SDK S3 Express identity properties collide in the shared registry.

This did not look like an IAM, S3, or Glue permission problem. The failure happens during S3 client construction, and the same Druid role was able to read the Glue table and S3 metadata object outside the failing task path.

Steps to reproduce, at a high level:

  1. Build a Druid distribution image with both druid-s3-extensions and druid-iceberg-extensions available.
  2. Load both extensions in the same Druid process.
  3. Configure S3 deep storage/task logs, so druid-s3-extensions is needed.
  4. Submit an index_parallel task that reads an Iceberg table via GlueCatalog with metadata stored in S3.
  5. The task can fail while IcebergInputSource.createSplits constructs the Iceberg AWS S3 client.

One possible packaging fix is to ensure AWS SDK v2 jars that are duplicated across extensions and can participate in shared static identity/plugin state are loaded from the parent /opt/druid/lib classpath rather than from an extension classloader. In a downstream image, promoting duplicated AWS SDK v2 extension jars such as s3, aws-xml-protocol, arns, and crt-core into /opt/druid/lib before duplicate symlinking made the final extension copies point to the parent lib path instead of to another extension directory.

Please let me know if a smaller reproducer would be useful; I can try to reduce this to a Docker/distribution-layout repro if that would help maintainers.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions