Affected Version
Observed with a downstream Druid image built from the 37.0.0 line (37.0.0-SNAPSHOT at runtime) using AWS SDK v2 2.40.0 and Iceberg 1.10.0.
The same packaging pattern appears to still exist on current master (38.0.0-SNAPSHOT):
distribution/docker/deduplicate_jars.sh keeps /opt/druid/lib jars as canonical, then symlinks duplicate extension jars by filename.
extensions-core/s3-extensions declares AWS SDK v2 S3-related jars.
extensions-contrib/druid-iceberg-extensions also declares AWS SDK v2 S3/Glue/KMS-related jars.
Description
When both druid-s3-extensions and druid-iceberg-extensions are loaded, an Iceberg batch ingestion task using Glue/S3 can fail before it starts running subtasks. The failure happens while Iceberg constructs the AWS SDK S3 client.
The cluster configuration that hit this used Kubernetes indexing service / peon tasks, S3 deep storage and task logs, and this extension load list shape:
druid.extensions.loadList=[...,"druid-s3-extensions",...,"druid-iceberg-extensions"]
The task failed with:
java.lang.ExceptionInInitializerError
Caused by: java.lang.IllegalArgumentException: No duplicate IdentityProperty names allowed but both IdentityProperties 5fb0a09e and 536fc38 have the same namespace (java.lang.String) and name (Bucket). IdentityProperty should be referenced from a shared static constant to protect against erroneous or unexpected collisions.
Relevant stack frames:
software.amazon.awssdk.services.s3.internal.s3express.S3ExpressPlugin.configureClient(S3ExpressPlugin.java:35)
software.amazon.awssdk.services.s3.DefaultS3BaseClientBuilder.invokePlugins
software.amazon.awssdk.core.client.builder.SdkDefaultClientBuilder.syncClientConfiguration
org.apache.iceberg.aws.AwsClientFactories$DefaultAwsClientFactory.s3
org.apache.iceberg.aws.s3.PrefixedS3Client.s3
org.apache.iceberg.aws.s3.S3InputFile.fromLocation
org.apache.iceberg.aws.s3.S3FileIO.newInputFile
org.apache.iceberg.TableMetadataParser.read
org.apache.iceberg.aws.glue.GlueTableOperations.doRefresh
org.apache.druid.iceberg.input.IcebergCatalog.extractSnapshotDataFiles
org.apache.druid.iceberg.input.IcebergInputSource.createSplits
org.apache.druid.indexing.common.task.batch.parallel.SinglePhaseParallelIndexTaskRunner.subTaskSpecIterator
org.apache.druid.indexing.common.task.batch.parallel.ParallelIndexSupervisorTask.runTask
The resulting runtime jar layout had AWS SDK core/identity jars in the parent Druid lib path, but several S3 service-side jars remained extension-owned. For example:
/opt/druid/lib/identity-spi-2.40.0.jar
/opt/druid/lib/aws-core-2.40.0.jar
/opt/druid/lib/sdk-core-2.40.0.jar
/opt/druid/extensions/druid-s3-extensions/s3-2.40.0.jar -> /opt/druid/extensions/druid-iceberg-extensions/s3-2.40.0.jar
/opt/druid/extensions/druid-iceberg-extensions/s3-2.40.0.jar
/opt/druid/extensions/druid-s3-extensions/aws-xml-protocol-2.40.0.jar -> /opt/druid/extensions/druid-iceberg-extensions/aws-xml-protocol-2.40.0.jar
/opt/druid/extensions/druid-iceberg-extensions/aws-xml-protocol-2.40.0.jar
My read of this is that identity-spi / AWS SDK core classes are loaded from the parent Druid classloader, while S3 service classes can still be loaded through an extension classloader. When the Iceberg path initializes the S3 client, the AWS SDK S3 Express identity properties collide in the shared registry.
This did not look like an IAM, S3, or Glue permission problem. The failure happens during S3 client construction, and the same Druid role was able to read the Glue table and S3 metadata object outside the failing task path.
Steps to reproduce, at a high level:
- Build a Druid distribution image with both
druid-s3-extensions and druid-iceberg-extensions available.
- Load both extensions in the same Druid process.
- Configure S3 deep storage/task logs, so
druid-s3-extensions is needed.
- Submit an
index_parallel task that reads an Iceberg table via GlueCatalog with metadata stored in S3.
- The task can fail while
IcebergInputSource.createSplits constructs the Iceberg AWS S3 client.
One possible packaging fix is to ensure AWS SDK v2 jars that are duplicated across extensions and can participate in shared static identity/plugin state are loaded from the parent /opt/druid/lib classpath rather than from an extension classloader. In a downstream image, promoting duplicated AWS SDK v2 extension jars such as s3, aws-xml-protocol, arns, and crt-core into /opt/druid/lib before duplicate symlinking made the final extension copies point to the parent lib path instead of to another extension directory.
Please let me know if a smaller reproducer would be useful; I can try to reduce this to a Docker/distribution-layout repro if that would help maintainers.
Affected Version
Observed with a downstream Druid image built from the 37.0.0 line (
37.0.0-SNAPSHOTat runtime) using AWS SDK v22.40.0and Iceberg1.10.0.The same packaging pattern appears to still exist on current
master(38.0.0-SNAPSHOT):distribution/docker/deduplicate_jars.shkeeps/opt/druid/libjars as canonical, then symlinks duplicate extension jars by filename.extensions-core/s3-extensionsdeclares AWS SDK v2 S3-related jars.extensions-contrib/druid-iceberg-extensionsalso declares AWS SDK v2 S3/Glue/KMS-related jars.Description
When both
druid-s3-extensionsanddruid-iceberg-extensionsare loaded, an Iceberg batch ingestion task using Glue/S3 can fail before it starts running subtasks. The failure happens while Iceberg constructs the AWS SDK S3 client.The cluster configuration that hit this used Kubernetes indexing service / peon tasks, S3 deep storage and task logs, and this extension load list shape:
The task failed with:
Relevant stack frames:
The resulting runtime jar layout had AWS SDK core/identity jars in the parent Druid lib path, but several S3 service-side jars remained extension-owned. For example:
My read of this is that
identity-spi/ AWS SDK core classes are loaded from the parent Druid classloader, while S3 service classes can still be loaded through an extension classloader. When the Iceberg path initializes the S3 client, the AWS SDK S3 Express identity properties collide in the shared registry.This did not look like an IAM, S3, or Glue permission problem. The failure happens during S3 client construction, and the same Druid role was able to read the Glue table and S3 metadata object outside the failing task path.
Steps to reproduce, at a high level:
druid-s3-extensionsanddruid-iceberg-extensionsavailable.druid-s3-extensionsis needed.index_paralleltask that reads an Iceberg table via GlueCatalog with metadata stored in S3.IcebergInputSource.createSplitsconstructs the Iceberg AWS S3 client.One possible packaging fix is to ensure AWS SDK v2 jars that are duplicated across extensions and can participate in shared static identity/plugin state are loaded from the parent
/opt/druid/libclasspath rather than from an extension classloader. In a downstream image, promoting duplicated AWS SDK v2 extension jars such ass3,aws-xml-protocol,arns, andcrt-coreinto/opt/druid/libbefore duplicate symlinking made the final extension copies point to the parent lib path instead of to another extension directory.Please let me know if a smaller reproducer would be useful; I can try to reduce this to a Docker/distribution-layout repro if that would help maintainers.