Description
Add support for AWS S3 Multi-Region Access Points (MRAPs) and S3 Access Point ARNs in Druid's S3 extension.
Currently, the bucket field in Druid's S3 configuration only accepts standard DNS-compliant bucket names. AWS Access Point ARNs (eg., arn:aws:s3::123456789123:accesspoint:bucket.mrap) are rejected at construction time in CloudObjectLocation because they fail the URL-encoding equality check used to enforce DNS naming rules. Additionally, some tools produce ARNs with a slash separator (accesspoint/alias) instead of the colon-delimited form (accesspoint:alias) expected by the AWS SDK, causing further failures downstream.
This change:
- Relaxes the bucket name validation in CloudObjectLocation to permit valid S3 Access Point ARNs alongside DNS-compliant names.
- Adds S3Utils.normalizeBucketName() to canonicalize the slash-delimited form to the colon-delimited form at ingestion points (S3DataSegmentPusherConfig, S3LoadSpec).
- Supports both regional Access Point ARNs (
arn:aws:s3:<region>:<account>:accesspoint:<name>) and MRAP ARNs (arn:aws:s3::<account>:accesspoint:<name>.mrap).
No API surface changes; the bucket configuration field continues to accept plain bucket names unchanged.
Motivation
Use case
AWS Multi-Region Access Points provide a single global S3 endpoint that routes requests to the nearest healthy bucket replica across regions. Operators use MRAPs for:
- Active-active multi-region Druid deployments backed by S3 Cross-Region Replication (CRR).
- Disaster recovery setups where deep storage must remain accessible during a regional outage.
- Simplifying Druid configuration across regions — one ARN in druid.storage.bucket instead of per-region overrides.
- Access Point ARNs more broadly (single-region) are also used to enforce fine-grained IAM access controls on shared buckets without exposing the bucket name.
Why the current behavior blocks this
CloudObjectLocation enforces:
Preconditions.checkArgument(
this.bucket.equals(StringUtils.urlEncode(this.bucket)),
"bucket must follow DNS-compliant naming conventions"
);
An ARN like arn:aws:s3::123456789123:accesspoint:bucket.mrap URL-encodes to arn:aws:s3::123456789123:accesspoint:bucket.mrap, so the check always fails. There is no escape hatch. Users who configure an MRAP ARN as the Druid storage bucket receive an IllegalArgumentException at startup with no workaround short of patching the code.
The AWS SDK for Java (v1 and v2) accepts ARN strings wherever a bucket name is expected, so no SDK-level changes are required. The fix is purely a validation relaxation and a normalization helper.
Description
Add support for AWS S3 Multi-Region Access Points (MRAPs) and S3 Access Point ARNs in Druid's S3 extension.
Currently, the bucket field in Druid's S3 configuration only accepts standard DNS-compliant bucket names. AWS Access Point ARNs (eg.,
arn:aws:s3::123456789123:accesspoint:bucket.mrap) are rejected at construction time inCloudObjectLocationbecause they fail the URL-encoding equality check used to enforce DNS naming rules. Additionally, some tools produce ARNs with a slash separator (accesspoint/alias) instead of the colon-delimited form (accesspoint:alias) expected by the AWS SDK, causing further failures downstream.This change:
arn:aws:s3:<region>:<account>:accesspoint:<name>) and MRAP ARNs (arn:aws:s3::<account>:accesspoint:<name>.mrap).No API surface changes; the bucket configuration field continues to accept plain bucket names unchanged.
Motivation
Use case
AWS Multi-Region Access Points provide a single global S3 endpoint that routes requests to the nearest healthy bucket replica across regions. Operators use MRAPs for:
Why the current behavior blocks this
CloudObjectLocation enforces:
An ARN like
arn:aws:s3::123456789123:accesspoint:bucket.mrapURL-encodes toarn:aws:s3::123456789123:accesspoint:bucket.mrap, so the check always fails. There is no escape hatch. Users who configure an MRAP ARN as the Druid storage bucket receive an IllegalArgumentException at startup with no workaround short of patching the code.The AWS SDK for Java (v1 and v2) accepts ARN strings wherever a bucket name is expected, so no SDK-level changes are required. The fix is purely a validation relaxation and a normalization helper.