Skip to content

Add AWS S3 Multi-Region Access Point (MRAP) support to the S3 extension #19608

Description

@vivek807

Description

Add support for AWS S3 Multi-Region Access Points (MRAPs) and S3 Access Point ARNs in Druid's S3 extension.

Currently, the bucket field in Druid's S3 configuration only accepts standard DNS-compliant bucket names. AWS Access Point ARNs (eg., arn:aws:s3::123456789123:accesspoint:bucket.mrap) are rejected at construction time in CloudObjectLocation because they fail the URL-encoding equality check used to enforce DNS naming rules. Additionally, some tools produce ARNs with a slash separator (accesspoint/alias) instead of the colon-delimited form (accesspoint:alias) expected by the AWS SDK, causing further failures downstream.

This change:

  • Relaxes the bucket name validation in CloudObjectLocation to permit valid S3 Access Point ARNs alongside DNS-compliant names.
  • Adds S3Utils.normalizeBucketName() to canonicalize the slash-delimited form to the colon-delimited form at ingestion points (S3DataSegmentPusherConfig, S3LoadSpec).
  • Supports both regional Access Point ARNs (arn:aws:s3:<region>:<account>:accesspoint:<name>) and MRAP ARNs (arn:aws:s3::<account>:accesspoint:<name>.mrap).

No API surface changes; the bucket configuration field continues to accept plain bucket names unchanged.

Motivation

Use case

AWS Multi-Region Access Points provide a single global S3 endpoint that routes requests to the nearest healthy bucket replica across regions. Operators use MRAPs for:

  • Active-active multi-region Druid deployments backed by S3 Cross-Region Replication (CRR).
  • Disaster recovery setups where deep storage must remain accessible during a regional outage.
  • Simplifying Druid configuration across regions — one ARN in druid.storage.bucket instead of per-region overrides.
  • Access Point ARNs more broadly (single-region) are also used to enforce fine-grained IAM access controls on shared buckets without exposing the bucket name.

Why the current behavior blocks this

CloudObjectLocation enforces:

Preconditions.checkArgument(
this.bucket.equals(StringUtils.urlEncode(this.bucket)),
"bucket must follow DNS-compliant naming conventions"
);

An ARN like arn:aws:s3::123456789123:accesspoint:bucket.mrap URL-encodes to arn:aws:s3::123456789123:accesspoint:bucket.mrap, so the check always fails. There is no escape hatch. Users who configure an MRAP ARN as the Druid storage bucket receive an IllegalArgumentException at startup with no workaround short of patching the code.

The AWS SDK for Java (v1 and v2) accepts ARN strings wherever a bucket name is expected, so no SDK-level changes are required. The fix is purely a validation relaxation and a normalization helper.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions