apache
diff --git a/‎.github/workflows/api-test.yml‎
Lines changed: 2 additions & 0 deletions b/‎.github/workflows/api-test.yml‎
Lines changed: 2 additions & 0 deletions
diff --git a/‎docs/configs/docsdev.js‎
Lines changed: 8 additions & 0 deletions b/‎docs/configs/docsdev.js‎
Lines changed: 8 additions & 0 deletions
diff --git a/‎docs/docs/en/guide/task/emr-serverless.md‎
Lines changed: 144 additions & 0 deletions b/‎docs/docs/en/guide/task/emr-serverless.md‎
Lines changed: 144 additions & 0 deletions
diff --git a/‎docs/docs/zh/guide/task/emr-serverless.md‎
Lines changed: 144 additions & 0 deletions b/‎docs/docs/zh/guide/task/emr-serverless.md‎
Lines changed: 144 additions & 0 deletions
diff --git a/‎docs/img/tasks/demo/emr_serverless_create.png‎
284 KB b/‎docs/img/tasks/demo/emr_serverless_create.png‎
284 KB
@@ -121,6 +121,8 @@ jobs:
             class: org.apache.dolphinscheduler.api.test.cases.OidcLoginAPITest
           - name: DependentTaskAPITest
             class: org.apache.dolphinscheduler.api.test.cases.tasks.DependentTaskAPITest
+          - name: EmrServerlessTaskAPITest
+            class: org.apache.dolphinscheduler.api.test.cases.tasks.EmrServerlessTaskAPITest
     env:
       RECORDING_PATH: /tmp/recording-${{ matrix.case.name }}
     steps:
 
@@ -153,6 +153,10 @@ export default {
                                 title: 'Amazon EMR',
                                 link: '/en-us/docs/dev/user_doc/guide/task/emr.html',
                             },
+                            {
+                                title: 'Amazon EMR Serverless',
+                                link: '/en-us/docs/dev/user_doc/guide/task/emr-serverless.html',
+                            },
                             {
                                 title: 'Apache Zeppelin',
                                 link: '/en-us/docs/dev/user_doc/guide/task/zeppelin.html',
@@ -877,6 +881,10 @@ export default {
                                 title: 'Amazon EMR',
                                 link: '/zh-cn/docs/dev/user_doc/guide/task/emr.html',
                             },
+                            {
+                                title: 'Amazon EMR Serverless',
+                                link: '/zh-cn/docs/dev/user_doc/guide/task/emr-serverless.html',
+                            },
                             {
                                 title: 'Apache Zeppelin',
                                 link: '/zh-cn/docs/dev/user_doc/guide/task/zeppelin.html',
 
@@ -0,0 +1,144 @@
+# Amazon EMR Serverless
+
+## Overview
+
+Amazon EMR Serverless task type, for submitting and monitoring job runs on [Amazon EMR Serverless](https://docs.aws.amazon.com/emr/latest/EMR-Serverless-UserGuide/emr-serverless.html) applications.
+Unlike traditional EMR on EC2, EMR Serverless requires no cluster infrastructure management and automatically scales compute resources on demand, suitable for Spark and Hive workloads.
+
+Using [aws-java-sdk](https://aws.amazon.com/cn/sdk-for-java/) in the background code, to transfer JSON parameters to a [StartJobRunRequest](https://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/services/emrserverless/model/StartJobRunRequest.html) object
+and submit it to AWS via the [StartJobRun API](https://docs.aws.amazon.com/emr-serverless/latest/APIReference/API_StartJobRun.html), then poll job status via the [GetJobRun API](https://docs.aws.amazon.com/emr-serverless/latest/APIReference/API_GetJobRun.html) until completion.
+
+## Create Task
+
+- Click `Project Management -> Project Name -> Workflow Definition`, click the `Create Workflow` button to enter the DAG editing page.
+- Drag `AmazonEMRServerless` task from the toolbar to the artboard to complete the creation.
+
+## Task Parameters
+
+[//]: # (TODO: use the commented anchor below once our website template supports this syntax)
+[//]: # (- Please refer to [DolphinScheduler Task Parameters Appendix]&#40;appendix.md#default-task-parameters&#41; `Default Task Parameters` section for default parameters.)
+
+- Please refer to [DolphinScheduler Task Parameters Appendix](appendix.md) `Default Task Parameters` section for default parameters.
+
+|      **Parameter**      |                                                                                                                                                                                                 **Description**                                                                                                                                                                                                  |
+|-------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| Application Id          | EMR Serverless application ID (e.g. `00fkht2eodujab09`), obtainable from the [EMR Serverless Console](https://console.aws.amazon.com/emr/home#/serverless)                                                                                                                                                                                                                                                       |
+| Execution Role Arn      | ARN of the IAM role for job execution (e.g. `arn:aws:iam::123456789012:role/EMRServerlessRole`), this role needs permissions to access S3, Glue, and other services                                                                                                                                                                                                                                              |
+| Job Name                | Job name (optional), used to identify the job in the EMR Serverless console                                                                                                                                                                                                                                                                                                                                      |
+| StartJobRunRequest JSON | JSON corresponding to the `JobDriver` and `ConfigurationOverrides` portions of the [StartJobRunRequest](https://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/services/emrserverless/model/StartJobRunRequest.html), see examples below. **Note**: `ApplicationId` and `ExecutionRoleArn` do not need to be included in the JSON as they are automatically injected from the form parameters above |
+
+![RUN_JOB_FLOW](../../../../img/tasks/demo/emr_serverless_create.png)
+
+## Task Example
+
+### Submit a Spark Job
+
+This example shows how to create an `EMR_SERVERLESS` task node to submit a Spark job to an EMR Serverless application.
+
+StartJobRunRequest JSON example (Spark):
+
+```json
+{
+  "JobDriver": {
+    "SparkSubmit": {
+      "EntryPoint": "s3://my-bucket/scripts/my-spark-job.jar",
+      "EntryPointArguments": [
+        "s3://my-bucket/input/",
+        "s3://my-bucket/output/"
+      ],
+      "SparkSubmitParameters": "--class com.example.MySparkApp --conf spark.executor.cores=4 --conf spark.executor.memory=8g --conf spark.executor.instances=10"
+    }
+  },
+  "ConfigurationOverrides": {
+    "MonitoringConfiguration": {
+      "S3MonitoringConfiguration": {
+        "LogUri": "s3://my-bucket/emr-serverless-logs/"
+      }
+    }
+  }
+}
+```
+
+### Submit a Hive Job
+
+This example shows how to create an `EMR_SERVERLESS` task node to submit a Hive query job.
+
+StartJobRunRequest JSON example (Hive):
+
+```json
+{
+  "JobDriver": {
+    "HiveSQL": {
+      "Query": "s3://my-bucket/scripts/my-hive-query.sql",
+      "Parameters": "--hiveconf hive.exec.dynamic.partition=true --hiveconf hive.exec.dynamic.partition.mode=nonstrict"
+    }
+  },
+  "ConfigurationOverrides": {
+    "MonitoringConfiguration": {
+      "S3MonitoringConfiguration": {
+        "LogUri": "s3://my-bucket/emr-serverless-logs/"
+      }
+    },
+    "ApplicationConfiguration": [
+      {
+        "Classification": "hive-site",
+        "Properties": {
+          "hive.metastore.client.factory.class": "com.amazonaws.glue.catalog.metastore.AWSGlueDataCatalogHiveClientFactory"
+        }
+      }
+    ]
+  }
+}
+```
+
+## AWS Authentication Configuration
+
+The EMR Serverless task reads AWS credentials from the DolphinScheduler `aws.yaml` configuration file, under the `aws.emr` section at `conf/aws.yaml`.
+
+### Using IAM Role (Recommended)
+
+If the DolphinScheduler Worker node runs on an EC2 instance with an attached IAM Role:
+
+```yaml
+aws:
+  emr:
+    credentials.provider.type: InstanceProfileCredentialsProvider
+    region: us-east-1
+```
+
+### Using Access Key
+
+If you need to authenticate using AK/SK:
+
+```yaml
+aws:
+  emr:
+    credentials.provider.type: AWSStaticCredentialsProvider
+    access.key.id: your-access-key-id
+    access.key.secret: your-secret-access-key
+    region: us-east-1
+```
+
+> **Note**: The `aws.emr` section configuration is shared by both EMR on EC2 and EMR Serverless task types.
+
+## Job State Transitions
+
+After an EMR Serverless job is submitted, DolphinScheduler polls the job status every 10 seconds:
+
+```
+SUBMITTED → PENDING → SCHEDULED → RUNNING → SUCCESS
+                                           → FAILED
+                                           → CANCELLED
+```
+
+- When a job reaches `SUCCESS` state, the task is marked as successful
+- When a job reaches `FAILED` or `CANCELLED` state, the task is marked as failed
+- If a DolphinScheduler task is killed, it automatically calls the [CancelJobRun API](https://docs.aws.amazon.com/emr-serverless/latest/APIReference/API_CancelJobRun.html) to cancel the running job
+
+## Notice
+
+- The **Application Id** must correspond to a pre-existing EMR Serverless application (created via the AWS Console or API) in `STARTED` or `CREATED` state
+- The **Execution Role** requires the following minimum permissions: `emr-serverless:StartJobRun`, `emr-serverless:GetJobRun`, `emr-serverless:CancelJobRun`, plus S3, Glue and other data access permissions required by the job
+- `StartJobRunRequest JSON` should NOT include `ApplicationId` or `ExecutionRoleArn` fields — they are automatically injected from the form parameters
+- EMR Serverless task supports failover: when a Worker node fails, a new Worker can recover tracking of running jobs through `appIds` (the `jobRunId`)
+
@@ -0,0 +1,144 @@
+# Amazon EMR Serverless
+
+## 综述
+
+Amazon EMR Serverless 任务类型，用于向 [Amazon EMR Serverless](https://docs.aws.amazon.com/emr/latest/EMR-Serverless-UserGuide/emr-serverless.html) 应用程序提交并监控作业运行。
+与传统的 EMR on EC2 不同，EMR Serverless 无需管理集群基础设施，按需自动扩缩容计算资源，适用于 Spark 和 Hive 工作负载。
+
+后台使用 [aws-java-sdk](https://aws.amazon.com/cn/sdk-for-java/) 将 JSON 参数转换为 [StartJobRunRequest](https://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/services/emrserverless/model/StartJobRunRequest.html) 对象，
+通过 [StartJobRun API](https://docs.aws.amazon.com/emr-serverless/latest/APIReference/API_StartJobRun.html) 提交到 AWS，并通过 [GetJobRun API](https://docs.aws.amazon.com/emr-serverless/latest/APIReference/API_GetJobRun.html) 轮询作业状态直到完成。
+
+## 创建任务
+
+- 点击 `项目管理 -> 项目名称 -> 工作流定义`，点击 `创建工作流` 按钮进入 DAG 编辑页面。
+- 从工具栏中拖拽 `AmazonEMRServerless` 任务到画布中完成创建。
+
+## 任务参数
+
+[//]: # (TODO: use the commented anchor below once our website template supports this syntax)
+[//]: # (- 默认参数说明请参考[DolphinScheduler任务参数附录]&#40;appendix.md#默认任务参数&#41;`默认任务参数`一栏。)
+
+- 默认参数说明请参考[DolphinScheduler任务参数附录](appendix.md)`默认任务参数`一栏。
+
+|        **任务参数**         |                                                                                                                                        **描述**                                                                                                                                        |
+|-------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| Application Id          | EMR Serverless 应用程序 ID（格式如 `00fkht2eodujab09`），可在 [EMR Serverless 控制台](https://console.aws.amazon.com/emr/home#/serverless) 获取                                                                                                                                                       |
+| Execution Role Arn      | 作业执行 IAM 角色的 ARN（格式如 `arn:aws:iam::123456789012:role/EMRServerlessRole`），该角色需要有访问 S3、Glue 等服务的权限                                                                                                                                                                                     |
+| Job Name                | 作业名称（可选），用于在 EMR Serverless 控制台中标识作业                                                                                                                                                                                                                                                 |
+| StartJobRunRequest JSON | [StartJobRunRequest](https://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/services/emrserverless/model/StartJobRunRequest.html) 中 `JobDriver` 和 `ConfigurationOverrides` 部分对应的 JSON，详细定义见下方示例。**注意**：`ApplicationId` 和 `ExecutionRoleArn` 无需在 JSON 中重复填写，系统会自动从上方参数注入 |
+
+![RUN_JOB_FLOW](../../../../img/tasks/demo/emr_serverless_create.png)
+
+## 任务样例
+
+### 提交 Spark 作业
+
+该样例展示了如何创建 `EMR_SERVERLESS` 任务节点来提交一个 Spark 作业到 EMR Serverless 应用程序。
+
+StartJobRunRequest JSON 参数样例（Spark）：
+
+```json
+{
+  "JobDriver": {
+    "SparkSubmit": {
+      "EntryPoint": "s3://my-bucket/scripts/my-spark-job.jar",
+      "EntryPointArguments": [
+        "s3://my-bucket/input/",
+        "s3://my-bucket/output/"
+      ],
+      "SparkSubmitParameters": "--class com.example.MySparkApp --conf spark.executor.cores=4 --conf spark.executor.memory=8g --conf spark.executor.instances=10"
+    }
+  },
+  "ConfigurationOverrides": {
+    "MonitoringConfiguration": {
+      "S3MonitoringConfiguration": {
+        "LogUri": "s3://my-bucket/emr-serverless-logs/"
+      }
+    }
+  }
+}
+```
+
+### 提交 Hive 作业
+
+该样例展示了如何创建 `EMR_SERVERLESS` 任务节点来提交一个 Hive 查询作业。
+
+StartJobRunRequest JSON 参数样例（Hive）：
+
+```json
+{
+  "JobDriver": {
+    "HiveSQL": {
+      "Query": "s3://my-bucket/scripts/my-hive-query.sql",
+      "Parameters": "--hiveconf hive.exec.dynamic.partition=true --hiveconf hive.exec.dynamic.partition.mode=nonstrict"
+    }
+  },
+  "ConfigurationOverrides": {
+    "MonitoringConfiguration": {
+      "S3MonitoringConfiguration": {
+        "LogUri": "s3://my-bucket/emr-serverless-logs/"
+      }
+    },
+    "ApplicationConfiguration": [
+      {
+        "Classification": "hive-site",
+        "Properties": {
+          "hive.metastore.client.factory.class": "com.amazonaws.glue.catalog.metastore.AWSGlueDataCatalogHiveClientFactory"
+        }
+      }
+    ]
+  }
+}
+```
+
+## AWS 认证配置
+
+EMR Serverless 任务通过 DolphinScheduler 的 `aws.yaml` 配置文件读取 AWS 认证信息，配置路径为 `conf/aws.yaml` 中的 `aws.emr` 段。
+
+### 使用 IAM Role（推荐）
+
+如果 DolphinScheduler Worker 节点运行在 EC2 实例上并已绑定 IAM Role，配置如下：
+
+```yaml
+aws:
+  emr:
+    credentials.provider.type: InstanceProfileCredentialsProvider
+    region: us-east-1
+```
+
+### 使用 Access Key
+
+如果需要使用 AK/SK 方式认证：
+
+```yaml
+aws:
+  emr:
+    credentials.provider.type: AWSStaticCredentialsProvider
+    access.key.id: your-access-key-id
+    access.key.secret: your-secret-access-key
+    region: us-east-1
+```
+
+> **注意**：`aws.emr` 段的配置同时被 EMR on EC2 和 EMR Serverless 任务类型共享。
+
+## 作业状态流转
+
+EMR Serverless 作业提交后，DolphinScheduler 会每 10 秒轮询一次作业状态：
+
+```
+SUBMITTED → PENDING → SCHEDULED → RUNNING → SUCCESS
+                                           → FAILED
+                                           → CANCELLED
+```
+
+- 作业进入 `SUCCESS` 状态时，任务标记为成功
+- 作业进入 `FAILED` 或 `CANCELLED` 状态时，任务标记为失败
+- 如果 DolphinScheduler 任务被终止，会自动调用 [CancelJobRun API](https://docs.aws.amazon.com/emr-serverless/latest/APIReference/API_CancelJobRun.html) 取消正在运行的作业
+
+## 注意事项
+
+- **Application Id** 对应的 EMR Serverless 应用程序需要预先在 AWS 控制台或通过 API 创建，并确保处于 `STARTED` 或 `CREATED` 状态
+- **Execution Role** 需要有以下最小权限：`emr-serverless:StartJobRun`、`emr-serverless:GetJobRun`、`emr-serverless:CancelJobRun`，以及作业所需的 S3、Glue 等数据访问权限
+- `StartJobRunRequest JSON` 中无需填写 `ApplicationId` 和 `ExecutionRoleArn` 字段，系统会自动从表单参数注入
+- EMR Serverless 任务支持故障转移（Failover）：当 Worker 节点发生故障时，新的 Worker 可以通过 `appIds`（即 `jobRunId`）恢复对正在运行作业的跟踪
+