Skip to content

[Improvement-16994][TaskPlugin] support retry for every api call for serverless spark#17476

Merged
EricGao888 merged 5 commits intoapache:devfrom
abzymeatsjtu:feat_serverless_spark_plugin_opt_1
Sep 8, 2025
Merged

[Improvement-16994][TaskPlugin] support retry for every api call for serverless spark#17476
EricGao888 merged 5 commits intoapache:devfrom
abzymeatsjtu:feat_serverless_spark_plugin_opt_1

Conversation

@abzymeatsjtu
Copy link
Copy Markdown
Contributor

@abzymeatsjtu abzymeatsjtu commented Sep 4, 2025

support retry for every api call for EMR Serverless Spark, will improve the robustness of this task plugin against some temporarily malfunction of remote service

part of #16994

@github-actions github-actions Bot added the test label Sep 4, 2025
Comment on lines +151 to +157
StartJobRunRequest startJobRunRequest = buildStartJobRunRequest(aliyunServerlessSparkParameters);
StartJobRunResponse startJobRunResponse = RetryUtils.retryFunction(() -> {
try {
return aliyunServerlessSparkClient.startJobRun(
aliyunServerlessSparkParameters.getWorkspaceId(), startJobRunRequest);
} catch (Exception e) {
throw new AliyunServerlessSparkTaskException("Failed to start job run!");
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There seems to exist timeout issue here, if the http is timeout, then client will retry, but server side might already handle the previous, then will cause the request be handled twice. I'm unsure whether the service side has implemented idempotency handling. Because a new token is passed here each time, so the server side cannot know the second request are retry.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There seems to exist timeout issue here, if the http is timeout, then client will retry, but server side might already handle the previous, then will cause the request be handled twice. I'm unsure whether the service side has implemented idempotency handling. Because a new token is passed here each time, so the server side cannot know the second request are retry.

@ruanwenjun @abzymeatsjtu Looks like the token is generated and set when initializing the request line#257, therefore, I assume the idempotency is alright here?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I missed that. If the server-side is capable of implementing idempotency handling via tokens, that would be a great solution.

@EricGao888 EricGao888 added this to the 4.0.0-alpha milestone Sep 5, 2025
Copy link
Copy Markdown
Member

@ruanwenjun ruanwenjun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ruanwenjun ruanwenjun added the improvement make more easy to user or prompt friendly label Sep 6, 2025
@sonarqubecloud
Copy link
Copy Markdown

sonarqubecloud Bot commented Sep 8, 2025

Quality Gate Failed Quality Gate failed

Failed conditions
C Reliability Rating on New Code (required ≥ A)

See analysis details on SonarQube Cloud

Catch issues before they fail your Quality Gate with our IDE extension SonarQube for IDE

@EricGao888 EricGao888 merged commit 4ec7c4b into apache:dev Sep 8, 2025
80 of 117 checks passed
davidzollo pushed a commit to davidzollo/dolphinscheduler that referenced this pull request Oct 27, 2025
…serverless spark (apache#17476)

* [Improvement-16994][TaskPlugin] support retry for every api call for serverless spark

---------

Co-authored-by: sunyifan.syf <sunyifan.syf@alibaba-inc.com>
Co-authored-by: Eric Gao <ericgao.apache@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backend improvement make more easy to user or prompt friendly test

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants