Skip to content

Commit 934d610

Browse files
committed
Support configurable maximum runtime for workflow/task instances
1 parent 4b0427d commit 934d610

8 files changed

Lines changed: 101 additions & 44 deletions

File tree

deploy/kubernetes/dolphinscheduler/values.yaml

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -565,6 +565,10 @@ master:
565565
MASTER_SERVER_LOAD_PROTECTION_MAX_SYSTEM_MEMORY_USAGE_PERCENTAGE_THRESHOLDS: 0.7
566566
# -- Master max disk usage , when the master's disk usage is smaller then this value, master server can execute workflow.
567567
MASTER_SERVER_LOAD_PROTECTION_MAX_DISK_USAGE_PERCENTAGE_THRESHOLDS: 0.7
568+
# Maximum allowed running time for a workflow instance. If the running duration exceeds this value, the instance will be killed. The default value of 0d indicates no limit.
569+
MASTER_SERVER_LOAD_PROTECTION_MAX_WORKFLOW_INSTANCE_RUNTIME: 0d
570+
# Maximum allowed running time for a task instance. If the running duration exceeds this value, the instance will be killed. The default value of 0d indicates no limit.
571+
MASTER_SERVER_LOAD_PROTECTION_MAX_TASK_INSTANCE_RUNTIME: 0d
568572
# -- Master failover interval, the unit is minute
569573
MASTER_FAILOVER_INTERVAL: "10m"
570574
# -- Master kill application when handle failover

docs/docs/en/architecture/configuration.md

Lines changed: 18 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -275,22 +275,24 @@ Location: `api-server/conf/application.yaml`
275275

276276
Location: `master-server/conf/application.yaml`
277277

278-
| Parameters | Default value | Description |
279-
|-----------------------------------------------------------------------------|------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------|
280-
| master.listen-port | 5678 | master listen port |
281-
| master.logic-task-config.task-executor-thread-count | 2 * CPU +1 | The thread size used to execute logic task |
282-
| master.worker-load-balancer-configuration-properties.type | DYNAMIC_WEIGHTED_ROUND_ROBIN | Master will use the worker's cpu/memory/threadPool usage to calculate the worker load, the lower load will have more change to be dispatched task |
283-
| master.max-heartbeat-interval | 10s | master max heartbeat interval |
284-
| master.server-load-protection.enabled | true | If set true, will open master overload protection |
285-
| master.server-load-protection.max-system-cpu-usage-percentage-thresholds | 0.8 | Master max system cpu usage, when the master's system cpu usage is smaller then this value, master server can execute workflow. |
286-
| master.server-load-protection.max-jvm-cpu-usage-percentage-thresholds | 0.8 | Master max JVM cpu usage, when the master's jvm cpu usage is smaller then this value, master server can execute workflow. |
287-
| master.server-load-protection.max-system-memory-usage-percentage-thresholds | 0.8 | Master max system memory usage , when the master's system memory usage is smaller then this value, master server can execute workflow. |
288-
| master.server-load-protection.max-disk-usage-percentage-thresholds | 0.8 | Master max disk usage , when the master's disk usage is smaller then this value, master server can execute workflow. |
289-
| master.server-load-protection.max-concurrent-workflow-instances | 2147483647 | Master max concurrent workflow instances, when the master's workflow instance count reaches or exceeds this value, master server will be marked as busy. |
290-
| master.worker-group-refresh-interval | 10s | The interval to refresh worker group from db to memory |
291-
| master.command-fetch-strategy.type | ID_SLOT_BASED | The command fetch strategy, only support `ID_SLOT_BASED` |
292-
| master.command-fetch-strategy.config.id-step | 1 | The id auto incremental step of t_ds_command in db |
293-
| master.command-fetch-strategy.config.fetch-size | 10 | The number of commands fetched by master |
278+
| Parameters | Default value | Description |
279+
|-----------------------------------------------------------------------------|------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
280+
| master.listen-port | 5678 | master listen port |
281+
| master.logic-task-config.task-executor-thread-count | 2 * CPU +1 | The thread size used to execute logic task |
282+
| master.worker-load-balancer-configuration-properties.type | DYNAMIC_WEIGHTED_ROUND_ROBIN | Master will use the worker's cpu/memory/threadPool usage to calculate the worker load, the lower load will have more change to be dispatched task |
283+
| master.max-heartbeat-interval | 10s | master max heartbeat interval |
284+
| master.server-load-protection.enabled | true | If set true, will open master overload protection |
285+
| master.server-load-protection.max-system-cpu-usage-percentage-thresholds | 0.8 | Master max system cpu usage, when the master's system cpu usage is smaller then this value, master server can execute workflow. |
286+
| master.server-load-protection.max-jvm-cpu-usage-percentage-thresholds | 0.8 | Master max JVM cpu usage, when the master's jvm cpu usage is smaller then this value, master server can execute workflow. |
287+
| master.server-load-protection.max-system-memory-usage-percentage-thresholds | 0.8 | Master max system memory usage , when the master's system memory usage is smaller then this value, master server can execute workflow. |
288+
| master.server-load-protection.max-disk-usage-percentage-thresholds | 0.8 | Master max disk usage , when the master's disk usage is smaller then this value, master server can execute workflow. |
289+
| master.server-load-protection.max-concurrent-workflow-instances | 2147483647 | Master max concurrent workflow instances, when the master's workflow instance count reaches or exceeds this value, master server will be marked as busy. |
290+
| master.server-load-protection.max-workflow-instance-runtime | 0m | Maximum allowed running time for a workflow instance. If the running duration exceeds this value, the instance will be kill. The default value of 0d indicates no limit, the min value is 1m. |
291+
| master.server-load-protection.max-task-instance-runtime | 0m | Maximum allowed running time for a task instance. If the running duration exceeds this value, the instance will be kill. The default value of 0d indicates no limit, the min value is 1m. |
292+
| master.worker-group-refresh-interval | 10s | The interval to refresh worker group from db to memory |
293+
| master.command-fetch-strategy.type | ID_SLOT_BASED | The command fetch strategy, only support `ID_SLOT_BASED` |
294+
| master.command-fetch-strategy.config.id-step | 1 | The id auto incremental step of t_ds_command in db |
295+
| master.command-fetch-strategy.config.fetch-size | 10 | The number of commands fetched by master |
294296

295297
### Worker Server related configuration
296298

docs/docs/zh/architecture/configuration.md

Lines changed: 18 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -303,22 +303,24 @@ common.properties配置文件目前主要是配置hadoop/s3/yarn/applicationId
303303

304304
位置:`worker-server/conf/application.yaml`
305305

306-
| 参数 | 默认值 | 描述 |
307-
|-----------------------------------------------------------------------------|-----------|-----------------------------------------------------------------------------------------|
308-
| worker.listen-port | 1234 | worker监听端口 |
309-
| worker.max-heartbeat-interval | 10s | worker最大心跳间隔 |
310-
| worker.host-weight | 100 | 派发任务时,worker主机的权重 |
311-
| worker.tenant-auto-create | true | 租户对应于系统的用户,由worker提交作业.如果系统没有该用户,则在参数worker.tenant.auto.create为true后自动创建。 |
312-
| worker.server-load-protection.enabled | true | 是否开启系统保护策略 |
313-
| worker.server-load-protection.max-system-cpu-usage-percentage-thresholds | 0.8 | worker最大系统cpu使用值,只有当前系统cpu使用值低于最大系统cpu使用值,worker服务才能接收任务. 默认值为0.8: 会使用80%的操作系统CPU |
314-
| worker.server-load-protection.max-jvm-cpu-usage-percentage-thresholds | 0.8 | worker最大JVM cpu使用值,只有当前JVM cpu使用值低于最大JVM cpu使用值,worker服务才能接收任务. 默认值为0.8: 会使用80%的JVM CPU |
315-
| worker.server-load-protection.max-system-memory-usage-percentage-thresholds | 0.8 | worker最大系统 内存使用值,只有当前系统内存使用值低于最大系统内存使用值,worker服务才能接收任务. 默认值为0.8: 会使用80%的操作系统内存 |
316-
| worker.server-load-protection.max-disk-usage-percentage-thresholds | 0.8 | worker最大系统磁盘使用值,只有当前系统磁盘使用值低于最大系统磁盘使用值,worker服务才能接收任务. 默认值为0.8: 会使用80%的操作系统磁盘空间 |
317-
| worker.alert-listen-host | localhost | alert监听host |
318-
| worker.alert-listen-port | 50052 | alert监听端口 |
319-
| worker.physical-task-config.task-executor-thread-size | 100 | Worker中任务最大并发度 |
320-
| worker.tenant-config.auto-create-tenant-enabled | true | 租户对应于系统的用户,由worker提交作业.如果系统没有该用户,则在参数worker.tenant.auto.create为true后自动创建。 |
321-
| worker.tenant-config.default-tenant-enabled | false | 如果设置为true, 将会使用worker服务启动用户作为 `default` 租户。 |
306+
| 默认值 | 参数 | 描述 |
307+
|-----------|-----------------------------------------------------------------------------|-----------------------------------------------------------------------------------------|
308+
| 1234 | worker.listen-port | worker监听端口 |
309+
| 10s | worker.max-heartbeat-interval | worker最大心跳间隔 |
310+
| 100 | worker.host-weight | 派发任务时,worker主机的权重 |
311+
| true | worker.tenant-auto-create | 租户对应于系统的用户,由worker提交作业.如果系统没有该用户,则在参数worker.tenant.auto.create为true后自动创建。 |
312+
| true | worker.server-load-protection.enabled | 是否开启系统保护策略 |
313+
| 0.8 | worker.server-load-protection.max-system-cpu-usage-percentage-thresholds | worker最大系统cpu使用值,只有当前系统cpu使用值低于最大系统cpu使用值,worker服务才能接收任务. 默认值为0.8: 会使用80%的操作系统CPU |
314+
| 0.8 | worker.server-load-protection.max-jvm-cpu-usage-percentage-thresholds | worker最大JVM cpu使用值,只有当前JVM cpu使用值低于最大JVM cpu使用值,worker服务才能接收任务. 默认值为0.8: 会使用80%的JVM CPU |
315+
| 0.8 | worker.server-load-protection.max-system-memory-usage-percentage-thresholds | worker最大系统 内存使用值,只有当前系统内存使用值低于最大系统内存使用值,worker服务才能接收任务. 默认值为0.8: 会使用80%的操作系统内存 |
316+
| 0.8 | worker.server-load-protection.max-disk-usage-percentage-thresholds | worker最大系统磁盘使用值,只有当前系统磁盘使用值低于最大系统磁盘使用值,worker服务才能接收任务. 默认值为0.8: 会使用80%的操作系统磁盘空间 |
317+
| 0m | master.server-load-protection.max-workflow-instance-runtime | 一个工作流实例最大的运行时间,如果超过这个时间,实例会被kill。 默认值为 0d 表示没有限制, 最小值为1分钟。 |
318+
| 0m | master.server-load-protection.max-task-instance-runtime | 一个任务实例最大的运行时间,如果超过这个时间,实例会被kill。 默认值为 0d 表示没有限制, 最小值为1分钟。 |
319+
| localhost | worker.alert-listen-host | alert监听host |
320+
| 50052 | worker.alert-listen-port | alert监听端口 |
321+
| 100 | worker.physical-task-config.task-executor-thread-size | Worker中任务最大并发度 |
322+
| true | worker.tenant-config.auto-create-tenant-enabled | 租户对应于系统的用户,由worker提交作业.如果系统没有该用户,则在参数worker.tenant.auto.create为true后自动创建。 |
323+
| false | worker.tenant-config.default-tenant-enabled | 如果设置为true, 将会使用worker服务启动用户作为 `default` 租户。 |
322324

323325
## Alert Server相关配置
324326

dolphinscheduler-master/src/main/java/org/apache/dolphinscheduler/server/master/config/MasterConfig.java

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -100,6 +100,7 @@ public void validate(Object target, Errors errors) {
100100
if (StringUtils.isEmpty(masterConfig.getMasterAddress())) {
101101
masterConfig.setMasterAddress(NetUtils.getAddr(masterConfig.getListenPort()));
102102
}
103+
serverLoadProtection.validate(errors);
103104
commandFetchStrategy.validate(errors);
104105
workerLoadBalancerConfigurationProperties.validate(errors);
105106

0 commit comments

Comments
 (0)