Skip to content

Commit 09d15a0

Browse files
[Doc-16755] Improve architecture description with JDBC and Etcd supportImprove design doc (#17092)
1 parent 778bab0 commit 09d15a0

1 file changed

Lines changed: 56 additions & 8 deletions

File tree

docs/docs/en/architecture/design.md

Lines changed: 56 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,7 @@
2222

2323
### Architecture Description
2424

25-
* **MasterServer**
25+
- **MasterServer**
2626

2727
MasterServer adopts a distributed and decentralized design concept. MasterServer is mainly responsible for DAG task segmentation, task submission monitoring, and monitoring the health status of other MasterServer and WorkerServer at the same time.
2828
When the MasterServer service starts, register a temporary node with ZooKeeper, and perform fault tolerance by monitoring changes in the temporary node of ZooKeeper.
@@ -44,7 +44,7 @@
4444

4545
- **FailoverExecuteThread** is mainly responsible for the logic of Master fault tolerance and Worker fault tolerance;
4646

47-
* **WorkerServer**
47+
- **WorkerServer**
4848

4949
WorkerServer also adopts a distributed and decentralized design concept. WorkerServer is mainly responsible for task execution and providing log services.
5050

@@ -59,21 +59,70 @@
5959

6060
- **RetryReportTaskStatusThread** is mainly responsible for regularly polling to report the task status to the Master until the Master replies to the status ack to avoid the loss of the task status;
6161

62-
* **ZooKeeper**
62+
- **ZooKeeper**
6363

64-
ZooKeeper service, MasterServer and WorkerServer nodes in the system all use ZooKeeper for cluster management and fault tolerance. In addition, the system implements event monitoring and distributed locks based on ZooKeeper.
64+
ZooKeeper service, MasterServer and WorkerServer nodes in the system all use ZooKeeper for cluster management and fault tolerance. With evolving needs and modern deployment environments, DolphinScheduler now supports event monitoring and distributed locks not only based on ZooKeeper, but also on **JDBC** and **Etcd** implementations.
65+
66+
- **JDBC**
67+
DolphinScheduler also provides a JDBC-based registry implementation, located in the `dolphinscheduler-registry/dolphinscheduler-registry-plugins/dolphinscheduler-registry-jdbc` module. Unlike external systems such as ZooKeeper or Etcd, the JDBC approach leverages a relational database to support event monitoring and distributed locking, making it well-suited for environments that already rely on SQL databases.
68+
69+
- **Event Monitoring**
70+
71+
- **Subscribe Method**
72+
The `subscribe(String watchedPath, SubscribeListener listener)` method in `JdbcRegistry` registers a data change listener using the `JdbcRegistryDataChangeListenerAdapter`. When changes (such as creation, update, or deletion) occur for the specified key or path in the database, the adapter converts these changes into DolphinScheduler `Event` notifications and triggers the `SubscribeListener` callback.
73+
74+
- **Polling/Trigger Mechanism**
75+
Internally, the system uses periodic polling or a trigger-based mechanism to detect changes in the registry data stored in the database, simulating a Watcher-like behavior similar to ZooKeeper.
76+
77+
- **Distributed Lock**
78+
79+
- **Lock Acquisition and Release**
80+
The JDBC registry offers both `acquireLock(String key)` and `acquireLock(String key, long timeout)` methods, which correspond to blocking and timeout-based lock acquisition respectively. These methods internally call `JdbcRegistryClient.acquireJdbcRegistryLock(...)` to manage locks via database records, ensuring mutual exclusion in a distributed environment.
81+
82+
- **Ephemeral vs. Persistent Locks**
83+
Data entries are classified as either **EPHEMERAL** or **PERSISTENT**. For ephemeral locks, if the client disconnects or fails, heartbeat mechanisms detect the lapse and clean up the lock record automatically, thus releasing the lock.
84+
85+
- **Lock Management**
86+
Under the hood, components like `JdbcRegistryLockManager` (or equivalent) use row-level locking or specific database fields to ensure atomic lock operations, maintaining consistency even when multiple masters/workers compete for the same lock.
87+
88+
***
89+
90+
By leveraging JDBC for both **event monitoring** and **distributed locking**, DolphinScheduler can achieve reliable task coordination and scheduling without relying on external registry centers, making it an attractive option for environments that prefer or already have robust database infrastructure.
91+
92+
- **Etcd**
93+
94+
DolphinScheduler also provides an Etcd-based registry implementation. The Etcd-based registry, implemented in the module `dolphinscheduler-registry/dolphinscheduler-registry-plugins/dolphinscheduler-registry-etcd`, leverages the Jetcd client library to interact with an Etcd cluster. This implementation provides several key functionalities:
95+
96+
- **Event Monitoring**
97+
- **Watch API**
98+
The `EtcdRegistry` class uses Etcd’s Watch API to observe changes (creation, update, or deletion) on specified keys or key prefixes. Low-level Etcd watch events are translated into DolphinScheduler’s `Event` objects, triggering `SubscribeListener` callbacks for real-time notifications.
99+
- **Distributed Lock**
100+
- **Lease-Based Locking**
101+
The `EtcdKeepAliveLeaseManager` grants a lease with a specified TTL, continuously kept alive via Etcd’s keep-alive mechanism. If the client disconnects, the lease expires automatically, releasing the lock without manual intervention.
102+
103+
- **Connection Health Monitoring**
104+
The `EtcdConnectionStateListener` tracks the connection state between DolphinScheduler and the Etcd cluster. Upon disconnection or reconnection, it re-establishes locks or re-registers services as needed.
105+
106+
- **Configuration**
107+
108+
- **Flexible Configuration**
109+
The behavior of the Etcd registry is controlled by `EtcdRegistryProperties`, which maps various settings (endpoints, namespace, SSL, authentication, etc.) from configuration files. These settings are integrated into the Spring Boot auto-configuration process via `EtcdRegistryAutoConfiguration`, ensuring that the Etcd registry is instantiated automatically when `registry.type` is set to `"etcd"`.
110+
111+
Together, these components ensure that DolphinScheduler can reliably use Etcd as an alternative registry center. This is especially useful in cloud-native environments where low latency, high scalability, and ease of deployment are critical.
112+
113+
***
65114

66115
We have also implemented queues based on Redis, but we hope DolphinScheduler depends on as few components as possible, so we finally removed the Redis implementation.
67116

68-
* **AlertServer**
117+
- **AlertServer**
69118

70119
Provides alarm services, and implements rich alarm methods through alarm plugins.
71120

72-
* **API**
121+
- **API**
73122

74123
The API interface layer is mainly responsible for processing requests from the front-end UI layer. The service uniformly provides RESTful APIs to provide request services to external.
75124

76-
* **UI**
125+
- **UI**
77126

78127
The front-end page of the system provides various visual operation interfaces of the system, see more at [Introduction to Functions](../guide/homepage.md) section.
79128

@@ -222,4 +271,3 @@ In the early schedule design, if there is no priority design and use the fair sc
222271
## Sum Up
223272

224273
From the perspective of scheduling, this article preliminarily introduces the architecture principles and implementation ideas of the big data distributed workflow scheduling system: DolphinScheduler. To be continued.
225-

0 commit comments

Comments
 (0)