server, dispatcher: improve node liveness self fence#5106
Conversation
Signed-off-by: dongmen <414110582@qq.com>
Signed-off-by: dongmen <414110582@qq.com>
|
Skipping CI for Draft Pull Request. |
|
[FORMAT CHECKER NOTIFICATION] Notice: To remove the 📖 For more info, you can check the "Contribute Code" section in the development guide. |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
Important Review skippedDraft detected. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
This pull request introduces a local fencing mechanism to ensure that downstream writes are stopped immediately when a capture loses its etcd session or lease. This is achieved by adding a session watchdog in the server and implementing a LocalFence method across the DispatcherOrchestrator and DispatcherManager to bypass graceful draining in failure scenarios. The review feedback identifies potential nil pointer dereferences in the DispatcherManager's shutdown logic, specifically regarding the redoSink when redo logging is enabled.
| if e.IsRedoEnabled() { | ||
| closeAllDispatchers(e.changefeedID, e.redoDispatcherMap, e.redoSink.SinkType()) |
There was a problem hiding this comment.
Potential nil pointer dereference. If redoEnabled is true but the redoSink failed to initialize or is not yet set (e.g., during a failed startup sequence), calling e.redoSink.SinkType() will cause a panic. Given that e.sink is explicitly checked for nil below, e.redoSink should also be guarded.
| if e.IsRedoEnabled() { | |
| closeAllDispatchers(e.changefeedID, e.redoDispatcherMap, e.redoSink.SinkType()) | |
| if e.IsRedoEnabled() && e.redoSink != nil { | |
| closeAllDispatchers(e.changefeedID, e.redoDispatcherMap, e.redoSink.SinkType()) |
| if e.IsRedoEnabled() { | ||
| e.redoSink.Close() |
There was a problem hiding this comment.
Potential nil pointer dereference. Similar to the pattern used for e.sink, e.redoSink should be checked for nil before calling Close() to ensure that cleanup of partially initialized managers does not result in a panic.
| if e.IsRedoEnabled() { | |
| e.redoSink.Close() | |
| if e.IsRedoEnabled() && e.redoSink != nil { | |
| e.redoSink.Close() | |
| } |
|
/test all |
|
@asddongmen: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
What problem does this PR solve?
Issue Number: close #xxx
What is changed and how it works?
Check List
Tests
Questions
Will it cause performance regression or break compatibility?
Do you need to update user documentation, design documentation or monitoring documentation?
Release note