Skip to content

maintainer, route: add target table registry to detect route confliction#5098

Open
3AceShowHand wants to merge 16 commits into
pingcap:masterfrom
3AceShowHand:table-route-conflict-registry
Open

maintainer, route: add target table registry to detect route confliction#5098
3AceShowHand wants to merge 16 commits into
pingcap:masterfrom
3AceShowHand:table-route-conflict-registry

Conversation

@3AceShowHand
Copy link
Copy Markdown
Collaborator

@3AceShowHand 3AceShowHand commented May 19, 2026

What problem does this PR solve?

Issue Number: close #xxx

What is changed and how it works?

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No code

Questions

Will it cause performance regression or break compatibility?
Do you need to update user documentation, design documentation or monitoring documentation?

Release note

Please refer to [Release Notes Language Style Guide](https://pingcap.github.io/tidb-dev-guide/contribute-to-tidb/release-notes-style-guide.html) to write a quality release note.

If you don't think this PR needs a release note then fill it with `None`.

Summary by CodeRabbit

  • Bug Fixes

    • Improved detection and prevention of table-routing conflicts so unsafe routing changes are rejected and existing state remains consistent.
    • Routing prechecks now suppress ACK/progress when routing application would fail.
  • New Features

    • DDLs that affect routing are detected earlier and can be blocked to preserve routing correctness.
    • Controller and maintainer now track and apply table-route transitions to keep routing state in sync.
  • Tests

    • Added extensive unit tests and a new end-to-end integration script validating table-route conflict scenarios.

Review Change Stack

@ti-chi-bot ti-chi-bot Bot added do-not-merge/needs-linked-issue release-note Denotes a PR that will be considered when it comes time to generate release notes. labels May 19, 2026
@ti-chi-bot
Copy link
Copy Markdown

ti-chi-bot Bot commented May 19, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign charlescheung96 for approval. For more information see the Code Review Process.
Please ensure that each of them provides their approval before proceeding.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot ti-chi-bot Bot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label May 19, 2026
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 19, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Exports router registry types and refactors TargetTableRegistry to dual maps with Remove/ApplyTransition; adds routeConflictDetector to validate/apply DDL-driven route transitions; integrates detector into Barrier and Controller bootstrap; updates dispatch blocking and adds unit and integration tests.

Changes

Table Route Conflict Detection Infrastructure

Layer / File(s) Summary
Routing API and registry implementation
downstreamadapter/routing/router.go, downstreamadapter/routing/registry.go
Exported TableKey/RouteBinding; Router.Route and Router.HasTableRoute; TargetTableRegistry refactored to target2Bindings and source2Target, add Remove and ApplyTransition and new constructor capacity param.
Router and registry tests / callsite updates
downstreamadapter/routing/*_test.go, downstreamadapter/routing/ddl_query_rewriter.go
Router tests updated to call Router.Route; registry tests added for Remove/idempotency/reject re-targeting and ApplyTransition outcomes; DDL rewriter calls exported API.
Dispatcher blocking for route-affecting DDLs
downstreamadapter/dispatcher/basic_dispatcher.go, downstreamadapter/dispatcher/event_dispatcher_test.go
BasicDispatcher.shouldBlock blocks route-affecting DDLs when routing is enabled; adds routeAffectingDDL helper and tests.
Route conflict detector core
maintainer/route_conflict_detector.go
New detector builds and validates route transitions from DDL events, queues pending transitions, prechecks and applies transitions atomically against the registry, and reports failures once.
Route conflict detector tests
maintainer/route_conflict_detector_test.go
Tests for conflict reporting and successful registry updates, plus deterministic test helpers and a fake schema store.
Barrier integration
maintainer/barrier.go, maintainer/barrier_event.go
Barrier accepts optional routeDetector, invokes precheck/apply at resend, event completion and block-state milestones; BarrierEvent gains routeDDLInfo and safer dispatcher handling.
Controller bootstrap and registry rebuild
maintainer/maintainer_controller_bootstrap.go, maintainer/route_registry.go, maintainer/maintainer_controller.go
Controller gains routing state (targetTableRegistry, routeDetector, reportError); bootstrap initializes detector and rebuilds the registry from dispatch rules; SetErrorReporter forwards reporter to detector.
Maintainer wiring and tests updates
maintainer/maintainer.go, maintainer/barrier_test.go, maintainer/maintainer_test.go
NewMaintainer wires controller error reporter; tests updated to the new Barrier constructor parameter.
Integration test script and CI inclusion
tests/integration_tests/table_route_conflict_detection/run.sh, tests/integration_tests/run_light_it_in_ci.sh
Adds MySQL integration script exercising route conflict scenarios and adds it to CI light test group G08.

Sequence Diagram(s)

sequenceDiagram
  participant DDLEvent
  participant Dispatcher as BasicDispatcher
  participant Barrier
  participant Detector as routeConflictDetector
  participant Registry as TargetTableRegistry
  DDLEvent->>Dispatcher: shouldBlock(event)?
  alt routing enabled and route-affecting
    Dispatcher-->>Barrier: block event
  end
  Barrier->>Detector: precheck(routeDDLInfo)
  Detector->>Detector: buildTransition(drops, adds, updates)
  Detector->>Registry: ApplyTransition(validates)
  alt validation passes
    Registry-->>Detector: nil
    Detector-->>Barrier: ready
  else validation fails
    Registry-->>Detector: ErrTableRouteConflict
    Detector-->>Barrier: error (reportError)
  end
  Barrier->>Detector: apply(routeDDLInfo)
  Detector->>Registry: ApplyTransition(removes, adds)
  Registry-->>Detector: updated
  Detector-->>Barrier: success
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

  • pingcap/ticdc#4659: Introduced core routing Router and early routing wiring used by this PR.
  • pingcap/ticdc#5101: Related static-route-conflict plumbing around registry.Add and ValidateNoStaticRouteConflict referenced by this change.

Suggested labels

lgtm, approved

Suggested reviewers

  • asddongmen
  • flowbehappy
  • tenfyzhong
  • wk989898

🐰 I hop through rules and registry rows,
I precheck routes where the cold wind blows,
I build transitions, then apply,
I flag conflicts and let teams sigh,
A tidy hop — routes safe as they grow.

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Description check ⚠️ Warning The PR description is largely incomplete, using the template verbatim with unfilled placeholders, no issue reference, empty 'What is changed' section, no detailed test information, and no release notes. Only checkboxes are marked without explanations. Fill in the issue number (replace '#xxx'), provide a detailed 'What is changed and how it works?' section explaining the routing conflict detection mechanism, add test and documentation update details, and provide a meaningful release note following the style guide.
Docstring Coverage ⚠️ Warning Docstring coverage is 16.36% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly summarizes the main change: adding a target table registry to detect route conflicts, which aligns with the substantial refactoring and new functionality introduced across routing and maintainer components.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a mechanism to detect and prevent table routing conflicts where multiple logical upstream tables map to the same downstream target. It implements a TargetTableRegistry to track these mappings and integrates conflict validation into the OpenAPI changefeed management and the maintainer's DDL processing workflow. Feedback was provided regarding an optimization for counting replicas per target in the registry to improve performance from O(N) to O(1).

Comment thread downstreamadapter/routing/registry.go Outdated
Comment on lines +253 to +262
func (r *TargetTableRegistry) countReplicasForTarget(excludeReplica int64, target TargetKey) int {
count := 0
for id, b := range r.bindings {
if id != excludeReplica && b.Target == target {
count++
}
}
// Add 1 for the binding we haven't inserted yet (the one being upserted).
return count
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The countReplicasForTarget method iterates over all bindings to count replicas for a target, which is O(N). This can be optimized to O(1) by maintaining a targetReplicaCount map[TargetKey]int in the TargetTableRegistry struct to track the number of replicas per target. This would also require updating Upsert, RemoveByReplicaID, and RemoveBySchemaID to maintain this map.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (5)
maintainer/barrier_event.go (1)

232-257: 💤 Low value

Repeated Snapshot() calls inside loop may degrade performance for large registries.

be.targetRegistry.Snapshot() is called inside the loop for each blocked table ID (line 240). If there are many blocked tables and a large registry, this creates O(n × m) snapshot overhead. Consider calling Snapshot() once before the loop.

♻️ Proposed optimization
+	// Handle rename (blocked tables may change name -> new route result)
+	if be.blockedDispatchers != nil && be.blockedDispatchers.InfluenceType == heartbeatpb.InfluenceType_Normal {
+		existingBindings := be.targetRegistry.Snapshot()
 		for _, tableID := range be.blockedDispatchers.TableIDs {
 			binding, err := be.buildRouteBindingForTable(schemaStore, tableID, 0)
 			if err != nil {
 				return nil, err
 			}
 			if binding != nil {
 				// Check if the route actually changed.
-				existingBindings := be.targetRegistry.Snapshot()
 				changed := true
 				for _, existing := range existingBindings {
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@maintainer/barrier_event.go` around lines 232 - 257, The code calls
be.targetRegistry.Snapshot() inside the loop over
be.blockedDispatchers.TableIDs, causing repeated snapshots; move the snapshot
call outside the loop by obtaining existingBindings :=
be.targetRegistry.Snapshot() once before iterating, then reuse existingBindings
when comparing binding against existing.ReplicaTableID/Target/Source in the loop
(while keeping the existing buildRouteBindingForTable, transition.Removes and
transition.Adds logic unchanged).
maintainer/barrier_test.go (1)

17-18: 💤 Low value

Import ordering: internal package import should be grouped after standard library imports.

Same as in maintainer_test.go, the routing import is placed before testing (standard library).

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@maintainer/barrier_test.go` around lines 17 - 18, The import order in
maintainer/barrier_test.go is wrong: move the standard library import testing
before the internal package import
"github.com/pingcap/ticdc/downstreamadapter/routing" so imports are grouped as
standard library first, then external/internal packages; update the import block
around the testing and routing entries accordingly.
maintainer/maintainer_controller_bootstrap.go (1)

522-530: 💤 Low value

Silently skipping tables on GetTableInfo error may cause incomplete registry initialization.

Similar to the concern in buildRouteBindingForTable, when GetTableInfo fails here (lines 523-530), the table is skipped with a warning log. During bootstrap, this could result in an incomplete registry. If a skipped table later routes to a target that conflicts with another table, the conflict won't be detected.

Consider whether bootstrap should fail if any table's info cannot be retrieved, or at least document that transient schema store errors during bootstrap may cause incomplete conflict detection.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@maintainer/maintainer_controller_bootstrap.go` around lines 522 - 530, The
loop over tables currently skips table entries when
schemaStore.GetTableInfo(c.keyspaceMeta, t.TableID, startTs) returns an error,
which can silently leave the registry incomplete; change this behavior in the
bootstrap path (the loop that calls GetTableInfo) to surface errors instead of
continuing—either propagate/return the error up so bootstrap fails (preferred)
or aggregate and return a consolidated error that includes c.changefeedID.Name()
and the failing t.TableID; ensure the same semantics are applied consistently
with buildRouteBindingForTable so route/conflict detection isn't bypassed.
maintainer/maintainer_test.go (1)

17-18: 💤 Low value

Import ordering: internal package import should be grouped after standard library imports.

The routing import is placed before the context import (standard library). Go convention groups standard library imports first, then third-party, then internal packages. However, since no static analysis tool has flagged this, consider fixing it for consistency.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@maintainer/maintainer_test.go` around lines 17 - 18, The import block in
maintainer_test.go has routing
("github.com/pingcap/ticdc/downstreamadapter/routing") placed before the
standard library import context; reorder the imports so standard library
packages (e.g., context) appear first, then third-party/internal packages (e.g.,
github.com/pingcap/ticdc/downstreamadapter/routing) to match Go import grouping
and conventions used around the test functions in this file.
downstreamadapter/routing/registry_test.go (1)

115-147: ⚡ Quick win

Add an Upsert regression case for multi-replica target ownership.

Please add a focused test where two replicas from the same source share one target, then one replica is upserted to a new target; ownership of the old target should remain and a different source adding to that old target must still fail with ErrTableRouteConflict.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@downstreamadapter/routing/registry_test.go` around lines 115 - 147, Add a new
subtest in TestTargetTableRegistryUpsert that creates a registry via
NewTargetTableRegistry seeded with two RouteBinding entries from the same source
(same SourceID but different ReplicaIDs) pointing to the same target (use
makeBinding to construct them), then call r.Upsert to move one of those replicas
to a new target; assert with r.Snapshot() that the remaining replica still owns
the original target (table name unchanged) and that attempting to Upsert a
binding from a different source to that original target returns an error equal
to errors.ErrTableRouteConflict; use the same helpers (NewTargetTableRegistry,
Upsert, makeBinding, Snapshot) and equality check for ErrTableRouteConflict as
in existing tests.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@downstreamadapter/routing/registry.go`:
- Around line 146-148: The ownerSources map is being cleared prematurely because
the condition uses "<= 1" when checking
r.countReplicasForTarget(binding.ReplicaTableID, oldBinding.Target); change this
to check for zero remaining replicas (== 0) so ownership is only removed when no
replica remains; update the same logic in the other block referenced (the second
occurrence around the 253-261 region) that also calls r.countReplicasForTarget
and deletes from r.ownerSources for oldBinding.Target.

In `@maintainer/barrier_event_test.go`:
- Around line 17-18: Update every call to NewBlockEvent in this test to pass the
three new routing arguments: routeRouter (type routing.Router), targetRegistry
(type *routing.TargetTableRegistry), and routeEnabled (bool). Specifically,
locate all existing NewBlockEvent(...) usages (e.g., the calls around the
previously noted lines) and append the same routing arguments pattern used where
NewBlockEvent is called from barrier.go: pass the Router instance, the
TargetTableRegistry pointer, and the appropriate routeEnabled boolean. Ensure
the final call signature matches NewBlockEvent(..., routeRouter, targetRegistry,
routeEnabled) so compilation succeeds.

In `@maintainer/barrier_event.go`:
- Around line 404-426: The precheck error path in the block using
buildRouteTransition and ValidateReplace repeatedly returns nil,"" without
setting be.routePrecheckDone, causing endless retries; update the error paths so
that on any error from buildRouteTransition or ValidateReplace you set
be.routePrecheckDone = true to avoid log spam and then surface the error to the
changefeed failure/monitoring path (e.g., record the error on the changefeed
status or invoke the existing changefeed-failure API) or implement a bounded
retry/backoff before marking the precheck done; ensure you still set
be.routeTransition when a valid transition exists and preserve the
needsRoutePrecheck() gating logic.

---

Nitpick comments:
In `@downstreamadapter/routing/registry_test.go`:
- Around line 115-147: Add a new subtest in TestTargetTableRegistryUpsert that
creates a registry via NewTargetTableRegistry seeded with two RouteBinding
entries from the same source (same SourceID but different ReplicaIDs) pointing
to the same target (use makeBinding to construct them), then call r.Upsert to
move one of those replicas to a new target; assert with r.Snapshot() that the
remaining replica still owns the original target (table name unchanged) and that
attempting to Upsert a binding from a different source to that original target
returns an error equal to errors.ErrTableRouteConflict; use the same helpers
(NewTargetTableRegistry, Upsert, makeBinding, Snapshot) and equality check for
ErrTableRouteConflict as in existing tests.

In `@maintainer/barrier_event.go`:
- Around line 232-257: The code calls be.targetRegistry.Snapshot() inside the
loop over be.blockedDispatchers.TableIDs, causing repeated snapshots; move the
snapshot call outside the loop by obtaining existingBindings :=
be.targetRegistry.Snapshot() once before iterating, then reuse existingBindings
when comparing binding against existing.ReplicaTableID/Target/Source in the loop
(while keeping the existing buildRouteBindingForTable, transition.Removes and
transition.Adds logic unchanged).

In `@maintainer/barrier_test.go`:
- Around line 17-18: The import order in maintainer/barrier_test.go is wrong:
move the standard library import testing before the internal package import
"github.com/pingcap/ticdc/downstreamadapter/routing" so imports are grouped as
standard library first, then external/internal packages; update the import block
around the testing and routing entries accordingly.

In `@maintainer/maintainer_controller_bootstrap.go`:
- Around line 522-530: The loop over tables currently skips table entries when
schemaStore.GetTableInfo(c.keyspaceMeta, t.TableID, startTs) returns an error,
which can silently leave the registry incomplete; change this behavior in the
bootstrap path (the loop that calls GetTableInfo) to surface errors instead of
continuing—either propagate/return the error up so bootstrap fails (preferred)
or aggregate and return a consolidated error that includes c.changefeedID.Name()
and the failing t.TableID; ensure the same semantics are applied consistently
with buildRouteBindingForTable so route/conflict detection isn't bypassed.

In `@maintainer/maintainer_test.go`:
- Around line 17-18: The import block in maintainer_test.go has routing
("github.com/pingcap/ticdc/downstreamadapter/routing") placed before the
standard library import context; reorder the imports so standard library
packages (e.g., context) appear first, then third-party/internal packages (e.g.,
github.com/pingcap/ticdc/downstreamadapter/routing) to match Go import grouping
and conventions used around the test functions in this file.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 4f215250-3fd0-48a0-8d5a-31819f3eab5b

📥 Commits

Reviewing files that changed from the base of the PR and between 17e2cdf and e3f2478.

📒 Files selected for processing (13)
  • api/v2/changefeed.go
  • downstreamadapter/routing/registry.go
  • downstreamadapter/routing/registry_test.go
  • downstreamadapter/routing/router.go
  • maintainer/barrier.go
  • maintainer/barrier_event.go
  • maintainer/barrier_event_test.go
  • maintainer/barrier_test.go
  • maintainer/maintainer_controller.go
  • maintainer/maintainer_controller_bootstrap.go
  • maintainer/maintainer_test.go
  • pkg/errors/error.go
  • pkg/errors/helper.go

Comment thread downstreamadapter/routing/registry.go Outdated
Comment thread maintainer/barrier_event_test.go Outdated
Comment thread maintainer/barrier_event.go Outdated
Comment thread maintainer/barrier_event.go Outdated
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
maintainer/barrier_event.go (1)

392-414: ⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Route precheck error causes infinite retry loop without recovery.

When buildRouteTransition() (line 394) or ValidateReplace() (line 403) returns an error, the method returns nil, "" without setting routePrecheckDone = true. On subsequent resend cycles, needsRoutePrecheck() && !be.routePrecheckDone will remain true, causing the same error to be logged repeatedly with no progress.

For transient errors, this may eventually recover. For permanent conflicts (true schema conflicts), the DDL is blocked indefinitely with log spam but no actionable feedback to the user.

Consider setting routePrecheckDone = true even on error to prevent log spam, and surfacing the conflict to the changefeed error state so operators can take action.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@maintainer/barrier_event.go` around lines 392 - 414, The route precheck
currently returns early on errors from buildRouteTransition() or
targetRegistry.ValidateReplace(), leaving be.routePrecheckDone false and causing
infinite retries; update the error paths in the precheck block (the code around
needsRoutePrecheck(), buildRouteTransition(), ValidateReplace(), and where
routeTransition is set) to always set be.routePrecheckDone = true before
returning on error and also record/surface the failure to the changefeed error
state (e.g., invoke the changefeed error reporting method used elsewhere in this
file or set the changefeed status field) so permanent conflicts stop retrying
and are visible to operators.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Duplicate comments:
In `@maintainer/barrier_event.go`:
- Around line 392-414: The route precheck currently returns early on errors from
buildRouteTransition() or targetRegistry.ValidateReplace(), leaving
be.routePrecheckDone false and causing infinite retries; update the error paths
in the precheck block (the code around needsRoutePrecheck(),
buildRouteTransition(), ValidateReplace(), and where routeTransition is set) to
always set be.routePrecheckDone = true before returning on error and also
record/surface the failure to the changefeed error state (e.g., invoke the
changefeed error reporting method used elsewhere in this file or set the
changefeed status field) so permanent conflicts stop retrying and are visible to
operators.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: f238fc2a-ccdd-4081-a086-1284aba6344d

📥 Commits

Reviewing files that changed from the base of the PR and between e3f2478 and 4407045.

📒 Files selected for processing (10)
  • downstreamadapter/routing/ddl_query_rewriter.go
  • downstreamadapter/routing/registry.go
  • downstreamadapter/routing/router.go
  • downstreamadapter/routing/router_test.go
  • maintainer/barrier.go
  • maintainer/barrier_event.go
  • maintainer/barrier_event_test.go
  • maintainer/barrier_test.go
  • maintainer/maintainer_controller_bootstrap.go
  • maintainer/maintainer_test.go
🚧 Files skipped from review as they are similar to previous changes (3)
  • maintainer/maintainer_test.go
  • maintainer/maintainer_controller_bootstrap.go
  • maintainer/barrier_test.go

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (5)
downstreamadapter/dispatcher/basic_dispatcher.go (2)

935-937: ⚡ Quick win

Consider adding logging when blocking route-affecting DDLs.

The new blocking path will prevent DDL execution when routing is configured, but there's no log statement explaining why the DDL was blocked. This could make troubleshooting difficult in production environments.

📝 Suggested improvement
 		ddlEvent := event.(*commonEvent.DDLEvent)
 		if d.sharedInfo.GetRouter().HasTableRoute() && routeAffectingDDL(ddlEvent) {
+			log.Info("blocking route-affecting DDL for conflict detection",
+				zap.Stringer("dispatcher", d.id),
+				zap.Uint64("commitTs", ddlEvent.FinishedTs),
+				zap.String("query", ddlEvent.Query))
 			return true
 		}

As per coding guidelines, logs are operational signals; see docs/agents/logging.md before adding, removing, or rewriting logs.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@downstreamadapter/dispatcher/basic_dispatcher.go` around lines 935 - 937, Add
an operational log when the dispatcher blocks route-affecting DDLs: in the
branch that checks d.sharedInfo.GetRouter().HasTableRoute() &&
routeAffectingDDL(ddlEvent) (just before the return true), emit a concise logger
message (e.g., d.logger.Warn or Info) that includes the DDL event
identifier/details (ddlEvent) and that routing is enabled (from
d.sharedInfo.GetRouter()) so operators can see why the DDL was blocked; keep the
message short and non-sensitive and then return true as before.

962-970: ⚡ Quick win

Simplify boolean return and add documentation.

The function uses an if-then-return-true/return-false pattern that can be simplified to a direct boolean expression. Additionally, the function lacks documentation explaining what constitutes a "route-affecting" DDL.

♻️ Proposed refactoring
+// routeAffectingDDL returns true if the DDL event requires route conflict detection.
+// This includes DDLs that drop tables, add tables, update schemas, or change table names,
+// all of which can alter table routing mappings.
 func routeAffectingDDL(event *commonEvent.DDLEvent) bool {
-	if event.GetNeedDroppedTables() != nil ||
+	return event.GetNeedDroppedTables() != nil ||
 		len(event.GetNeedAddedTables()) > 0 ||
 		len(event.GetUpdatedSchemas()) > 0 ||
-		event.TableNameChange != nil {
-		return true
-	}
-	return false
+		event.TableNameChange != nil
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@downstreamadapter/dispatcher/basic_dispatcher.go` around lines 962 - 970,
Replace the verbose if/return pattern in routeAffectingDDL with a single boolean
return expression that directly checks the four conditions
(event.GetNeedDroppedTables() != nil, len(event.GetNeedAddedTables()) > 0,
len(event.GetUpdatedSchemas()) > 0, event.TableNameChange != nil) and add a
short documentation comment above routeAffectingDDL describing that a
"route-affecting" DDL is any DDL that drops tables, adds tables, updates
schemas, or renames a table so callers understand the criteria.
downstreamadapter/dispatcher/event_dispatcher_test.go (1)

71-108: ⚡ Quick win

Consider adding test cases for all route-affecting DDL conditions.

The test covers NeedAddedTables and TableNameChange scenarios but doesn't test the NeedDroppedTables and UpdatedSchemas code paths in the routeAffectingDDL() helper. While the current coverage validates the core blocking logic, testing all four conditions would ensure complete coverage of the helper function.

🧪 Suggested additional test cases
 	require.False(t, newDispatcher(router).shouldBlock(regularSingleTableDDL))
+
+	dropTable := &commonEvent.DDLEvent{
+		NeedDroppedTables: &commonEvent.InfluencedTables{
+			InfluenceType: commonEvent.InfluenceTypeNormal,
+			TableIDs:      []int64{1},
+		},
+	}
+	require.True(t, newDispatcher(router).shouldBlock(dropTable))
+
+	schemaChange := &commonEvent.DDLEvent{
+		UpdatedSchemas: []*commonEvent.SchemaIDChange{
+			{TableID: 1, OldSchemaID: 1, NewSchemaID: 2},
+		},
+	}
+	require.True(t, newDispatcher(router).shouldBlock(schemaChange))
 }

As per coding guidelines, prefer focused deterministic tests; see docs/agents/testing.md before adding or changing tests.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@downstreamadapter/dispatcher/event_dispatcher_test.go` around lines 71 - 108,
Add two focused assertions in TestShouldBlockRouteAffectingDDL to cover the
missing route-affecting DDL paths: construct a DDLEvent with NeedDroppedTables
populated (e.g., NeedDroppedTables: []commonEvent.Table{{SchemaID:1,TableID:2}})
and assert newDispatcher(router).shouldBlock(...) returns true (and false for
newDispatcher(routing.Router{}) as with NeedAddedTables); then construct a
DDLEvent exercising UpdatedSchemas (populate UpdatedSchemas with the same kind
of entries or schema IDs used by routeAffectingDDL) and assert shouldBlock
returns true with router and false with an empty router. Reference
TestShouldBlockRouteAffectingDDL, routeAffectingDDL, NeedDroppedTables, and
UpdatedSchemas when adding these cases.
maintainer/barrier.go (1)

541-546: 💤 Low value

Consistent nil-guard returns but asymmetric semantics.

Both precheckRouteEvent (returns (true, nil)) and applyRouteEvent (returns nil) short-circuit when routeDetector is nil. While the nil guards are correct, the return-value semantics differ: precheck signals "ready" by returning true, apply signals "success" by returning nil. This is internally consistent with Go conventions (bool for readiness, error for failure), but consider adding a brief comment on line 542 noting that true, nil means "no conflict detector configured, proceed" to improve readability for future maintainers unfamiliar with the routing subsystem.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@maintainer/barrier.go` around lines 541 - 546, Add a short clarifying comment
in the precheckRouteEvent function noting that returning true, nil is
intentional and means "no routeDetector configured — proceed" to mirror
applyRouteEvent's nil short-circuit semantics; e.g., place the comment above the
nil check in precheckRouteEvent and reference that it intentionally signals
readiness when routeDetector is nil (consistent with applyRouteEvent returning
nil on the same condition).
maintainer/route_conflict_detector.go (1)

287-290: ⚡ Quick win

Use repository errors instead of raw fmt.Errorf.

These branches bypass the repo's error taxonomy, so upstream code can't classify or wrap detector failures consistently the way it can for errors.ErrTableRouteConflict above.

As per coding guidelines, "Use predefined repository errors; see docs/agents/error-handling.md before changing error creation, wrapping, or propagation."

Also applies to: 305-307, 330-334

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@maintainer/route_conflict_detector.go` around lines 287 - 290, Replace raw
fmt.Errorf usages in the route_conflict_detector loop and the other noted
branches with the repository's predefined error types (e.g.,
errors.ErrTableRouteConflict) so callers can classify failures; for each place
that currently returns fmt.Errorf("route registry binding not found for table
%d", change.TableID) (and the similar cases at the later branches around the
code handling d.tables and route lookups), return the appropriate repository
error from the errors package and include the table id as context by wrapping or
annotating the repository error using the project's error-wrapping helper
(maintain the original context but do not introduce new fmt.Errorf-based
errors). Ensure you update the return statements in the loop over
info.updatedSchema (reference variable names change.TableID, existing, d.tables)
and the similar branches mentioned so they consistently use
errors.ErrTableRouteConflict (or the exact repo error that matches "route
registry binding not found") with contextual wrapping.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@maintainer/barrier.go`:
- Around line 427-432: The code path where event.allDispatcherReported() is true
calls b.precheckRouteEvent(event) and on err || !ready returns (event, nil, "",
false) but does not remove the event from b.blockedEvents; make this behavior
symmetric with the non-blocked path by deleting the event from b.blockedEvents
when precheckRouteEvent fails or returns not-ready (and likewise ensure you
delete on applyRouteEvent failures if applicable), or explicitly document/handle
why blocked events should persist; update the branch that calls
precheckRouteEvent (and any references to blockedEvents, needACK,
precheckRouteEvent, applyRouteEvent) so failed prechecks either remove the event
from blockedEvents before returning or intentionally keep it with a clear
comment and state-machine assertion.

In `@maintainer/maintainer_controller_bootstrap.go`:
- Around line 470-474: The comment "// findHoles returns an array of Span that
are not covered in the range" is misplaced above initializeRouteRegistry; move
that doc comment so it immediately precedes the findHoles function (the function
named findHoles) or remove it entirely, ensuring initializeRouteRegistry's
godoc/comment block is directly above initializeRouteRegistry and reads cleanly;
verify no other stray comments referencing findHoles remain.

In `@maintainer/route_conflict_detector.go`:
- Around line 213-220: needsCheck currently ignores blockTables when TableIDs is
empty, causing DB-wide or global (All) block events to be skipped; update
routeConflictDetector.needsCheck to also return true when info.blockTables is
non-nil and its scope indicates DB or All (even if TableIDs is empty) by
checking the blockTables scope/level field. Also update buildTransition to not
only reevaluate heartbeatpb.InfluenceType_Normal: ensure buildTransition
handles/schema-wide and global influences emitted by blockTables (e.g., treat
DB/All scope as impacting other InfluenceType values and trigger the registry
reevaluation path), so schema/global route-affecting DDLs cause registry
updates.

In `@maintainer/route_registry_test.go`:
- Around line 27-101: newRegistryTestTable currently only sets SchemaTableName,
but rebuildTargetTableRegistry and router.Route use the top-level
SchemaName/TableName fields, so tests exercise routing with empty names; update
newRegistryTestTable to also set Table.SchemaName and Table.TableName to the
provided schema and table values (in addition to SchemaTableName) so the "builds
current target owners" and "fails on route conflict" cases exercise the actual
matcher logic.

---

Nitpick comments:
In `@downstreamadapter/dispatcher/basic_dispatcher.go`:
- Around line 935-937: Add an operational log when the dispatcher blocks
route-affecting DDLs: in the branch that checks
d.sharedInfo.GetRouter().HasTableRoute() && routeAffectingDDL(ddlEvent) (just
before the return true), emit a concise logger message (e.g., d.logger.Warn or
Info) that includes the DDL event identifier/details (ddlEvent) and that routing
is enabled (from d.sharedInfo.GetRouter()) so operators can see why the DDL was
blocked; keep the message short and non-sensitive and then return true as
before.
- Around line 962-970: Replace the verbose if/return pattern in
routeAffectingDDL with a single boolean return expression that directly checks
the four conditions (event.GetNeedDroppedTables() != nil,
len(event.GetNeedAddedTables()) > 0, len(event.GetUpdatedSchemas()) > 0,
event.TableNameChange != nil) and add a short documentation comment above
routeAffectingDDL describing that a "route-affecting" DDL is any DDL that drops
tables, adds tables, updates schemas, or renames a table so callers understand
the criteria.

In `@downstreamadapter/dispatcher/event_dispatcher_test.go`:
- Around line 71-108: Add two focused assertions in
TestShouldBlockRouteAffectingDDL to cover the missing route-affecting DDL paths:
construct a DDLEvent with NeedDroppedTables populated (e.g., NeedDroppedTables:
[]commonEvent.Table{{SchemaID:1,TableID:2}}) and assert
newDispatcher(router).shouldBlock(...) returns true (and false for
newDispatcher(routing.Router{}) as with NeedAddedTables); then construct a
DDLEvent exercising UpdatedSchemas (populate UpdatedSchemas with the same kind
of entries or schema IDs used by routeAffectingDDL) and assert shouldBlock
returns true with router and false with an empty router. Reference
TestShouldBlockRouteAffectingDDL, routeAffectingDDL, NeedDroppedTables, and
UpdatedSchemas when adding these cases.

In `@maintainer/barrier.go`:
- Around line 541-546: Add a short clarifying comment in the precheckRouteEvent
function noting that returning true, nil is intentional and means "no
routeDetector configured — proceed" to mirror applyRouteEvent's nil
short-circuit semantics; e.g., place the comment above the nil check in
precheckRouteEvent and reference that it intentionally signals readiness when
routeDetector is nil (consistent with applyRouteEvent returning nil on the same
condition).

In `@maintainer/route_conflict_detector.go`:
- Around line 287-290: Replace raw fmt.Errorf usages in the
route_conflict_detector loop and the other noted branches with the repository's
predefined error types (e.g., errors.ErrTableRouteConflict) so callers can
classify failures; for each place that currently returns fmt.Errorf("route
registry binding not found for table %d", change.TableID) (and the similar cases
at the later branches around the code handling d.tables and route lookups),
return the appropriate repository error from the errors package and include the
table id as context by wrapping or annotating the repository error using the
project's error-wrapping helper (maintain the original context but do not
introduce new fmt.Errorf-based errors). Ensure you update the return statements
in the loop over info.updatedSchema (reference variable names change.TableID,
existing, d.tables) and the similar branches mentioned so they consistently use
errors.ErrTableRouteConflict (or the exact repo error that matches "route
registry binding not found") with contextual wrapping.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: ec0a5288-fa9e-498b-8cb8-cb82ad284642

📥 Commits

Reviewing files that changed from the base of the PR and between 4407045 and e27ec8a.

📒 Files selected for processing (18)
  • downstreamadapter/dispatcher/basic_dispatcher.go
  • downstreamadapter/dispatcher/event_dispatcher_test.go
  • downstreamadapter/routing/ddl_query_rewriter.go
  • downstreamadapter/routing/registry.go
  • downstreamadapter/routing/registry_test.go
  • downstreamadapter/routing/router.go
  • downstreamadapter/routing/router_test.go
  • maintainer/barrier.go
  • maintainer/barrier_event.go
  • maintainer/barrier_test.go
  • maintainer/maintainer.go
  • maintainer/maintainer_controller.go
  • maintainer/maintainer_controller_bootstrap.go
  • maintainer/maintainer_test.go
  • maintainer/route_conflict_detector.go
  • maintainer/route_conflict_detector_test.go
  • maintainer/route_registry.go
  • maintainer/route_registry_test.go

Comment thread maintainer/barrier.go
Comment on lines +427 to +432
if event.allDispatcherReported() {
ready, err := b.precheckRouteEvent(event)
if err != nil || !ready {
return event, nil, "", false
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Asymmetric error handling: precheck failures suppress ack but leave event in blockedEvents.

When precheckRouteEvent fails or returns not-ready at lines 428-430, the function returns false for needACK but does not remove the event from blockedEvents. In contrast, the non-blocked path at lines 451-456 deletes the event on precheck/apply failure. This asymmetry means blocked events that fail precheck will remain in the map and will be retried on the next dispatcher resend, while non-blocked events are immediately discarded. Confirm this difference is intentional and reflects the correct state-machine semantics for the two paths.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@maintainer/barrier.go` around lines 427 - 432, The code path where
event.allDispatcherReported() is true calls b.precheckRouteEvent(event) and on
err || !ready returns (event, nil, "", false) but does not remove the event from
b.blockedEvents; make this behavior symmetric with the non-blocked path by
deleting the event from b.blockedEvents when precheckRouteEvent fails or returns
not-ready (and likewise ensure you delete on applyRouteEvent failures if
applicable), or explicitly document/handle why blocked events should persist;
update the branch that calls precheckRouteEvent (and any references to
blockedEvents, needACK, precheckRouteEvent, applyRouteEvent) so failed prechecks
either remove the event from blockedEvents before returning or intentionally
keep it with a clear comment and state-machine assertion.

Comment on lines 470 to +474
// findHoles returns an array of Span that are not covered in the range
// initializeRouteRegistry constructs the maintainer-owned route conflict detector
// from the current set of tables. If no dispatch rules contain routing targets,
// routeDetector stays nil.
func (c *Controller) initializeRouteRegistry(tables []commonEvent.Table, startTs uint64) error {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Misplaced comment for findHoles now precedes initializeRouteRegistry.

The line 470 comment // findHoles returns an array of Span that are not covered in the range describes findHoles (defined at line 489) but is now stranded above initializeRouteRegistry. Move the orphaned findHoles doc back next to its function (or delete it) so the initializeRouteRegistry godoc reads cleanly.

📝 Proposed fix
-// findHoles returns an array of Span that are not covered in the range
 // initializeRouteRegistry constructs the maintainer-owned route conflict detector
 // from the current set of tables. If no dispatch rules contain routing targets,
 // routeDetector stays nil.
 func (c *Controller) initializeRouteRegistry(tables []commonEvent.Table, startTs uint64) error {
@@
 }

+// findHoles returns an array of Span that are not covered in the range
 func findHoles(currentSpan utils.Map[*heartbeatpb.TableSpan, *replica.SpanReplication], totalSpan *heartbeatpb.TableSpan) []*heartbeatpb.TableSpan {
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@maintainer/maintainer_controller_bootstrap.go` around lines 470 - 474, The
comment "// findHoles returns an array of Span that are not covered in the
range" is misplaced above initializeRouteRegistry; move that doc comment so it
immediately precedes the findHoles function (the function named findHoles) or
remove it entirely, ensuring initializeRouteRegistry's godoc/comment block is
directly above initializeRouteRegistry and reads cleanly; verify no other stray
comments referencing findHoles remain.

Comment on lines +213 to +220
func (d *routeConflictDetector) needsCheck(info routeDDLInfo) bool {
if d == nil || d.registry == nil || info.isSyncPoint {
return false
}
if info.droppedTables != nil || len(info.addedTables) > 0 || len(info.updatedSchema) > 0 {
return true
}
return info.blockTables != nil && len(info.blockTables.TableIDs) > 0
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

blockTables events with DB/All scope are skipped.

needsCheck only admits blockTables when TableIDs is non-empty, and buildTransition only reevaluates heartbeatpb.InfluenceType_Normal. A schema-wide or global route-affecting DDL therefore becomes a no-op here, so the registry never gets updated and conflicts can be missed.

Also applies to: 300-320

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@maintainer/route_conflict_detector.go` around lines 213 - 220, needsCheck
currently ignores blockTables when TableIDs is empty, causing DB-wide or global
(All) block events to be skipped; update routeConflictDetector.needsCheck to
also return true when info.blockTables is non-nil and its scope indicates DB or
All (even if TableIDs is empty) by checking the blockTables scope/level field.
Also update buildTransition to not only reevaluate
heartbeatpb.InfluenceType_Normal: ensure buildTransition handles/schema-wide and
global influences emitted by blockTables (e.g., treat DB/All scope as impacting
other InfluenceType values and trigger the registry reevaluation path), so
schema/global route-affecting DDLs cause registry updates.

Comment thread maintainer/route_registry_test.go Outdated
@3AceShowHand
Copy link
Copy Markdown
Collaborator Author

/test all

@3AceShowHand
Copy link
Copy Markdown
Collaborator Author

/test all

@3AceShowHand
Copy link
Copy Markdown
Collaborator Author

/test next-gen

@ti-chi-bot
Copy link
Copy Markdown

ti-chi-bot Bot commented May 22, 2026

@3AceShowHand: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-cdc-storage-integration-light-next-gen 017242c link false /test pull-cdc-storage-integration-light-next-gen
pull-cdc-storage-integration-light 017242c link true /test pull-cdc-storage-integration-light
pull-cdc-mysql-integration-light 017242c link true /test pull-cdc-mysql-integration-light
pull-cdc-mysql-integration-light-next-gen 017242c link false /test pull-cdc-mysql-integration-light-next-gen
pull-cdc-kafka-integration-light-next-gen 017242c link false /test pull-cdc-kafka-integration-light-next-gen
pull-cdc-kafka-integration-light 017242c link true /test pull-cdc-kafka-integration-light

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@ti-chi-bot
Copy link
Copy Markdown

ti-chi-bot Bot commented May 25, 2026

[FORMAT CHECKER NOTIFICATION]

Notice: To remove the do-not-merge/needs-linked-issue label, please provide the linked issue number on one line in the PR body, for example: Issue Number: close #123 or Issue Number: ref #456.

📖 For more info, you can check the "Contribute Code" section in the development guide.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

do-not-merge/needs-linked-issue release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant