Skip to content

kafka: bump sarama version and enable the retry to fix the broken pipe and out of order (#5359)#5370

Merged
ti-chi-bot[bot] merged 9 commits into
pingcap:release-8.5from
ti-chi-bot:cherry-pick-5359-to-release-8.5
Jun 15, 2026
Merged

kafka: bump sarama version and enable the retry to fix the broken pipe and out of order (#5359)#5370
ti-chi-bot[bot] merged 9 commits into
pingcap:release-8.5from
ti-chi-bot:cherry-pick-5359-to-release-8.5

Conversation

@ti-chi-bot

Copy link
Copy Markdown
Member

This is an automated cherry-pick of #5359

What problem does this PR solve?

Issue Number: close #1920

ref pingcap/tiflow#12618

Kafka sink can hit stale broker connections and return errors such as broken pipe. The old TiCDC-side Kafka heartbeat sent ApiVersions periodically and ignored errors, so it added background traffic but did not repair a bad connection. Producer retry was also disabled or overridden in different places because older Sarama retry behavior could reorder messages.

What is changed and how it works?

This PR migrates the relevant Kafka sink changes from pingcap/tiflow#12618:

  • Bump the PingCAP Sarama fork to v1.41.2-pingcap-20260508, which includes the partition-muting ordering fix.
  • Remove TiCDC's Kafka application-level heartbeat from the DML, DDL/checkpoint, admin, and topic manager paths.
  • Initialize config.Producer.Retry.Max from Kafka sink options for all Kafka producers.
  • Add a max-retry sink URI parameter. Non-negative values are accepted; negative values are ignored and keep the default.
  • Set the default Kafka producer retry budget to 5.
  • Keep Net.MaxOpenRequests = 1 as an extra ordering guard.

Check List

Tests

  • Unit test:
    • go test --tags=intest ./pkg/sink/kafka ./downstreamadapter/sink/kafka ./downstreamadapter/sink/topicmanager

Questions

Will it cause performance regression or break compatibility?
  • No performance regression
  • No break compatibility since it's internal mechanism change.
Do you need to update user documentation, design documentation or monitoring documentation?

Release note

Kafka sink now retries transient producer send failures by default.

Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>
@ti-chi-bot ti-chi-bot added do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. lgtm release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. type/cherry-pick-for-release-8.5 This PR is cherry-picked to release-8.5 from a source PR. labels Jun 12, 2026
@ti-chi-bot

Copy link
Copy Markdown
Member Author

@3AceShowHand This PR has conflicts, I have hold it.
Please resolve them or ask others to resolve them, then comment /unhold to remove the hold label.

@ti-chi-bot

ti-chi-bot Bot commented Jun 12, 2026

Copy link
Copy Markdown

@ti-chi-bot: ## If you want to know how to resolve it, please read the guide in TiDB Dev Guide.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository.

@coderabbitai

coderabbitai Bot commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 8bc968a8-00b4-48b6-bfb1-906733383f6a

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request removes the heartbeat mechanisms from the Kafka sink, producers, and topic managers, and introduces a configurable max-retry option for Kafka producers via the sink URI. However, there are critical unresolved merge conflicts in go.mod, go.sum, and pkg/sink/kafka/options_test.go that must be resolved before this PR can be merged.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread go.mod Outdated
Comment thread go.sum Outdated
Comment thread pkg/sink/kafka/options_test.go Outdated
@3AceShowHand

Copy link
Copy Markdown
Collaborator

/test all

@ti-chi-bot ti-chi-bot Bot added cherry-pick-approved Cherry pick PR approved by release team. and removed do-not-merge/cherry-pick-not-approved labels Jun 15, 2026
@3AceShowHand

Copy link
Copy Markdown
Collaborator

/test all

@ti-chi-bot

ti-chi-bot Bot commented Jun 15, 2026

Copy link
Copy Markdown

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: 3AceShowHand

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot ti-chi-bot Bot added the approved label Jun 15, 2026
@3AceShowHand

Copy link
Copy Markdown
Collaborator

/unhold

@ti-chi-bot ti-chi-bot Bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 15, 2026
@3AceShowHand

Copy link
Copy Markdown
Collaborator

/test all

@3AceShowHand

Copy link
Copy Markdown
Collaborator

/test all

@ti-chi-bot ti-chi-bot Bot merged commit 0cddc1d into pingcap:release-8.5 Jun 15, 2026
20 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved cherry-pick-approved Cherry pick PR approved by release team. lgtm release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. type/cherry-pick-for-release-8.5 This PR is cherry-picked to release-8.5 from a source PR.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants