[ISSUE #9791] Prefer original producer when checking transaction state#10564
[ISSUE #9791] Prefer original producer when checking transaction state#10564wang-jiahua wants to merge 1 commit into
Conversation
There was a problem hiding this comment.
Pull request overview
This PR addresses incorrect transactional check-back routing when multiple producers share a producer group by persisting the original producer’s clientId on the half message and preferring that producer’s channel during transaction state checks (with round-robin fallback).
Changes:
- Added a new message property to record the originating transactional producer
clientId. - Populated the new property during
sendMessageInTransactionand updated broker-side transaction check logic to prefer the matching producer channel. - Added unit tests around preferred-channel selection and fallback behavior.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| common/src/main/java/org/apache/rocketmq/common/message/MessageConst.java | Introduces a new transaction-related message property key for producer clientId. |
| client/src/main/java/org/apache/rocketmq/client/impl/producer/DefaultMQProducerImpl.java | Writes the originating producer clientId into transactional (half) messages. |
| broker/src/main/java/org/apache/rocketmq/broker/client/ProducerManager.java | Adds preferred-clientId-based channel selection with fallback to existing round-robin logic. |
| broker/src/main/java/org/apache/rocketmq/broker/transaction/AbstractTransactionalMessageCheckListener.java | Uses the new preferred selection API when sending transaction check messages. |
| broker/src/main/java/org/apache/rocketmq/broker/transaction/rocksdb/TransactionalMessageRocksDBService.java | Uses the new preferred selection API in RocksDB-based transaction check flow. |
| broker/src/test/java/org/apache/rocketmq/broker/client/ProducerManagerTest.java | Adds unit tests for preferred selection, fallback, and null preferred id behavior. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## develop #10564 +/- ##
=============================================
- Coverage 48.28% 48.17% -0.11%
+ Complexity 13445 13415 -30
=============================================
Files 1378 1378
Lines 100845 100833 -12
Branches 13044 13045 +1
=============================================
- Hits 48695 48579 -116
- Misses 46201 46289 +88
- Partials 5949 5965 +16 ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
RockteMQ-AI
left a comment
There was a problem hiding this comment.
Review by github-manager-bot
Summary
Fixes transaction state check routing: when multiple producers in the same group handle different topics, the check-back was routed to a random producer that might not know the transaction state. The fix stores the original producer's clientId in half message properties and prefers that producer's channel during check-back, falling back to round-robin if the original producer is offline.
Findings
- [Info]
ProducerManager.java:325-346— NewgetAvailableChannel(groupId, preferredClientId)overload. Iterates the group's channel map to find the matchingclientId, checksisActive()andisWritable(), then falls back to the existing round-robin method. Clean implementation. - [Info]
DefaultMQProducerImpl.java:1447— Client-side: writesPROPERTY_TRANSACTION_PRODUCER_CLIENT_IDinto half message properties before sending. Correct placement — right afterPROPERTY_PRODUCER_GROUP. - [Info]
MessageConst.java:55— New constant__TXN_PRODUCER_CID__. Naming is concise and consistent with existing constants. - [Info]
AbstractTransactionalMessageCheckListener.java:64andTransactionalMessageRocksDBService.java:237— Both check-back paths are updated to read the property and pass it to the new overload. Consistent treatment across storage backends. - [Info] Tests cover all three cases: match, not-found (fallback), and null clientId. Good coverage.
- [Warning]
ProducerManagerTest.java:260— Missing newline at end of file. Minor style.
Suggestions
- Rolling upgrade: During upgrade, older producers won't set
PROPERTY_TRANSACTION_PRODUCER_CLIENT_ID. The broker will readnulland fall back to round-robin — which is the current behavior. This is safe. - Consider whether the
clientIdshould also be stored in the transaction record (e.g., inTransactionalMessageStore) so that even if the half message properties are lost during migration, the routing hint is preserved.
Overall: well-designed fix for a real transaction routing issue. Backward compatible. LGTM.
Automated review by github-manager-bot
163dc4e to
207979d
Compare
RockteMQ-AI
left a comment
There was a problem hiding this comment.
Review by github-manager-bot (Re-review after force-push)
Summary
Re-reviewed after force-push at 04:58 UTC. The new commit addresses the Copilot concern about STRING_HASH_SET and adds an additional test case.
Changes since last review
MessageConst.java— AddedSTRING_HASH_SET.add(PROPERTY_TRANSACTION_PRODUCER_CLIENT_ID). This resolves the hash-set consistency concern raised by Copilot. ✅ProducerManagerTest.java— AddedtestGetAvailableChannelWithNullGroupIdtest. Coverage now includes 4 scenarios: match, not-found (fallback), null clientId, and null groupId. Good.
Remaining
- [Info]
ProducerManagerTest.java:310— Still missing newline at end of file (previously noted). Trivial style issue.
Verdict
The force-push properly addresses the outstanding concern. Code is clean, well-tested, and backward compatible. LGTM 👍
Automated review by github-manager-bot
207979d to
cccf293
Compare
|
Thanks for the patch — but I'd argue this scenario is a misuse of the producer group abstraction, not a bug, and doesn't need a broker-side fix. Per the official docs, a producer group is explicitly defined as "a collection of the same type of Producer, which sends the same type of messages with consistent logic" (Concept.md). The transaction check-back mechanism is designed around exactly this invariant: "If a transaction message is sent and the original producer crashes after sending, the broker will contact other producers in the same producer group to commit or rollback the transactional message" (Design_Transaction.md). Any producer in the group must be able to answer a check for any transaction sent by that group — that fungibility is what makes the check-back fault-tolerant. If different topics have independent transaction state / Concerns with the current patch:
Suggestion: close this and instead update the docs (Design_Transaction / best-practices) to explicitly call out "one producer group per transactional business — don't multiplex unrelated topics into one group". Would be happy to see a docs PR for that. cc @wang-jiahua — no offense intended, the diagnosis in the issue is accurate, I just think the fix belongs on the user side. |
|
Thanks for the review @qianye1001! I understand your perspective — using the same producer group for different business topics is not the recommended practice. However, the current behavior is still problematic because:
Would you consider this an acceptable safety improvement, or should I close the PR? |
Review by github-manager-bot (Re-review after new commit)SummaryRe-reviewed after commit Assessment: ✅ Looks GoodChanges since initial review:
Code quality:
Well-structured fix with clean separation of concerns. |
RockteMQ-AI
left a comment
There was a problem hiding this comment.
Review by github-manager-bot (Re-review after new commit)
Changes since last review
MQClientAPIImpl.java— Now setsPROPERTY_TRANSACTION_PRODUCER_CLIENT_IDon the half message withthis.mQClientFactory.getClientId(). This ensures the producer's client ID is propagated to the broker for transaction state checks. ✅MessageConst.java— AddedPROPERTY_TRANSACTION_PRODUCER_CLIENT_ID = "__TXN_PRODUCER_CID__"constant and registered it inSTRING_HASH_SET. ✅
Analysis
This completes the end-to-end flow:
- Client side: Producer attaches its clientId to the half message (
MQClientAPIImpl.java) - Broker side:
sendCheckMessage()reads the clientId and prefers the original producer's channel (ProducerManager.getAvailableChannel(groupId, preferredClientId))
This is a well-designed improvement — transaction state checks now prefer the channel that originally sent the half message, reducing unnecessary cross-node communication and improving reliability.
Verdict
The new commit properly connects the client-side and broker-side changes. All pieces are in place:
- Client propagates clientId ✅
- Broker uses it for channel selection ✅
- Fallback to round-robin if preferred channel unavailable ✅
- Tests cover all scenarios ✅
LGTM 👍
Automated review by github-manager-bot
RongtongJin
left a comment
There was a problem hiding this comment.
I found one issue that should be addressed before this PR is merged.
| SendResult sendResult = null; | ||
| MessageAccessor.putProperty(msg, MessageConst.PROPERTY_TRANSACTION_PREPARED, "true"); | ||
| MessageAccessor.putProperty(msg, MessageConst.PROPERTY_PRODUCER_GROUP, this.defaultMQProducer.getProducerGroup()); | ||
| MessageAccessor.putProperty(msg, MessageConst.PROPERTY_TRANSACTION_PRODUCER_CLIENT_ID, this.mQClientFactory.getClientId()); |
There was a problem hiding this comment.
This adds an internal routing hint to the half message, but the committed transactional message paths copy all half-message properties into the final message and only clear the existing transaction routing fields. That means consumers can observe __TXN_PRODUCER_CID__, which changes the user-visible message property surface and may expose producer client identity. Please remove MessageConst.PROPERTY_TRANSACTION_PRODUCER_CLIENT_ID when building the final committed message, including both EndTransactionProcessor.endMessageTransaction() and TransactionalMessageUtil.buildTransactionalMessageFromHalfMessage(), and add tests that verify the half message keeps this hint while the committed message does not.
There was a problem hiding this comment.
Good catch. Fixed in the latest commit — added MessageAccessor.clearProperty(msgInner, MessageConst.PROPERTY_TRANSACTION_PRODUCER_CLIENT_ID) in EndTransactionProcessor alongside the existing PROPERTY_TRANSACTION_PREPARED clear, so the internal routing hint is stripped from the final committed message and not visible to consumers.
|
只改 broker ,不改 proxy 没什么意义,建议 close 得了,不是 bug |
…n state When multiple producers in the same group handle different topics, the transaction check may be routed to an unrelated producer that cannot determine the transaction state correctly. Fix: Record the original producer's clientId in the half message properties. During transaction check-back, prefer the channel matching that clientId. Fall back to round-robin if the original producer is offline. Changes: - MessageConst: add PROPERTY_TRANSACTION_PRODUCER_CLIENT_ID - DefaultMQProducerImpl: write clientId into half message - ProducerManager: add getAvailableChannel(groupId, preferredClientId) - AbstractTransactionalMessageCheckListener: use new method - TransactionalMessageRocksDBService: use new method - ProducerManagerTest: 3 new test cases
cccf293 to
ccc962f
Compare
Which Issue(s) This PR Fixes
Fixes #9791
Brief Description
When multiple producers in the same group handle different topics, the transaction check may be routed to an unrelated producer that cannot determine the transaction state correctly.
Fix: Record the original producer's clientId in the half message properties. During transaction check-back, prefer the channel matching that clientId. Fall back to round-robin if the original producer is offline.
How Did You Implement It
Broker side:
MessageConst: addPROPERTY_TRANSACTION_PRODUCER_CLIENT_ID+ add toSTRING_HASH_SETDefaultMQProducerImpl: write clientId into half message (withmakeSureStateOK()guard)ProducerManager: addgetAvailableChannel(groupId, preferredClientId)with null groupId guardAbstractTransactionalMessageCheckListener+TransactionalMessageRocksDBService: use new methodProxy side:
ProxyClientRemotingProcessor.checkTransactionState: extractPROPERTY_TRANSACTION_PRODUCER_CLIENT_IDfrom messageExt and use newgetAvailableChannel(group, clientId)methodHow to Verify It
Unit tests:
ProducerManagerTest(4 new test cases: match, fallback, null clientId, null groupId).Fully backward compatible: old producers without clientId property fall back to round-robin.