[ISSUE #10288] Hold MappedFile reference in ConsumeQueue write path to prevent SIGSEGV#10566
[ISSUE #10288] Hold MappedFile reference in ConsumeQueue write path to prevent SIGSEGV#10566wang-jiahua wants to merge 1 commit into
Conversation
…path to prevent SIGSEGV ConsumeQueue.putMessagePositionInfo() accesses MappedFile without holding a reference count. A concurrent cleanExpiredConsumeQueue can destroy() the MappedFile (unmapping its buffer) while the writer is still using it, causing SIGSEGV in JIT-compiled Unsafe.copyMemory. Fix: Call mappedFile.hold() before accessing the buffer and release() in a finally block, following the same pattern used in DefaultMappedFile.getData(). Also protect the fillPreBlank call in initializeWithOffset. Reported crash signatures: - Case 1 (v4.9.4): SEGV_MAPERR in ReputMessageService thread at ConsumeQueue.putMessagePositionInfo - Case 2 (v5.3.3): SEGV in AdminBrokerThread at ConsumeQueueStore.cleanExpired
There was a problem hiding this comment.
Pull request overview
This PR addresses a concurrency crash in the ConsumeQueue write path by ensuring MappedFile is protected via reference counting (hold()/release()) while its underlying buffers may be accessed, preventing concurrent destroy() from unmapping the buffer during writes.
Changes:
- Add
mappedFile.hold()/release()guarding aroundConsumeQueue.putMessagePositionInfo(...)write operations. - Guard
fillPreBlank(...)duringinitializeWithOffset(...)withhold()/release()to prevent concurrent unmap while initializing.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #10566 +/- ##
=============================================
- Coverage 48.28% 48.18% -0.11%
+ Complexity 13445 13417 -28
=============================================
Files 1378 1378
Lines 100845 100852 +7
Branches 13044 13046 +2
=============================================
- Hits 48695 48595 -100
- Misses 46201 46289 +88
- Partials 5949 5968 +19 ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
RockteMQ-AI
left a comment
There was a problem hiding this comment.
Review by github-manager-bot
Summary
Fixes a SIGSEGV caused by concurrent cleanExpiredConsumeQueue destroying a MappedFile while putMessagePositionInfo() is still writing to its buffer. The fix adds mappedFile.hold() / mappedFile.release() in a try-finally pattern around all buffer access in the ConsumeQueue write path.
Findings
- [Info]
ConsumeQueue.java:862—hold()failure returnsfalsesafely, preventing use of an unmapped buffer. The warning log includes topic and queueId for diagnosis. - [Info]
ConsumeQueue.java:864-901— The try-finally correctly wraps all buffer operations includingfillPreBlank, repeated-offset check, andappendMessage/appendMessageUsingFileChannel. The earlyreturn trueat line ~882 (repeated build) still executes thefinallyblock — correct. - [Info]
ConsumeQueue.java:1282-1288—initializeWithOffsetalso gets the hold/release treatment with an added null check. Good defensive coding. - [Info] The pattern is consistent with
DefaultMappedFile.getData()— established codebase convention.
Suggestions
- Consider whether
CommitLogwrite paths have similar exposure. IfcleanExpiredMappedFiles()can race with other CommitLog writers, the same pattern may be needed there. - A concurrent stress test (e.g., write + cleanExpired in parallel threads) would help prevent regression.
Overall: critical bug fix, correctly implemented. LGTM.
Automated review by github-manager-bot
|
LGTM~ @guyinyou help to double check |
| MappedFile mappedFile = mappedFileQueue.getLastMappedFile(offset * ConsumeQueue.CQ_STORE_UNIT_SIZE, true); | ||
| fillPreBlank(mappedFile, offset * ConsumeQueue.CQ_STORE_UNIT_SIZE); | ||
| if (mappedFile != null && mappedFile.hold()) { | ||
| try { | ||
| fillPreBlank(mappedFile, offset * ConsumeQueue.CQ_STORE_UNIT_SIZE); | ||
| } finally { | ||
| mappedFile.release(); | ||
| } | ||
| } |
There was a problem hiding this comment.
initializeWithOffset() should not silently continue when the mapped file cannot be created or held. This path relies on continuous ConsumeQueue files, and skipping fillPreBlank() while still calling flush(0) makes initialization look successful even though the pre-blank area was never written. Please fail fast here, or at least log an error and abort the initialization, so a file creation/availability problem does not leave the consume queue in an inconsistent state.
There was a problem hiding this comment.
Good catch. Fixed in the latest commit — instead of silently skipping fillPreBlank(), the method now logs an error and returns early (aborts initialization) when mappedFile is null or hold() fails:
if (mappedFile == null) {
log.error("initializeWithOffset failed: mappedFile is null for offset {}", offset);
return;
}
if (!mappedFile.hold()) {
log.error("initializeWithOffset failed: mappedFile hold() failed for offset {}", offset);
return;
}This ensures the consume queue is not left in an inconsistent state (no flush(0) called) when file creation/availability fails, while keeping the hold() reference to prevent the SIGSEGV.
Which Issue(s) This PR Fixes
Fixes #10288
Brief Description
ConsumeQueue.putMessagePositionInfo() accesses MappedFile without holding a reference count. A concurrent cleanExpiredConsumeQueue can destroy() the MappedFile (unmapping its buffer) while the writer is still using it, causing SIGSEGV in JIT-compiled Unsafe.copyMemory.
Fix: Call mappedFile.hold() before accessing the buffer and release() in a finally block, following the same pattern used in DefaultMappedFile.getData(). Also protect the fillPreBlank call in initializeWithOffset.
How Did You Test This Change
Existing ConsumeQueueTest (17 tests) all pass. The fix was validated during server benchmarking where the baseline broker crashed with SIGSEGV in ReputMessageService at ConsumeQueue.putMessagePositionInfo — exactly the crash signature reported in #10288.