Skip to content

[ISSUE #10288] Hold MappedFile reference in ConsumeQueue write path to prevent SIGSEGV#10566

Open
wang-jiahua wants to merge 1 commit into
apache:developfrom
wang-jiahua:fix/consume-queue-hold-mapped-file
Open

[ISSUE #10288] Hold MappedFile reference in ConsumeQueue write path to prevent SIGSEGV#10566
wang-jiahua wants to merge 1 commit into
apache:developfrom
wang-jiahua:fix/consume-queue-hold-mapped-file

Conversation

@wang-jiahua

Copy link
Copy Markdown
Contributor

Which Issue(s) This PR Fixes

Fixes #10288

Brief Description

ConsumeQueue.putMessagePositionInfo() accesses MappedFile without holding a reference count. A concurrent cleanExpiredConsumeQueue can destroy() the MappedFile (unmapping its buffer) while the writer is still using it, causing SIGSEGV in JIT-compiled Unsafe.copyMemory.

Fix: Call mappedFile.hold() before accessing the buffer and release() in a finally block, following the same pattern used in DefaultMappedFile.getData(). Also protect the fillPreBlank call in initializeWithOffset.

How Did You Test This Change

Existing ConsumeQueueTest (17 tests) all pass. The fix was validated during server benchmarking where the baseline broker crashed with SIGSEGV in ReputMessageService at ConsumeQueue.putMessagePositionInfo — exactly the crash signature reported in #10288.

…path to prevent SIGSEGV

ConsumeQueue.putMessagePositionInfo() accesses MappedFile without
holding a reference count. A concurrent cleanExpiredConsumeQueue
can destroy() the MappedFile (unmapping its buffer) while the
writer is still using it, causing SIGSEGV in JIT-compiled
Unsafe.copyMemory.

Fix: Call mappedFile.hold() before accessing the buffer and
release() in a finally block, following the same pattern used
in DefaultMappedFile.getData(). Also protect the fillPreBlank
call in initializeWithOffset.

Reported crash signatures:
- Case 1 (v4.9.4): SEGV_MAPERR in ReputMessageService thread
  at ConsumeQueue.putMessagePositionInfo
- Case 2 (v5.3.3): SEGV in AdminBrokerThread at
  ConsumeQueueStore.cleanExpired
Copilot AI review requested due to automatic review settings July 1, 2026 03:58

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses a concurrency crash in the ConsumeQueue write path by ensuring MappedFile is protected via reference counting (hold()/release()) while its underlying buffers may be accessed, preventing concurrent destroy() from unmapping the buffer during writes.

Changes:

  • Add mappedFile.hold()/release() guarding around ConsumeQueue.putMessagePositionInfo(...) write operations.
  • Guard fillPreBlank(...) during initializeWithOffset(...) with hold()/release() to prevent concurrent unmap while initializing.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread store/src/main/java/org/apache/rocketmq/store/ConsumeQueue.java
@codecov-commenter

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 25.00000% with 24 lines in your changes missing coverage. Please review.
✅ Project coverage is 48.18%. Comparing base (88709c5) to head (8a094a1).

Files with missing lines Patch % Lines
...n/java/org/apache/rocketmq/store/ConsumeQueue.java 25.00% 18 Missing and 6 partials ⚠️
Additional details and impacted files
@@              Coverage Diff              @@
##             develop   #10566      +/-   ##
=============================================
- Coverage      48.28%   48.18%   -0.11%     
+ Complexity     13445    13417      -28     
=============================================
  Files           1378     1378              
  Lines         100845   100852       +7     
  Branches       13044    13046       +2     
=============================================
- Hits           48695    48595     -100     
- Misses         46201    46289      +88     
- Partials        5949     5968      +19     

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@RockteMQ-AI RockteMQ-AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review by github-manager-bot

Summary

Fixes a SIGSEGV caused by concurrent cleanExpiredConsumeQueue destroying a MappedFile while putMessagePositionInfo() is still writing to its buffer. The fix adds mappedFile.hold() / mappedFile.release() in a try-finally pattern around all buffer access in the ConsumeQueue write path.

Findings

  • [Info] ConsumeQueue.java:862hold() failure returns false safely, preventing use of an unmapped buffer. The warning log includes topic and queueId for diagnosis.
  • [Info] ConsumeQueue.java:864-901 — The try-finally correctly wraps all buffer operations including fillPreBlank, repeated-offset check, and appendMessage/appendMessageUsingFileChannel. The early return true at line ~882 (repeated build) still executes the finally block — correct.
  • [Info] ConsumeQueue.java:1282-1288initializeWithOffset also gets the hold/release treatment with an added null check. Good defensive coding.
  • [Info] The pattern is consistent with DefaultMappedFile.getData() — established codebase convention.

Suggestions

  • Consider whether CommitLog write paths have similar exposure. If cleanExpiredMappedFiles() can race with other CommitLog writers, the same pattern may be needed there.
  • A concurrent stress test (e.g., write + cleanExpired in parallel threads) would help prevent regression.

Overall: critical bug fix, correctly implemented. LGTM.


Automated review by github-manager-bot

@RongtongJin

Copy link
Copy Markdown
Contributor

LGTM~ @guyinyou help to double check

Comment on lines 1281 to +1288
MappedFile mappedFile = mappedFileQueue.getLastMappedFile(offset * ConsumeQueue.CQ_STORE_UNIT_SIZE, true);
fillPreBlank(mappedFile, offset * ConsumeQueue.CQ_STORE_UNIT_SIZE);
if (mappedFile != null && mappedFile.hold()) {
try {
fillPreBlank(mappedFile, offset * ConsumeQueue.CQ_STORE_UNIT_SIZE);
} finally {
mappedFile.release();
}
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

initializeWithOffset() should not silently continue when the mapped file cannot be created or held. This path relies on continuous ConsumeQueue files, and skipping fillPreBlank() while still calling flush(0) makes initialization look successful even though the pre-blank area was never written. Please fail fast here, or at least log an error and abort the initialization, so a file creation/availability problem does not leave the consume queue in an inconsistent state.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. Fixed in the latest commit — instead of silently skipping fillPreBlank(), the method now logs an error and returns early (aborts initialization) when mappedFile is null or hold() fails:

if (mappedFile == null) {
    log.error("initializeWithOffset failed: mappedFile is null for offset {}", offset);
    return;
}
if (!mappedFile.hold()) {
    log.error("initializeWithOffset failed: mappedFile hold() failed for offset {}", offset);
    return;
}

This ensures the consume queue is not left in an inconsistent state (no flush(0) called) when file creation/availability fails, while keeping the hold() reference to prevent the SIGSEGV.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] JVM crashes SIGSEGV in ConsumeQueue write path due to missing reference counting when MappedFile is destroyed concurrently

5 participants