Skip to content

[ISSUE #10537] Cache RPC Attributes and use processTimerElapsedMs in NettyRemotingAbstract#10539

Open
wang-jiahua wants to merge 1 commit into
apache:developfrom
wang-jiahua:perf/remoting-metrics-cache-and-timing
Open

[ISSUE #10537] Cache RPC Attributes and use processTimerElapsedMs in NettyRemotingAbstract#10539
wang-jiahua wants to merge 1 commit into
apache:developfrom
wang-jiahua:perf/remoting-metrics-cache-and-timing

Conversation

@wang-jiahua

@wang-jiahua wang-jiahua commented Jun 22, 2026

Copy link
Copy Markdown
Contributor

Which Issue(s) This PR Fixes

Fixes #10537

Brief Description

Cache RPC Attributes objects in RemotingMetricsManager to avoid per-RPC Attributes.builder().build() allocation.

  • RemotingMetricsManager: add getOrBuildAttributes(reqCode, respCode, isLongPolling, result) with a long-key cache (35-bit packed key, no String allocation). Falls back to direct build for unknown result values.
  • NettyRemotingAbstract: use cached Attributes from getOrBuildAttributes() instead of inline attributesBuilder.build() at each metrics recording point.
  • NettyRemotingAbstract: add log.isDebugEnabled() guard for debug log.

How Did You Test This Change

Server benchmark: 8C30G broker, 1KB msg, 128 queues, sync mode, 256 threads. CI checkstyle + compilation passed.

@RockteMQ-AI RockteMQ-AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review by github-manager-bot

Summary

Caches per-request-code OpenTelemetry Attributes objects and replaces Stopwatch.elapsed(TimeUnit.MILLISECONDS) with processTimerElapsedMs() (nanoTime-based) across the remoting layer. Also adds cached Constructor lookup for custom header classes and reduces default HashMap capacity. Multiple coordinated micro-optimizations targeting the broker hot path.

Findings

  • [Warning] NettyRemotingServer.java:432-433 — Changed log.info("channel writable false. host:{}", ...) to log.debug(...). This is a behavioral change: production users who relied on INFO-level channel writability warnings for backpressure monitoring will lose this visibility. If this was intentional (to reduce log noise), it is fine, but worth confirming.

  • [Info] RemotingMetricsManager.java — The attributes cache uses a ConcurrentHashMap with a volatile fast-path (lastAttrsCacheKey / lastCachedAttrs). The non-atomic read-then-write pattern means two threads with the same cache key could both rebuild Attributes, but computeIfAbsent ensures correctness. The volatile fast-path is a reasonable optimization for the common case where the same request type dominates.

  • [Info] RemotingMetricsManager.java:214-216 — Cache key encoding: (requestCode << 19) | (responseCode << 3) | (isLongPolling ? 4 : 0) | resultIndex. This assumes requestCode fits in 16 bits (max 65535) and responseCode fits in 16 bits. Both are safe given current ResponseCode and RequestCode values. Consider adding a comment documenting these bit-width assumptions for future maintainers.

  • [Info] RemotingCommand.java — Replacing Stopwatch processTimer with long processTimeStartNanos eliminates one object allocation per request. The new processTimerElapsedMs() method correctly computes elapsed milliseconds from System.nanoTime(). The deprecated getProcessTimer() now returns a fresh Stopwatch which would give incorrect elapsed time if called — the @Deprecated annotation is appropriate.

  • [Info] RemotingCommand.java:231-256 — Cached Constructor lookup via CONSTRUCTOR_CACHE (ConcurrentHashMap with computeIfAbsent). Thread-safe. The setAccessible(true) call is necessary for non-public constructors. Clean.

  • [Info] RemotingCommand.java:341 — HashMap initial capacity reduced from 256 to 16. With default load factor 0.75, this supports 12 entries before resizing — sufficient for typical header field counts. Good optimization.

  • [Info] TopicQueueMappingContext.java — Added EMPTY static singleton to avoid allocating empty context objects. Clean.

Suggestions

  1. The log level change in NettyRemotingServer.java (channel writability) is the most notable behavioral change. If intentional, no action needed.

  2. Consider adding a brief comment on the cache key encoding in RemotingMetricsManager documenting the bit-width assumptions (requestCode ≤ 16 bits, responseCode ≤ 16 bits).

Overall: Well-structured set of micro-optimizations with consistent patterns. The Stopwatch → nanoTime replacement is a clean improvement.


Automated review by github-manager-bot

@wang-jiahua wang-jiahua force-pushed the perf/remoting-metrics-cache-and-timing branch from 5c2f5cc to bb222a5 Compare June 23, 2026 02:25

@RockteMQ-AI RockteMQ-AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review by github-manager-bot (Re-review after new commits)

Summary

PR squashed into 2 commits. Core approach unchanged: cache OpenTelemetry Attributes objects and replace Stopwatch with long processTimerNanos to eliminate per-RPC allocations. New log.isDebugEnabled() guard added for debug logging.

CI Status

CI is failing — SpotBugs reports 4 ES_COMPARING_PARAMETER_STRING_WITH_EQ violations in RemotingMetricsManager.getOrBuildAttributes() (lines 102-105). The result == RESULT_SUCCESS style comparisons on a method parameter trigger SpotBugs even though the values are interned constants.

Findings

  • [Critical] RemotingMetricsManager.java:102-105 — String == comparison on method parameter result. Fix: use .equals() or add @SuppressFBWarnings("ES_COMPARING_PARAMETER_STRING_WITH_EQ") with justification that all callers pass interned constant strings. The .equals() approach is safer and preferred.

  • [Info] RemotingMetricsManager.java — Cache key encoding (requestCode << 19) | (responseCode << 3) | (isLongPolling ? 4 : 0) | resultIdx is correct for current value ranges. The lastAttrsCacheKey/lastCachedAttrs volatile fast-path is a reasonable optimization for the dominant-case pattern.

  • [Info] RemotingCommand.javaprocessTimerElapsedMs() correctly computes elapsed ms from System.nanoTime(). The deprecated getProcessTimer() returning a fresh Stopwatch is a safe backward-compat shim.

  • [Info] NettyRemotingAbstract.javalog.isDebugEnabled() guard avoids string concatenation overhead when debug is disabled. Good.

Suggestions

  1. Fix the SpotBugs violations to unblock CI — this is the only blocker.
  2. Consider adding a brief comment on the cache key bit-width assumptions for future maintainers.

Automated re-review by github-manager-bot

@wang-jiahua wang-jiahua force-pushed the perf/remoting-metrics-cache-and-timing branch 2 times, most recently from f5a1d4b to be98520 Compare June 24, 2026 04:05

@RockteMQ-AI RockteMQ-AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review by github-manager-bot (Re-review after new commits)

Summary

New squashed commit. Core approach unchanged: cache OpenTelemetry Attributes via ConcurrentHashMap + volatile fast-path, and replace Stopwatch with long processTimerNanos.

CI Status

CI is still failing — SpotBugs reports 4 ES_COMPARING_PARAMETER_STRING_WITH_EQ violations in RemotingMetricsManager.getOrBuildAttributes():

RemotingMetricsManager.java:102 — result == RESULT_SUCCESS
RemotingMetricsManager.java:103 — result == RESULT_ONEWAY
RemotingMetricsManager.java:104 — result == RESULT_WRITE_CHANNEL_FAILED
RemotingMetricsManager.java:105 — result == RESULT_CANCELED

This was flagged in the previous review but not yet fixed. SpotBugs treats == comparison on a method parameter as a bug even when the constants are interned, because the parameter type is String and callers could pass non-interned strings.

Fix Required

Replace == with .equals() in getOrBuildAttributes():

public Attributes getOrBuildAttributes(int requestCode, int responseCode,
    boolean isLongPolling, String result) {
    int resultIdx;
    if (RESULT_SUCCESS.equals(result)) resultIdx = 0;
    else if (RESULT_ONEWAY.equals(result)) resultIdx = 1;
    else if (RESULT_WRITE_CHANNEL_FAILED.equals(result)) resultIdx = 2;
    else if (RESULT_CANCELED.equals(result)) resultIdx = 3;
    else resultIdx = -1;
    // ... rest unchanged

Findings

  • [Critical] RemotingMetricsManager.java:102-105Not fixed. Change result == CONSTANT to CONSTANT.equals(result) to resolve SpotBugs ES_COMPARING_PARAMETER_STRING_WITH_EQ. This is blocking CI.

  • [Info] The ConcurrentHashMap<Long, Attributes> + volatile fast-path caching strategy is sound. The lastAttrsCacheKey / lastCachedAttrs volatile pair avoids HashMap lookup on the hottest path. Good optimization.

  • [Info] The bit-packing key ((long) requestCode << 19) | ((responseCode & 0xFFFF) << 3) | ... is correct given the value ranges (requestCode fits in 13 bits, responseCode in 16 bits).


Automated review by github-manager-bot

@wang-jiahua wang-jiahua force-pushed the perf/remoting-metrics-cache-and-timing branch 4 times, most recently from c90c171 to 611d8ea Compare June 24, 2026 08:57
@codecov-commenter

codecov-commenter commented Jun 24, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 39.21569% with 31 lines in your changes missing coverage. Please review.
✅ Project coverage is 48.21%. Comparing base (88709c5) to head (f7bef92).

Files with missing lines Patch % Lines
...rocketmq/remoting/netty/NettyRemotingAbstract.java 8.82% 28 Missing and 3 partials ⚠️
Additional details and impacted files
@@              Coverage Diff              @@
##             develop   #10539      +/-   ##
=============================================
- Coverage      48.28%   48.21%   -0.08%     
+ Complexity     13445    13432      -13     
=============================================
  Files           1378     1378              
  Lines         100845   100860      +15     
  Branches       13044    13049       +5     
=============================================
- Hits           48695    48628      -67     
- Misses         46201    46263      +62     
- Partials        5949     5969      +20     

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@RockteMQ-AI RockteMQ-AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review by github-manager-bot (Re-review after new commit 611d8ea)

Summary

New commit confirmed — OpenTelemetry Attributes are now cached per request-code/response-code combination using a ConcurrentHashMap. Debug log string concatenation is also guarded with log.isDebugEnabled().

Assessment

  • [Correctness] The bit-packing key (requestCode << 19) | (responseCode << 3) | flags correctly encodes the attribute combination. The resultIdx mapping is consistent with the string constants. Cache invalidation is not needed since the set of request/response codes is finite and small.
  • [Performance] Eliminates per-request Attributes object allocation and string formatting on the metrics hot path. The isDebugEnabled() guard avoids unnecessary string concatenation. Good optimization.
  • [Thread Safety] ConcurrentHashMap.computeIfAbsent is used correctly for thread-safe lazy initialization.
  • [Info] One minor note: the cache is unbounded. In practice the number of distinct (requestCode, responseCode, isLongPolling, result) combinations is small and bounded, so this is not a concern.

LGTM — all CI checks pass. Clean performance improvement. ✅


Automated by github-manager-bot

@wang-jiahua wang-jiahua force-pushed the perf/remoting-metrics-cache-and-timing branch from 611d8ea to 84e42d4 Compare June 30, 2026 10:13

@RockteMQ-AI RockteMQ-AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review by github-manager-bot (Re-review after new commit)

Summary

Re-review triggered by new commit 84e42d4a. This PR caches Attributes objects in RemotingMetricsManager to avoid per-RPC allocation on the hot path, and guards debug log string concatenation with isDebugEnabled().

Findings

  • [Info] remoting/.../RemotingMetricsManager.java — The attributesCache (ConcurrentHashMap<Long, Attributes>) has no eviction policy. In practice, RocketMQ has a bounded set of request/response codes so the cache should stabilize at a manageable size. However, consider documenting the expected upper bound or adding a size guard (e.g., if (cache.size() > THRESHOLD) return buildAttributes(...)) as a safety net against unexpected growth.

  • [Info] remoting/.../RemotingMetricsManager.java:112-115 — Bit-packing key construction: requestCode is shifted by 19 bits. This assumes requestCode fits in 19 bits (0–524287). RocketMQ request codes are well within this range, but a defensive comment documenting the assumption would help future maintainers.

  • [Info] remoting/.../NettyRemotingAbstract.java — Good improvement: extracting processTimerElapsedMs and using it consistently across processResponseCommand, processRequestCommand, and processResponseCommandAsync. The isDebugEnabled() guard before string concatenation is a solid optimization for the hot path.

  • [Info] remoting/.../RemotingMetricsConstant.java — Renaming labels to include _KEY suffix improves clarity. Clean migration.

Positive Aspects

  • Well-structured getOrBuildAttributes with clear result-index mapping
  • Comprehensive test coverage including cache hit verification and unknown-result bypass
  • Thread-safe via ConcurrentHashMap.computeIfAbsent
  • Good deduplication of metrics recording across writeResponse methods

LGTM 👍


Automated re-review by github-manager-bot

- RemotingMetricsManager: add getOrBuildAttributes() with long-key cache
- NettyRemotingAbstract: use cached Attributes instead of per-RPC AttributesBuilder
- NettyRemotingAbstract: add log.isDebugEnabled() guard for debug log
- RemotingMetricsConstant: add LABEL_*_KEY static AttributeKey instances
@wang-jiahua wang-jiahua force-pushed the perf/remoting-metrics-cache-and-timing branch from 84e42d4 to f7bef92 Compare July 1, 2026 03:11

@RockteMQ-AI RockteMQ-AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review by github-manager-bot

Summary

Caches metrics Attributes objects in a ConcurrentHashMap keyed by a packed long value combining requestCode, responseCode, isLongPolling, and result index. Also introduces processTimerElapsedMs() on RemotingCommand to avoid TimeUnit imports and Stopwatch overhead.

Findings

  • [Warning] Key packing in getOrBuildAttributes() — The key packs requestCode (16 bits) + responseCode (15 bits) + isLongPolling (1 bit) + resultIdx (4 bits) into a long. This assumes response codes fit in 15 bits (max 32767). If RocketMQ ever defines response codes above 32767, there will be silent key collisions.
  • [Info] ConcurrentHashMap for attributes cache — Appropriate for concurrent access. Consider adding a size bound or eviction policy to prevent unbounded growth if the cardinality of (requestCode, responseCode, isLongPolling, result) combinations is high.
  • [Info] processTimerElapsedMs() — Clean abstraction. Avoids repeated TimeUnit.MILLISECONDS import and call pattern across multiple processors.

Suggestions

  • Add a comment documenting the key bit layout and the assumption that response codes fit in 15 bits.
  • Consider using Caffeine or Guava cache with a max size if the cardinality concern is real.

Automated review by github-manager-bot

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[ISSUE #10537] Cache RPC Attributes and use processTimerElapsedMs in NettyRemotingAbstract

4 participants