Skip to content

Fix NPE in NettyResponseChannel when root cause exception has null message#3263

Open
crliao wants to merge 1 commit into
linkedin:masterfrom
crliao:fix/netty-null-message-npe-v2
Open

Fix NPE in NettyResponseChannel when root cause exception has null message#3263
crliao wants to merge 1 commit into
linkedin:masterfrom
crliao:fix/netty-null-message-npe-v2

Conversation

@crliao
Copy link
Copy Markdown
Contributor

@crliao crliao commented May 21, 2026

Problem & Solution Overview

CompletableFuture.orTimeout() constructs a TimeoutException with no message (new TimeoutException()). When this exception is the root cause of a RestServiceException, Utils.getRootCause(cause).getMessage() returns null, causing an NPE in NettyResponseChannel.getErrorResponse() at line 593.

This fires when shouldSendFailureReason returns true — i.e. when the request has SEND_FAILURE_REASON=true (set by NamedBlobPutHandler, S3MultipartUploadPartHandler, and UndeleteHandler). Any named blob PUT whose ID conversion times out will hit this path, logging a spurious NPE stack trace on top of the 503.

Fix: null-check the root cause message before using it. If null, errReason stays null and no FAILURE_REASON_HEADER is set — same behavior as when shouldSendFailureReason returns false.

Testing Done

  • Added setFailureReasonNullMessageNoNpeTest in NettyResponseChannelTest: wraps a bare new TimeoutException() (no message) inside a RestServiceException with SEND_FAILURE_REASON=true, sends through embedded channel, asserts 503 response with no NPE and no FAILURE_REASON_HEADER.
  • Existing setFailureReasonInResponseTest still passes.

Notes for Reviewers

Root cause of null message: CompletableFuture.orTimeout() JDK source calls new TimeoutException() with no string argument. The NPE only fires on the shouldSendFailureReason=true code path.

Author Checklist

  • For significant code changes - code is LiX'ed and/or completely backwards compatible.
  • For added/changed user-exposed functionality - appropriate documentation has been added.
  • If necessary, new metrics and/or alerts have been added.
  • If a high-risk change, I've articulated what to look out for in the sections above.

…ssage

CompletableFuture.orTimeout() creates a TimeoutException with no message.
Utils.getRootCause() returns that exception, and calling getMessage() on it
returns null, causing an NPE at getErrorResponse() line 593.

Add null check before using the root cause message to set FAILURE_REASON_HEADER.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented May 21, 2026

Codecov Report

❌ Patch coverage is 75.00000% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 51.24%. Comparing base (52ba813) to head (bff62f4).
⚠️ Report is 392 commits behind head on master.

Files with missing lines Patch % Lines
...va/com/github/ambry/rest/NettyResponseChannel.java 75.00% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@              Coverage Diff              @@
##             master    #3263       +/-   ##
=============================================
- Coverage     64.24%   51.24%   -13.00%     
+ Complexity    10398     8687     -1711     
=============================================
  Files           840      931       +91     
  Lines         71755    79542     +7787     
  Branches       8611     9526      +915     
=============================================
- Hits          46099    40764     -5335     
- Misses        23004    35400    +12396     
- Partials       2652     3378      +726     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@beijxu beijxu self-requested a review May 21, 2026 03:40
String rootMessage = Utils.getRootCause(cause).getMessage();
if (rootMessage != null) {
errReason = new String(
rootMessage.replaceAll("[\n\t\r]", " ").getBytes(StandardCharsets.US_ASCII),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will Utils.getRootCause(cause) return null

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No — getRootCause(cause) cannot return null here. If the input is non-null, the loop starts with throwable = t (non-null) and only advances while throwable.getCause() != null; when the chain ends it returns the current throwable, still non-null. It only returns null when the input is null. At this call site, cause has already passed the instanceof RestServiceException check on the enclosing if, so it is guaranteed non-null.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

at line 586 if (cause instanceof RestServiceException)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants