Skip to content

Validate upstream JSON-RPC responses in transparent proxy#5288

Open
bishnubista wants to merge 7 commits into
stacklok:mainfrom
bishnubista:fix/validate-upstream-jsonrpc-responses
Open

Validate upstream JSON-RPC responses in transparent proxy#5288
bishnubista wants to merge 7 commits into
stacklok:mainfrom
bishnubista:fix/validate-upstream-jsonrpc-responses

Conversation

@bishnubista
Copy link
Copy Markdown

@bishnubista bishnubista commented May 14, 2026

Summary

The transparent proxy (thv run) currently forwards upstream MCP server responses to clients with HTTP 200 even when the JSON-RPC frame is structurally invalid — missing jsonrpc:"2.0", invalid id type, non-object body, etc. The proxy sits between potentially untrusted upstream MCP servers and trusted clients, so a compromised or misconfigured upstream can push malformed frames straight through, where they may crash strict JSON-RPC parsers, be misinterpreted as legitimate responses, or trigger undefined behaviour in client-side parsers.

This PR validates upstream JSON-RPC frames at the proxy boundary and rewrites invalid responses into a structured JSON-RPC error so the proxy stops being a silent amplifier for malformed (or adversarial) upstream servers.

  • Validation runs in NoOpResponseProcessor.ProcessResponse (streamable-http / default transport). SSE is unaffected.
  • Structural rules: single JSON value (trailing JSON rejected); object or non-empty array of objects; per object jsonrpc=="2.0", id ∈ {string, number, null}, exactly one of result/error; error.code must be an integer, error.message must be a string.
  • Gating: POST + 200, application/json(-rpc) media type, non-identity Content-Encoding passed through untouched, request must carry an MCP streamable-HTTP signal (MCP-Protocol-Version or Mcp-Session-Id) so non-MCP JSON traffic flowing through the catch-all is not rewritten.
  • Invalid frame → HTTP 502 with body {"jsonrpc":"2.0","id":null,"error":{"code":-32000,"message":"Invalid upstream JSON-RPC response","data":"<detail>"}}. -32000 is the JSON-RPC implementation-defined server-error code; -32603 is reserved for internal JSON-RPC errors. Response headers are replaced wholesale so upstream Mcp-Session-Id, Set-Cookie, ETag, Cache-Control, etc. are not smuggled into the proxy-generated error.
  • Body reads bounded by io.LimitReader at maxJSONRPCResponseBytes = 100 << 20 (100 MiB), matching streamable-HTTP precedent in pkg/vmcp/client and pkg/vmcp/session/internal/backend. Oversized responses are rejected with the same 502 shape so the proxy cannot be amplified into a memory DoS.

Closes #5247

Type of change

  • Bug fix
  • New feature
  • Refactoring (no behavior change)
  • Dependency update
  • Documentation
  • Other (describe):

Test plan

  • Unit tests (task test)
  • E2E tests (task test-e2e)
  • Linting (task lint-fix)
  • Manual testing (describe below)

Local verification:

  • go test ./pkg/transport/proxy/transparent/... -count=1 — pass.
  • golangci-lint run --timeout=5m ./pkg/transport/proxy/transparent/... — 0 issues.

New test cases in pkg/transport/proxy/transparent/response_processor_test.go:

  • Valid pass-through: result, error, batch, result: null, application/json; charset=utf-8.
  • Invalid → 502: missing jsonrpc, invalid id type, non-object body, result+error both present, trailing JSON, fractional error.code.
  • Non-validated pass-through: non-POST, non-200, non-JSON, text/event-stream, application/jsonsomethingelse.
  • Compressed responses (Content-Encoding: gzip) passed through unchanged for both valid and malformed payloads; explicit Content-Encoding: identity still validates.
  • MCP-signal gate: no MCP request headers → pass through even with malformed body; MCP-Protocol-Version or Mcp-Session-Id → validate.
  • Rewritten 502 strips Mcp-Session-Id, Set-Cookie, Etag, Cache-Control from the response.
  • Oversized response (> maxJSONRPCResponseBytes) is rejected with a size-limit error.

API Compatibility

  • This PR does not break the v1beta1 API, OR the api-break-allowed label is applied and the migration guidance is described above.

No CRD or operator API surface is touched.

Does this introduce a user-facing change?

Yes. Clients of the transparent proxy that previously received malformed upstream responses verbatim with HTTP 200 will now receive HTTP 502 with a synthetic JSON-RPC error body. Conformant upstream MCP servers are unaffected — only structurally invalid responses are rewritten.

Special notes for reviewers

Two scope decisions called out explicitly so reviewers can push back if either is wrong for the project:

  1. Validation runs in ModifyResponse, after tracingTransport.RoundTrip may have observed an upstream Mcp-Session-Id and registered proxy-side session state. This PR strips upstream session headers from the proxy-generated 502 so clients never receive a session id derived from a malformed initialize response, but it does not roll back server-side proxy session state created before validation. Moving validation earlier (or adding a rollback path for invalid initialize responses) touches tracingTransport and felt outside the scope of "structurally validate the upstream frame." Happy to open a follow-up — or fold it into this PR if maintainers prefer.

  2. Backward-compat clients that POST without MCP-Protocol-Version on the very first initialize will not trigger validation — same as today's behaviour, so no regression. If broader coverage is preferred, tracingTransport.RoundTrip already buffers and parses the request body to detect initialize; a small context-propagated marker would let ProcessResponse validate those frames too. Easy follow-up.

Open to feedback on:

  • The 100 MiB body cap (matched pkg/vmcp precedent; can tighten to 10 MiB if maintainers prefer an explicit security default).
  • Whether the MCP-signal narrowing should be widened to also accept the buffered-request-body sniff above.

The transparent proxy forwarded malformed upstream MCP frames to clients with
HTTP 200 even when the response violated JSON-RPC 2.0 structure. This adds a
boundary check in NoOpResponseProcessor that rejects structurally invalid
upstream frames and returns a synthetic 502 carrying a JSON-RPC error to the
client, so the proxy stops being a silent amplifier for malformed (or
adversarial) upstream servers.

Validation runs only for streamable-http POST/200 responses that carry an MCP
request signal (MCP-Protocol-Version or Mcp-Session-Id) and an application/json
content type, with non-identity Content-Encoding traffic passed through
untouched. Body reads are bounded to 100 MiB to match existing streamable-HTTP
limits in pkg/vmcp. Rewritten error responses replace headers wholesale so
upstream session/cookie/cache metadata is not smuggled into the proxy-generated
error. SSE traffic is unaffected.

Closes stacklok#5247

Signed-off-by: bishnubista <bista.developer@gmail.com>
@bishnubista bishnubista force-pushed the fix/validate-upstream-jsonrpc-responses branch from 4fcdb4b to a7badd9 Compare May 14, 2026 23:58
if dec.More() {
return fmt.Errorf("JSON-RPC response must contain a single JSON value")
}
if err := dec.Decode(&struct{}{}); err != io.EOF {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The second Decode here is unreachable. When dec.More() returns false there is nothing left in the stream, so the subsequent Decode will always return io.EOF and the condition never fires. The More() check above already covers the trailing-value case completely. Nit, but worth removing the dead branch.


// NoOpResponseProcessor is a processor that does nothing.
// Used for transports that don't require response processing (e.g., streamable-http).
// NoOpResponseProcessor is the default processor for non-SSE transports.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Scoping to non-SSE makes sense for now — SSE validation would need a fundamentally different approach. By the time ProcessResponse runs on an SSE response, the 200 OK and Content-Type: text/event-stream headers are already committed downstream, so there is no way to rewrite to 502 for a malformed mid-stream event. A proper fix would need a per-event streaming interceptor inside the SSE processor that can either synthesize an error event or close the stream — a meaningfully larger scope than this PR.

Worth opening a follow-up issue to track it explicitly, so it is clear the gap is deferred rather than forgotten.

Remove the top-level json.Decoder.More check from upstream JSON-RPC response validation. More is intended for array and object iteration, so using it as a top-level single-value guard can miss malformed trailing delimiter bytes.

Keep the second Decode call as the exact-single-value check and compare its result with io.EOF via errors.Is. This rejects trailing JSON values and trailing syntax junk while accepting clean EOF after the first decoded response.

Add a regression case for a valid JSON-RPC response followed by a stray closing delimiter so malformed trailing bytes cannot pass validation.
@github-actions github-actions Bot added the size/M Medium PR: 300-599 lines changed label May 26, 2026
@codecov
Copy link
Copy Markdown

codecov Bot commented May 26, 2026

Codecov Report

❌ Patch coverage is 81.30841% with 20 lines in your changes missing coverage. Please review.
✅ Project coverage is 68.79%. Comparing base (2043cf6) to head (f70c56c).

Files with missing lines Patch % Lines
.../transport/proxy/transparent/response_processor.go 81.30% 10 Missing and 10 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #5288      +/-   ##
==========================================
- Coverage   68.81%   68.79%   -0.03%     
==========================================
  Files         627      627              
  Lines       63594    63700     +106     
==========================================
+ Hits        43762    43822      +60     
- Misses      16583    16618      +35     
- Partials     3249     3260      +11     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@JAORMX
Copy link
Copy Markdown
Collaborator

JAORMX commented May 26, 2026

We have middleware that does the validation and parsing. This happens outside of the transparent proxy, but still in the path. Why not use the existing middleware instead?

Move strict JSON-RPC response body validation into pkg/mcp so the transparent proxy reuses shared MCP parsing code instead of carrying a private copy.

Use parsed MCP request context from the parser middleware as the primary signal for response validation, while preserving protocol/session header fallback for batch and compatibility cases that the request parser does not currently cover.

Keep the transparent proxy's ModifyResponse path responsible for rejecting malformed upstream responses with the existing bounded read and 502 rewrite.
@bishnubista
Copy link
Copy Markdown
Author

We have middleware that does the validation and parsing. This happens outside of the transparent proxy, but still in the path. Why not use the existing middleware instead?

Good point. I agree the JSON-RPC validation logic should not live as bespoke protocol code in pkg/transport/proxy/transparent.

I traced the path again: the existing MCP parser middleware runs before the transparent proxy and parses inbound requests into context, but I didn’t find an existing generic response validator that can inspect upstream response bodies and rewrite them before they reach the client. The response-side parsing I found is authz-specific or audit-only/lenient.

I’ll adjust this by moving the response validation helper into pkg/mcp, using mcp.GetParsedMCPRequest(resp.Request.Context()) as the primary MCP signal, and keeping the current header fallback for batch/backcompat cases. I think ModifyResponse should remain the enforcement point because it can still rewrite the upstream response to 502 before the reverse proxy commits it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/M Medium PR: 300-599 lines changed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

security: proxy forwards malformed upstream JSON-RPC frames to clients without validation

3 participants