Configure rate limits on VirtualMCPServer PR B#5276
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #5276 +/- ##
==========================================
+ Coverage 68.83% 68.85% +0.02%
==========================================
Files 627 628 +1
Lines 63590 63678 +88
==========================================
+ Hits 43772 43846 +74
- Misses 16568 16581 +13
- Partials 3250 3251 +1 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
695a96c to
480152f
Compare
There was a problem hiding this comment.
Large PR Detected
This PR exceeds 1000 lines of changes and requires justification before it can be reviewed.
How to unblock this PR:
Add a section to your PR description with the following format:
## Large PR Justification
[Explain why this PR must be large, such as:]
- Generated code that cannot be split
- Large refactoring that must be atomic
- Multiple related changes that would break if separated
- Migration or data transformationAlternative:
Consider splitting this PR into smaller, focused changes (< 1000 lines each) for easier review and reduced risk.
See our Contributing Guidelines for more details.
This review will be automatically dismissed once you add the justification section.
480152f to
619264d
Compare
619264d to
f3cf1e4
Compare
Large PR justification has been provided. Thank you!
|
✅ Large PR justification has been provided. The size review has been dismissed and this PR can now proceed with normal review. |
There was a problem hiding this comment.
Pull request overview
This PR wires the VirtualMCPServer.spec.config.rateLimiting field (introduced in PR A) into the vMCP runtime so rate limits are actually enforced. It builds a Redis-backed limiter during server construction, refactors the MCP middleware chain so rate limiting runs after auth + MCP parsing (allowing per-user/per-tool keying), and adds optimizer-aware tool-name resolution for call_tool.
Changes:
- Pass namespace and
RateLimitingconfig throughvmcpserver.Configand initialize a Redis-backed rate-limit middleware (with cleanup on Stop) inNew(). - Refactor the vMCP MCP middleware composition into small
applyXxxhelpers and place rate limiting between MCP parser and audit/discovery; resolvecall_toolto its innertool_namewhen optimizer is active. - Generalize
pkg/ratelimit/middleware.goto exposeNewMiddlewarewith an optionalToolNameResolver, and add unit + focused E2E coverage.
Reviewed changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
pkg/vmcp/cli/serve.go |
Plumbs VMCP_NAMESPACE env (defaulting to local) and RateLimiting into vmcpserver.Config. |
pkg/vmcp/cli/serve_test.go |
Adds unit tests for vmcpNamespace() default and env override. |
pkg/vmcp/server/server.go |
Adds Namespace + RateLimiting config fields, initializes rate-limit middleware in New(), and splits MCP middleware chain into helpers with new execution order. |
pkg/vmcp/server/ratelimit.go |
New file: builds Redis client + limiter, validates Redis session storage, returns middleware + cleanup, and resolves optimizer call_tool inner tool name. |
pkg/vmcp/server/ratelimit_test.go |
New unit tests for config validation, per-user/per-tool/optimizer behavior, and chain helpers. |
pkg/ratelimit/middleware.go |
Introduces ToolNameResolver/DefaultToolNameResolver and exported NewMiddleware; CreateMiddleware now uses it. |
pkg/ratelimit/middleware_test.go |
Adds tests for custom resolver, default resolver nil-handling, and end-to-end CreateMiddleware registration. |
test/e2e/thv-operator/virtualmcp/virtualmcp_rate_limiting_test.go |
New focused E2E: deploys Redis + OIDC + backend + vMCP with perUser.maxTokens=1 and asserts second tools/call is rejected. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Signed-off-by: Sanskarzz <sanskar.gur@gmail.com>
Signed-off-by: Sanskarzz <sanskar.gur@gmail.com>
f3cf1e4 to
589dfe9
Compare
Summary
This PR wires the
VirtualMCPServerrate-limiting config added in PR A into the vMCP runtime so configured limits are actually enforced before requests reach aggregated backends.spec.config.rateLimitingfrom the vMCP runtime config intovmcpserver.Config.call_toolrequests are rate-limited using the innerarguments.tool_name.call_toolbehavior.perUser.maxTokens=1, then verifies the second authenticatedtools/callis rejected.Fixes: #4552
Type of change
Test plan
Changes
pkg/vmcp/cli/serve.goRateLimitingintovmcpserver.Config.pkg/vmcp/server/server.goNew(), and refactors MCP middleware composition.pkg/vmcp/server/ratelimit.gopkg/ratelimit/middleware.gopkg/vmcp/server/ratelimit_test.gotest/e2e/thv-operator/virtualmcp/virtualmcp_rate_limiting_test.gomanual-testing-ratelimit-vmcp.mdDoes this introduce a user-facing change?
Yes. Cluster admins can now configure
VirtualMCPServer.spec.config.rateLimitingand vMCP will enforce the configured shared, per-user, and per-tool limits fortools/callrequests at runtime.Implementation plan
Approved implementation plan
sessionStorageandrateLimitingconfig.tools/callforcall_toolto the innerarguments.tool_namevalue.Special notes for reviewers
tools/callparameters.kubectl port-forwardfor vMCP/OIDC access instead of assuming NodePorts are reachable onlocalhost, which makes it work reliably with local kind clusters and CI environments.Manual testing
If the cluster is not already running with local images:
1. PR A: Validate CRD Admission Rules
1.1. Reject rate limiting without Redis session storage
Expected: the API server rejects the object with:
1.2. Reject per-user rate limiting without OIDC auth
Expected: the API server rejects the object with:
2. PR A: Verify Runtime ConfigMap Serialization
Create Redis first and wait for it to accept traffic. This avoids a transient vMCP restart while Redis is still starting.
Create a valid shared-limit vMCP. This uses anonymous auth so it can be inspected without setting up OIDC manually.
Wait for resources:
Expected: the
VirtualMCPServerreachesReady. It may briefly showDegradedwhile the backend health check warms up.Inspect the generated vMCP ConfigMap:
Expected:
config.yamlcontains:This confirms PR A’s field survives CRD -> converter -> ConfigMap serialization.
3. PR B: Verify Runtime Enforcement
The most reliable manual runtime check is the focused E2E test. It deploys Redis, a parameterized OIDC server, a backend MCPServer, and a VirtualMCPServer with
perUser.maxTokens=1. Then it sends two authenticatedtools/callrequests from the same user and expects the second request to be rejected.Expected:
This confirms PR B:
config.rateLimiting.sessionStorage.tools/callis rejected with rate-limit error behavior.4. Optional: Inspect Runtime Logs
Expected log line:
If Redis cannot be reached at startup, vMCP should fail to build the handler and logs should show:
5. Cleanup
The focused E2E test creates timestamped resources and cleans them up automatically.
Large PR Justification
This is a new feature package with a large test suite, and it needs to land as one coherent phase.