Skip to content

fix: tolerate empty toolNames on RemoteMCPServer refs#1798

Draft
reilly3000 wants to merge 1 commit intokagent-dev:mainfrom
reilly3000:fix/remote-mcp-server-null-tools
Draft

fix: tolerate empty toolNames on RemoteMCPServer refs#1798
reilly3000 wants to merge 1 commit intokagent-dev:mainfrom
reilly3000:fix/remote-mcp-server-null-tools

Conversation

@reilly3000
Copy link
Copy Markdown

@reilly3000 reilly3000 commented May 4, 2026

Summary

Fixes #1797. An Agent that references a RemoteMCPServer without an
explicit toolNames filter currently CrashLoops with a pydantic
ValidationError because the controller emits http_tools[*].tools = null
and the runtime rejects None. Whether toolNames is meant to be required
or optional is a call for maintainers — either way, the controller
shouldn't produce a config the runtime can't load.

This PR is intentionally defensive on both sides; happy to rework toward
a different direction (e.g. CRD validation making toolNames required, or
runtime-only fix) if maintainers prefer.

  • Controller (Go): when toolNames is empty, expand to
    RemoteMCPServer.status.discoveredTools so the rendered config is
    explicit and observable in the agent's config.json Secret. Explicit
    toolNames continue to take precedence. Falls back to an empty array
    rather than null if status has no discovered tools yet.
  • Runtime (Python): coerce tools: None[] via a pydantic
    BeforeValidator on HttpMcpServerConfig / SseMcpServerConfig, so
    older controllers don't crash newer runtimes.

See #1797 for the full repro and root-cause analysis.

Test plan

  • New Go translator regression test
    (Test_AdkApiTranslator_RemoteMCPServer_NoToolNames) asserts the
    rendered AgentConfig.HttpTools[0].Tools is non-nil, expanded from
    discovered tools, and serializes as a JSON array (not null).
  • New Python pydantic test
    (test_http_tool_null_tools.py) covers tools: None, missing
    tools, and explicit tools: [...] for both Http and Sse
    variants.
  • Full Go translator suite (go test ./core/internal/controller/translator/agent/...) — pass.
  • Full Python kagent-adk unit suite (uv run pytest packages/kagent-adk/tests/unittests/) — pass (9 unrelated TLS e2e failures from missing local fixtures).
  • Runtime-level e2e: invoked the actual kagent-adk static --filepath ... startup path (the same command the runtime container runs) against a config.json containing http_tools[*].tools = null.
    - Pre-fix: reproduces the exact ValidationError ... type=list_type, input_value=None and the process exits.
    - Post-fix: Application startup complete. / Uvicorn running on http://127.0.0.1:....
  • In-cluster e2e against a real RemoteMCPServer: built an overlay image of the v0.9.1 runtime with only the patched types.py, swapped it in via controller.agentImage, and deployed against a live RemoteMCPServer that fronts ~50 federated MCP tools.
    - Pre-fix: a fresh Agent with no toolNames CrashLoops with the exact ValidationError ... input_value=None from the issue.
    - Post-fix: same Agent reaches Application startup complete / Uvicorn running on http://0.0.0.0:8080 and serves 200 on /.well-known/agent-card.json.
  • Controller-side e2e: built an overlay image of the v0.9.1 controller with only the patched binary, swapped it in via controller.image, and applied a fresh Agent referencing the same RemoteMCPServer (23 discoveredTools) with no toolNames specified.
    - Resulting agent pod reaches Application startup complete / Uvicorn running cleanly.

Open questions for reviewers

  1. Is toolNames intended to be required, or optional with omission
    meaning "all discovered tools"? If the former, this PR can be reduced
    to CRD validation + a clear admission error.
  2. Is the controller-side expansion to discoveredTools desirable for
    observability, or would you prefer the controller emit [] (or
    omit the field entirely) and have the runtime handle the "all tools"
    semantics on its own?

Notes

  • Same nil-slice → JSON null defect existed for sse_tools[*].tools;
    fixed via the same resolveToolNames helper.
  • Commit is DCO-signed.

@github-actions github-actions Bot added the bug Something isn't working label May 4, 2026
@reilly3000
Copy link
Copy Markdown
Author

Hi folks, I hate to be the ai slop PR guy, but ran into this issue and thought others may make the same assumption as well. I have a couple tweaks inbound before I raise it for review.

…dev#1797)

When an Agent's `tools[*].mcpServer` references a RemoteMCPServer
without an explicit `toolNames` filter, the controller serializes
`http_tools[*].tools` (and `sse_tools[*].tools`) as JSON `null`,
because Go's `encoding/json` marshals a nil `[]string` slice without
`omitempty` as `null`. The Python ADK runtime then crashes on startup
with a pydantic ValidationError, leaving the agent pod in
CrashLoopBackOff:

    http_tools.0.tools
      Input should be a valid list [type=list_type, input_value=None,
      input_type=NoneType]

Whether `toolNames` is intended to be required or optional is a call
for maintainers (see kagent-dev#1797 for discussion); either way, the controller
should not produce a config the runtime cannot load. This change is
defensive on both sides:

  * Controller (Go): when `toolNames` is empty, expand to the
    RemoteMCPServer's `.status.discoveredTools` list so the rendered
    config is explicit and observable. Explicit `toolNames` continue
    to take precedence. Falls back to an empty array (rather than
    `null`) if the status has no discovered tools yet.

  * Runtime (Python): coerce `tools: None` to `[]` via a pydantic
    BeforeValidator on `HttpMcpServerConfig` / `SseMcpServerConfig`,
    so older controller versions don't crash newer runtimes.

Adds a translator regression test asserting the rendered AgentConfig
contains a non-null `tools` array, and a pydantic-level test for the
runtime coercion.

Happy to rework toward a different direction (e.g. CRD validation
making `toolNames` required) if that better matches the intended
semantics.

Signed-off-by: Chris <chris@chris-reilly.com>
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@reilly3000 reilly3000 force-pushed the fix/remote-mcp-server-null-tools branch from 4448624 to 0aea970 Compare May 4, 2026 21:49
@github-actions github-actions Bot added bug Something isn't working and removed bug Something isn't working labels May 4, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] Agent pod CrashLoops with pydantic ValidationError when RemoteMCPServer ref omits toolNames (http_tools[*].tools=null)

1 participant