Skip to content

Fixes #27796: enforce entityType filter in MCP search_metadata#27821

Draft
ecerulm wants to merge 1 commit intoopen-metadata:mainfrom
ecerulm:fix/27796-mcp-search-metadata-entity-type-filter
Draft

Fixes #27796: enforce entityType filter in MCP search_metadata#27821
ecerulm wants to merge 1 commit intoopen-metadata:mainfrom
ecerulm:fix/27796-mcp-search-metadata-entity-type-filter

Conversation

@ecerulm
Copy link
Copy Markdown

@ecerulm ecerulm commented Apr 29, 2026

Summary

  • Always inject an explicit term filter on entityType into the OpenSearch query when the MCP search_metadata tool is called with a specific entityType, instead of relying solely on the index alias to scope results.
  • The filter is composed (AND) with any user-supplied queryFilter, so advanced callers keep full control while still getting the entityType constraint enforced.
  • Refactor the search execution so the request goes through the regular search() path when only entityType (no queryFilter) is provided — that path honors both query (keyword) and queryFilter (the new entityType term).

Why

Reported in #27796: calling the MCP tool with entityType=metric returned results whose entityType was dashboard (and one table). Targeting the index alias by itself is not always sufficient — when the alias resolves to a multi-entity index or otherwise fans out, documents of other types leak into the response. Adding an explicit term filter makes the contract independent of how the index alias resolves.

Test plan

  • mvn -pl openmetadata-mcp -Dtest=SearchMetadataToolTest test (10 tests, all green; 7 new)
  • mvn -pl openmetadata-mcp test — full module suite (161 tests, all green)
  • mvn -pl openmetadata-mcp,openmetadata-integration-tests spotless:check
  • mvn -pl openmetadata-integration-tests test-compile (new IT compiles cleanly)
  • CI: McpToolsValidationIT.testSearchMetadataEntityTypeFilterIsHonored — regression test that searches with entityType=table and entityType=dashboard and asserts every returned entityType matches.

Notes

  • New unit tests cover: explicit term filter is added, no filter when entityType is omitted, helper merges with user-provided queryFilter, helper is a no-op when entityType is null/blank, and the helper correctly wraps both bool and non-bool existing query shapes.
  • validateSearchMetadataResponse in McpToolsValidationIT now optionally asserts the result entityType — closing the gap that allowed this bug to ship.

…tadata

The MCP search_metadata tool relied solely on the OpenSearch index alias
derived from the requested entityType to scope results. When the alias
fans out to multiple indices (e.g. dataAsset) or otherwise leaks
documents of other types, the response can include unrelated entities —
for example, requesting entityType=metric returned dashboard documents.

Always inject an explicit `term` filter on entityType into the query
filter so the constraint is enforced regardless of how the index alias
resolves. The filter is composed with any user-supplied queryFilter so
both apply.
@github-actions
Copy link
Copy Markdown
Contributor

Hi there 👋 Thanks for your contribution!

The OpenMetadata team will review the PR shortly! Once it has been labeled as safe to test, the CI workflows
will start executing and we'll be able to make sure everything is working as expected.

Let us know if you need any help!

Comment on lines +391 to +405
@VisibleForTesting
static String addEntityTypeFilter(String existingFilter, String entityType) {
if (entityType == null || entityType.isBlank()) {
return existingFilter;
}
ObjectNode termFilter = JsonUtils.getObjectMapper().createObjectNode();
termFilter.putObject("term").put("entityType", entityType);
if (nullOrEmpty(existingFilter)) {
ObjectNode bool = JsonUtils.getObjectMapper().createObjectNode();
ArrayNode filterArray = bool.putObject("bool").putArray("filter");
filterArray.add(termFilter);
ObjectNode wrapper = JsonUtils.getObjectMapper().createObjectNode();
wrapper.set("query", bool);
return JsonUtils.pojoToJson(wrapper);
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Quality: addEntityTypeFilter exceeds 15-line method limit

The new addEntityTypeFilter method is ~45 lines with 4 levels of nesting, exceeding the project's complexity limits (max 15 lines/method, max 3 nesting levels). The method handles three distinct concerns — creating a fresh filter, merging into an existing bool query, and wrapping a non-bool query — each of which could be a separate private helper.

Suggested fix:

Extract into three small methods:
- `createEntityTypeFilter(entityType)` for the null/empty case
- `mergeBoolFilter(boolNode, termFilter)` for adding to existing bool
- `wrapNonBoolQuery(queryObject, termFilter)` for wrapping non-bool

The public method becomes a short dispatcher:
  if (nullOrEmpty(existingFilter)) return createEntityTypeFilter(entityType);
  // parse, then dispatch to merge or wrap

Was this helpful? React with 👍 / 👎 | Reply gitar fix to apply this suggestion

Comment on lines +422 to +428
ArrayNode filterArray;
if (boolNode.has("filter") && boolNode.get("filter").isArray()) {
filterArray = (ArrayNode) boolNode.get("filter");
} else {
filterArray = boolNode.putArray("filter");
}
filterArray.add(termFilter);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Edge Case: Duplicate entityType filter if user's queryFilter already has one

If a caller passes a queryFilter that already includes a term filter on entityType, addEntityTypeFilter will blindly append a second identical term filter. While this won't produce incorrect results (two identical AND'd terms are logically equivalent to one), it adds unnecessary noise to the OpenSearch query. A defensive check for an existing entityType term in the filter array would be cleaner.

Was this helpful? React with 👍 / 👎 | Reply gitar fix to apply this suggestion

@gitar-bot
Copy link
Copy Markdown

gitar-bot Bot commented Apr 29, 2026

Code Review 👍 Approved with suggestions 0 resolved / 2 findings

Enforces the entityType filter in MCP search_metadata as requested. Refactor addEntityTypeFilter to reduce method length and nesting, and address the potential for redundant duplicate filters.

💡 Quality: addEntityTypeFilter exceeds 15-line method limit

📄 openmetadata-mcp/src/main/java/org/openmetadata/mcp/tools/SearchMetadataTool.java:391-405

The new addEntityTypeFilter method is ~45 lines with 4 levels of nesting, exceeding the project's complexity limits (max 15 lines/method, max 3 nesting levels). The method handles three distinct concerns — creating a fresh filter, merging into an existing bool query, and wrapping a non-bool query — each of which could be a separate private helper.

Suggested fix
Extract into three small methods:
- `createEntityTypeFilter(entityType)` for the null/empty case
- `mergeBoolFilter(boolNode, termFilter)` for adding to existing bool
- `wrapNonBoolQuery(queryObject, termFilter)` for wrapping non-bool

The public method becomes a short dispatcher:
  if (nullOrEmpty(existingFilter)) return createEntityTypeFilter(entityType);
  // parse, then dispatch to merge or wrap
💡 Edge Case: Duplicate entityType filter if user's queryFilter already has one

📄 openmetadata-mcp/src/main/java/org/openmetadata/mcp/tools/SearchMetadataTool.java:422-428

If a caller passes a queryFilter that already includes a term filter on entityType, addEntityTypeFilter will blindly append a second identical term filter. While this won't produce incorrect results (two identical AND'd terms are logically equivalent to one), it adds unnecessary noise to the OpenSearch query. A defensive check for an existing entityType term in the filter array would be cleaner.

🤖 Prompt for agents
Code Review: Enforces the entityType filter in MCP search_metadata as requested. Refactor addEntityTypeFilter to reduce method length and nesting, and address the potential for redundant duplicate filters.

1. 💡 Quality: addEntityTypeFilter exceeds 15-line method limit
   Files: openmetadata-mcp/src/main/java/org/openmetadata/mcp/tools/SearchMetadataTool.java:391-405

   The new `addEntityTypeFilter` method is ~45 lines with 4 levels of nesting, exceeding the project's complexity limits (max 15 lines/method, max 3 nesting levels). The method handles three distinct concerns — creating a fresh filter, merging into an existing bool query, and wrapping a non-bool query — each of which could be a separate private helper.

   Suggested fix:
   Extract into three small methods:
   - `createEntityTypeFilter(entityType)` for the null/empty case
   - `mergeBoolFilter(boolNode, termFilter)` for adding to existing bool
   - `wrapNonBoolQuery(queryObject, termFilter)` for wrapping non-bool
   
   The public method becomes a short dispatcher:
     if (nullOrEmpty(existingFilter)) return createEntityTypeFilter(entityType);
     // parse, then dispatch to merge or wrap

2. 💡 Edge Case: Duplicate entityType filter if user's queryFilter already has one
   Files: openmetadata-mcp/src/main/java/org/openmetadata/mcp/tools/SearchMetadataTool.java:422-428

   If a caller passes a `queryFilter` that already includes a `term` filter on `entityType`, `addEntityTypeFilter` will blindly append a second identical term filter. While this won't produce incorrect results (two identical AND'd terms are logically equivalent to one), it adds unnecessary noise to the OpenSearch query. A defensive check for an existing `entityType` term in the filter array would be cleaner.

Options

Display: compact → Showing less information.

Comment with these commands to change:

Compact
gitar display:verbose         

Was this helpful? React with 👍 / 👎 | Gitar

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant