Fixes #27796: enforce entityType filter in MCP search_metadata#27821
Fixes #27796: enforce entityType filter in MCP search_metadata#27821ecerulm wants to merge 1 commit intoopen-metadata:mainfrom
Conversation
…tadata The MCP search_metadata tool relied solely on the OpenSearch index alias derived from the requested entityType to scope results. When the alias fans out to multiple indices (e.g. dataAsset) or otherwise leaks documents of other types, the response can include unrelated entities — for example, requesting entityType=metric returned dashboard documents. Always inject an explicit `term` filter on entityType into the query filter so the constraint is enforced regardless of how the index alias resolves. The filter is composed with any user-supplied queryFilter so both apply.
|
Hi there 👋 Thanks for your contribution! The OpenMetadata team will review the PR shortly! Once it has been labeled as Let us know if you need any help! |
| @VisibleForTesting | ||
| static String addEntityTypeFilter(String existingFilter, String entityType) { | ||
| if (entityType == null || entityType.isBlank()) { | ||
| return existingFilter; | ||
| } | ||
| ObjectNode termFilter = JsonUtils.getObjectMapper().createObjectNode(); | ||
| termFilter.putObject("term").put("entityType", entityType); | ||
| if (nullOrEmpty(existingFilter)) { | ||
| ObjectNode bool = JsonUtils.getObjectMapper().createObjectNode(); | ||
| ArrayNode filterArray = bool.putObject("bool").putArray("filter"); | ||
| filterArray.add(termFilter); | ||
| ObjectNode wrapper = JsonUtils.getObjectMapper().createObjectNode(); | ||
| wrapper.set("query", bool); | ||
| return JsonUtils.pojoToJson(wrapper); | ||
| } |
There was a problem hiding this comment.
💡 Quality: addEntityTypeFilter exceeds 15-line method limit
The new addEntityTypeFilter method is ~45 lines with 4 levels of nesting, exceeding the project's complexity limits (max 15 lines/method, max 3 nesting levels). The method handles three distinct concerns — creating a fresh filter, merging into an existing bool query, and wrapping a non-bool query — each of which could be a separate private helper.
Suggested fix:
Extract into three small methods:
- `createEntityTypeFilter(entityType)` for the null/empty case
- `mergeBoolFilter(boolNode, termFilter)` for adding to existing bool
- `wrapNonBoolQuery(queryObject, termFilter)` for wrapping non-bool
The public method becomes a short dispatcher:
if (nullOrEmpty(existingFilter)) return createEntityTypeFilter(entityType);
// parse, then dispatch to merge or wrap
Was this helpful? React with 👍 / 👎 | Reply gitar fix to apply this suggestion
| ArrayNode filterArray; | ||
| if (boolNode.has("filter") && boolNode.get("filter").isArray()) { | ||
| filterArray = (ArrayNode) boolNode.get("filter"); | ||
| } else { | ||
| filterArray = boolNode.putArray("filter"); | ||
| } | ||
| filterArray.add(termFilter); |
There was a problem hiding this comment.
💡 Edge Case: Duplicate entityType filter if user's queryFilter already has one
If a caller passes a queryFilter that already includes a term filter on entityType, addEntityTypeFilter will blindly append a second identical term filter. While this won't produce incorrect results (two identical AND'd terms are logically equivalent to one), it adds unnecessary noise to the OpenSearch query. A defensive check for an existing entityType term in the filter array would be cleaner.
Was this helpful? React with 👍 / 👎 | Reply gitar fix to apply this suggestion
Code Review 👍 Approved with suggestions 0 resolved / 2 findingsEnforces the entityType filter in MCP search_metadata as requested. Refactor addEntityTypeFilter to reduce method length and nesting, and address the potential for redundant duplicate filters. 💡 Quality: addEntityTypeFilter exceeds 15-line method limit📄 openmetadata-mcp/src/main/java/org/openmetadata/mcp/tools/SearchMetadataTool.java:391-405 The new Suggested fix💡 Edge Case: Duplicate entityType filter if user's queryFilter already has one📄 openmetadata-mcp/src/main/java/org/openmetadata/mcp/tools/SearchMetadataTool.java:422-428 If a caller passes a 🤖 Prompt for agentsOptionsDisplay: compact → Showing less information. Comment with these commands to change:
Was this helpful? React with 👍 / 👎 | Gitar |
Summary
termfilter onentityTypeinto the OpenSearch query when the MCPsearch_metadatatool is called with a specificentityType, instead of relying solely on the index alias to scope results.queryFilter, so advanced callers keep full control while still getting the entityType constraint enforced.search()path when onlyentityType(noqueryFilter) is provided — that path honors bothquery(keyword) andqueryFilter(the new entityType term).Why
Reported in #27796: calling the MCP tool with
entityType=metricreturned results whoseentityTypewasdashboard(and onetable). Targeting the index alias by itself is not always sufficient — when the alias resolves to a multi-entity index or otherwise fans out, documents of other types leak into the response. Adding an explicittermfilter makes the contract independent of how the index alias resolves.Test plan
mvn -pl openmetadata-mcp -Dtest=SearchMetadataToolTest test(10 tests, all green; 7 new)mvn -pl openmetadata-mcp test— full module suite (161 tests, all green)mvn -pl openmetadata-mcp,openmetadata-integration-tests spotless:checkmvn -pl openmetadata-integration-tests test-compile(new IT compiles cleanly)McpToolsValidationIT.testSearchMetadataEntityTypeFilterIsHonored— regression test that searches withentityType=tableandentityType=dashboardand asserts every returnedentityTypematches.Notes
queryFilter, helper is a no-op when entityType is null/blank, and the helper correctly wraps both bool and non-bool existing query shapes.validateSearchMetadataResponseinMcpToolsValidationITnow optionally asserts the resultentityType— closing the gap that allowed this bug to ship.