Skip to content

Commit 8b0f4a7

Browse files
fix(search): resolve entity-specific aliases to canonical index names (#27813)
Closes #27761. Background. When the UI started passing the alias `index=table` (instead of the legacy `index=table_search_index`), every search for tables also returned column docs. Root cause: ES alias expansion is bidirectional through the parent/child graph in indexMapping.json. column_search_index is created with `table` as one of its aliases (because tableColumn lists "table" in parentAliases), so when ES sees alias `table` it expands it to both `table_search_index` and `column_search_index`. The same shape exists for every alias whose entities also act as a parent for some other entity (testCase, testSuite, database, …). Fix. Resolve entity-specific aliases at the API boundary into their canonical `*_search_index` names so we send literal index names to ES, bypassing alias expansion entirely. Compound aliases (`all`, `dataAsset`) have no canonical index — they pass through unchanged so ES expands them natively, preserving the "everything under the data-asset umbrella" use case the UI's MyData / CuratedAssets / search- bar widgets rely on. The change is one method, `SearchRepository.getIndexOrAliasName(String)`: * entity-specific alias (`"table"`) → `"<cluster>_table_search_index"` * compound alias (`"dataAsset"`, `"all"`) → `"<cluster>_dataAsset"` (passes through) * canonical name (`"table_search_index"`) → `"<cluster>_table_search_index"` (legacy callers) * already cluster-prefixed → returned unchanged (idempotent) * empty token from `"table,"` / `","` → dropped, with all-empty input preserved All four search/export/preview/NLQ resource paths already call this method; `searchByField`, `aggregate`, and `getEntityTypeCounts` already call it inside the ES/OS managers. So the fix takes effect across every endpoint that accepts an `index` parameter without changing the public API surface — no new query params, no schema changes, no signature churn. No caller passes an entity-specific alias and expects child entity types back: UI sites with `SearchIndex.TABLE`/`TOPIC`/etc. all want only that type (asset-type filter chips, advanced-search builders, lineage selection, alert rule scoping). UI sites that DO want mixed entity types use `SearchIndex.ALL` or `SearchIndex.DATA_ASSET`, which are compound aliases that this change leaves unchanged. Internal Java callers (RBAC, propagation, DataInsightSystemChartRepository) pass entity-specific aliases for entity-specific operations — no leakage expected there either. Tests pin: entity-alias → canonical resolution; compound-alias passes through; idempotent prefix; comma-separated input; empty-token handling; existing canonical-name behavior unchanged.
1 parent 6adbe28 commit 8b0f4a7

2 files changed

Lines changed: 112 additions & 4 deletions

File tree

openmetadata-service/src/main/java/org/openmetadata/service/search/SearchRepository.java

Lines changed: 50 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -641,13 +641,59 @@ private boolean isKnownCanonicalIndex(String name) {
641641
return false;
642642
}
643643

644+
/**
645+
* Resolve the supplied index alias into the actual Elasticsearch / OpenSearch index name to
646+
* query. Handles four shapes:
647+
*
648+
* <ul>
649+
* <li><b>Entity-specific alias</b> (e.g. {@code "table"}): looked up in
650+
* {@code entityIndexMap} and resolved to the canonical {@code *_search_index} name.
651+
* This is the bug fix — without resolving, ES would treat {@code "table"} as an alias
652+
* and expand it to every index that has that alias attached, including
653+
* {@code column_search_index} (because {@code tableColumn} declares {@code "table"} as
654+
* a {@code parentAlias}). Resolving here bypasses ES's alias expansion entirely so a
655+
* query for tables only hits the table index.
656+
* <li><b>Compound alias</b> (e.g. {@code "all"}, {@code "dataAsset"}): no entry in
657+
* {@code entityIndexMap}, no canonical index, so the alias passes through and ES
658+
* resolves it natively across the entities that have registered the alias. This is the
659+
* intended behavior — searching {@code dataAsset} should surface every data-asset
660+
* entity.
661+
* <li><b>Canonical / legacy index name</b> (e.g. {@code "table_search_index"}): not a key
662+
* in {@code entityIndexMap}, falls through to the prefix-and-pass branch, identical to
663+
* the legacy behavior.
664+
* <li><b>Already cluster-prefixed token</b>: idempotent — returned unchanged so that
665+
* internal code paths that hand back a resolved value don't double-prefix.
666+
* </ul>
667+
*
668+
* Comma-separated tokens are resolved independently. Empty tokens (from {@code "table,"} or
669+
* {@code ","}) are dropped instead of materializing as a bare cluster prefix; if every token
670+
* is empty the original input is returned unchanged so downstream ES surfaces a normal
671+
* "unknown index" error instead of an empty-target failure.
672+
*/
644673
public String getIndexOrAliasName(String name) {
645-
if (clusterAlias == null || clusterAlias.isEmpty()) {
674+
if (nullOrEmpty(name)) {
646675
return name;
647676
}
648-
return Arrays.stream(name.split(","))
649-
.map(index -> clusterAlias + INDEX_NAME_SEPARATOR + index.trim())
650-
.collect(Collectors.joining(","));
677+
String prefix =
678+
clusterAlias == null || clusterAlias.isEmpty() ? null : clusterAlias + INDEX_NAME_SEPARATOR;
679+
String resolved =
680+
Arrays.stream(name.split(","))
681+
.map(String::trim)
682+
.filter(t -> !t.isEmpty())
683+
.map(t -> resolveSingleAliasToken(t, prefix))
684+
.collect(Collectors.joining(","));
685+
return resolved.isEmpty() ? name : resolved;
686+
}
687+
688+
private String resolveSingleAliasToken(String token, String clusterPrefix) {
689+
if (clusterPrefix != null && token.startsWith(clusterPrefix)) {
690+
return token;
691+
}
692+
IndexMapping mapping = entityIndexMap == null ? null : entityIndexMap.get(token);
693+
if (mapping != null) {
694+
return mapping.getIndexName(clusterAlias);
695+
}
696+
return clusterPrefix == null ? token : clusterPrefix + token;
651697
}
652698

653699
private static final Map<String, Set<String>> RBAC_CHILD_TYPES =

openmetadata-service/src/test/java/org/openmetadata/service/search/SearchRepositoryBehaviorTest.java

Lines changed: 62 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -275,6 +275,68 @@ void indexNameHelpersRespectClusterAlias() {
275275
"table_search_index", repository.getIndexNameWithoutAlias("cluster_table_search_index"));
276276
}
277277

278+
/**
279+
* Bug regression for issue #27761: passing the entity-specific alias {@code "table"} used to
280+
* leak into ES alias expansion and surface tableColumn docs (because column_search_index is
281+
* registered with {@code "table"} as one of its aliases). Resolving the alias to its canonical
282+
* index name here bypasses ES's alias resolution, so the search hits exactly the table index.
283+
*/
284+
@Test
285+
void getIndexOrAliasNameResolvesEntitySpecificAliasToCanonicalIndex() {
286+
assertEquals("cluster_table_search_index", repository.getIndexOrAliasName("table"));
287+
assertEquals("cluster_domain_search_index", repository.getIndexOrAliasName("domain"));
288+
}
289+
290+
/**
291+
* Compound aliases like {@code "all"} and {@code "dataAsset"} have no entry in
292+
* {@code entityIndexMap} (they're meta-aliases registered against many entities at index
293+
* creation time). The resolver passes them through with the cluster prefix so ES expands them
294+
* natively — searching {@code dataAsset} should still surface every data-asset entity.
295+
*/
296+
@Test
297+
void getIndexOrAliasNamePassesCompoundAliasesThroughForNativeESExpansion() {
298+
assertEquals("cluster_dataAsset", repository.getIndexOrAliasName("dataAsset"));
299+
assertEquals("cluster_all", repository.getIndexOrAliasName("all"));
300+
}
301+
302+
/**
303+
* Defense-in-depth: a token that already carries the cluster prefix must not get prefixed
304+
* again. Otherwise multi-tenant deployments would 404 on
305+
* {@code cluster_cluster_table_search_index} if any internal code accidentally hands a
306+
* resolved value back to this method.
307+
*/
308+
@Test
309+
void getIndexOrAliasNameIsIdempotentForAlreadyPrefixedTokens() {
310+
assertEquals(
311+
"cluster_table_search_index", repository.getIndexOrAliasName("cluster_table_search_index"));
312+
}
313+
314+
/**
315+
* Mixed input: each comma-separated token is resolved independently. Entity-specific aliases
316+
* resolve to canonical names; compound aliases pass through.
317+
*/
318+
@Test
319+
void getIndexOrAliasNameResolvesEachCommaSeparatedTokenIndependently() {
320+
assertEquals(
321+
"cluster_table_search_index,cluster_dataAsset",
322+
repository.getIndexOrAliasName("table,dataAsset"));
323+
}
324+
325+
/**
326+
* Stray-comma / empty-token input must not produce bare cluster prefixes such as
327+
* {@code "cluster_"}. Empty tokens are dropped; if every token is empty the original string
328+
* is returned unchanged so downstream ES surfaces a normal "unknown index" error instead of
329+
* a confusing empty-target failure.
330+
*/
331+
@Test
332+
void getIndexOrAliasNameDropsEmptyTokensAndPreservesAllEmptyInput() {
333+
assertEquals("cluster_table_search_index", repository.getIndexOrAliasName("table,"));
334+
assertEquals(
335+
"cluster_table_search_index,cluster_domain_search_index",
336+
repository.getIndexOrAliasName("table, ,domain"));
337+
assertEquals(", ,", repository.getIndexOrAliasName(", ,"));
338+
}
339+
278340
@Test
279341
void indexExistsFallsBackToAliasLookup() {
280342
when(searchClient.indexExists("cluster_table_search_index")).thenReturn(false);

0 commit comments

Comments
 (0)