Skip to content

Refactor/efficient union type memory#148043

Draft
GalLalouche wants to merge 22 commits intoelastic:mainfrom
GalLalouche:refactor/efficient_union_type_memory
Draft

Refactor/efficient union type memory#148043
GalLalouche wants to merge 22 commits intoelastic:mainfrom
GalLalouche:refactor/efficient_union_type_memory

Conversation

@GalLalouche
Copy link
Copy Markdown
Contributor

TODO

Resolves #147600.

`EsIndex#fieldToUnmappedIndices` was an `O(numFields * numIndices)`
map tracking which indices each field was unmapped in. For indices
with many fields (e.g. `.ds-metrics-*`) this caused OutOfMemory
errors, as reported in elastic#145920.

Partially-unmapped field wrapping (into
`PotentiallyUnmappedKeywordEsField` /
`InvalidMappedField.potentiallyUnmapped`) happened in the `Analyzer`,
using `fieldToUnmappedIndices` as a separate side-channel.

Move the wrapping into `IndexResolver#mergedMappings`, encoding the
partial-mapping signal directly into the `EsField` hierarchy. Only
applied when `trackUnmappedFieldIndices` is on (i.e.
`unmapped_fields="load"` or an `INSIST` clause). `EsIndex` no longer
carries `fieldToUnmappedIndices`.

The `Analyzer` no longer wraps fields. The `Verifier` now inspects
the `IndexResolution` mapping directly to detect partially-unmapped
non-keyword fields, since `UnionTypesCleanup.cleanTypeConflicts`
reverts single-type `potentiallyUnmapped` wrappers in the plan.

Additional fixes uncovered along the way:

- `wrapPartiallyUnmappedField` skips `OBJECT` fields, which are
  containers rather than leaves — wrapping them triggered
  `illegal data type [object]` downstream.
- `PotentiallyUnmappedKeywordEsField` is constructed with the full
  dotted path (e.g. `city.country.continent.name`) so that
  `DefaultShardContextForUnmappedField#fieldType` matches against
  the path Lucene passes in. Without this, deeply-nested unmapped
  keyword subfields returned `null` instead of loading from
  `_source`.
- `GoldenTestCase#mergeMappings` applies the same wrapping as the
  production path (gated on the same flag), so golden outputs for
  `unmapped_fields="load"` stay stable without any plan-shape
  changes.

Made-with: Cursor
Two related fixes after moving partially-mapped keyword wrapping from
the Analyzer into IndexResolver:

- multiIndexAnalyzer test helper used to mark partial_type_keyword via
  the (now removed) fieldToUnmappedIndices map. Wrap it via
  IndexResolver.wrapPartiallyUnmappedField, mirroring production.

- PropgateUnmappedFields had an all-or-nothing short-circuit: if the
  EsRelation already contained any PUK, the rule skipped propagation
  entirely. That worked when wrapping happened in the Analyzer for
  load mode only, but now any partially-mapped keyword is pre-wrapped,
  so the short-circuit dropped new PUKs introduced by INSIST on a
  field absent from the index. Filter to PUKs not already in the
  relation's output and only merge the missing ones.

Made-with: Cursor
Introduce two memory-frugal variants of the EsField subtypes used for
union-typed fields and route the analyzer to them on capable transport
versions:

* MultiTypeEsField2 keys conversion expressions by data type (plus an
  optional unmapped conversion) instead of per-index, dropping the
  conversion map from O(num_indices) to O(num_types) per field.
* InvalidMappedField2 keeps at most three indices per type plus a
  "..." sentinel; the error message is rendered from the full input
  before truncation, so users still get "[a, b, c] and [N] other
  indices".
* A common UnionTypeEsField interface lets call sites work with either
  flavor without instanceof chains.
* The analyzer threads the minimum transport version through and
  produces the new representations when ESQL_MULTI_TYPE_ES_FIELD_2 is
  supported, falling back to the legacy classes otherwise.

For a plan with 50 union-typed fields each conflicting across 5,000
indices, retained memory drops ~35x (14.4 MB -> 407 KB), as exercised
by the new MultiTypeEsFieldMemoryTests.

Made-with: Cursor
CompactMultiTypeEsField now keys its conversion map by DataType
instead of typeName(); the wire format is unchanged (DataType.writeTo
already routes through PlanStreamOutput cached strings).

CompactInvalidMappedField is gone, replaced by static factories
InvalidMappedField.compact / compactPotentiallyUnmapped. The subclass
added no state and overrode no behavior; it only forced widening
InvalidMappedField's constructor to protected and makeErrorMessage to
package-private. Both are private again.

While here, switch the truncation tests to Strings.format so they
stop breaking under tests.locale=as.

Made-with: Cursor
CompactInvalidMappedField is now a peer of InvalidMappedField rather
than a subclass: both implement a new TypeConflictField interface that
holds the shared IMF-specific surface plus a static makeErrorMessage
helper. Production analyzer/verifier/type-resolution sites branch on
the interface so they remain oblivious to which flavor backs a field.

Made-with: Cursor
The truncated typesToIndices map now uses DataType keys (mirroring
CompactMultiTypeEsField), so the type-safe key flows down to
construction. The TypeConflictField#getTypesToIndices contract is
preserved by converting to string keys on access; that's cheap because
the map is post-truncation and the accessor sits on a cold path.

Made-with: Cursor
Pull the shared per-type validation loop into UnionTypeEsField.resolve,
returning a (resolvedDataType, typeToExpr) Resolution. Both resolveFrom
methods now consume that and only differ in their post-processing:
the compact form keeps the per-type map and applies the unmapped
fallback for the resolved type, while the legacy form expands it to
per-index entries.

Made-with: Cursor
@GalLalouche GalLalouche added >refactoring Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) :Analytics/ES|QL AKA ESQL labels Apr 30, 2026
CompactMultiTypeEsField's per-source-type conversion map is keyed by
what field-caps reports on the coordinator, which is the family type
(e.g. "keyword" for a constant_keyword field). The data-node lookup
in EsPhysicalOperationProviders was using mft.typeName() instead, so
constant_keyword (and any other family-collapsed type) missed the
lookup, fell through to the unmapped path, and produced all-nulls.

Fixes the constant_keyword yaml rest test on this branch.

Made-with: Cursor
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Analytics/ES|QL AKA ESQL >refactoring Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v9.5.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ESQL: union types + load need to use more memory-efficient MultiTypeEsFields

2 participants