Refactor/efficient union type memory#148043
Draft
GalLalouche wants to merge 22 commits intoelastic:mainfrom
Draft
Refactor/efficient union type memory#148043GalLalouche wants to merge 22 commits intoelastic:mainfrom
GalLalouche wants to merge 22 commits intoelastic:mainfrom
Conversation
`EsIndex#fieldToUnmappedIndices` was an `O(numFields * numIndices)` map tracking which indices each field was unmapped in. For indices with many fields (e.g. `.ds-metrics-*`) this caused OutOfMemory errors, as reported in elastic#145920. Partially-unmapped field wrapping (into `PotentiallyUnmappedKeywordEsField` / `InvalidMappedField.potentiallyUnmapped`) happened in the `Analyzer`, using `fieldToUnmappedIndices` as a separate side-channel. Move the wrapping into `IndexResolver#mergedMappings`, encoding the partial-mapping signal directly into the `EsField` hierarchy. Only applied when `trackUnmappedFieldIndices` is on (i.e. `unmapped_fields="load"` or an `INSIST` clause). `EsIndex` no longer carries `fieldToUnmappedIndices`. The `Analyzer` no longer wraps fields. The `Verifier` now inspects the `IndexResolution` mapping directly to detect partially-unmapped non-keyword fields, since `UnionTypesCleanup.cleanTypeConflicts` reverts single-type `potentiallyUnmapped` wrappers in the plan. Additional fixes uncovered along the way: - `wrapPartiallyUnmappedField` skips `OBJECT` fields, which are containers rather than leaves — wrapping them triggered `illegal data type [object]` downstream. - `PotentiallyUnmappedKeywordEsField` is constructed with the full dotted path (e.g. `city.country.continent.name`) so that `DefaultShardContextForUnmappedField#fieldType` matches against the path Lucene passes in. Without this, deeply-nested unmapped keyword subfields returned `null` instead of loading from `_source`. - `GoldenTestCase#mergeMappings` applies the same wrapping as the production path (gated on the same flag), so golden outputs for `unmapped_fields="load"` stay stable without any plan-shape changes. Made-with: Cursor
Two related fixes after moving partially-mapped keyword wrapping from the Analyzer into IndexResolver: - multiIndexAnalyzer test helper used to mark partial_type_keyword via the (now removed) fieldToUnmappedIndices map. Wrap it via IndexResolver.wrapPartiallyUnmappedField, mirroring production. - PropgateUnmappedFields had an all-or-nothing short-circuit: if the EsRelation already contained any PUK, the rule skipped propagation entirely. That worked when wrapping happened in the Analyzer for load mode only, but now any partially-mapped keyword is pre-wrapped, so the short-circuit dropped new PUKs introduced by INSIST on a field absent from the index. Filter to PUKs not already in the relation's output and only merge the missing ones. Made-with: Cursor
Introduce two memory-frugal variants of the EsField subtypes used for union-typed fields and route the analyzer to them on capable transport versions: * MultiTypeEsField2 keys conversion expressions by data type (plus an optional unmapped conversion) instead of per-index, dropping the conversion map from O(num_indices) to O(num_types) per field. * InvalidMappedField2 keeps at most three indices per type plus a "..." sentinel; the error message is rendered from the full input before truncation, so users still get "[a, b, c] and [N] other indices". * A common UnionTypeEsField interface lets call sites work with either flavor without instanceof chains. * The analyzer threads the minimum transport version through and produces the new representations when ESQL_MULTI_TYPE_ES_FIELD_2 is supported, falling back to the legacy classes otherwise. For a plan with 50 union-typed fields each conflicting across 5,000 indices, retained memory drops ~35x (14.4 MB -> 407 KB), as exercised by the new MultiTypeEsFieldMemoryTests. Made-with: Cursor
CompactMultiTypeEsField now keys its conversion map by DataType instead of typeName(); the wire format is unchanged (DataType.writeTo already routes through PlanStreamOutput cached strings). CompactInvalidMappedField is gone, replaced by static factories InvalidMappedField.compact / compactPotentiallyUnmapped. The subclass added no state and overrode no behavior; it only forced widening InvalidMappedField's constructor to protected and makeErrorMessage to package-private. Both are private again. While here, switch the truncation tests to Strings.format so they stop breaking under tests.locale=as. Made-with: Cursor
CompactInvalidMappedField is now a peer of InvalidMappedField rather than a subclass: both implement a new TypeConflictField interface that holds the shared IMF-specific surface plus a static makeErrorMessage helper. Production analyzer/verifier/type-resolution sites branch on the interface so they remain oblivious to which flavor backs a field. Made-with: Cursor
The truncated typesToIndices map now uses DataType keys (mirroring CompactMultiTypeEsField), so the type-safe key flows down to construction. The TypeConflictField#getTypesToIndices contract is preserved by converting to string keys on access; that's cheap because the map is post-truncation and the accessor sits on a cold path. Made-with: Cursor
Pull the shared per-type validation loop into UnionTypeEsField.resolve, returning a (resolvedDataType, typeToExpr) Resolution. Both resolveFrom methods now consume that and only differ in their post-processing: the compact form keeps the per-type map and applies the unmapped fallback for the resolved type, while the legacy form expands it to per-index entries. Made-with: Cursor
CompactMultiTypeEsField's per-source-type conversion map is keyed by what field-caps reports on the coordinator, which is the family type (e.g. "keyword" for a constant_keyword field). The data-node lookup in EsPhysicalOperationProviders was using mft.typeName() instead, so constant_keyword (and any other family-collapsed type) missed the lookup, fell through to the unmapped path, and produced all-nulls. Fixes the constant_keyword yaml rest test on this branch. Made-with: Cursor
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
TODO
Resolves #147600.