Refactor(ingestion): introduce ClassifiableEntityAdapter to eliminate scattered isinstance checks#27716
Refactor(ingestion): introduce ClassifiableEntityAdapter to eliminate scattered isinstance checks#27716
Conversation
| from metadata.sampler.entity_adapters import adapter_for | ||
|
|
||
| adapter = adapter_for(table) | ||
| fields = adapter.patch_fields if adapter else ["tags"] |
There was a problem hiding this comment.
⚠️ Bug: patch_column_tags can fetch entity with wrong fields when adapter is None
In patch_column_tags, when adapter_for(table) returns None (unsupported entity type), fields defaults to ["tags"]. The method then proceeds to fetch the entity and call _prepare_destination_for_column_tags, which independently calls adapter_for(table) again and returns None, causing a silent no-op. While not a crash, this is wasteful (fetches entity from API for nothing) and the redundant adapter_for lookup in _prepare_destination_for_column_tags is unnecessary — the adapter should be resolved once and passed through, or the method should bail out early when adapter is None.
Suggested fix:
adapter = adapter_for(table)
if adapter is None:
logger.warning(
"Unsupported entity type for column tag patching: %s",
type(table).__name__,
)
return None
fields = adapter.patch_fields
# ...then pass adapter to _prepare_destination_for_column_tags
# instead of looking it up again inside that method
Was this helpful? React with 👍 / 👎 | Reply gitar fix to apply this suggestion
| _BY_ENTITY: Dict[type, EntityAdapter] = { | ||
| Table: TableAdapter(), | ||
| Container: ContainerAdapter(), | ||
| } | ||
|
|
||
| _BY_PIPELINE: Dict[type, EntityAdapter] = { | ||
| DatabaseServiceAutoClassificationPipeline: TableAdapter(), | ||
| StorageServiceAutoClassificationPipeline: ContainerAdapter(), | ||
| } |
There was a problem hiding this comment.
💡 Quality: Duplicate adapter instances in _BY_ENTITY and _BY_PIPELINE
The module-level registries _BY_ENTITY and _BY_PIPELINE each instantiate their own TableAdapter() and ContainerAdapter(), resulting in 4 adapter objects when 2 would suffice. Since adapters are stateless singletons by design, consider creating a single instance of each and referencing it in both dictionaries. This also makes the relationship between the two registries more explicit.
Suggested fix:
_TABLE_ADAPTER = TableAdapter()
_CONTAINER_ADAPTER = ContainerAdapter()
_BY_ENTITY: Dict[type, EntityAdapter] = {
Table: _TABLE_ADAPTER,
Container: _CONTAINER_ADAPTER,
}
_BY_PIPELINE: Dict[type, EntityAdapter] = {
DatabaseServiceAutoClassificationPipeline: _TABLE_ADAPTER,
StorageServiceAutoClassificationPipeline: _CONTAINER_ADAPTER,
}
Was this helpful? React with 👍 / 👎 | Reply gitar fix to apply this suggestion
Code Review
|
| Compact |
|
Was this helpful? React with 👍 / 👎 | Gitar
|
🔴 Playwright Results — 3 failure(s), 22 flaky✅ 3947 passed · ❌ 3 failed · 🟡 22 flaky · ⏭️ 86 skipped
Genuine Failures (failed on all attempts)❌
|



Describe your changes:
Refactors the auto-classification pipeline to eliminate 6 scattered
isinstance(entity, Table/Container)checks introduced in #26495. Replaces them with aClassifiableEntityAdapterstrategy pattern backed by a single O(1) registry.What changed:
sampler/entity_adapters.py—EntityAdapterABC +TableAdapter,ContainerAdapter, and two dict registries (_BY_ENTITY,_BY_PIPELINE) withadapter_for()/adapter_for_pipeline()lookup functionssampler/config_utils.py—build_database_service_conn_config()free function extracted fromSamplerProcessor._copy_service_config()sampler/processor.py— deleted_run_for_table,_run_for_container,_copy_service_config; replaced with single_run_for_entity()using adapter dispatch (~120 lines removed, ~30 added)pii/base_processor.py—_get_entity_columnsnow delegates toadapter_for(entity).get_columns(entity)ometa/mixins/patch_mixin.py— deleted 4 isinstance helpers (_get_fields_for_entity,_prepare_table_destination,_prepare_container_destination,_prepare_destination_for_column_tags); replaced with single method using adapteringestion/sink/metadata_rest.py— deleted dead_patch_entity_column_tagssingledispatch (both registered handlers were byte-identical); callsite replaced with directmetadata.patch_column_tags()docs/auto-classification/add-support-for-another-entity.md— updated to reflect the adapter patternWhy:
Adding
DashboardDataModel(already referenced inpii/types.py) would have required touching all 6 sites. With this refactor it requires: one newEntityAdaptersubclass + two registry entries + extendClassifiableEntityTypeunion + extendworkflow/classification.pyisinstance tuple. All sampling, PII, and patch logic picks it up automatically.Type of change:
Checklist:
Refactor(ingestion): introduce ClassifiableEntityAdapter to eliminate scattered isinstance checks