[engine] fix UDF→Druid KeyError via stable mapping key#291
Conversation
…158) Changed UDF node mapping from keying by Python id(node) to a stable tuple key (source_path, start_line, start_pos). This prevents KeyError when the same query is processed multiple times or when Python reuses object IDs after garbage collection. Applied the same fix to all validators that use the UDF node mapping: - ValidateCallKwargs (populates the mapping) - DruidQueryTransformer (consumes in Druid translation) - ImportsMustNotHaveCycles - FeatureNameToEntityTypeMapping - ValidateLabels - ValidateDynamicCallsHaveAnnotatedRValue - ValidateStaticTypes
Add udf_mapping_key() helper to avoid copy-pasting the span-based key construction across 7 consumer sites. Update all consumers to use it. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replace 6 instances of manual span-key construction with calls to the new udf_mapping_key() helper function. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Update test_udf_node_mapping_with_missing_entry to explicitly verify that the mapping uses stable (source_path, line, pos) tuple keys rather than id()-based integer keys. This confirms the fix for issue #158. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…efined (#158) Cycle-2 refactored UDF mapping from id(node) to stable (source_path, line, pos) keys. Two consumers were missed in that update, causing KeyError on every UDF execution. This broke 360 executor and stdlib UDF tests. - CallExecutor line 29: lookup now uses udf_mapping_key(node) - VariablesMustBeDefined line 134: lookup now uses udf_mapping_key(node) All other consumers were already updated in cycle-2.
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Plus Run ID: 📒 Files selected for processing (10)
📝 WalkthroughWalkthroughThe PR replaces unstable Python object identity ( ChangesStable UDF node mapping keys throughout validation and execution
🎯 2 (Simple) | ⏱️ ~12 minutes 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
Pre-commit ruff-format normalizes double-quoted strings to single quotes and unwraps the multi-line assert messages onto single lines under the project's printWidth config. The new test_udf_node_mapping_uses_stable_keys in cycle-2 was missed by local verification (only ran ruff check, not ruff format --check). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Reviewed, and tested. This is a great long term replacement for object identity references, it doesn't fix the issue though, I still get the error. I suspect this is supposed to be ontop of #233, which is also dependent on your write-output fix, I'll update that PR once the write-output branch is merged. |
Summary
DruidQueryTransformer.transform_Call(and several validators / the executor) looked up entries in_udf_node_mappingbyid(node)— Python's object-identity int. That key was unstable across the validator-pipeline → consumer boundary, so any UDF with a Druid translation (e.g.RegexMatch(item=UserName, regex='^jake')) blew up withKeyErrorand returned a 500 from/queries/validate.Replace the unstable
id(node)key with a stable source-position tuple(source.path, start_line, start_pos)and route every reader through a newudf_mapping_key(node)helper.Related Issues/Tasks
Closes #158
Changes Made
validate_call_kwargs.py: changedUDFNodeMappingfromDict[int, ...]toDict[Tuple[str, int, int], ...]. Addedudf_mapping_key(node) -> Tuple[str, int, int]helper next to the alias. Producer uses the helper.udf_mapping_key(node):query_language/ast_druid_translator.py(the reported failure site)executor/node_executor/call_executor.pyast_validator/validators/feature_name_to_entity_type_mapping.pyast_validator/validators/imports_must_not_have_cycles.pyast_validator/validators/validate_dynamic_calls_have_annotated_rvalue.pyast_validator/validators/validate_labels.pyast_validator/validators/validate_static_types.pyast_validator/validators/variables_must_be_defined.pytest_udf_node_mapping_uses_stable_keysintest_ast_druid_translator.py. The test exercises the producer/consumer contract viaparse_query_to_validated_astand asserts the keys are(str, int, int)tuples. With the producer reverted toid(node), the test fails inside the validator pipeline with the original-shapeKeyError.Confidence Level
Confidence Level: Claude
Testing
KeyError: 'POSTGRES_HOSTS'collection errors unrelated to this change).main: 804 passed (the +2 are the new regression tests).test_ast_druid_translator.py: 16 passed.mypyon all touched files: clean.Cycle notes
This took three review cycles. Cycle 1 shipped a fix without a real red→green test. Cycle 2 added the helper + cleaned PR-context comments but missed two orphan consumers (
CallExecutorandVariablesMustBeDefined) — 360 engine tests were broken until cycle 3 caught them.Summary by CodeRabbit
Bug Fixes
Tests