Skip to content

[engine] fix UDF→Druid KeyError via stable mapping key#291

Draft
haileyok wants to merge 7 commits into
mainfrom
claude/bug-158-udf-druid
Draft

[engine] fix UDF→Druid KeyError via stable mapping key#291
haileyok wants to merge 7 commits into
mainfrom
claude/bug-158-udf-druid

Conversation

@haileyok
Copy link
Copy Markdown
Member

@haileyok haileyok commented May 23, 2026

Summary

DruidQueryTransformer.transform_Call (and several validators / the executor) looked up entries in _udf_node_mapping by id(node) — Python's object-identity int. That key was unstable across the validator-pipeline → consumer boundary, so any UDF with a Druid translation (e.g. RegexMatch(item=UserName, regex='^jake')) blew up with KeyError and returned a 500 from /queries/validate.

Replace the unstable id(node) key with a stable source-position tuple (source.path, start_line, start_pos) and route every reader through a new udf_mapping_key(node) helper.

Related Issues/Tasks

Closes #158

Changes Made

  • validate_call_kwargs.py: changed UDFNodeMapping from Dict[int, ...] to Dict[Tuple[str, int, int], ...]. Added udf_mapping_key(node) -> Tuple[str, int, int] helper next to the alias. Producer uses the helper.
  • Updated all eight consumers to look up via udf_mapping_key(node):
    • query_language/ast_druid_translator.py (the reported failure site)
    • executor/node_executor/call_executor.py
    • ast_validator/validators/feature_name_to_entity_type_mapping.py
    • ast_validator/validators/imports_must_not_have_cycles.py
    • ast_validator/validators/validate_dynamic_calls_have_annotated_rvalue.py
    • ast_validator/validators/validate_labels.py
    • ast_validator/validators/validate_static_types.py
    • ast_validator/validators/variables_must_be_defined.py
  • Added regression test test_udf_node_mapping_uses_stable_keys in test_ast_druid_translator.py. The test exercises the producer/consumer contract via parse_query_to_validated_ast and asserts the keys are (str, int, int) tuples. With the producer reverted to id(node), the test fails inside the validator pipeline with the original-shape KeyError.

Confidence Level

Confidence Level: Claude

Testing

  • Full engine sweep (this branch): 806 passed, 1 skipped, 4 xfailed (63 errors are pre-existing KeyError: 'POSTGRES_HOSTS' collection errors unrelated to this change).
  • Same on main: 804 passed (the +2 are the new regression tests).
  • test_ast_druid_translator.py: 16 passed.
  • mypy on all touched files: clean.

Cycle notes

This took three review cycles. Cycle 1 shipped a fix without a real red→green test. Cycle 2 added the helper + cleaned PR-context comments but missed two orphan consumers (CallExecutor and VariablesMustBeDefined) — 360 engine tests were broken until cycle 3 caught them.

Summary by CodeRabbit

  • Bug Fixes

    • Improved stability of internal node mapping lookup mechanism across validation and execution layers to use consistent, position-based keys instead of unreliable object identity references.
  • Tests

    • Added regression test validating stable node mapping behavior in query translation.

Review Change Stack

haileyok and others added 6 commits May 23, 2026 00:10
…158)

Tests that Druid query transformer can handle UDF-containing queries without KeyError.
Verifies the fix for issue #158 where UDF node mapping was keyed by Python id()
which could be reused after garbage collection.
…158)

Changed UDF node mapping from keying by Python id(node) to a stable tuple key
(source_path, start_line, start_pos). This prevents KeyError when the same query
is processed multiple times or when Python reuses object IDs after garbage
collection.

Applied the same fix to all validators that use the UDF node mapping:
- ValidateCallKwargs (populates the mapping)
- DruidQueryTransformer (consumes in Druid translation)
- ImportsMustNotHaveCycles
- FeatureNameToEntityTypeMapping
- ValidateLabels
- ValidateDynamicCallsHaveAnnotatedRValue
- ValidateStaticTypes
Add udf_mapping_key() helper to avoid copy-pasting the span-based key
construction across 7 consumer sites. Update all consumers to use it.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replace 6 instances of manual span-key construction with calls to the
new udf_mapping_key() helper function.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Update test_udf_node_mapping_with_missing_entry to explicitly verify
that the mapping uses stable (source_path, line, pos) tuple keys rather
than id()-based integer keys. This confirms the fix for issue #158.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…efined (#158)

Cycle-2 refactored UDF mapping from id(node) to stable (source_path, line, pos)
keys. Two consumers were missed in that update, causing KeyError on every UDF
execution. This broke 360 executor and stdlib UDF tests.

- CallExecutor line 29: lookup now uses udf_mapping_key(node)
- VariablesMustBeDefined line 134: lookup now uses udf_mapping_key(node)

All other consumers were already updated in cycle-2.
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 23, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: eda0d77a-9637-4349-8795-14b7aa5a768f

📥 Commits

Reviewing files that changed from the base of the PR and between 615ac01 and 63b8cf2.

📒 Files selected for processing (10)
  • osprey_worker/src/osprey/engine/ast_validator/validators/feature_name_to_entity_type_mapping.py
  • osprey_worker/src/osprey/engine/ast_validator/validators/imports_must_not_have_cycles.py
  • osprey_worker/src/osprey/engine/ast_validator/validators/validate_call_kwargs.py
  • osprey_worker/src/osprey/engine/ast_validator/validators/validate_dynamic_calls_have_annotated_rvalue.py
  • osprey_worker/src/osprey/engine/ast_validator/validators/validate_labels.py
  • osprey_worker/src/osprey/engine/ast_validator/validators/validate_static_types.py
  • osprey_worker/src/osprey/engine/ast_validator/validators/variables_must_be_defined.py
  • osprey_worker/src/osprey/engine/executor/node_executor/call_executor.py
  • osprey_worker/src/osprey/engine/query_language/ast_druid_translator.py
  • osprey_worker/src/osprey/engine/query_language/tests/test_ast_druid_translator.py

📝 Walkthrough

Walkthrough

The PR replaces unstable Python object identity (id()) with stable source location tuples for UDF node mapping keys, fixing query transformation failures where the same Call node received different ids in separate phases. All validators, the executor, and the query translator are updated to use the new udf_mapping_key(node) helper.

Changes

Stable UDF node mapping keys throughout validation and execution

Layer / File(s) Summary
Stable UDF node mapping key definition
osprey_worker/src/osprey/engine/ast_validator/validators/validate_call_kwargs.py
New udf_mapping_key(Call) -> Tuple[str, int, int] helper computes stable keys from source path and span location. Type alias UDFNodeMapping updated from Dict[int, ...] to Dict[Tuple[str, int, int], ...]. When recording validated UDFs, the mapping now uses the stable key instead of id(call_node).
Validator imports and lookups updated to stable keys
osprey_worker/src/osprey/engine/ast_validator/validators/feature_name_to_entity_type_mapping.py, imports_must_not_have_cycles.py, validate_dynamic_calls_have_annotated_rvalue.py, validate_labels.py, validate_static_types.py, variables_must_be_defined.py
Six validators import udf_mapping_key and update _udf_node_mapping lookups from id(call_node) to udf_mapping_key(call_node) to retrieve cached UDF and argument metadata during validation.
Executor and query translator use stable keys
osprey_worker/src/osprey/engine/executor/node_executor/call_executor.py, osprey_worker/src/osprey/engine/query_language/ast_druid_translator.py
CallExecutor and DruidQueryTransformer import udf_mapping_key and update their lookups into the validated UDFNodeMapping from id(node) to udf_mapping_key(node), enabling correct UDF and arguments retrieval at execution and query translation time.
Test coverage for stable keying and RegexMatch
osprey_worker/src/osprey/engine/query_language/tests/test_ast_druid_translator.py
Two new tests: test_parses_query_with_regex_match_on_username validates RegexMatch query transformation to Druid regex filters. test_udf_node_mapping_uses_stable_keys is a regression test confirming stable tuple keys are used instead of id(), and verifies DruidQueryTransformer successfully transforms the AST without relying on unstable keying.

🎯 2 (Simple) | ⏱️ ~12 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 12.50% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title directly and concisely summarizes the main change: fixing a KeyError via switching to stable mapping keys for UDF node lookups.
Linked Issues check ✅ Passed All coding requirements from issue #158 are met: stable mapping keys replace object IDs, udf_mapping_key helper is introduced, and all 8 consumers are updated to use the new keying scheme.
Out of Scope Changes check ✅ Passed All changes are directly scoped to addressing the KeyError issue by replacing unstable id()-based keys with stable (source_path, start_line, start_pos) tuple keys across validators and consumers.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch claude/bug-158-udf-druid

Comment @coderabbitai help to get the list of available commands and usage tips.

Pre-commit ruff-format normalizes double-quoted strings to single quotes
and unwraps the multi-line assert messages onto single lines under the
project's printWidth config. The new test_udf_node_mapping_uses_stable_keys
in cycle-2 was missed by local verification (only ran ruff check, not
ruff format --check).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@haileyok haileyok marked this pull request as ready for review May 23, 2026 01:00
@haileyok haileyok requested review from a team, EXBreder, ayubun and vinaysrao1 as code owners May 23, 2026 01:00
@chimosky
Copy link
Copy Markdown
Contributor

Reviewed, and tested.

This is a great long term replacement for object identity references, it doesn't fix the issue though, I still get the error.

I suspect this is supposed to be ontop of #233, which is also dependent on your write-output fix, I'll update that PR once the write-output branch is merged.

@haileyok haileyok marked this pull request as draft May 29, 2026 17:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Failed UDF -> Druid query filter translation

2 participants