Skip to content

fix: resolve path-based lineage for Databricks external tables (#27561)#27648

Open
ShivamChavan01 wants to merge 2 commits intoopen-metadata:mainfrom
ShivamChavan01:fix/databricks-external-table-path-lineage-27561
Open

fix: resolve path-based lineage for Databricks external tables (#27561)#27648
ShivamChavan01 wants to merge 2 commits intoopen-metadata:mainfrom
ShivamChavan01:fix/databricks-external-table-path-lineage-27561

Conversation

@ShivamChavan01
Copy link
Copy Markdown

Describe your changes:

Fixes #27561

External tables in Databricks are referenced using cloud storage paths (e.g. delta.\abfss://...`) instead of table names. In this case, Databricks system tables populate source_path/target_pathand leavesource_table_full_name/target_table_full_name` as null. The lineage processor was filtering out these rows entirely, resulting in missing lineage for all external tables.

Changes:

  • databricks/queries.py + unitycatalog/queries.py: Added source_path and target_path to SELECT; relaxed WHERE filter from hard IS NOT NULL on name columns to (name IS NOT NULL OR path IS NOT NULL)
  • databricks/client.py: Pass source_path and target_path through the lineage cache dict
  • unitycatalog/lineage.py: Build a reverse path → table_fqn map from the external locations cache; fall back to path resolution when full_name is null; ensure _cache_external_locations() runs before _cache_lineage() so the reverse map is available
  • test_unity_catalog_lineage.py: Updated mock row definitions to include path fields; added tests for path resolution, unresolvable path skipping, and reverse map construction

Type of change:

  • Bug fix

Checklist:

  • I have read the CONTRIBUTING document.
  • My PR title is Fixes #27561: resolve path-based lineage for Databricks external tables
  • I have commented on my code, particularly in hard-to-understand areas.
  • I have added a test that covers the exact scenario we are fixing.

@ShivamChavan01 ShivamChavan01 requested a review from a team as a code owner April 23, 2026 02:40
@github-actions
Copy link
Copy Markdown
Contributor

Hi there 👋 Thanks for your contribution!

The OpenMetadata team will review the PR shortly! Once it has been labeled as safe to test, the CI workflows
will start executing and we'll be able to make sure everything is working as expected.

Let us know if you need any help!

Comment thread ingestion/src/metadata/ingestion/source/database/databricks/queries.py Outdated
…-FQN resolution

Reverts the path-based fallback in DATABRICKS_GET_TABLE_LINEAGE and
DATABRICKS_GET_COLUMN_LINEAGE queries since DatabricksClient lacks
the external_path_to_fqn map needed to resolve paths to FQNs.

Without this map, relaxing the IS NOT NULL constraints creates dict keys
containing None values that never match downstream lookups.
@github-actions
Copy link
Copy Markdown
Contributor

Hi there 👋 Thanks for your contribution!

The OpenMetadata team will review the PR shortly! Once it has been labeled as safe to test, the CI workflows
will start executing and we'll be able to make sure everything is working as expected.

Let us know if you need any help!

@gitar-bot
Copy link
Copy Markdown

gitar-bot Bot commented Apr 23, 2026

Code Review ✅ Approved 1 resolved / 1 findings

Resolves path-based lineage for Databricks external tables by enabling path fallback during column lineage caching. No issues found.

✅ 1 resolved
Bug: DatabricksClient column lineage caching ignores path fallback

📄 ingestion/src/metadata/ingestion/source/database/databricks/client.py:370-379 📄 ingestion/src/metadata/ingestion/source/database/databricks/queries.py:107-121 📄 ingestion/src/metadata/ingestion/source/database/databricks/client.py:348-355 📄 ingestion/src/metadata/ingestion/source/database/databricks/queries.py:90-104
The DATABRICKS_GET_COLUMN_LINEAGE query was relaxed to allow rows where source_table_full_name or target_table_full_name is NULL (as long as the corresponding path is not null). However, the cache_lineage() method in client.py (lines 370-379) still directly uses row.source_table_full_name and row.target_table_full_name without any path-based fallback. This means:

  1. Column lineage rows for external tables will create dict keys containing None (e.g., (None, 'cat.schema.target')), which won't match any downstream lookup.
  2. These phantom entries silently pollute entity_column_lineage and will never produce useful lineage.

The same path-resolution logic added to unitycatalog/lineage.py should be applied here, or the column lineage query's WHERE clause should retain the IS NOT NULL filter on table name columns (as done before this PR) since there's no external_path_to_fqn map available in DatabricksClient.

Options

Display: compact → Showing less information.

Comment with these commands to change:

Compact
gitar display:verbose         

Was this helpful? React with 👍 / 👎 | Gitar

@ulixius9 ulixius9 added the safe to test Add this label to run secure Github workflows on PRs label Apr 25, 2026
@github-actions
Copy link
Copy Markdown
Contributor

The Python checkstyle failed.

Please run make py_format and py_format_check in the root of your repository and commit the changes to this PR.
You can also use pre-commit to automate the Python code formatting.

You can install the pre-commit hooks with make install_test precommit_install.

@ulixius9
Copy link
Copy Markdown
Member

@ShivamChavan01

#27561 (comment)
can you attach screenshot of how lineage was looking before your fix and how after your fix this is resolved

@ShivamChavan01
Copy link
Copy Markdown
Author

@ShivamChavan01

#27561 (comment)
can you attach screenshot of how lineage was looking before your fix and how after your fix this is resolved

Sure ill attach it

@sonarqubecloud
Copy link
Copy Markdown

@github-actions
Copy link
Copy Markdown
Contributor

🔴 Playwright Results — 1 failure(s), 28 flaky

✅ 3652 passed · ❌ 1 failed · 🟡 28 flaky · ⏭️ 89 skipped

Shard Passed Failed Flaky Skipped
🟡 Shard 1 476 0 4 4
🟡 Shard 2 648 0 3 7
🔴 Shard 3 651 1 4 1
🟡 Shard 4 625 0 9 27
🟡 Shard 5 610 0 1 42
🟡 Shard 6 642 0 7 8

Genuine Failures (failed on all attempts)

Features/RestoreEntityInheritedFields.spec.ts › Validate restore with Inherited domain and data products assigned (shard 3)
Error: �[2mexpect(�[22m�[31mreceived�[39m�[2m).�[22mtoBe�[2m(�[22m�[32mexpected�[39m�[2m) // Object.is equality�[22m

Expected: �[32m200�[39m
Received: �[31m400�[39m
🟡 28 flaky test(s) (passed on retry)
  • Flow/Tour.spec.ts › Tour should work from help section (shard 1, 1 retry)
  • Flow/Tour.spec.ts › Tour should work from welcome screen (shard 1, 1 retry)
  • Flow/Tour.spec.ts › Tour should work from URL directly (shard 1, 1 retry)
  • Pages/UserCreationWithPersona.spec.ts › Create user with persona and verify on profile (shard 1, 1 retry)
  • Features/BulkEditEntity.spec.ts › Glossary (shard 2, 1 retry)
  • Features/ChangeSummaryBadge.spec.ts › Automated badge should appear on entity description with Automated source (shard 2, 1 retry)
  • Features/Glossary/GlossaryHierarchy.spec.ts › should cancel move operation (shard 2, 1 retry)
  • Features/RestoreEntityInheritedFields.spec.ts › Validate restore with Inherited domain and data products assigned (shard 3, 2 retries)
  • Features/RestoreEntityInheritedFields.spec.ts › Validate restore with Inherited domain and data products assigned (shard 3, 1 retry)
  • Features/RestoreEntityInheritedFields.spec.ts › Validate restore with Inherited domain and data products assigned (shard 3, 2 retries)
  • Flow/SchemaTable.spec.ts › schema table test (shard 3, 1 retry)
  • Pages/Customproperties-part2.spec.ts › entityReferenceList shows item count, scrollable list, no expand toggle (shard 4, 1 retry)
  • Pages/DataContractsSemanticRules.spec.ts › Validate Owner Rule Is_Set (shard 4, 1 retry)
  • Pages/DataProducts.spec.ts › Pagination (shard 4, 1 retry)
  • Pages/DescriptionVisibility.spec.ts › Customized Table detail page Description widget shows long description (shard 4, 1 retry)
  • Pages/Domains.spec.ts › Rename domain with assets (tables, topics, dashboards) preserves associations (shard 4, 1 retry)
  • Pages/Domains.spec.ts › Multiple consecutive domain renames preserve all associations (shard 4, 1 retry)
  • Pages/Entity.spec.ts › Column detail panel data type display and nested column navigation (shard 4, 1 retry)
  • Pages/Entity.spec.ts › Tag Add, Update and Remove (shard 4, 1 retry)
  • Pages/Entity.spec.ts › Glossary Term Add, Update and Remove (shard 4, 1 retry)
  • Pages/Glossary.spec.ts › Add and Remove Assets (shard 5, 1 retry)
  • Features/AutoPilot.spec.ts › Create Service and check the AutoPilot status (shard 6, 1 retry)
  • Pages/HyperlinkCustomProperty.spec.ts › should accept valid http and https URLs (shard 6, 1 retry)
  • Pages/Lineage/LineageFilters.spec.ts › Verify lineage schema filter selection (shard 6, 1 retry)
  • Pages/Lineage/LineageRightPanel.spec.ts › Verify custom properties tab IS visible for supported type: searchIndex (shard 6, 1 retry)
  • Pages/ODCSImportExport.spec.ts › Multi-object ODCS contract - object selector shows all schema objects (shard 6, 1 retry)
  • Pages/UserDetails.spec.ts › Create team with domain and verify visibility of inherited domain in user profile after team removal (shard 6, 1 retry)
  • Pages/Users.spec.ts › Permissions for table details page for Data Consumer (shard 6, 1 retry)

📦 Download artifacts

How to debug locally
# Download playwright-test-results-<shard> artifact and unzip
npx playwright show-trace path/to/trace.zip    # view trace

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

safe to test Add this label to run secure Github workflows on PRs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] Lineage Databricks is not performed for external tables using path-based queries.

2 participants