Skip to content

Improve SSRS Connector - Lineage#27652

Merged
ulixius9 merged 7 commits intomainfrom
ssrs_lineage
Apr 23, 2026
Merged

Improve SSRS Connector - Lineage#27652
ulixius9 merged 7 commits intomainfrom
ssrs_lineage

Conversation

@harshach
Copy link
Copy Markdown
Collaborator

@harshach harshach commented Apr 23, 2026

Describe your changes:

Fixes

I worked on ... because ...

Type of change:

  • Bug fix
  • Improvement
  • New feature
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation

Checklist:

  • I have read the CONTRIBUTING document.
  • My PR title is Fixes <issue-number>: <short explanation>
  • I have commented on my code, particularly in hard-to-understand areas.
  • For JSON Schema changes: I updated the migration scripts or explained why it is not needed.

Summary by Gitar

  • Resilience improvements:
    • Updated SSRS client to stream RDL downloads and abort if they exceed MAX_RDL_BYTES.
    • Refined error handling to propagate SourceConnectionException during outages, preventing accidental entity deletion.
  • Cache optimization:
    • Replaced dictionary-based cache with a memory-efficient single-entry _current_rdl cache for $O(1)$ state management.
  • Security enhancements:
    • Strengthened XML parsing security by case-insensitively blocking <!DOCTYPE and <!ENTITY declarations in RDL content.

This will update automatically on new commits.

Copilot AI review requested due to automatic review settings April 23, 2026 06:45
@harshach harshach requested a review from a team as a code owner April 23, 2026 06:45
@github-actions github-actions Bot added backend safe to test Add this label to run secure Github workflows on PRs labels Apr 23, 2026
Comment thread ingestion/src/metadata/ingestion/source/dashboard/ssrs/metadata.py Outdated
Comment thread .claude/scheduled_tasks.lock Outdated
@github-actions
Copy link
Copy Markdown
Contributor

✅ TypeScript Types Auto-Updated

The generated TypeScript types have been automatically updated based on JSON schema changes in this PR.

@github-actions github-actions Bot requested a review from a team as a code owner April 23, 2026 06:49
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR enhances the SSRS dashboard connector to extract richer metadata from SSRS reports by fetching and parsing RDL content, emitting DashboardDataModels per dataset, and generating table lineage from dataset SQL.

Changes:

  • Added SsrsDataModel as a supported DataModelType in the DashboardDataModel schema.
  • Implemented SSRS RDL fetching + decoding in the SSRS client and an XML RDL parser to extract data sources/datasets/fields.
  • Extended the SSRS source to cache parsed RDL, emit data models, and compute lineage using the SQL lineage parser; added unit/integration tests and fixtures.

Reviewed changes

Copilot reviewed 14 out of 17 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
openmetadata-spec/src/main/resources/json/schema/entity/data/dashboardDataModel.json Adds SsrsDataModel to the enum of supported dashboard data model types.
ingestion/src/metadata/ingestion/source/dashboard/ssrs/client.py Fetches report RDL bytes from SSRS endpoints and decodes XML/JSON(base64) responses.
ingestion/src/metadata/ingestion/source/dashboard/ssrs/rdl_parser.py New RDL XML parser to extract data sources, datasets, commands, and fields.
ingestion/src/metadata/ingestion/source/dashboard/ssrs/metadata.py Loads/caches RDL per report, emits datamodels, and yields lineage from parsed SQL tables.
ingestion/tests/unit/topology/dashboard/test_ssrs_rdl_parser.py Unit tests for RDL parsing and connection-string parsing.
ingestion/tests/unit/topology/dashboard/test_ssrs.py Unit tests for SSRS datamodel emission and lineage behavior.
ingestion/tests/unit/topology/dashboard/fixtures/ssrs/*.rdl RDL fixtures used by the unit tests.
ingestion/tests/integration/ssrs/conftest.py Extends mock SSRS server to serve RDL endpoints.
ingestion/tests/integration/ssrs/test_metadata.py Integration tests validating RDL fetch + parse via mock server.
.claude/scheduled_tasks.lock Adds a lock file to the repo.

Comment thread ingestion/src/metadata/ingestion/source/dashboard/ssrs/metadata.py Outdated
Copilot AI review requested due to automatic review settings April 23, 2026 07:08
Comment thread ingestion/src/metadata/ingestion/source/dashboard/ssrs/rdl_parser.py Outdated
Comment thread ingestion/src/metadata/ingestion/source/dashboard/ssrs/client.py Outdated
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR enhances the SSRS dashboard connector to extract SSRS RDL metadata (datasets/data sources/SQL) and use it to emit Dashboard Data Models and table lineage, while also adding the new SsrsDataModel enum value to the shared DashboardDataModel schema and generated UI types.

Changes:

  • Add an SSRS RDL XML parser and integrate it into the SSRS ingestion flow for datamodel + lineage extraction.
  • Extend the SSRS client to retrieve report definitions from SSRS content endpoints (including base64-decoded JSON payloads).
  • Update schema + generated UI types to support SsrsDataModel, and add unit/integration test coverage with RDL fixtures.

Reviewed changes

Copilot reviewed 15 out of 18 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
openmetadata-ui/src/main/resources/ui/src/generated/entity/data/dashboardDataModel.ts Adds SsrsDataModel to the generated DataModelType enum for UI usage.
openmetadata-ui/src/main/resources/ui/src/generated/api/data/createDashboardDataModel.ts Adds SsrsDataModel to the generated API enum used by UI create requests.
openmetadata-spec/src/main/resources/json/schema/entity/data/dashboardDataModel.json Extends the schema enum and javaEnums to include SsrsDataModel.
ingestion/src/metadata/ingestion/source/dashboard/ssrs/rdl_parser.py New module to parse RDL into structured datasets/data sources and extract connection info.
ingestion/src/metadata/ingestion/source/dashboard/ssrs/models.py Adds created_by field mapping (CreatedBy) to the SSRS report model.
ingestion/src/metadata/ingestion/source/dashboard/ssrs/metadata.py Implements RDL caching, datamodel emission, owner resolution, and dataset SQL lineage extraction.
ingestion/src/metadata/ingestion/source/dashboard/ssrs/client.py Adds report-definition fetch + decode logic (multiple endpoints, base64 JSON decoding, size limiting).
ingestion/tests/unit/topology/dashboard/test_ssrs_rdl_parser.py Unit tests for parsing RDL variants, security rejection, and connection string parsing.
ingestion/tests/unit/topology/dashboard/test_ssrs.py Unit tests for SSRS ownership, datamodel emission, lineage behavior, and hidden-report status filtering.
ingestion/tests/unit/topology/dashboard/fixtures/ssrs/shared_datasource.rdl RDL fixture for shared datasource reference.
ingestion/tests/unit/topology/dashboard/fixtures/ssrs/no_datasource.rdl RDL fixture with empty sources/datasets.
ingestion/tests/unit/topology/dashboard/fixtures/ssrs/malformed.rdl RDL fixture for malformed XML.
ingestion/tests/unit/topology/dashboard/fixtures/ssrs/inline_single_dataset_2016.rdl RDL fixture for single dataset with inline SQL and fields.
ingestion/tests/unit/topology/dashboard/fixtures/ssrs/inline_multi_dataset_2010.rdl RDL fixture for multiple datasets with inline SQL.
ingestion/tests/unit/topology/dashboard/fixtures/ssrs/expression_commandtype.rdl RDL fixture for CommandType=Expression.
ingestion/tests/integration/ssrs/test_metadata.py Integration tests for report-definition fetching and end-to-end RDL parsing via mock server.
ingestion/tests/integration/ssrs/conftest.py Extends mock SSRS server to serve RDL content endpoints used by the client.

Comment on lines +146 to +167
def get_report_definition(self, report_id: str) -> Optional[bytes]:
"""Return the RDL XML bytes for a report, or ``None`` if unavailable.

Tries ``/Reports({id})/Content/$value`` first, then ``/CatalogItems({id})/Content``.
Not-found responses (404/400) trigger fallback silently; transport errors
propagate so operators see outages instead of empty catalogs."""
last_err: Optional[Exception] = None
for template in RDL_CONTENT_PATHS:
path = template.format(id=report_id)
try:
body = self._fetch_report_content(path)
except requests.RequestException as exc:
last_err = exc
logger.warning("RDL fetch transport error for %s: %s", path, exc)
continue
if body is not None:
return body
if last_err is not None:
raise SourceConnectionException(
f"Failed to fetch RDL content for report [{report_id}]: {last_err}"
) from last_err
return None
Comment on lines +158 to +175
self, dashboard: SsrsReport
) -> Optional[SsrsReportDefinition]:
"""Fetch and cache RDL lazily. Returns ``None`` when the report has no
sources or the RDL cannot be fetched/parsed."""
cached = self._report_definitions.get(dashboard.id)
if cached is not None:
return cached
if dashboard.has_data_sources is False:
return None
try:
rdl_bytes = self.client.get_report_definition(dashboard.id)
except Exception as exc:
logger.debug(traceback.format_exc())
logger.warning(
"Could not fetch RDL for report [%s]: %s", dashboard.name, exc
)
return None
if not rdl_bytes:
Comment on lines +325 to +347
if not rdl:
return
for dataset in rdl.data_sets:
try:
datamodel_request = self._build_datamodel_request(
dashboard_details, dataset
)
if datamodel_request is None:
continue
yield Either(right=datamodel_request)
self.register_record_datamodel(datamodel_request=datamodel_request)
except Exception as exc:
yield Either(
left=StackTraceError(
name=f"{dashboard_details.name}.{dataset.name}",
error=(
f"Error yielding DataModel [{dataset.name}] for report "
f"[{dashboard_details.name}]: {exc}"
),
stackTrace=traceback.format_exc(),
)
)

@gitar-bot
Copy link
Copy Markdown

gitar-bot Bot commented Apr 23, 2026

Code Review ✅ Approved 4 resolved / 4 findings

SSRS Connector lineage improvements now correctly preserve source table identity, prevent XXE bypasses, and implement memory-efficient response sizing. The accidental .claude configuration file has been removed.

✅ 4 resolved
Bug: prefix_table overrides every source table, collapsing lineage

📄 ingestion/src/metadata/ingestion/source/dashboard/ssrs/metadata.py:503-504
In _yield_table_to_target_lineage, table_name = context.prefix_table or split.get("table") means that if the db_service_prefix config string happens to have 4+ dot-separated parts (e.g. my_service.db.schema.table), every source table discovered by the lineage parser is replaced by that single prefix table name. This collapses all lineage edges to a single target table.

The prefix_table from parse_db_service_prefix is a legacy artefact of the base class's generic splitting — other dashboard connectors (e.g. Looker, Tableau) don't use it for the same reason. Here, prefix_table should not override the parsed table name since the lineage parser already extracts the correct table name from the SQL.

Quality: Accidentally committed .claude/scheduled_tasks.lock

📄 .claude/scheduled_tasks.lock:1
The file .claude/scheduled_tasks.lock contains a session-specific lock with a PID and timestamp. This is a local tooling artifact and should not be checked into the repository.

Security: XXE guard is case-sensitive and can be bypassed with mixed-case

📄 ingestion/src/metadata/ingestion/source/dashboard/ssrs/rdl_parser.py:23 📄 ingestion/src/metadata/ingestion/source/dashboard/ssrs/rdl_parser.py:64-65
The FORBIDDEN_XML_TOKENS check uses exact case-sensitive byte matching for only two variants (<!DOCTYPE/<!doctype, <!ENTITY/<!entity). An attacker serving a crafted RDL payload with mixed-case like <!DocType or <!Entity would bypass this guard entirely, and Python's xml.etree.ElementTree is vulnerable to billion-laughs entity expansion.

A more robust approach is to do a case-insensitive check, or better yet, use defusedxml.ElementTree which handles all XML bomb variants.

Performance: Size-limit check runs after full response is already in memory

📄 ingestion/src/metadata/ingestion/source/dashboard/ssrs/client.py:181 📄 ingestion/src/metadata/ingestion/source/dashboard/ssrs/client.py:186-200
_exceeds_size_limit checks the Content-Length header, but the response is fetched without stream=True, so the entire body is already downloaded into memory by the time the check executes. For genuinely large responses (>50 MB), the OOM has already occurred. The _truncate_to_limit fallback on actual body length is the real safety net here, but it too operates after the full body is in memory.

To truly prevent OOM, pass stream=True to the request and check Content-Length before reading the body.

Options

Display: compact → Showing less information.

Comment with these commands to change:

Compact
gitar display:verbose         

Was this helpful? React with 👍 / 👎 | Gitar

@github-actions
Copy link
Copy Markdown
Contributor

Jest test Coverage

UI tests summary

Lines Statements Branches Functions
Coverage: 61%
62.01% (60371/97352) 42% (31636/75306) 45% (9502/21111)

@sonarqubecloud
Copy link
Copy Markdown

@sonarqubecloud
Copy link
Copy Markdown

Quality Gate Failed Quality Gate failed for 'open-metadata-ingestion'

Failed conditions
E Security Review Rating on New Code (required ≥ A)

See analysis details on SonarQube Cloud

@github-actions
Copy link
Copy Markdown
Contributor

🟡 Playwright Results — all passed (13 flaky)

✅ 3698 passed · ❌ 0 failed · 🟡 13 flaky · ⏭️ 89 skipped

Shard Passed Failed Flaky Skipped
✅ Shard 1 481 0 0 4
🟡 Shard 2 655 0 1 7
🟡 Shard 3 665 0 1 1
🟡 Shard 4 645 0 3 27
🟡 Shard 5 610 0 1 42
🟡 Shard 6 642 0 7 8
🟡 13 flaky test(s) (passed on retry)
  • Features/BulkEditEntity.spec.ts › Glossary (shard 2, 1 retry)
  • Flow/PersonaFlow.spec.ts › Set default persona for team should work properly (shard 3, 1 retry)
  • Pages/Customproperties-part2.spec.ts › entityReferenceList shows item count, scrollable list, no expand toggle (shard 4, 1 retry)
  • Pages/DataContracts.spec.ts › Create Data Contract and validate for Dashboard (shard 4, 1 retry)
  • Pages/Entity.spec.ts › Announcement create, edit & delete (shard 4, 1 retry)
  • Pages/Glossary.spec.ts › Add and Remove Assets (shard 5, 1 retry)
  • Features/AutoPilot.spec.ts › Agents created by AutoPilot should be deleted (shard 6, 1 retry)
  • Pages/Lineage/DataAssetLineage.spec.ts › verify create lineage for entity - Container (shard 6, 1 retry)
  • Pages/Lineage/DataAssetLineage.spec.ts › verify create lineage for entity - Search Index (shard 6, 1 retry)
  • Pages/Lineage/LineageFilters.spec.ts › Verify lineage schema filter selection (shard 6, 1 retry)
  • Pages/Lineage/LineageRightPanel.spec.ts › Verify custom properties tab IS visible for supported type: searchIndex (shard 6, 1 retry)
  • Pages/Users.spec.ts › Create and Delete user (shard 6, 1 retry)
  • Pages/Users.spec.ts › Permissions for table details page for Data Consumer (shard 6, 1 retry)

📦 Download artifacts

How to debug locally
# Download playwright-test-results-<shard> artifact and unzip
npx playwright show-trace path/to/trace.zip    # view trace

@ulixius9 ulixius9 merged commit edf6373 into main Apr 23, 2026
71 of 74 checks passed
@ulixius9 ulixius9 deleted the ssrs_lineage branch April 23, 2026 09:45
ulixius9 pushed a commit that referenced this pull request Apr 23, 2026
* Improve SSRS Connector - Lineage

* Update generated TypeScript types

* Add ownership extraction

* remove claude file

* Address comments

* address comments

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
ulixius9 pushed a commit that referenced this pull request Apr 23, 2026
* Improve SSRS Connector - Lineage

* Update generated TypeScript types

* Add ownership extraction

* remove claude file

* Address comments

* address comments

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
ulixius9 pushed a commit that referenced this pull request Apr 27, 2026
* Improve SSRS Connector - Lineage

* Update generated TypeScript types

* Add ownership extraction

* remove claude file

* Address comments

* address comments

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
jatinmasaram pushed a commit to jatinmasaram/OpenMetadata that referenced this pull request May 2, 2026
* Improve SSRS Connector - Lineage

* Update generated TypeScript types

* Add ownership extraction

* remove claude file

* Address comments

* address comments

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backend safe to test Add this label to run secure Github workflows on PRs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants