Skip to content

fix(ssrs): streamline test connection, stream reports, retry transient failures#27637

Merged
harshach merged 3 commits intomainfrom
ssrs_fix_test_conn
Apr 22, 2026
Merged

fix(ssrs): streamline test connection, stream reports, retry transient failures#27637
harshach merged 3 commits intomainfrom
ssrs_fix_test_conn

Conversation

@ulixius9
Copy link
Copy Markdown
Member

Summary

  • Test connection no longer paginates all reports. The GetDashboards step was calling client.get_reports() which walked every page of /Reports — on a large SSRS instance that's a huge, unnecessary cost. It now issues a single $top=1 probe (client.test_get_reports), mirroring the existing CheckAccess probe against /Folders.
  • SsrsClient.get_reports now streams. Converted from list-accumulating to an Iterator[SsrsReport] that yield froms each page as it's fetched, and SsrsSource.get_dashboards_list yields non-hidden reports one at a time. Ingestion memory stays flat regardless of report count.
  • Transient failures are now retried. Session mounts an HTTPAdapter with a urllib3 Retry (2 retries, exponential backoff, covers connect errors, read timeouts, 5xx). Motivated by a production trace where a single 30s ReadTimeout against /Reports aborted the fetch and left the pipeline reporting success with 0 records.

Test plan

  • Unit tests (tests/unit/topology/dashboard/test_ssrs.py) — 22 pass, includes new test_get_reports_streams_lazily asserting the generator only fetches the first page until the consumer advances.
  • Integration tests (tests/integration/ssrs/, tests/integration/connections/test_ssrs_connection.py) — 10 pass, includes new test_get_connection_test_get_reports, test_connection_bad_host_get_reports, and test_get_reports_retries_transient_failures (flaky mock server returns 503 twice then 200; asserts exactly 3 requests).
  • End-to-end against a real SSRS instance — not feasible without a Windows host; manual verification recommended.

🤖 Generated with Claude Code

…sient failures

- Test connection's GetDashboards step no longer paginates every report; it
  now issues a single $top=1 probe to /Reports, matching the existing
  CheckAccess probe against /Folders.
- SsrsClient.get_reports is now a generator that yields each page as it is
  fetched, so ingestion memory stays flat regardless of report count.
- Session retries transient failures (connect errors, read timeouts, 5xx)
  up to 2 times with exponential backoff to survive flaky SSRS endpoints.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings April 22, 2026 16:52
@ulixius9 ulixius9 requested a review from a team as a code owner April 22, 2026 16:52
@github-actions github-actions Bot added Ingestion safe to test Add this label to run secure Github workflows on PRs labels Apr 22, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR improves SSRS ingestion reliability and scalability by making the SSRS “reports” listing lazy/streamed, reducing test-connection cost, and adding automatic retries for transient HTTP failures.

Changes:

  • Replace SSRS connection test “GetDashboards” step from a full /Reports pagination to a $top=1 probe via SsrsClient.test_get_reports.
  • Convert SsrsClient.get_reports() to stream results page-by-page (iterator) and update SsrsSource.get_dashboards_list() to yield visible reports lazily.
  • Add requests session retry behavior (urllib3 Retry via HTTPAdapter) and extend unit/integration tests accordingly.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated no comments.

Show a summary per file
File Description
ingestion/src/metadata/ingestion/source/dashboard/ssrs/client.py Adds session-level retries and changes get_reports() to a streaming iterator; introduces test_get_reports() probe.
ingestion/src/metadata/ingestion/source/dashboard/ssrs/metadata.py Changes dashboard listing to yield non-hidden reports lazily from the client iterator.
ingestion/src/metadata/ingestion/source/dashboard/ssrs/connection.py Updates test-connection step mapping to use the lightweight test_get_reports() probe.
ingestion/tests/unit/topology/dashboard/test_ssrs.py Updates tests for iterator-based APIs and adds a laziness test verifying only the first page is fetched initially.
ingestion/tests/integration/ssrs/test_metadata.py Updates integration tests to materialize the iterator with list(...).
ingestion/tests/integration/connections/test_ssrs_connection.py Adds integration coverage for test_get_reports() and for retry behavior using a flaky HTTP handler.

harshach
harshach previously approved these changes Apr 22, 2026
if verify_ssl is not None:
self.session.verify = verify_ssl
retry = Retry(
total=MAX_RETRIES,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we sure retries is hte reason we are getting timed out?

Copilot AI review requested due to automatic review settings April 22, 2026 18:21
@gitar-bot
Copy link
Copy Markdown

gitar-bot Bot commented Apr 22, 2026

Code Review ✅ Approved

Streamlines SSRS test connections, implements report streaming, and adds retry logic for transient failures. No issues found.

Options

Display: compact → Showing less information.

Comment with these commands to change:

Compact
gitar display:verbose         

Was this helpful? React with 👍 / 👎 | Gitar

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.

Comment on lines +37 to +42
CONNECT_TIMEOUT = 10
READ_TIMEOUT = 120
PAGE_SIZE = 100
MAX_RETRIES = 2
BACKOFF_FACTOR = 1
RETRY_STATUS_CODES = (500, 502, 503, 504)
Copy link

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CONNECT_TIMEOUT/READ_TIMEOUT combined with MAX_RETRIES can exceed the 3-minute test-connection timeout used by SSRS (THREE_MIN=180s). Worst case for a single GET is roughly (CONNECT_TIMEOUT+READ_TIMEOUT)*(MAX_RETRIES+1) plus backoff, which can cause test_get_reports / connection testing to time out even though retries are still in progress. Consider lowering the probe timeouts (or making _get accept an override timeout for test probes), reducing retries for test probes, or increasing the SSRS test-connection timeout to keep the worst-case under the configured limit.

Copilot uses AI. Check for mistakes.
@harshach harshach merged commit d373a14 into main Apr 22, 2026
48 checks passed
@harshach harshach deleted the ssrs_fix_test_conn branch April 22, 2026 18:39
@sonarqubecloud
Copy link
Copy Markdown

Quality Gate Failed Quality Gate failed for 'open-metadata-ingestion'

Failed conditions
E Security Review Rating on New Code (required ≥ A)

See analysis details on SonarQube Cloud

@github-actions
Copy link
Copy Markdown
Contributor

🟡 Playwright Results — all passed (10 flaky)

✅ 3701 passed · ❌ 0 failed · 🟡 10 flaky · ⏭️ 89 skipped

Shard Passed Failed Flaky Skipped
✅ Shard 1 481 0 0 4
🟡 Shard 2 655 0 1 7
🟡 Shard 3 665 0 1 1
🟡 Shard 4 646 0 2 27
🟡 Shard 5 609 0 2 42
🟡 Shard 6 645 0 4 8
🟡 10 flaky test(s) (passed on retry)
  • Features/BulkEditEntity.spec.ts › Glossary (shard 2, 1 retry)
  • Flow/PersonaFlow.spec.ts › Set default persona for team should work properly (shard 3, 1 retry)
  • Pages/Customproperties-part2.spec.ts › entityReferenceList shows item count, scrollable list, no expand toggle (shard 4, 1 retry)
  • Pages/DataContractsSemanticRules.spec.ts › Validate Description Rule Is_Not_Set (shard 4, 1 retry)
  • Pages/EntityDataSteward.spec.ts › User as Owner Add, Update and Remove (shard 5, 1 retry)
  • Pages/Glossary.spec.ts › Add and Remove Assets (shard 5, 2 retries)
  • Pages/Lineage/DataAssetLineage.spec.ts › verify create lineage for entity - Search Index (shard 6, 1 retry)
  • Pages/Lineage/LineageFilters.spec.ts › Verify lineage schema filter selection (shard 6, 1 retry)
  • Pages/Lineage/LineageRightPanel.spec.ts › Verify custom properties tab IS visible for supported type: searchIndex (shard 6, 1 retry)
  • Pages/Users.spec.ts › Permissions for table details page for Data Consumer (shard 6, 1 retry)

📦 Download artifacts

How to debug locally
# Download playwright-test-results-<shard> artifact and unzip
npx playwright show-trace path/to/trace.zip    # view trace

ulixius9 added a commit that referenced this pull request Apr 23, 2026
…t failures (#27637)

* fix(ssrs): streamline test connection, stream reports, and retry transient failures

- Test connection's GetDashboards step no longer paginates every report; it
  now issues a single $top=1 probe to /Reports, matching the existing
  CheckAccess probe against /Folders.
- SsrsClient.get_reports is now a generator that yields each page as it is
  fetched, so ingestion memory stays flat regardless of report count.
- Session retries transient failures (connect errors, read timeouts, 5xx)
  up to 2 times with exponential backoff to survive flaky SSRS endpoints.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Fixing handling of pagination

* fix test

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: Sriharsha Chintalapani <harsha@getcollate.io>
ulixius9 added a commit that referenced this pull request Apr 23, 2026
…t failures (#27637)

* fix(ssrs): streamline test connection, stream reports, and retry transient failures

- Test connection's GetDashboards step no longer paginates every report; it
  now issues a single $top=1 probe to /Reports, matching the existing
  CheckAccess probe against /Folders.
- SsrsClient.get_reports is now a generator that yields each page as it is
  fetched, so ingestion memory stays flat regardless of report count.
- Session retries transient failures (connect errors, read timeouts, 5xx)
  up to 2 times with exponential backoff to survive flaky SSRS endpoints.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Fixing handling of pagination

* fix test

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: Sriharsha Chintalapani <harsha@getcollate.io>
ulixius9 added a commit that referenced this pull request Apr 27, 2026
…t failures (#27637)

* fix(ssrs): streamline test connection, stream reports, and retry transient failures

- Test connection's GetDashboards step no longer paginates every report; it
  now issues a single $top=1 probe to /Reports, matching the existing
  CheckAccess probe against /Folders.
- SsrsClient.get_reports is now a generator that yields each page as it is
  fetched, so ingestion memory stays flat regardless of report count.
- Session retries transient failures (connect errors, read timeouts, 5xx)
  up to 2 times with exponential backoff to survive flaky SSRS endpoints.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Fixing handling of pagination

* fix test

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: Sriharsha Chintalapani <harsha@getcollate.io>
jatinmasaram pushed a commit to jatinmasaram/OpenMetadata that referenced this pull request May 2, 2026
…t failures (open-metadata#27637)

* fix(ssrs): streamline test connection, stream reports, and retry transient failures

- Test connection's GetDashboards step no longer paginates every report; it
  now issues a single $top=1 probe to /Reports, matching the existing
  CheckAccess probe against /Folders.
- SsrsClient.get_reports is now a generator that yields each page as it is
  fetched, so ingestion memory stays flat regardless of report count.
- Session retries transient failures (connect errors, read timeouts, 5xx)
  up to 2 times with exponential backoff to survive flaky SSRS endpoints.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Fixing handling of pagination

* fix test

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: Sriharsha Chintalapani <harsha@getcollate.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Ingestion safe to test Add this label to run secure Github workflows on PRs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants