Skip to content

feat: support for relative (percent) metrics in data contract checks.#1248

Closed
HenrikSpiegel wants to merge 10 commits into
datacontract:mainfrom
HenrikSpiegel:main
Closed

feat: support for relative (percent) metrics in data contract checks.#1248
HenrikSpiegel wants to merge 10 commits into
datacontract:mainfrom
HenrikSpiegel:main

Conversation

@HenrikSpiegel
Copy link
Copy Markdown

  • Tests pass (uv run pytest)
  • Code formatted (uv run ruff check --fix && uv run ruff format)
  • README.md updated (if relevant)
  • CHANGELOG.md entry added

#1228

Comment thread datacontract/engines/data_contract_checks.py
Copy link
Copy Markdown
Collaborator

@jschoedl jschoedl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the PR! A few notes, and please add test cases for this in the tests directory - e.g.:

  • test that percent really appends %
  • test that no % exists for the default case or rows
  • test that e.g. check_property_null_values with percent produces the correct sodacl check
  • test that a percent sign in check_row_count behaves correctly

Comment thread datacontract/engines/data_contract_checks.py
Comment thread datacontract/engines/data_contract_checks.py Outdated
Comment thread datacontract/engines/data_contract_checks.py Outdated
Comment thread datacontract/engines/data_contract_checks.py Outdated
Comment thread datacontract/engines/data_contract_checks.py Outdated
@HenrikSpiegel HenrikSpiegel requested a review from jschoedl June 2, 2026 07:03
@jochenchrist
Copy link
Copy Markdown
Contributor

Thanks for this, and apologies it sat open long enough to be overtaken by events.

This is now superseded by the v1.0.0 refactor in #1279 ("Replace Soda Core with ibis test engine"), which landed the same feature through a different path:

  • The target file is gone. datacontract/engines/data_contract_checks.py was removed in Replace soda with ibis #1279; the check-building code moved to datacontract/export/sodacl_check_builder.py. This PR no longer applies cleanly.
  • Percent thresholds are now supported natively. datacontract test honors ODCS quality.unit: percent on the count-of-bad-rows metrics (nullValues, missingValues, invalidValues), comparing the threshold against the failed fraction of the row count. Percent on metrics where a row fraction has no meaning (rowCount, duplicateValues) logs a warning and falls back to the absolute count, matching the behavior in this PR. See is_percent_unit() in datacontract/engines/checks/create_checks.py, threshold_is_percent in check_spec.py, the comparison logic in ibis/ibis_check_execute.py, and tests/test_test_quality_percent_severity.py.

That resolves #1228, so I'm closing this as already implemented.

One thing not covered by #1279: the SodaCL export (sodacl_check_builder.py) still emits count-only metrics and doesn't apply a % suffix. If emitting percent thresholds in the SodaCL export is still useful to you, a smaller follow-up scoped to that file would be welcome.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add support for inbuilt quality metrics with relative thresholds as per ODCS 3.1.

3 participants