feat(python/sedonadb): normalize geometry WKT expected values in harness#845
feat(python/sedonadb): normalize geometry WKT expected values in harness#845oglego wants to merge 1 commit into
Conversation
paleolimbot
left a comment
There was a problem hiding this comment.
Thank you for working on this!
Before looking at this closely I want to double check that our existing WKT normalization is working (you've clearly hit a case where it isn't!):
sedona-db/python/sedonadb/python/sedonadb/testing.py
Lines 204 to 222 in 4daf73b
This is using geoarrow-c's writer, which we did at the time because shapely didn't yet support M values. In theory all geometry and geography should be getting normalized by the time it gets to assert_result(). There is an unfortunate bug in geoarrow-c where POINT EMPTY is formatted as POINT (nan nan).
|
Thank you for looking into this! If I need to make any modifications or need to close out the PR just let me know, thanks again! |
Summary
Fixes #815 - formatting-only failures in Python tests that use WKT as an expected value by normalizing geometry WKT in the Python test harness before comparison.
Problem
Some Python tests compare query results against expected WKT strings using
assert_query_result(). When the query returns a geometry value, the Python harness converts that geometry to WKT before comparing tuple results.The issue is that equivalent WKT can be formatted differently depending on which writer produced it. In practice, this showed up as whitespace-only differences, such as whether spaces appear after commas:
Expected:
POLYGON Z ((0 0 5,0 1 5,1 1 5,1 0 5,0 0 5))Got:
POLYGON Z ((0 0 5, 0 1 5, 1 1 5, 1 0 5, 0 0 5))These failures are not about the geometry result itself, they come from string formatting differences.
What changed
Why this approach
This keeps geometry tests focused on geometry semantics rather than renderer-specific text formatting.
The important detail is scope: we do not normalize all strings globally. Only columns that are actually typed as geometry are canonicalized. That avoids masking real regressions in plain-text outputs while still fixing the formatting-only failures that motivated this change.
Testing
pytest -q python/sedonadb/tests/test_testing.pypytest -q python/sedonadb/tests/functions/test_functions.py