feat(python/sedonadb): normalize geometry WKT expected values in harness by oglego · Pull Request #845 · apache/sedona-db

oglego · 2026-05-15T02:05:46Z

Summary

Fixes #815 - formatting-only failures in Python tests that use WKT as an expected value by normalizing geometry WKT in the Python test harness before comparison.

Problem

Some Python tests compare query results against expected WKT strings using assert_query_result(). When the query returns a geometry value, the Python harness converts that geometry to WKT before comparing tuple results.

The issue is that equivalent WKT can be formatted differently depending on which writer produced it. In practice, this showed up as whitespace-only differences, such as whether spaces appear after commas:

Expected:
POLYGON Z ((0 0 5,0 1 5,1 1 5,1 0 5,0 0 5))

Got:
POLYGON Z ((0 0 5, 0 1 5, 1 1 5, 1 0 5, 0 0 5))

These failures are not about the geometry result itself, they come from string formatting differences.

What changed

Updated the Python test harness to canonicalize WKT only for geometry-typed result columns during tuple-based comparisons
Left non-geometry string columns on exact string comparison
Added a regression test that verifies this distinction:
- geometry results with spacing-only WKT differences compare equal
- plain string results with the same spacing differences still fail

Why this approach

This keeps geometry tests focused on geometry semantics rather than renderer-specific text formatting.

The important detail is scope: we do not normalize all strings globally. Only columns that are actually typed as geometry are canonicalized. That avoids masking real regressions in plain-text outputs while still fixing the formatting-only failures that motivated this change.

Testing

pytest -q python/sedonadb/tests/test_testing.py
pytest -q python/sedonadb/tests/functions/test_functions.py

paleolimbot

Thank you for working on this!

Before looking at this closely I want to double check that our existing WKT normalization is working (you've clearly hit a case where it isn't!):

sedona-db/python/sedonadb/python/sedonadb/testing.py

Lines 204 to 222 in 4daf73b

    
               def result_to_tuples( 
        
                   self, result, *, wkt_precision=None, **kwargs 
        
               ) -> List[Tuple[str]]: 
        
                   """Convert a query result into row tuples 
        
                   This option strips away fine-grained type information but is helpful for 
        
                   generally asserting a query result or verifying results between engines 
        
                   that have (e.g.) differing integer handling. 
        
                   """ 
        
                   tab = self.result_to_table(result) 
        
                   columns = [] 
        
                   for col in tab.columns: 
        
                       # isinstance() does not always work with pyarrow in pytest 
        
                       if _type_is_geoarrow(col.type): 
        
                           columns.append(ga.format_wkt(col, precision=wkt_precision).to_pylist()) 
        
                       else: 
        
                           columns.append(col.cast(pa.string()).to_pylist()) 
        
                   return list(zip(*columns))

This is using geoarrow-c's writer, which we did at the time because shapely didn't yet support M values. In theory all geometry and geography should be getting normalized by the time it gets to assert_result(). There is an unfortunate bug in geoarrow-c where POINT EMPTY is formatted as POINT (nan nan).

oglego · 2026-05-21T23:37:54Z

Thank you for looking into this! If I need to make any modifications or need to close out the PR just let me know, thanks again!

feat(python/sedonadb): normalize geometry WKT expected values in harness

1a182d6

paleolimbot reviewed May 18, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(python/sedonadb): normalize geometry WKT expected values in harness#845

feat(python/sedonadb): normalize geometry WKT expected values in harness#845
oglego wants to merge 1 commit into
apache:mainfrom
oglego:feat/wkt-formatting

oglego commented May 15, 2026

Uh oh!

paleolimbot left a comment

Uh oh!

oglego commented May 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	def result_to_tuples(
	self, result, , wkt_precision=None, *kwargs
	) -> List[Tuple[str]]:
	"""Convert a query result into row tuples

	This option strips away fine-grained type information but is helpful for
	generally asserting a query result or verifying results between engines
	that have (e.g.) differing integer handling.
	"""
	tab = self.result_to_table(result)
	columns = []
	for col in tab.columns:
	# isinstance() does not always work with pyarrow in pytest
	if _type_is_geoarrow(col.type):
	columns.append(ga.format_wkt(col, precision=wkt_precision).to_pylist())
	else:
	columns.append(col.cast(pa.string()).to_pylist())

	return list(zip(*columns))

Conversation

oglego commented May 15, 2026

Summary

Problem

What changed

Why this approach

Testing

Uh oh!

paleolimbot left a comment

Choose a reason for hiding this comment

Uh oh!

oglego commented May 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants