[GH-2979] ST_Split: support puntal input (Point, MultiPoint)#2982
Conversation
Previously ST_Split returned null when the input was Point or MultiPoint,
because GeometrySplitter.split() only dispatched lineal and polygonal
inputs. PostGIS 3.2+ supports splitting puntal inputs:
- Puntal input by polygonal blade -> GEOMETRYCOLLECTION of two
MULTIPOINTs (points inside the polygon, points outside).
- Puntal input by puntal blade -> MULTIPOINT with the blade
coordinates removed from the input.
Adds tests in sedona-common, sedona-spark, and sedona-flink, and
documents the new behaviour in the SQL and Snowflake reference docs.
There was a problem hiding this comment.
Pull request overview
This PR updates Sedona’s ST_Split implementation to match PostGIS 3.2+ behavior for puntal inputs (Point / MultiPoint), which previously returned null due to GeometrySplitter.split() only dispatching lineal and polygonal inputs.
Changes:
- Add puntal dispatch in
GeometrySplitterand implement splitting rules for puntal-by-polygonal (inside/outside partition) and puntal-by-puntal (coordinate set-difference). - Add unit tests across common, Spark, and Flink to cover the new puntal split behaviors.
- Update SQL and Snowflake docs to describe puntal
ST_Splitsemantics and add an example.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| common/src/main/java/org/apache/sedona/common/utils/GeometrySplitter.java | Adds puntal dispatch and implements splitPoints logic for polygonal/puntal blades. |
| common/src/test/java/org/apache/sedona/common/FunctionsTest.java | Adds unit tests for point/multipoint split scenarios and boundary semantics. |
| spark/common/src/test/scala/org/apache/sedona/sql/functionTestScala.scala | Extends ST_Split table-driven Spark test with new puntal cases and normalizes output. |
| flink/src/test/java/org/apache/sedona/flink/FunctionTest.java | Adds Flink SQL tests for MultiPoint-by-Polygon and MultiPoint-by-MultiPoint. |
| docs/api/sql/Overlay-Functions/ST_Split.md | Documents puntal ST_Split behavior and adds an example. |
| docs/api/snowflake/vector-data/Overlay-Functions/ST_Split.md | Mirrors the SQL docs update for Snowflake docs. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| public GeometryCollection split(Geometry input, Geometry blade) { | ||
| GeometryCollection result = null; | ||
|
|
||
| if (GeomUtils.geometryIsLineal(input)) { | ||
| result = splitLines(input, blade); | ||
| } else if (GeomUtils.geometryIsPolygonal(input)) { | ||
| result = splitPolygons(input, blade); | ||
| } else if (GeomUtils.geometryIsPuntal(input)) { | ||
| result = splitPoints(input, blade); |
There was a problem hiding this comment.
Updated in 0a2feb1. The Javadoc now mentions puntal input and the GeometryCollection-of-MultiPoints return shape.
| private MultiPoint splitPointsByPoints(Geometry input, Geometry blade) { | ||
| java.util.Set<Coordinate> bladeCoords = new java.util.HashSet<>(); | ||
| for (Coordinate c : blade.getCoordinates()) { | ||
| bladeCoords.add(c); | ||
| } |
There was a problem hiding this comment.
Moot — splitPointsByPoints was removed in 1a3b149 as part of a separate review fix (puntal-by-puntal had no PostGIS analog and broke the union-reconstructs-input invariant).
| Linear (LineString or MultiLineString) geometry can be split by a Point, MultiPoint, LineString, MultiLineString, Polygon, or MultiPolygon. | ||
| Polygonal (Polygon or MultiPolygon) geometry can be split by a LineString, MultiLineString, Polygon, or MultiPolygon. | ||
| Puntal (Point or MultiPoint) geometry can be split by a Polygon, MultiPolygon, Point, or MultiPoint. | ||
| In either case, when a polygonal blade is used then the boundary of the blade is what is actually split by. |
There was a problem hiding this comment.
Updated in 0a2feb1. The sentence now scopes the boundary rule to lineal and polygonal inputs and notes that puntal inputs are partitioned by polygon interior coverage.
| Linear (LineString or MultiLineString) geometry can be split by a Point, MultiPoint, LineString, MultiLineString, Polygon, or MultiPolygon. | ||
| Polygonal (Polygon or MultiPolygon) geometry can be split by a LineString, MultiLineString, Polygon, or MultiPolygon. | ||
| Puntal (Point or MultiPoint) geometry can be split by a Polygon, MultiPolygon, Point, or MultiPoint. | ||
| In either case, when a polygonal blade is used then the boundary of the blade is what is actually split by. |
There was a problem hiding this comment.
Updated in 0a2feb1. Same wording change applied here.
…llections Review fixes: - Drop the MultiPoint-by-MultiPoint subcase. It had no PostGIS analog and broke the invariant that the union of split pieces reconstructs the input. Only polygonal blades are accepted for puntal input now. - Flatten puntal input by walking coordinates instead of casting each child to Point. A homogeneous GeometryCollection like GEOMETRYCOLLECTION(MULTIPOINT(...), POINT(...)) satisfies geometryIsPuntal and previously hit a ClassCastException when its MultiPoint child was cast to Point. - Reframe the docs as a Sedona-specific extension rather than PostGIS parity, since PostGIS itself only supports lineal/polygonal input.
The Flink module ships ST_Split but had no doc page. Add one mirroring the SQL doc, including the Sedona-specific puntal-input extension.
…lade docs - GeometrySplitter.split() Javadoc now covers puntal input and the GeometryCollection-of-MultiPoints return shape. - SQL / Snowflake / Flink ST_Split docs clarify that the "polygonal blade -> boundary" rule applies to lineal and polygonal inputs only; puntal inputs are partitioned by polygon interior coverage.
Did you read the Contributor Guide?
Is this PR related to a JIRA ticket?
What changes were proposed in this PR?
ST_Splitreturnednullwhen the input geometry was aPointorMultiPoint, becauseGeometrySplitter.split()only dispatched lineal and polygonal inputs. PostGIS 3.2+ supports splitting puntal inputs, which is the most likely scenario behind the empty result reported in #2979.This PR adds parity with PostGIS:
GEOMETRYCOLLECTION(MULTIPOINT(<inside>), MULTIPOINT(<outside>)). Points covered by the polygon land in the first MultiPoint, the rest in the second. An empty group is omitted.MULTIPOINTwith the blade coordinates removed from the input (set-difference of coordinates).Pointis treated as a one-element MultiPoint and handled by the same code path.Point/MultiPointwith an unsupported blade type (e.g.LineString) continues to returnnull, matching PostGIS's no-op behaviour for that case.Example matching the PostGIS docs:
How was this patch tested?
sedona-common(FunctionsTest) covering MultiPoint-by-Polygon, all-inside / all-outside partitions, boundary semantics, single-Point input, MultiPoint-by-MultiPoint, and the unsupported-blade null case. All 26 split-related tests pass.ST_Splittest in SparkfunctionTestScalawith the two new cases. Full suite (233 tests) passes.testSplitMultiPointByPolygonandtestSplitMultiPointByMultiPointto FlinkFunctionTest. All 3 split tests pass.Did this PR include necessary documentation updates?
docs/api/sql/Overlay-Functions/ST_Split.mdanddocs/api/snowflake/vector-data/Overlay-Functions/ST_Split.mddescribe the new puntal-input behaviour and include a MultiPoint-by-Polygon SQL example.