Skip to content

[GH-2979] ST_Split: support puntal input (Point, MultiPoint)#2982

Merged
jiayuasu merged 4 commits into
apache:masterfrom
jiayuasu:fix/st-split-postgis-parity
May 22, 2026
Merged

[GH-2979] ST_Split: support puntal input (Point, MultiPoint)#2982
jiayuasu merged 4 commits into
apache:masterfrom
jiayuasu:fix/st-split-postgis-parity

Conversation

@jiayuasu
Copy link
Copy Markdown
Member

Did you read the Contributor Guide?

Is this PR related to a JIRA ticket?

What changes were proposed in this PR?

ST_Split returned null when the input geometry was a Point or MultiPoint, because GeometrySplitter.split() only dispatched lineal and polygonal inputs. PostGIS 3.2+ supports splitting puntal inputs, which is the most likely scenario behind the empty result reported in #2979.

This PR adds parity with PostGIS:

  • Puntal input by polygonal bladeGEOMETRYCOLLECTION(MULTIPOINT(<inside>), MULTIPOINT(<outside>)). Points covered by the polygon land in the first MultiPoint, the rest in the second. An empty group is omitted.
  • Puntal input by puntal bladeMULTIPOINT with the blade coordinates removed from the input (set-difference of coordinates).
  • A single Point is treated as a one-element MultiPoint and handled by the same code path.
  • Point / MultiPoint with an unsupported blade type (e.g. LineString) continues to return null, matching PostGIS's no-op behaviour for that case.

Example matching the PostGIS docs:

SELECT ST_Split(
    ST_GeomFromWKT('MULTIPOINT ((1 1), (5 5), (15 15))'),
    ST_GeomFromWKT('POLYGON ((0 0, 10 0, 10 10, 0 10, 0 0))'))
-- GEOMETRYCOLLECTION (MULTIPOINT ((1 1), (5 5)), MULTIPOINT ((15 15)))

How was this patch tested?

  • 7 new unit tests in sedona-common (FunctionsTest) covering MultiPoint-by-Polygon, all-inside / all-outside partitions, boundary semantics, single-Point input, MultiPoint-by-MultiPoint, and the unsupported-blade null case. All 26 split-related tests pass.
  • Extended the table-driven ST_Split test in Spark functionTestScala with the two new cases. Full suite (233 tests) passes.
  • Added testSplitMultiPointByPolygon and testSplitMultiPointByMultiPoint to Flink FunctionTest. All 3 split tests pass.

Did this PR include necessary documentation updates?

  • Yes, I have updated the documentation. Both docs/api/sql/Overlay-Functions/ST_Split.md and docs/api/snowflake/vector-data/Overlay-Functions/ST_Split.md describe the new puntal-input behaviour and include a MultiPoint-by-Polygon SQL example.

Previously ST_Split returned null when the input was Point or MultiPoint,
because GeometrySplitter.split() only dispatched lineal and polygonal
inputs. PostGIS 3.2+ supports splitting puntal inputs:

  - Puntal input by polygonal blade -> GEOMETRYCOLLECTION of two
    MULTIPOINTs (points inside the polygon, points outside).
  - Puntal input by puntal blade -> MULTIPOINT with the blade
    coordinates removed from the input.

Adds tests in sedona-common, sedona-spark, and sedona-flink, and
documents the new behaviour in the SQL and Snowflake reference docs.
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates Sedona’s ST_Split implementation to match PostGIS 3.2+ behavior for puntal inputs (Point / MultiPoint), which previously returned null due to GeometrySplitter.split() only dispatching lineal and polygonal inputs.

Changes:

  • Add puntal dispatch in GeometrySplitter and implement splitting rules for puntal-by-polygonal (inside/outside partition) and puntal-by-puntal (coordinate set-difference).
  • Add unit tests across common, Spark, and Flink to cover the new puntal split behaviors.
  • Update SQL and Snowflake docs to describe puntal ST_Split semantics and add an example.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
common/src/main/java/org/apache/sedona/common/utils/GeometrySplitter.java Adds puntal dispatch and implements splitPoints logic for polygonal/puntal blades.
common/src/test/java/org/apache/sedona/common/FunctionsTest.java Adds unit tests for point/multipoint split scenarios and boundary semantics.
spark/common/src/test/scala/org/apache/sedona/sql/functionTestScala.scala Extends ST_Split table-driven Spark test with new puntal cases and normalizes output.
flink/src/test/java/org/apache/sedona/flink/FunctionTest.java Adds Flink SQL tests for MultiPoint-by-Polygon and MultiPoint-by-MultiPoint.
docs/api/sql/Overlay-Functions/ST_Split.md Documents puntal ST_Split behavior and adds an example.
docs/api/snowflake/vector-data/Overlay-Functions/ST_Split.md Mirrors the SQL docs update for Snowflake docs.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 64 to +72
public GeometryCollection split(Geometry input, Geometry blade) {
GeometryCollection result = null;

if (GeomUtils.geometryIsLineal(input)) {
result = splitLines(input, blade);
} else if (GeomUtils.geometryIsPolygonal(input)) {
result = splitPolygons(input, blade);
} else if (GeomUtils.geometryIsPuntal(input)) {
result = splitPoints(input, blade);
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated in 0a2feb1. The Javadoc now mentions puntal input and the GeometryCollection-of-MultiPoints return shape.

Comment on lines +125 to +129
private MultiPoint splitPointsByPoints(Geometry input, Geometry blade) {
java.util.Set<Coordinate> bladeCoords = new java.util.HashSet<>();
for (Coordinate c : blade.getCoordinates()) {
bladeCoords.add(c);
}
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moot — splitPointsByPoints was removed in 1a3b149 as part of a separate review fix (puntal-by-puntal had no PostGIS analog and broke the union-reconstructs-input invariant).

Linear (LineString or MultiLineString) geometry can be split by a Point, MultiPoint, LineString, MultiLineString, Polygon, or MultiPolygon.
Polygonal (Polygon or MultiPolygon) geometry can be split by a LineString, MultiLineString, Polygon, or MultiPolygon.
Puntal (Point or MultiPoint) geometry can be split by a Polygon, MultiPolygon, Point, or MultiPoint.
In either case, when a polygonal blade is used then the boundary of the blade is what is actually split by.
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated in 0a2feb1. The sentence now scopes the boundary rule to lineal and polygonal inputs and notes that puntal inputs are partitioned by polygon interior coverage.

Linear (LineString or MultiLineString) geometry can be split by a Point, MultiPoint, LineString, MultiLineString, Polygon, or MultiPolygon.
Polygonal (Polygon or MultiPolygon) geometry can be split by a LineString, MultiLineString, Polygon, or MultiPolygon.
Puntal (Point or MultiPoint) geometry can be split by a Polygon, MultiPolygon, Point, or MultiPoint.
In either case, when a polygonal blade is used then the boundary of the blade is what is actually split by.
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated in 0a2feb1. Same wording change applied here.

jiayuasu added 3 commits May 21, 2026 23:36
…llections

Review fixes:

- Drop the MultiPoint-by-MultiPoint subcase. It had no PostGIS analog
  and broke the invariant that the union of split pieces reconstructs
  the input. Only polygonal blades are accepted for puntal input now.

- Flatten puntal input by walking coordinates instead of casting each
  child to Point. A homogeneous GeometryCollection like
  GEOMETRYCOLLECTION(MULTIPOINT(...), POINT(...)) satisfies
  geometryIsPuntal and previously hit a ClassCastException when its
  MultiPoint child was cast to Point.

- Reframe the docs as a Sedona-specific extension rather than PostGIS
  parity, since PostGIS itself only supports lineal/polygonal input.
The Flink module ships ST_Split but had no doc page. Add one mirroring
the SQL doc, including the Sedona-specific puntal-input extension.
…lade docs

- GeometrySplitter.split() Javadoc now covers puntal input and the
  GeometryCollection-of-MultiPoints return shape.
- SQL / Snowflake / Flink ST_Split docs clarify that the "polygonal
  blade -> boundary" rule applies to lineal and polygonal inputs only;
  puntal inputs are partitioned by polygon interior coverage.
@jiayuasu jiayuasu added this to the sedona-1.9.1 milestone May 22, 2026
@jiayuasu jiayuasu merged commit 49fe8ec into apache:master May 22, 2026
44 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ST_Split makes different results in sedona and postgis

2 participants