Skip to content

[GH-2809] Support distance joins for raster predicates#2980

Draft
jiayuasu wants to merge 1 commit into
apache:masterfrom
jiayuasu:feature/raster-distance-join
Draft

[GH-2809] Support distance joins for raster predicates#2980
jiayuasu wants to merge 1 commit into
apache:masterfrom
jiayuasu:feature/raster-distance-join

Conversation

@jiayuasu
Copy link
Copy Markdown
Member

@jiayuasu jiayuasu commented May 20, 2026

Did you read the Contributor Guide?

Is this PR related to a ticket?

What changes were proposed in this PR?

Adds an RS_DWithin(left, right, distance) predicate so distance joins can use raster operands, and routes the join planner through the same spatial-index machinery used for RS_Intersects / ST_DWithin.

  • New RS_DWithin SQL function with three overloads (raster + geom, geom + raster, raster + raster), backed by RasterPredicates.rsDWithin (CRS conversion via the existing convertCRSIfNeeded, JTS isWithinDistance for the per-row check).
  • JoinQueryDetector and OptimizableJoinCondition treat RS_DWithin as a distance-join predicate. Broadcast plans go through BroadcastIndexJoinExec; non-broadcast plans go through DistanceJoinExec.
  • BroadcastIndexJoinExec.createStreamShapes, SpatialIndexExec, and DistanceJoinExec now project raster shapes to WGS84 envelopes (the same path RS_Intersects already uses) and expand by the radius before the R-tree filter. The new helper lives in TraitJoinQueryBase.toExpandedWGS84EnvelopeRDD.
  • Drops the placeholder UnsupportedOperationException for distance + raster combinations. Geography + raster + distance remains guarded since the geography refiner doesn't accept raster shapes yet.

How was this patch tested?

  • BroadcastIndexJoinSuite: new Passed RS_DWithin test exercises stream-raster, broadcast-raster, and swapped-operand forms.
  • RasterJoinSuite: new RS_DWithin distance join describe block covers DistanceJoinExec with both partition-side configs, swapped operands, and raster-raster.
  • All 122 tests across the two suites pass locally under -Dspark=3.4 -Pscala2.12.

Did this PR include necessary documentation updates?

  • Yes, I am adding a new API. I am using the current SNAPSHOT version number v1.9.1 in the Since field.
  • Yes, I have updated the documentation:
    • New docs/api/sql/Raster-Predicates/RS_DWithin.md (intro, CRS rules, all three signatures, SQL example, join-planning note).
    • Raster-Functions.md: predicate table row for RS_DWithin.
    • Optimizer.md: new "Raster distance join" subsection with broadcast and non-broadcast SQL examples.

Add `RS_DWithin(raster|geom, raster|geom, distance)` so distance joins
can use raster operands, and route the join planner through the existing
spatial-index machinery.

- `RS_DWithin` expression in `RasterPredicates.scala`, backed by new
  `RasterPredicates.rsDWithin` overloads (raster-geom, raster-raster)
  that reuse `convertCRSIfNeeded` and JTS `isWithinDistance`.
- `JoinQueryDetector` and `OptimizableJoinCondition` recognise
  `RS_DWithin` as a distance-join predicate; the relationship label
  collapses to `RS_DWithin` for all raster + distance cases.
- `BroadcastIndexJoinExec.createStreamShapes` and the new
  `TraitJoinQueryBase.toExpandedWGS84EnvelopeRDD` handle the raster
  stream and build sides for broadcast-index joins; `SpatialIndexExec`
  and `DistanceJoinExec` route to the same helper so non-broadcast
  distance joins work too.
- Drop the placeholder `UnsupportedOperationException` guards for
  distance + raster combinations; geography + raster + distance remains
  guarded since the geography refiner does not handle raster shapes.

Tests
- `BroadcastIndexJoinSuite`: `RS_DWithin` covers stream-raster /
  broadcast-raster / swapped-operand forms.
- `RasterJoinSuite`: new `RS_DWithin distance join` describe block
  covers `DistanceJoinExec` with both partition-side configs, swapped
  operands, and raster-raster.

Docs
- New `docs/api/sql/Raster-Predicates/RS_DWithin.md` page.
- `Raster-Functions.md` predicate table row.
- `Optimizer.md` raster-distance-join subsection.
@jiayuasu jiayuasu force-pushed the feature/raster-distance-join branch from 53602f9 to 1e23c81 Compare May 22, 2026 06:25
@jiayuasu jiayuasu marked this pull request as draft May 22, 2026 07:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support distance joins for raster predicates

1 participant