perf: add sort-serving segments index for KillUnusedSegments query by jtuglu1 · Pull Request #19645 · apache/druid

jtuglu1 · 2026-06-30T22:25:57Z

Description

KillUnusedSegments' per-datasource find-interval query
(SqlSegmentsMetadataQuery#retrieveUnusedSegmentIntervals) runs:

WHERE dataSource=? AND used=? AND end<=? [AND start>=?]
  AND used_status_last_updated<=? ORDER BY start, end LIMIT n

The existing (dataSource, used, end, start) index orders by end before start, so it
cannot serve ORDER BY start, end. EXPLAIN ANALYZE
measured ~11s. With ~50 datasources this duty runs at ~43s/cycle, bound basically by this
SQL call.

Baseline plan (no new index, `ORDER BY start, end`)

Query:

EXPLAIN ANALYZE
SELECT start, `end` FROM druid_segments
WHERE dataSource = '<datasource>' AND used = false
  AND `end` <= '<max-end>'
  AND used_status_last_updated IS NOT NULL
  AND used_status_last_updated <= '<buffer-cutoff>'
ORDER BY start, `end` LIMIT 1000;

-> Limit: 1000 row(s)  (actual time=10959..10959 rows=1000)
  -> Sort: start, `end`, limit input to 1000 row(s) per chunk  (actual time=10959..10959 rows=1000)
    -> Filter: (used = false AND `end` <= '<max-end>' AND used_status_last_updated <= '<buffer-cutoff>')
              (rows=418976)  (actual time=0.365..9765)
      -> Index lookup using IDX_<dataSource-only>  (rows=806095)  (actual time=0.362..9632)
~9.6s scanning, ~1.2s sorting, ~11s total.

Options Considered

Create a new sort-serving index (this PR)

Add (dataSource, used, start, end, used_status_last_updated). The (dataSource, used)
equality prefix + (start, end) matches the ORDER BY, so the filesort is removed and the
LIMIT short-circuits; used_status_last_updated trailing makes the query covering. The
ORDER BY start, end is preserved, so kill semantics are unchanged.

The optimized plan was confirmed by running the symmetric query against the existing
(dataSource, used, end, start) index with ORDER BY end, start:

EXPLAIN ANALYZE
SELECT start, `end` FROM druid_segments
WHERE dataSource = '<datasource>' AND used = false
  AND `end` <= '<max-end>'
  AND used_status_last_updated IS NOT NULL
  AND used_status_last_updated <= '<buffer-cutoff>'
ORDER BY `end`, start LIMIT 1000;

-> Limit: 1000 row(s)  (actual time=0.382..15.5 rows=1000)
  -> Filter: (used_status_last_updated IS NOT NULL AND used_status_last_updated <= '<buffer-cutoff>')
            (rows=1000)  (actual time=0.381..15.4)
    -> Index range scan using IDX_<dataSource,used,end,start>
       over (dataSource = '<datasource>' AND used = 0 AND `end` <= '<max-end>')  (rows=1000)  (actual time=0.38..15.2)
~11,000ms → ~15ms. The new index produces this same plan for the unchanged
ORDER BY start, end.

With this change, we might? also be able to delete the old (dataSource, used, end, start) index as no other queries use it.

Reformat the query to reuse an existing index

Flip the query to ORDER BY end, start so the existing (dataSource, used, end, start)
index serves it – this is the expected runtime of the 15ms plan measured above.

I opted not to go for this as it changes kill semantics in a way that breaks some behavior. KillUnusedSegments
drains earliest-start-first behind a start-based cursor (datasourceToLastKillIntervalEnd
--> next query filters start >= cursor), and limitToPeriod always retains the segments at the
earliest start. Making this safe requires reworking the drain to be end-consistent (cursor, ordering,
and limitToPeriod all keyed on end) which seemed like more work. Open to opinions.

Release note

This PR has:

…erval query KillUnusedSegments' per-datasource find-interval query (SqlSegmentsMetadataQuery#retrieveUnusedSegmentIntervals) runs `WHERE dataSource=? AND used=? AND end<=? [AND start>=?] AND used_status_last_updated<=? ORDER BY start, end LIMIT n`. The existing (dataSource, used, end, start) index orders by end before start, so it cannot serve `ORDER BY start, end`. On a large datasource (~470k unused segments) MySQL materializes all matching rows and filesorts them just to return LIMIT n; EXPLAIN ANALYZE measured ~11s, with the scan dominating. With ~50 datasources this drove the duty to ~43s/cycle, almost all in the find-interval SQL.

jtuglu1 added the Performance label Jun 30, 2026

jtuglu1 force-pushed the kill-unused-segments-sort-index branch from a3d5b17 to c343d33 Compare June 30, 2026 22:28

jtuglu1 requested a review from kfaraz June 30, 2026 22:49

jtuglu1 added this to the 38.0.0 milestone Jun 30, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

perf: add sort-serving segments index for KillUnusedSegments query#19645

perf: add sort-serving segments index for KillUnusedSegments query#19645
jtuglu1 wants to merge 1 commit into
apache:masterfrom
jtuglu1:kill-unused-segments-sort-index

jtuglu1 commented Jun 30, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

jtuglu1 commented Jun 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Baseline plan (no new index, ORDER BY start, end)

Release note

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

jtuglu1 commented Jun 30, 2026 •

edited

Loading

Baseline plan (no new index, `ORDER BY start, end`)