perf: add sort-serving segments index for KillUnusedSegments query#19645
Draft
jtuglu1 wants to merge 1 commit into
Draft
perf: add sort-serving segments index for KillUnusedSegments query#19645jtuglu1 wants to merge 1 commit into
jtuglu1 wants to merge 1 commit into
Conversation
…erval query KillUnusedSegments' per-datasource find-interval query (SqlSegmentsMetadataQuery#retrieveUnusedSegmentIntervals) runs `WHERE dataSource=? AND used=? AND end<=? [AND start>=?] AND used_status_last_updated<=? ORDER BY start, end LIMIT n`. The existing (dataSource, used, end, start) index orders by end before start, so it cannot serve `ORDER BY start, end`. On a large datasource (~470k unused segments) MySQL materializes all matching rows and filesorts them just to return LIMIT n; EXPLAIN ANALYZE measured ~11s, with the scan dominating. With ~50 datasources this drove the duty to ~43s/cycle, almost all in the find-interval SQL.
a3d5b17 to
c343d33
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
KillUnusedSegments' per-datasource find-interval query(
SqlSegmentsMetadataQuery#retrieveUnusedSegmentIntervals) runs:The existing
(dataSource, used, end, start)index orders byendbeforestart, so itcannot serve
ORDER BY start, end.EXPLAIN ANALYZEmeasured ~11s. With ~50 datasources this duty runs at ~43s/cycle, bound basically by this
SQL call.
Baseline plan (no new index,
ORDER BY start, end)Query:
Options Considered
Add (dataSource, used, start, end, used_status_last_updated). The (dataSource, used)
equality prefix + (start, end) matches the ORDER BY, so the filesort is removed and the
LIMIT short-circuits; used_status_last_updated trailing makes the query covering. The
ORDER BY start, end is preserved, so kill semantics are unchanged.
The optimized plan was confirmed by running the symmetric query against the existing
(dataSource, used, end, start) index with ORDER BY end, start:
With this change, we might? also be able to delete the old
(dataSource, used, end, start)index as no other queries use it.Flip the query to ORDER BY end, start so the existing (dataSource, used, end, start)
index serves it – this is the expected runtime of the 15ms plan measured above.
I opted not to go for this as it changes kill semantics in a way that breaks some behavior. KillUnusedSegments
drains earliest-start-first behind a start-based cursor (datasourceToLastKillIntervalEnd
--> next query filters start >= cursor), and limitToPeriod always retains the segments at the
earliest start. Making this safe requires reworking the drain to be end-consistent (cursor, ordering,
and limitToPeriod all keyed on end) which seemed like more work. Open to opinions.
Release note
This PR has: