perf: optimize coordinator HistoricalManagement duty runtime#19532
Open
jtuglu1 wants to merge 2 commits into
Open
perf: optimize coordinator HistoricalManagement duty runtime#19532jtuglu1 wants to merge 2 commits into
jtuglu1 wants to merge 2 commits into
Conversation
924dcf0 to
520c157
Compare
b0cea24 to
a2df321
Compare
Comment on lines
+118
to
+128
| return new DataSegment( | ||
| DATASOURCE, | ||
| interval, | ||
| "v1", | ||
| Collections.emptyMap(), | ||
| Collections.emptyList(), | ||
| Collections.emptyList(), | ||
| null, | ||
| 0, | ||
| 1 | ||
| ); |
|
|
||
| private static DataSegment segment(String datasource, Interval interval, String version) | ||
| { | ||
| return DataSegment.builder() |
a2df321 to
fc83ff7
Compare
fc83ff7 to
205af94
Compare
205af94 to
21ce8e3
Compare
21ce8e3 to
4ee48f9
Compare
FrankChen021
reviewed
Jul 3, 2026
FrankChen021
left a comment
Member
There was a problem hiding this comment.
I have reviewed the code for correctness, edge cases, concurrency, and integration risks; no issues found.
Reviewed 15 of 15 changed files.
This is an automated review by Codex GPT-5.5
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
High-level changes
BalanceSegments- computes cluster shape (historicals, segments) once per run and reuses it for both the segment-move budget and the summary log, instead of calling getNumHistoricalsAndSegments() twice.MarkOvershadowedSegmentsAsUnused- builds the set of datasources that actually have overshadowed segments first, then skips timeline construction entirely for every other datasource, both when scanning servers and when checking zero-replica segments.MarkEternityTombstonesAsUnused- groups overshadowed segments by datasource up front, then checks each candidate tombstone only against its own datasource's group, instead of scanning the full overshadowed-segment list per candidate. Also swaps the timeline-based findNonOvershadowedObjectsInInterval scan for a direct per-segment isOvershadowed check.RunRules- caches ruleHandler.getRulesWithDefault(datasource) per datasource in a map built during the segment loop, and reuses that same map for the later broadcast-datasource check. Previously rules were looked up once per segment and then a second time per datasource during the broadcast scan.PrepareBalancerAndLoadQueues- collectUsedSegmentStats now iterates ImmutableDruidDataSource objects directly, using getTotalSizeOfSegments() and getSegments().size(), instead of streaming and summing over every segment in each datasource's timeline.SegmentReplicationStatus- the constructor no longer makes two full ImmutableMap.copyOf defensive copies. It holds the caller's map directly (safe because it's rebuilt fresh each cycle and not mutated after this point) and computes the total-replica-count map in one pass instead of a separate forEach. Also adds computeSegmentStats(), which does unavailable, under-replicated, and deep-storage-only counting in one pass over used segments instead of three.StrategicSegmentAssigner- allTiersInCluster is now computed once in the constructor and stored as a field, instead of being rebuilt with Sets.newHashSet(cluster.getTierNames()) on every replicateSegment/replicateSegmentPartially call.I've kept the logging changes in in-case we deem some of them are debug-level worthy. Otherwise, please ignore those diffs during the review as they will be removed before merge. Initial results have shown a decrease in total HistoricalManagement duty runtime of about 35% on a cluster with 4M segments.
Release note
Speed-up coordinator HistoricalManagementDuties duty group
This PR has: