Skip to content

perf: Interval tree for managing segment metadata in memory#19138

Open
pirvtech wants to merge 31 commits into
apache:masterfrom
RivianVW-tech:segment-interval-tree
Open

perf: Interval tree for managing segment metadata in memory#19138
pirvtech wants to merge 31 commits into
apache:masterfrom
RivianVW-tech:segment-interval-tree

Conversation

@pirvtech

Copy link
Copy Markdown

Segment metadata stored in memory of the Historicals, is used when looking up segments that match an interval for query and segment loading purposes. Currently this is a serial scan that goes through all segments metadata in ascending start time order to find the right matching segments.

This changes introduces an Interval Tree as a more efficient way to store segment metadata in memory, to speed up searches for segments, that can potentially cut down search times from O(n) to O(logn).

Core changes to file processing/src/main/java/org/apache/druid/timeline/VersionedIntervalTimeline.java
Interval tree implementation in file processing/src/main/java/org/apache/druid/timeline/IntervalTree.java
Documentation comments have been included in important files and sections of code. Unit tests added.

This has been reviewed internally and is in a production Druid cluster.

@jtuglu1

jtuglu1 commented Mar 11, 2026

Copy link
Copy Markdown
Contributor

Hi – thanks. Do you have benchmarks for this?

}

@Override
public T remove(Object key)

Check notice

Code scanning / CodeQL

Confusing overloading of methods Note

Method IntervalTree.remove(..) could be confused with overloaded method
remove
, since dispatch depends on static types.
@pirvtech

Copy link
Copy Markdown
Author

Hi – thanks. Do you have benchmarks for this?

I will add the benchmarks.

@kfaraz kfaraz left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for putting this together, @pirvtech!
I have often felt the need for such a datastructure too, even outside the VIT (VersionedIntervalTimeline).

Thoughts on perf

  • I doubt if a single query is really going to benefit from this change since doing a contains() or an overlaps() check on say 25k intervals (which is a fairly large number of intervals for a typical Druid cluster) would not be very compute intensive.
    • You can think of 25k intervals as roughly 3 years worth of HOUR-granularity data.
  • But in high concurrency, this would still be beneficial since the VIT does all of its computations inside a giant lock. The shorter we hold the lock, the better.
  • Either way, we should add benchmarks as @jtuglu1 suggested for queries as well as VIT itself.

Notes on the implementation

I have left some inline comments.

Along with that, the approach would be much cleaner if you do something like this instead:

  • Add an IntervalNavigableMap<T> interface which extends NavigableMap<Interval, T>. This interface should have the following new methods:
    • entriesContaining(Interval interval)
    • entriesOverlapping(Interval interval)
    • entriesMatching(Interval interval) (Is this really needed?)
    • Alternatively, instead of methods that return a sub map, you could have methods that return a matching entry set.
  • Instead of HashMap and NavigableMap, the VIT class should use this new interface.
  • Add a class IntervalTreeMap<T> implements IntervalNavigableMap<T> extends TreeMap<Interval, T> and provides default implementations for the new methods.
    • For example, for the entriesContaining() method, we return the whole map.
  • Add a new class FastSearchIntervalMap<T> which performs the optimised search.
  • Based on value of the fastSearch flag passed to constructor of VIT, choose the implementation of the map.
  • This would ensure that there are minimal changes to VIT and we can easily swap out different map implementations.

Comment thread processing/src/main/java/org/apache/druid/timeline/VersionedIntervalTimeline.java Outdated
Comment thread processing/src/main/java/org/apache/druid/timeline/VersionedIntervalTimeline.java Outdated
@pirvtech

Copy link
Copy Markdown
Author

@kfaraz thanks for the comments. We were facing a scenario where the number of segments were in the order of millions, due to the amount of parallelization we have in ingestion tasks, the time spread of data being ingested, and relative time offset when a time interval gets compacted. This change helped our historicals be able to load and serve segments faster. I had made another change to make the scanning and loading of local on-disk segments during historical startup multi-threaded from a single threa that it was, but looks like someone beat us to the punch in submission :). I will review your code feedback and provide my responses.

@jtuglu1

jtuglu1 commented Apr 8, 2026

Copy link
Copy Markdown
Contributor

@kfaraz I think this PR will have much more impact than just Historical. See #19278. Broker timeline operations become terribly slow under high # of intervals. We can use the interval tree here to speed up those operations drastically and speed up queries under heavy segment load/remove callback load.

@jtuglu1

jtuglu1 commented Apr 8, 2026

Copy link
Copy Markdown
Contributor

@pirvtech – are you still working on this patch? We'd love to get this in and experimenting on Broker-side to fix some issues we've run into with large #s of intervals in the VIT. Happy to pick this up otherwise.

@kfaraz

kfaraz commented Apr 11, 2026

Copy link
Copy Markdown
Contributor

Thanks for the clarification, @jtuglu1 !
Could you share an estimate of the typical number of intervals that causes the slowness? Were you able to capture flamegraphs?

Regardless, I agree that this feature would be generally useful. We just need to clean up the changes a bit.

@jtuglu1

jtuglu1 commented Apr 13, 2026

Copy link
Copy Markdown
Contributor

Thanks for the clarification, @jtuglu1 ! Could you share an estimate of the typical number of intervals that causes the slowness? Were you able to capture flamegraphs?

Regardless, I agree that this feature would be generally useful. We just need to clean up the changes a bit.

I can get a flamegraph, but see the description of #19278 for explanation. Basically for a full year of MINUTE granularity you get ~500k intervals.

@pirvtech

Copy link
Copy Markdown
Author

@pirvtech – are you still working on this patch? We'd love to get this in and experimenting on Broker-side to fix some issues we've run into with large #s of intervals in the VIT. Happy to pick this up otherwise.

@jtuglu1 yes was planning to send out updates to review comments this week. Let me know if you have any feedback as well.

@jtuglu1

jtuglu1 commented Apr 13, 2026

Copy link
Copy Markdown
Contributor

@pirvtech – are you still working on this patch? We'd love to get this in and experimenting on Broker-side to fix some issues we've run into with large #s of intervals in the VIT. Happy to pick this up otherwise.

@jtuglu1 yes was planning to send out updates to review comments this week. Let me know if you have any feedback as well.

Great, thanks! Feel free to use my benchmarks in #19278. I think it'd be great to extend on those to summarize/benchmark the common operations with a variable size of intervals to ensure we're seeing the expected throughput/latency.

@jtuglu1

jtuglu1 commented Apr 22, 2026

Copy link
Copy Markdown
Contributor

@pirvtech any updates here? I'm happy to get this merged if you're busy

@pirvtech

Copy link
Copy Markdown
Author

@jtuglu1 thanks for the patience, submitting updates today.

@jtuglu1 jtuglu1 self-requested a review April 23, 2026 16:08
@pirvtech

Copy link
Copy Markdown
Author

Please see commend above on pushed config related changes @kfaraz

Comment thread processing/src/main/java/org/apache/druid/timeline/IntervalTree.java Outdated
@pirvtech

pirvtech commented May 6, 2026

Copy link
Copy Markdown
Author

Addressed comments from @FrankChen021 in new commit

@FrankChen021 FrankChen021 left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Severity Findings
P0 0
P1 0
P2 0
P3 1
Total 1

This is an automated review by Codex GPT-5

Comment thread processing/src/main/java/org/apache/druid/timeline/IntervalTree.java Outdated

@FrankChen021 FrankChen021 left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 10 of 10 changed files. The follow-up adds full-traversal APIs and documents the range-pruning contract; current PR call sites use range-monotonic predicates, so no inline reply is needed.


This is an automated review by Codex GPT-5

@pirvtech

Copy link
Copy Markdown
Author

Merged latest master into this branch and resolved conflicts.

@pirvtech

Copy link
Copy Markdown
Author

What are the next steps?

@pirvtech pirvtech changed the title Interval tree for managing segment metadata in memory perf: Interval tree for managing segment metadata in memory May 11, 2026

@FrankChen021 FrankChen021 left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have reviewed the code for correctness, edge cases, concurrency, and integration risks; no issues found.

Reviewed 10 of 10 changed files.


This is an automated review by Codex GPT-5.5

@pirvtech

Copy link
Copy Markdown
Author

Any other comments @FrankChen021 @kfaraz @jtuglu1 or is it ready to merge.

@pirvtech

Copy link
Copy Markdown
Author

Any updates from reviewers if it is ready to merge?

@kfaraz kfaraz left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for updating the patch and for your patience, @pirvtech !

Overall, the changes look good. I have left some minor suggestions. A couple of files remain for review, will finish those soon as well.

Please also update the PR description to include a Release note section which describes the new config.

* Not thread safe.
* <p>
*/
public class IntervalTree<T> extends AbstractMap<Interval, T> implements NavigableMap<Interval, T>

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Rename since this is essentially a navigable map inspired by Interval Trees.

Suggested change
public class IntervalTree<T> extends AbstractMap<Interval, T> implements NavigableMap<Interval, T>
public class IntervalTreeMap<T> extends AbstractMap<Interval, T> implements NavigableMap<Interval, T>

Comment thread processing/src/main/java/org/apache/druid/timeline/IntervalTree.java Outdated
Comment thread processing/src/main/java/org/apache/druid/timeline/IntervalTree.java Outdated
}

@VisibleForTesting
public String print()

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename this to toTreeString (or use toString itself), add a javadoc, remove the @VisibleForTesting annotation.

import java.util.function.Consumer;
import java.util.stream.Collectors;

public class IntervalTreeTest

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use JUnit5 for the test.

Interval interval = pair.lhs;
tree.findEncompassing(interval);
}
System.out.println("Tree find time " + (System.currentTimeMillis() - start));

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove System.out statements from the test.


import javax.annotation.Nullable;

public class TimelineConfig

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of a new class, we should probably add the new config fastIntervalSearch in class SegmentMetadataQueryConfig which is bound as druid.query.segmentMetadata.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I considered this earlier but the interval tree is not only used during query processing but also in other cases when loading segments from disk.

@kfaraz kfaraz Jul 2, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I suppose that's fair as we might want to use this on non-queryable nodes as well.
Please add a short javadoc though, and maybe rename to SegmentTimelineConfig.

Comment thread server/src/main/java/org/apache/druid/server/SegmentManager.java Outdated
if (fastIntervalSearch) {
Map<Interval, TreeMap<VersionType, TimelineEntry>> possibleMatches = allTimeIntervals.findEncompassing(interval);
for (Entry<Interval, TreeMap<VersionType, TimelineEntry>> entry : possibleMatches.entrySet()) {
Interval eninterval = entry.getKey();

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Interval eninterval = entry.getKey();
Interval candidateInterval = entry.getKey();

@jtuglu1 jtuglu1 added this to the 38.0.0 milestone Jun 24, 2026
@pirvtech

pirvtech commented Jul 1, 2026

Copy link
Copy Markdown
Author

Thanks for updating the patch and for your patience, @pirvtech !

Overall, the changes look good. I have left some minor suggestions. A couple of files remain for review, will finish those soon as well.

Please also update the PR description to include a Release note section which describes the new config.

@kfaraz will look into your comments and address them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants