perf: Interval tree for managing segment metadata in memory#19138
perf: Interval tree for managing segment metadata in memory#19138pirvtech wants to merge 31 commits into
Conversation
…g a given interval. Using it for finding segments loaded from cache.
…n finding lower and higher entries
|
Hi – thanks. Do you have benchmarks for this? |
I will add the benchmarks. |
kfaraz
left a comment
There was a problem hiding this comment.
Thanks for putting this together, @pirvtech!
I have often felt the need for such a datastructure too, even outside the VIT (VersionedIntervalTimeline).
Thoughts on perf
- I doubt if a single query is really going to benefit from this change since doing a
contains()or anoverlaps()check on say 25k intervals (which is a fairly large number of intervals for a typical Druid cluster) would not be very compute intensive.- You can think of 25k intervals as roughly 3 years worth of HOUR-granularity data.
- But in high concurrency, this would still be beneficial since the VIT does all of its computations inside a giant lock. The shorter we hold the lock, the better.
- Either way, we should add benchmarks as @jtuglu1 suggested for queries as well as VIT itself.
Notes on the implementation
I have left some inline comments.
Along with that, the approach would be much cleaner if you do something like this instead:
- Add an
IntervalNavigableMap<T>interface which extendsNavigableMap<Interval, T>. This interface should have the following new methods:entriesContaining(Interval interval)entriesOverlapping(Interval interval)entriesMatching(Interval interval)(Is this really needed?)- Alternatively, instead of methods that return a sub map, you could have methods that return a matching entry set.
- Instead of
HashMapandNavigableMap, the VIT class should use this new interface. - Add a class
IntervalTreeMap<T> implements IntervalNavigableMap<T> extends TreeMap<Interval, T>and provides default implementations for the new methods.- For example, for the
entriesContaining()method, we return the whole map.
- For example, for the
- Add a new class
FastSearchIntervalMap<T>which performs the optimised search. - Based on value of the
fastSearchflag passed to constructor of VIT, choose the implementation of the map. - This would ensure that there are minimal changes to
VITand we can easily swap out different map implementations.
|
@kfaraz thanks for the comments. We were facing a scenario where the number of segments were in the order of millions, due to the amount of parallelization we have in ingestion tasks, the time spread of data being ingested, and relative time offset when a time interval gets compacted. This change helped our historicals be able to load and serve segments faster. I had made another change to make the scanning and loading of local on-disk segments during historical startup multi-threaded from a single threa that it was, but looks like someone beat us to the punch in submission :). I will review your code feedback and provide my responses. |
|
@kfaraz I think this PR will have much more impact than just Historical. See #19278. Broker timeline operations become terribly slow under high # of intervals. We can use the interval tree here to speed up those operations drastically and speed up queries under heavy segment load/remove callback load. |
|
@pirvtech – are you still working on this patch? We'd love to get this in and experimenting on Broker-side to fix some issues we've run into with large #s of intervals in the VIT. Happy to pick this up otherwise. |
|
Thanks for the clarification, @jtuglu1 ! Regardless, I agree that this feature would be generally useful. We just need to clean up the changes a bit. |
I can get a flamegraph, but see the description of #19278 for explanation. Basically for a full year of MINUTE granularity you get ~500k intervals. |
@jtuglu1 yes was planning to send out updates to review comments this week. Let me know if you have any feedback as well. |
Great, thanks! Feel free to use my benchmarks in #19278. I think it'd be great to extend on those to summarize/benchmark the common operations with a variable size of intervals to ensure we're seeing the expected throughput/latency. |
|
@pirvtech any updates here? I'm happy to get this merged if you're busy |
|
@jtuglu1 thanks for the patience, submitting updates today. |
|
Please see commend above on pushed config related changes @kfaraz |
|
Addressed comments from @FrankChen021 in new commit |
FrankChen021
left a comment
There was a problem hiding this comment.
| Severity | Findings |
|---|---|
| P0 | 0 |
| P1 | 0 |
| P2 | 0 |
| P3 | 1 |
| Total | 1 |
This is an automated review by Codex GPT-5
FrankChen021
left a comment
There was a problem hiding this comment.
Reviewed 10 of 10 changed files. The follow-up adds full-traversal APIs and documents the range-pruning contract; current PR call sites use range-monotonic predicates, so no inline reply is needed.
This is an automated review by Codex GPT-5
|
Merged latest master into this branch and resolved conflicts. |
|
What are the next steps? |
FrankChen021
left a comment
There was a problem hiding this comment.
I have reviewed the code for correctness, edge cases, concurrency, and integration risks; no issues found.
Reviewed 10 of 10 changed files.
This is an automated review by Codex GPT-5.5
|
Any other comments @FrankChen021 @kfaraz @jtuglu1 or is it ready to merge. |
|
Any updates from reviewers if it is ready to merge? |
There was a problem hiding this comment.
Thanks for updating the patch and for your patience, @pirvtech !
Overall, the changes look good. I have left some minor suggestions. A couple of files remain for review, will finish those soon as well.
Please also update the PR description to include a Release note section which describes the new config.
| * Not thread safe. | ||
| * <p> | ||
| */ | ||
| public class IntervalTree<T> extends AbstractMap<Interval, T> implements NavigableMap<Interval, T> |
There was a problem hiding this comment.
Nit: Rename since this is essentially a navigable map inspired by Interval Trees.
| public class IntervalTree<T> extends AbstractMap<Interval, T> implements NavigableMap<Interval, T> | |
| public class IntervalTreeMap<T> extends AbstractMap<Interval, T> implements NavigableMap<Interval, T> |
| } | ||
|
|
||
| @VisibleForTesting | ||
| public String print() |
There was a problem hiding this comment.
Rename this to toTreeString (or use toString itself), add a javadoc, remove the @VisibleForTesting annotation.
| import java.util.function.Consumer; | ||
| import java.util.stream.Collectors; | ||
|
|
||
| public class IntervalTreeTest |
There was a problem hiding this comment.
Please use JUnit5 for the test.
| Interval interval = pair.lhs; | ||
| tree.findEncompassing(interval); | ||
| } | ||
| System.out.println("Tree find time " + (System.currentTimeMillis() - start)); |
There was a problem hiding this comment.
Please remove System.out statements from the test.
|
|
||
| import javax.annotation.Nullable; | ||
|
|
||
| public class TimelineConfig |
There was a problem hiding this comment.
Instead of a new class, we should probably add the new config fastIntervalSearch in class SegmentMetadataQueryConfig which is bound as druid.query.segmentMetadata.
There was a problem hiding this comment.
I considered this earlier but the interval tree is not only used during query processing but also in other cases when loading segments from disk.
There was a problem hiding this comment.
Yeah, I suppose that's fair as we might want to use this on non-queryable nodes as well.
Please add a short javadoc though, and maybe rename to SegmentTimelineConfig.
| if (fastIntervalSearch) { | ||
| Map<Interval, TreeMap<VersionType, TimelineEntry>> possibleMatches = allTimeIntervals.findEncompassing(interval); | ||
| for (Entry<Interval, TreeMap<VersionType, TimelineEntry>> entry : possibleMatches.entrySet()) { | ||
| Interval eninterval = entry.getKey(); |
There was a problem hiding this comment.
| Interval eninterval = entry.getKey(); | |
| Interval candidateInterval = entry.getKey(); |
@kfaraz will look into your comments and address them. |
Segment metadata stored in memory of the Historicals, is used when looking up segments that match an interval for query and segment loading purposes. Currently this is a serial scan that goes through all segments metadata in ascending start time order to find the right matching segments.
This changes introduces an Interval Tree as a more efficient way to store segment metadata in memory, to speed up searches for segments, that can potentially cut down search times from O(n) to O(logn).
Core changes to file processing/src/main/java/org/apache/druid/timeline/VersionedIntervalTimeline.java
Interval tree implementation in file processing/src/main/java/org/apache/druid/timeline/IntervalTree.java
Documentation comments have been included in important files and sections of code. Unit tests added.
This has been reviewed internally and is in a production Druid cluster.