Interval tree
Encyclopedia
In computer science
, an interval tree is an ordered tree data structure
to hold intervals
. Specifically, it allows one to efficiently find all intervals that overlap with any given interval or point. It is often used for windowing queries, for example, to find all roads on a computerized map inside a rectangular viewport, or to find all visible elements inside a three-dimensional scene. A similar data structure is the segment tree
.
The trivial solution is to visit each interval and test whether it intersects the given point or interval, which requires Θ(n) time, where n is the number of intervals in the collection. Since a query may return all intervals, for example if the query is a large interval intersecting all intervals in the collection, this is asymptotically optimal
; however, we can do better by considering output-sensitive algorithm
s, where the runtime is expressed in terms of m, the number of intervals produced by the query. Interval trees are dynamic, i.e., they allow insertion and deletion of intervals. They obtain a query time of O(log n) while the preprocessing time to construct the data structure is O(n log n) (but the space consumption is O(n)).
and queried in O(log n) time. However, with arbitrarily overlapping intervals, there is no way to compare two intervals for insertion into the tree since orderings sorted by the beginning points or the ending points may be different. A naive approach might be to build two parallel trees, one ordered by the beginning point, and one ordered by the ending point of each interval. This allows discarding half of each tree in O(log n) time, but the results must be merged, requiring O(n) time. This gives us queries in O(n + log n) = O(n), which is no better than brute-force.
Interval trees solve this problem. This article describes two alternative designs for an interval tree, dubbed the centered interval tree and the augmented tree.
We start by taking the entire range of all the intervals and dividing it in half at x_center (in practice, x_center should be picked to keep the tree relatively balanced). This gives three sets of intervals, those completely to the left of x_center which we'll call S_left, those completely to the right of x_center which we'll call S_right, and those overlapping x_center which we'll call S_center.
The intervals in S_left and S_right are recursively divided in the same manner until there are no intervals left.
The intervals in S_center that overlap the center point are stored in a separate data structure linked to the node in the interval tree. This data structure consists of two lists, one containing all the intervals sorted by their beginning points, and another containing all the intervals sorted by their ending points.
The result is a binary tree with each node storing:
A binary search in O(log n) time for the beginning and end of R reveals the minimum and maximum points to consider. Each point within this range references an interval that overlaps our range and is added to the result list. Care must be taken to avoid duplicates, since an interval might both begin and end within R. This can be done using a binary flag on each interval to mark whether or not it has been added to the result set.
The only intervals not yet considered are those overlapping R that do not have an endpoint inside R, in other words, intervals that enclose it. To find these, we pick any point inside R and use the algorithm below to find all intervals intersecting that point (again, being careful to remove duplicates).
For each tree node, x is compared to x_center, the midpoint used in node construction above. If x is less than x_center, the leftmost set of intervals, S_left, is considered. If x is greater than x_center, the rightmost set of intervals, S_right, is considered.
As each node is processed as we traverse the tree from the root to a leaf, the ranges in its S_center are processed. If x is less than x_center, we know that all intervals in S_center end after x, or they could not also overlap x_center. Therefore, we need only find those intervals in S_center that begin before x. We can consult the lists of S_center that have already been constructed. Since we only care about the interval beginnings in this scenario, we can consult the list sorted by beginnings. Suppose we find the closest number no greater than x in this list. All ranges from the beginning of the list to that found point overlap x because they begin before x and end after x (as we know because they overlap x_center which is larger than x). Thus, we can simply start enumerating intervals in the list until the endpoint value exceeds x.
Likewise, if x is greater than x_center, we know that all intervals in S_center must begin before x, so we find those intervals that end after x using the list sorted by interval endings.
If x exactly matches x_center, all intervals in S_center can be added to the results without further processing and tree traversal can be stopped.
First, a range tree
in N dimensions is constructed that allows efficient retrieval of all intervals with beginning and end points inside the query region R. Once the corresponding ranges are found, the only thing that is left are those ranges that enclose the region in some dimension. To find these overlaps, N interval trees are created, and one axis intersecting R is queried for each. For example, in two dimensions, the bottom of the square R (or any other horizontal line intersecting R) would be queried against the interval tree constructed for the horizontal axis. Likewise, the left (or any other vertical line intersecting R) would be queried against the interval tree constructed on the vertical axis.
Each interval tree also needs an addition for higher dimensions. At each node we traverse in the tree, x is compared with S_center to find overlaps. Instead of two sorted lists of points as was used in the one-dimensional case, a range tree is constructed. This allows efficient retrieval of all points in S_center that overlap region R.
An interval may overlap the center point of several nodes in the tree. Since each node stores the intervals that overlap it, with all intervals completely to the left of its center point in the left subtree, similarly for the right subtree, it follows that each interval is stored in the node closest to the root from the set of nodes whose center point it overlaps.
Normal deletion operations in a binary tree (for the case where the node being deleted has two children) involve promoting a node further from the root to the position of the node being deleted (usually the leftmost child of the right subtree, or the rightmost child of the left subtree). As a result of this promotion, some nodes that were above the promoted node will become descendents of it; it is necessary to search these nodes for intervals that also overlap the promoted node, and move those intervals into the promoted node. As a consequence, this may result in new empty nodes, which must be deleted, following the same algorithm again.
Both insertion and deletion require O(log n) time, with n being the total number of intervals.
Use a simple ordered tree, for example a binary search tree
or self-balancing binary search tree
, where the tree is ordered by the 'low' values of the intervals, and an extra annotation is added to every node recording the maximum high value of both its subtrees. It is simple to maintain this attribute in only O(h) steps during each addition or removal of a node, where h is the height of the node added or removed in the tree, by updating all ancestors of the node from the bottom up. Additionally, the tree rotation
s used during insertion and deletion may require updating the high value of the affected nodes.
Now, it's known that two intervals A and B overlap only when both A.low ≤ B.high and A.high ≥ B.low. When searching the trees for nodes overlapping with a given interval, you can immediately skip:
A total order can be defined on the intervals by ordering them first by their 'low' value and finally by their
'high' value. This ordering can be used to prevent duplicate intervals from being inserted into the tree in O(log n) time, versus the O(k + log n) time required to find duplicates if k intervals overlap a new interval.
The code to search for an interval is exactly the same except for the check in the middle:
overlapsWith is defined as:
A much simpler solution is to use nested interval trees. First, create a tree using the ranges for the y coordinate. Now, for each node in the tree, add another interval tree on the x ranges, for all elements whose y range intersect that node's y range.
The advantage of this solution is that it can be extended to an arbitrary amount of dimensions using the same code base.
At first, the cost for the additional trees might seem prohibitive but that is usually not the case. As with the solution above, you need one node per x coordinate, so this cost is the same in both solutions. The only difference is that you need an additional tree structure per vertical interval. This structure is typically very small (a pointer to the root node plus maybe the number of nodes and the height of the tree).
External links
Computer science
Computer science or computing science is the study of the theoretical foundations of information and computation and of practical techniques for their implementation and application in computer systems...
, an interval tree is an ordered tree data structure
Data structure
In computer science, a data structure is a particular way of storing and organizing data in a computer so that it can be used efficiently.Different kinds of data structures are suited to different kinds of applications, and some are highly specialized to specific tasks...
to hold intervals
Interval (mathematics)
In mathematics, a interval is a set of real numbers with the property that any number that lies between two numbers in the set is also included in the set. For example, the set of all numbers satisfying is an interval which contains and , as well as all numbers between them...
. Specifically, it allows one to efficiently find all intervals that overlap with any given interval or point. It is often used for windowing queries, for example, to find all roads on a computerized map inside a rectangular viewport, or to find all visible elements inside a three-dimensional scene. A similar data structure is the segment tree
Segment tree
In computer science, a segment tree is a tree data structure for storing intervals, or segments. It allows querying which of the stored segments contain a given point. It is, in principle, a static structure; that is, its content cannot be modified once the structure is built...
.
The trivial solution is to visit each interval and test whether it intersects the given point or interval, which requires Θ(n) time, where n is the number of intervals in the collection. Since a query may return all intervals, for example if the query is a large interval intersecting all intervals in the collection, this is asymptotically optimal
Asymptotically optimal
In computer science, an algorithm is said to be asymptotically optimal if, roughly speaking, for large inputs it performs at worst a constant factor worse than the best possible algorithm...
; however, we can do better by considering output-sensitive algorithm
Output-sensitive algorithm
In computer science, an output-sensitive algorithm is an algorithm whose running time depends not only on the size of the input but also on the size of the output...
s, where the runtime is expressed in terms of m, the number of intervals produced by the query. Interval trees are dynamic, i.e., they allow insertion and deletion of intervals. They obtain a query time of O(log n) while the preprocessing time to construct the data structure is O(n log n) (but the space consumption is O(n)).
Naive approach
In a simple case, the intervals do not overlap and they can be inserted into a simple binary treeBinary search tree
In computer science, a binary search tree , which may sometimes also be called an ordered or sorted binary tree, is a node-based binary tree data structurewhich has the following properties:...
and queried in O(log n) time. However, with arbitrarily overlapping intervals, there is no way to compare two intervals for insertion into the tree since orderings sorted by the beginning points or the ending points may be different. A naive approach might be to build two parallel trees, one ordered by the beginning point, and one ordered by the ending point of each interval. This allows discarding half of each tree in O(log n) time, but the results must be merged, requiring O(n) time. This gives us queries in O(n + log n) = O(n), which is no better than brute-force.
Interval trees solve this problem. This article describes two alternative designs for an interval tree, dubbed the centered interval tree and the augmented tree.
Centered interval tree
Queries require O(log n + m) time, with n being the total number of intervals and m being the number of reported results. Construction requires O(n log n) time, and storage requires O(n) space.Construction
Given a set of n intervals on the number line, we want to construct a data structure so that we can efficiently retrieve all intervals overlapping another interval or point.We start by taking the entire range of all the intervals and dividing it in half at x_center (in practice, x_center should be picked to keep the tree relatively balanced). This gives three sets of intervals, those completely to the left of x_center which we'll call S_left, those completely to the right of x_center which we'll call S_right, and those overlapping x_center which we'll call S_center.
The intervals in S_left and S_right are recursively divided in the same manner until there are no intervals left.
The intervals in S_center that overlap the center point are stored in a separate data structure linked to the node in the interval tree. This data structure consists of two lists, one containing all the intervals sorted by their beginning points, and another containing all the intervals sorted by their ending points.
The result is a binary tree with each node storing:
- A center point
- A pointer to another node containing all intervals completely to the left of the center point
- A pointer to another node containing all intervals completely to the right of the center point
- All intervals overlapping the center point sorted by their beginning point
- All intervals overlapping the center point sorted by their ending point
Intersecting
Given the data structure constructed above, we receive queries consisting of ranges or points, and return all the ranges in the original set overlapping this input.With an Interval
First, we can reduce the case where an interval R is given as input to the simpler case where a single point is given as input. We first find all ranges with beginning or end points inside the input interval R using a separately constructed tree. In the one-dimensional case, we can use a simple tree containing all the beginning and ending points in the interval set, each with a pointer to its corresponding interval.A binary search in O(log n) time for the beginning and end of R reveals the minimum and maximum points to consider. Each point within this range references an interval that overlaps our range and is added to the result list. Care must be taken to avoid duplicates, since an interval might both begin and end within R. This can be done using a binary flag on each interval to mark whether or not it has been added to the result set.
The only intervals not yet considered are those overlapping R that do not have an endpoint inside R, in other words, intervals that enclose it. To find these, we pick any point inside R and use the algorithm below to find all intervals intersecting that point (again, being careful to remove duplicates).
With a Point
The task is to find all intervals in the tree that overlap a given point x. The tree is walked with a similar recursive algorithm as would be used to traverse a traditional binary tree, but with extra affordance for the intervals overlapping the "center" point at each node.For each tree node, x is compared to x_center, the midpoint used in node construction above. If x is less than x_center, the leftmost set of intervals, S_left, is considered. If x is greater than x_center, the rightmost set of intervals, S_right, is considered.
As each node is processed as we traverse the tree from the root to a leaf, the ranges in its S_center are processed. If x is less than x_center, we know that all intervals in S_center end after x, or they could not also overlap x_center. Therefore, we need only find those intervals in S_center that begin before x. We can consult the lists of S_center that have already been constructed. Since we only care about the interval beginnings in this scenario, we can consult the list sorted by beginnings. Suppose we find the closest number no greater than x in this list. All ranges from the beginning of the list to that found point overlap x because they begin before x and end after x (as we know because they overlap x_center which is larger than x). Thus, we can simply start enumerating intervals in the list until the endpoint value exceeds x.
Likewise, if x is greater than x_center, we know that all intervals in S_center must begin before x, so we find those intervals that end after x using the list sorted by interval endings.
If x exactly matches x_center, all intervals in S_center can be added to the results without further processing and tree traversal can be stopped.
Higher Dimensions
The interval tree data structure can be generalized to a higher dimension N with identical query and construction time and O(n log n) space.First, a range tree
Range tree
In computer science, a range tree is an ordered tree data structure to hold a list of points. It allows all points within a given range to be efficiently retrieved, and is typically used in two or higher dimensions....
in N dimensions is constructed that allows efficient retrieval of all intervals with beginning and end points inside the query region R. Once the corresponding ranges are found, the only thing that is left are those ranges that enclose the region in some dimension. To find these overlaps, N interval trees are created, and one axis intersecting R is queried for each. For example, in two dimensions, the bottom of the square R (or any other horizontal line intersecting R) would be queried against the interval tree constructed for the horizontal axis. Likewise, the left (or any other vertical line intersecting R) would be queried against the interval tree constructed on the vertical axis.
Each interval tree also needs an addition for higher dimensions. At each node we traverse in the tree, x is compared with S_center to find overlaps. Instead of two sorted lists of points as was used in the one-dimensional case, a range tree is constructed. This allows efficient retrieval of all points in S_center that overlap region R.
Deletion
If after deleting an interval from the tree, the node containing that interval contains no more intervals, that node may be deleted from the tree. This is more complex than a normal binary tree deletion operation.An interval may overlap the center point of several nodes in the tree. Since each node stores the intervals that overlap it, with all intervals completely to the left of its center point in the left subtree, similarly for the right subtree, it follows that each interval is stored in the node closest to the root from the set of nodes whose center point it overlaps.
Normal deletion operations in a binary tree (for the case where the node being deleted has two children) involve promoting a node further from the root to the position of the node being deleted (usually the leftmost child of the right subtree, or the rightmost child of the left subtree). As a result of this promotion, some nodes that were above the promoted node will become descendents of it; it is necessary to search these nodes for intervals that also overlap the promoted node, and move those intervals into the promoted node. As a consequence, this may result in new empty nodes, which must be deleted, following the same algorithm again.
Balancing
The same issues that affect deletion also affect rotation operations; rotation must preserve the invariant that intervals are stored as close to the root as possible.Augmented tree
Another way to represent intervals is described in .Both insertion and deletion require O(log n) time, with n being the total number of intervals.
Use a simple ordered tree, for example a binary search tree
Binary search tree
In computer science, a binary search tree , which may sometimes also be called an ordered or sorted binary tree, is a node-based binary tree data structurewhich has the following properties:...
or self-balancing binary search tree
Self-balancing binary search tree
In computer science, a self-balancing binary search tree is any node based binary search tree that automatically keeps its height small in the face of arbitrary item insertions and deletions....
, where the tree is ordered by the 'low' values of the intervals, and an extra annotation is added to every node recording the maximum high value of both its subtrees. It is simple to maintain this attribute in only O(h) steps during each addition or removal of a node, where h is the height of the node added or removed in the tree, by updating all ancestors of the node from the bottom up. Additionally, the tree rotation
Tree rotation
A tree rotation is an operation on a binary tree that changes the structure without interfering with the order of the elements. A tree rotation moves one node up in the tree and one node down...
s used during insertion and deletion may require updating the high value of the affected nodes.
Now, it's known that two intervals A and B overlap only when both A.low ≤ B.high and A.high ≥ B.low. When searching the trees for nodes overlapping with a given interval, you can immediately skip:
- all nodes to the right of nodes whose low value is past the end of the given interval.
- all nodes that have their maximum 'high' value below the start of the given interval.
A total order can be defined on the intervals by ordering them first by their 'low' value and finally by their
'high' value. This ordering can be used to prevent duplicate intervals from being inserted into the tree in O(log n) time, versus the O(k + log n) time required to find duplicates if k intervals overlap a new interval.
Java Example: Adding a new interval to the tree
The key of each node is the interval itself and the value of each node is the end point of the interval:Java Example: Searching a point or an interval in the tree
To search for an interval, you walk the tree, omitting those branches which can't contain what you're looking for. The simple case is looking for a point:The code to search for an interval is exactly the same except for the check in the middle:
overlapsWith is defined as:
Higher dimension
This can be extended to higher dimensions by cycling through the dimensions at each level of the tree. For example, for two dimensions, the odd levels of the tree might contain ranges for the x coordinate, while the even levels contain ranges for the y coordinate. However, it is not quite obvious how the rotation logic will have to be extended for such cases to keep the tree balanced.A much simpler solution is to use nested interval trees. First, create a tree using the ranges for the y coordinate. Now, for each node in the tree, add another interval tree on the x ranges, for all elements whose y range intersect that node's y range.
The advantage of this solution is that it can be extended to an arbitrary amount of dimensions using the same code base.
At first, the cost for the additional trees might seem prohibitive but that is usually not the case. As with the solution above, you need one node per x coordinate, so this cost is the same in both solutions. The only difference is that you need an additional tree structure per vertical interval. This structure is typically very small (a pointer to the root node plus maybe the number of nodes and the height of the tree).
External links
- CGAL : Computational Geometry Algorithms Library in C++ contains a robust implementation of Range Trees