Cartesian tree
Encyclopedia
In computer science
, a Cartesian tree is a binary tree
derived from a sequence of numbers; it can be uniquely defined from the properties that it is heap
-ordered and that a symmetric (in-order) traversal
of the tree returns the original sequence. Introduced by in the context of geometric range searching
data structure
s, Cartesian trees have also been used in the definition of the treap
and randomized binary search tree data structures for binary search problems. The Cartesian tree for a sequence may be constructed in linear time using a stack
-based algorithm for finding all nearest smaller values
in a sequence.
Based on the heap property, the root of the tree must be the smallest number in the sequence. From this, the tree itself may also be defined recursively: the root is the minimum value of the sequence, and the left and right subtrees are the Cartesian trees for the subsequences to the left and right of the root value. Therefore, the three properties above uniquely define the Cartesian tree.
If a sequence of numbers contains repetitions, the Cartesian tree may be defined by determining a consistent tie-breaking rule (for instance, determining that the first of two equal elements is treated as the smaller of the two) before applying the above rules.
An example of a Cartesian tree is shown in the figure above.
for range minimum queries
, a range searching
problem involving queries that ask for the minimum value in a contiguous subsequence of the original sequence. In a Cartesian tree, this minimum value may be found at the lowest common ancestor
of the leftmost and rightmost values in the subsequence. For instance, in the subsequence (12,10,20,15) of the sequence shown in the first illustration, the minimum value of the subsequence (10) forms the lowest common ancestor of the leftmost and rightmost values (12 and 15). Because lowest common ancestors may be found in constant time per query, using a data structure that takes linear space to store and that may be constructed in linear time, the same bounds hold for the range minimization problem.
reversed this relationship between the two data structure problems by showing that lowest common ancestors in an input tree could be solved efficiently applying a non-tree-based technique for range minimization. Their data structure uses an Euler tour technique to transform the input tree into a sequence and then finds range minima in the resulting sequence. The sequence resulting from this transformation has a special form (adjacent numbers, representing heights of adjacent nodes in the tree, differ by ±1) which they take advantage of in their data structure; to solve the range minimization problem for sequences that do not have this special form, they use Cartesian trees to transform the range minimization problem into a lowest common ancestor problem, and then apply the Euler tour technique to transform the problem again into one of range minimization for sequences with this special form.
The same range minimization problem may also be given an alternative interpretation in terms of two dimensional range searching. A collection of finitely many points in the Cartesian plane may be used to form a Cartesian tree, by sorting the points by their x-coordinates and using the y-coordinates in this order as the sequence of values from which this tree is formed. If S is the subset of the input points within some vertical slab defined by the inequalities L ≤ x ≤ R, p is the leftmost point in S (the one with minimum x-coordinate), and q is the rightmost point in S (the one with maximum x-coordinate) then the lowest common ancestor of p and q in the Cartesian tree is the bottommost point in the slab. A three-sided range query, in which the task is to list all points within a region bounded by the three inequalities L ≤ x ≤ R and y ≤ T, may be answered by finding this bottommost point b, comparing its y-coordinate to T, and (if the point lies within the three-sided region) continuing recursively in the two slabs bounded between p and b and between b and q. In this way, after the leftmost and rightmost points in the slab are identified, all points within the three-sided region may be listed in constant time per point.
The same construction, of lowest common ancestors in a Cartesian tree, makes it possible to construct a data structure with linear space that allows the distances between pairs of points in any ultrametric space to be queried in constant time per query. The distance within an ultrametric is the same as the minimax path
weight in the minimum spanning tree
of the metric. From the minimum spanning tree, one can construct a Cartesian tree, the root node of which represents the heaviest edge of the minimum spanning tree. Removing this edge partitions the minimum spanning tree into two subtrees, and Cartesian trees recursively constructed for these two subtrees form the children of the root node of the Cartesian tree. The leaves of the Cartesian tree represent points of the metric space, and the lowest common ancestor of two leaves in the Cartesian tree is the heaviest edge between those two points in the minimum spanning tree, which has weight equal to the distance between the two points. Once the minimum spanning tree has been found and its edge weights sorted, the Cartesian tree may be constructed in linear time.
Because a Cartesian tree is a binary tree, it is natural to use it as a binary search tree
for an ordered sequence of values. However, defining a Cartesian tree based on the same values that form the search keys of a binary search tree does not work well: the Cartesian tree of a sorted sequence is just a path
, rooted at its leftmost endpoint, and binary searching in this tree degenerates to sequential search in the path. However, it is possible to generate more-balanced search trees by generating priority values for each search key that are different than the key itself, sorting the inputs by their key values, and using the corresponding sequence of priorities to generate a Cartesian tree. This construction may equivalently be viewed in the geometric framework described above, in which the x-coordinates of a set of points are the search keys and the y-coordinates are the priorities.
This idea was applied by , who suggested the use of random numbers as priorities. The data structure resulting from this random choice is called a treap
, due to its combination of binary search tree and binary heap features. An insertion into a treap may be performed by inserting the new key as a leaf of an existing tree, choosing a priority for it, and then performing tree rotation
operations along a path from the node to the root of the tree to repair any violations of the heap property caused by this insertion; a deletion may similarly be performed by a constant amount of change to the tree followed by a sequence of rotations along a single path in the tree.
If the priorities of each key are chosen randomly and independently once whenever the key is inserted into the tree, the resulting Cartesian tree will have the same properties as a random binary search tree, a tree computed by inserting the keys in a randomly chosen permutation
starting from an empty tree, with each insertion leaving the previous tree structure unchanged and inserting the new node as a leaf of the tree. Random binary search trees had been studied for much longer, and are known to behave well as search trees (they have logarithm
ic depth with high probability); the same good behavior carries over to treaps. It is also possible, as suggested by Aragon and Seidel, to reprioritize frequently-accessed nodes, causing them to move towards the root of the treap and speeding up future accesses for the same keys.
One method is to simply process the sequence values in left-to-right order, maintaining the Cartesian tree of the nodes processed so far, in a structure that allows both upwards and downwards traversal of the tree. To process each new value x, start at the node representing the value prior to x in the sequence and follow the path from this node to the root of the tree until finding a value y smaller than x. This node y is the parent of x, and the previous right child of y becomes the new left child of x. The total time for this procedure is linear, because the time spent searching for the parent y of each new node x can be charged
against the number of nodes that are removed from the rightmost path in the tree.
An alternative linear-time construction algorithm is based on the all nearest smaller values
problem. In the input sequence, one may define the left neighbor of a value x to be the value that occurs prior to x, is smaller than x, and is closer in position to x than any other smaller value. The right neighbor is defined symmetrically. The sequence of left neighbors may be found by an algorithm that maintains a stack
containing a subsequence of the input. For each new sequence value x, the stack is popped until it is empty or its top element is smaller than x, and then x is pushed onto the stack. The left neighbor of x is the top element at the time x is pushed. The right neighbors may be found by applying the same stack algorithm to the reverse of the sequence. The parent of x in the Cartesian tree is either the left neighbor of x or the right neighbor of x, whichever exists and has a larger value. The left and right neighbors may also be constructed efficiently by parallel algorithm
s, so this formulation may be used to develop efficient parallel algorithms for Cartesian tree construction.
based on Cartesian trees. They describe the algorithm as based on a tree with the maximum at the root, but it may be modified straightforwardly to support a Cartesian tree with the convention that the minimum value is at the root. For consistency, it is this modified version of the algorithm that is described below.
The Levcopoulos–Petersson algorithm can be viewed as a version of selection sort
or heap sort that maintains a priority queue
of candidate minima, and that at each step finds and removes the minimum value in this queue, moving this value to the end of an output sequence. In their algorithm, the priority queue consists only of elements whose parent in the Cartesian tree has already been found and removed. Thus, the algorithm consists of the following steps:
As Levcopoulos and Petersson show, for input sequences that are already nearly sorted, the size of the priority queue will remain small, allowing this method to take advantage of the nearly-sorted input and run more quickly. Specifically, the worst-case running time of this algorithm is O(n log k), where k is the average, over all values x in the sequence, of the number of consecutive pairs of sequence values that bracket x. They also prove a lower bound stating that, for any n and k = ω(1), any comparison-based sorting algorithm must use Ω(n log k) comparisons for some inputs.
and subsequent authors followed the definition here in which a Cartesian tree is defined from a sequence; this change generalizes the geometric setting of Vuillemin to allow sequences other than the sorted order of x-coordinates, and allows the Cartesian tree to be applied to non-geometric problems as well.
Computer science
Computer science or computing science is the study of the theoretical foundations of information and computation and of practical techniques for their implementation and application in computer systems...
, a Cartesian tree is a binary tree
Binary tree
In computer science, a binary tree is a tree data structure in which each node has at most two child nodes, usually distinguished as "left" and "right". Nodes with children are parent nodes, and child nodes may contain references to their parents. Outside the tree, there is often a reference to...
derived from a sequence of numbers; it can be uniquely defined from the properties that it is heap
Heap (data structure)
In computer science, a heap is a specialized tree-based data structure that satisfies the heap property: if B is a child node of A, then key ≥ key. This implies that an element with the greatest key is always in the root node, and so such a heap is sometimes called a max-heap...
-ordered and that a symmetric (in-order) traversal
Tree traversal
In computer science, tree-traversal refers to the process of visiting each node in a tree data structure, exactly once, in a systematic way. Such traversals are classified by the order in which the nodes are visited...
of the tree returns the original sequence. Introduced by in the context of geometric range searching
Range searching
In its most general form, the range searching problem consists of preprocessing a set S of objects, in order to determine which objects from S a query object, called a range, intersects...
data structure
Data structure
In computer science, a data structure is a particular way of storing and organizing data in a computer so that it can be used efficiently.Different kinds of data structures are suited to different kinds of applications, and some are highly specialized to specific tasks...
s, Cartesian trees have also been used in the definition of the treap
Treap
In computer science, the treap and the randomized binary search tree are two closely related forms of binary search tree data structures that maintain a dynamic set of ordered keys and allow binary searches among the keys...
and randomized binary search tree data structures for binary search problems. The Cartesian tree for a sequence may be constructed in linear time using a stack
Stack (data structure)
In computer science, a stack is a last in, first out abstract data type and linear data structure. A stack can have any abstract data type as an element, but is characterized by only three fundamental operations: push, pop and stack top. The push operation adds a new item to the top of the stack,...
-based algorithm for finding all nearest smaller values
All nearest smaller values
In computer science, the all nearest smaller values problem involves computing, for each value in a sequence of numbers, the previous smaller value that has the closest position in the sequence to the given value...
in a sequence.
Definition
The Cartesian tree for a sequence of distinct numbers can be uniquely defined by the following properties:- The Cartesian tree for a sequence has one node for each number in the sequence. Each node is associated with a single sequence value.
- A symmetric (in-order) traversalTree traversalIn computer science, tree-traversal refers to the process of visiting each node in a tree data structure, exactly once, in a systematic way. Such traversals are classified by the order in which the nodes are visited...
of the tree results in the original sequence. That is, the left subtree consists of the values earlier than the root in the sequence order, while the right subtree consists of the values later than the root, and a similar ordering constraint holds at each lower node of the tree. - The tree has the heap propertyBinary heapA binary heap is a heap data structure created using a binary tree. It can be seen as a binary tree with two additional constraints:*The shape property: the tree is a complete binary tree; that is, all levels of the tree, except possibly the last one are fully filled, and, if the last level of...
: the parent of any non-root node has a smaller value than the node itself.
Based on the heap property, the root of the tree must be the smallest number in the sequence. From this, the tree itself may also be defined recursively: the root is the minimum value of the sequence, and the left and right subtrees are the Cartesian trees for the subsequences to the left and right of the root value. Therefore, the three properties above uniquely define the Cartesian tree.
If a sequence of numbers contains repetitions, the Cartesian tree may be defined by determining a consistent tie-breaking rule (for instance, determining that the first of two equal elements is treated as the smaller of the two) before applying the above rules.
An example of a Cartesian tree is shown in the figure above.
Range searching and lowest common ancestors
Cartesian trees may be used as part of an efficient data structureData structure
In computer science, a data structure is a particular way of storing and organizing data in a computer so that it can be used efficiently.Different kinds of data structures are suited to different kinds of applications, and some are highly specialized to specific tasks...
for range minimum queries
Range Minimum Query
Given an array A[1,n] of n ordered objects , a Range Minimum Query from i to j asks for the position of a minimum element in the sub-array A[i,j]....
, a range searching
Range searching
In its most general form, the range searching problem consists of preprocessing a set S of objects, in order to determine which objects from S a query object, called a range, intersects...
problem involving queries that ask for the minimum value in a contiguous subsequence of the original sequence. In a Cartesian tree, this minimum value may be found at the lowest common ancestor
Lowest common ancestor
The lowest common ancestor is a concept in graph theory and computer science. Let T be a rooted tree with n nodes. The lowest common ancestor is defined between two nodes v and w as the lowest node in T that has both v and w as descendants .The LCA of v and w in T is the shared ancestor of v...
of the leftmost and rightmost values in the subsequence. For instance, in the subsequence (12,10,20,15) of the sequence shown in the first illustration, the minimum value of the subsequence (10) forms the lowest common ancestor of the leftmost and rightmost values (12 and 15). Because lowest common ancestors may be found in constant time per query, using a data structure that takes linear space to store and that may be constructed in linear time, the same bounds hold for the range minimization problem.
reversed this relationship between the two data structure problems by showing that lowest common ancestors in an input tree could be solved efficiently applying a non-tree-based technique for range minimization. Their data structure uses an Euler tour technique to transform the input tree into a sequence and then finds range minima in the resulting sequence. The sequence resulting from this transformation has a special form (adjacent numbers, representing heights of adjacent nodes in the tree, differ by ±1) which they take advantage of in their data structure; to solve the range minimization problem for sequences that do not have this special form, they use Cartesian trees to transform the range minimization problem into a lowest common ancestor problem, and then apply the Euler tour technique to transform the problem again into one of range minimization for sequences with this special form.
The same range minimization problem may also be given an alternative interpretation in terms of two dimensional range searching. A collection of finitely many points in the Cartesian plane may be used to form a Cartesian tree, by sorting the points by their x-coordinates and using the y-coordinates in this order as the sequence of values from which this tree is formed. If S is the subset of the input points within some vertical slab defined by the inequalities L ≤ x ≤ R, p is the leftmost point in S (the one with minimum x-coordinate), and q is the rightmost point in S (the one with maximum x-coordinate) then the lowest common ancestor of p and q in the Cartesian tree is the bottommost point in the slab. A three-sided range query, in which the task is to list all points within a region bounded by the three inequalities L ≤ x ≤ R and y ≤ T, may be answered by finding this bottommost point b, comparing its y-coordinate to T, and (if the point lies within the three-sided region) continuing recursively in the two slabs bounded between p and b and between b and q. In this way, after the leftmost and rightmost points in the slab are identified, all points within the three-sided region may be listed in constant time per point.
The same construction, of lowest common ancestors in a Cartesian tree, makes it possible to construct a data structure with linear space that allows the distances between pairs of points in any ultrametric space to be queried in constant time per query. The distance within an ultrametric is the same as the minimax path
Widest path problem
In graph algorithms, the widest path problem, also known as the bottleneck shortest path problem or the maximum capacity path problem, is the problem of finding a path between two designated vertices in a weighted directed graph, maximizing the weight of the minimum-weight edge in the path.For...
weight in the minimum spanning tree
Minimum spanning tree
Given a connected, undirected graph, a spanning tree of that graph is a subgraph that is a tree and connects all the vertices together. A single graph can have many different spanning trees...
of the metric. From the minimum spanning tree, one can construct a Cartesian tree, the root node of which represents the heaviest edge of the minimum spanning tree. Removing this edge partitions the minimum spanning tree into two subtrees, and Cartesian trees recursively constructed for these two subtrees form the children of the root node of the Cartesian tree. The leaves of the Cartesian tree represent points of the metric space, and the lowest common ancestor of two leaves in the Cartesian tree is the heaviest edge between those two points in the minimum spanning tree, which has weight equal to the distance between the two points. Once the minimum spanning tree has been found and its edge weights sorted, the Cartesian tree may be constructed in linear time.
Treaps
- Main article: TreapTreapIn computer science, the treap and the randomized binary search tree are two closely related forms of binary search tree data structures that maintain a dynamic set of ordered keys and allow binary searches among the keys...
Because a Cartesian tree is a binary tree, it is natural to use it as a binary search tree
Binary search tree
In computer science, a binary search tree , which may sometimes also be called an ordered or sorted binary tree, is a node-based binary tree data structurewhich has the following properties:...
for an ordered sequence of values. However, defining a Cartesian tree based on the same values that form the search keys of a binary search tree does not work well: the Cartesian tree of a sorted sequence is just a path
Path graph
In the mathematical field of graph theory, a path graph or linear graph is a particularly simple example of a tree, namely a tree with two or more vertices that is not branched at all, that is, contains only vertices of degree 2 and 1...
, rooted at its leftmost endpoint, and binary searching in this tree degenerates to sequential search in the path. However, it is possible to generate more-balanced search trees by generating priority values for each search key that are different than the key itself, sorting the inputs by their key values, and using the corresponding sequence of priorities to generate a Cartesian tree. This construction may equivalently be viewed in the geometric framework described above, in which the x-coordinates of a set of points are the search keys and the y-coordinates are the priorities.
This idea was applied by , who suggested the use of random numbers as priorities. The data structure resulting from this random choice is called a treap
Treap
In computer science, the treap and the randomized binary search tree are two closely related forms of binary search tree data structures that maintain a dynamic set of ordered keys and allow binary searches among the keys...
, due to its combination of binary search tree and binary heap features. An insertion into a treap may be performed by inserting the new key as a leaf of an existing tree, choosing a priority for it, and then performing tree rotation
Tree rotation
A tree rotation is an operation on a binary tree that changes the structure without interfering with the order of the elements. A tree rotation moves one node up in the tree and one node down...
operations along a path from the node to the root of the tree to repair any violations of the heap property caused by this insertion; a deletion may similarly be performed by a constant amount of change to the tree followed by a sequence of rotations along a single path in the tree.
If the priorities of each key are chosen randomly and independently once whenever the key is inserted into the tree, the resulting Cartesian tree will have the same properties as a random binary search tree, a tree computed by inserting the keys in a randomly chosen permutation
Permutation
In mathematics, the notion of permutation is used with several slightly different meanings, all related to the act of permuting objects or values. Informally, a permutation of a set of objects is an arrangement of those objects into a particular order...
starting from an empty tree, with each insertion leaving the previous tree structure unchanged and inserting the new node as a leaf of the tree. Random binary search trees had been studied for much longer, and are known to behave well as search trees (they have logarithm
Logarithm
The logarithm of a number is the exponent by which another fixed value, the base, has to be raised to produce that number. For example, the logarithm of 1000 to base 10 is 3, because 1000 is 10 to the power 3: More generally, if x = by, then y is the logarithm of x to base b, and is written...
ic depth with high probability); the same good behavior carries over to treaps. It is also possible, as suggested by Aragon and Seidel, to reprioritize frequently-accessed nodes, causing them to move towards the root of the treap and speeding up future accesses for the same keys.
Efficient construction
A Cartesian tree may be constructed in linear time from its input sequence.One method is to simply process the sequence values in left-to-right order, maintaining the Cartesian tree of the nodes processed so far, in a structure that allows both upwards and downwards traversal of the tree. To process each new value x, start at the node representing the value prior to x in the sequence and follow the path from this node to the root of the tree until finding a value y smaller than x. This node y is the parent of x, and the previous right child of y becomes the new left child of x. The total time for this procedure is linear, because the time spent searching for the parent y of each new node x can be charged
Potential method
In computational complexity theory, the potential method is a method used to analyze the amortized time and space complexity of a data structure, a measure of its performance over sequences of operations that smooths out the cost of infrequent but expensive operations.-Definition of amortized...
against the number of nodes that are removed from the rightmost path in the tree.
An alternative linear-time construction algorithm is based on the all nearest smaller values
All nearest smaller values
In computer science, the all nearest smaller values problem involves computing, for each value in a sequence of numbers, the previous smaller value that has the closest position in the sequence to the given value...
problem. In the input sequence, one may define the left neighbor of a value x to be the value that occurs prior to x, is smaller than x, and is closer in position to x than any other smaller value. The right neighbor is defined symmetrically. The sequence of left neighbors may be found by an algorithm that maintains a stack
Stack (data structure)
In computer science, a stack is a last in, first out abstract data type and linear data structure. A stack can have any abstract data type as an element, but is characterized by only three fundamental operations: push, pop and stack top. The push operation adds a new item to the top of the stack,...
containing a subsequence of the input. For each new sequence value x, the stack is popped until it is empty or its top element is smaller than x, and then x is pushed onto the stack. The left neighbor of x is the top element at the time x is pushed. The right neighbors may be found by applying the same stack algorithm to the reverse of the sequence. The parent of x in the Cartesian tree is either the left neighbor of x or the right neighbor of x, whichever exists and has a larger value. The left and right neighbors may also be constructed efficiently by parallel algorithm
Parallel algorithm
In computer science, a parallel algorithm or concurrent algorithm, as opposed to a traditional sequential algorithm, is an algorithm which can be executed a piece at a time on many different processing devices, and then put back together again at the end to get the correct result.Some algorithms...
s, so this formulation may be used to develop efficient parallel algorithms for Cartesian tree construction.
Application in sorting
describe a sorting algorithmSorting algorithm
In computer science, a sorting algorithm is an algorithm that puts elements of a list in a certain order. The most-used orders are numerical order and lexicographical order...
based on Cartesian trees. They describe the algorithm as based on a tree with the maximum at the root, but it may be modified straightforwardly to support a Cartesian tree with the convention that the minimum value is at the root. For consistency, it is this modified version of the algorithm that is described below.
The Levcopoulos–Petersson algorithm can be viewed as a version of selection sort
Selection sort
Selection sort is a sorting algorithm, specifically an in-place comparison sort. It has O time complexity, making it inefficient on large lists, and generally performs worse than the similar insertion sort...
or heap sort that maintains a priority queue
Priority queue
A priority queue is an abstract data type in computer programming.It is exactly like a regular queue or stack data structure, but additionally, each element is associated with a "priority"....
of candidate minima, and that at each step finds and removes the minimum value in this queue, moving this value to the end of an output sequence. In their algorithm, the priority queue consists only of elements whose parent in the Cartesian tree has already been found and removed. Thus, the algorithm consists of the following steps:
- Construct a Cartesian tree for the input sequence
- Initialize a priority queue, initially containing only the tree root
- While the priority queue is non-empty:
- Find and remove the minimum value x in the priority queue
- Add x to the output sequence
- Add the Cartesian tree children of x to the priority queue
As Levcopoulos and Petersson show, for input sequences that are already nearly sorted, the size of the priority queue will remain small, allowing this method to take advantage of the nearly-sorted input and run more quickly. Specifically, the worst-case running time of this algorithm is O(n log k), where k is the average, over all values x in the sequence, of the number of consecutive pairs of sequence values that bracket x. They also prove a lower bound stating that, for any n and k = ω(1), any comparison-based sorting algorithm must use Ω(n log k) comparisons for some inputs.
History
Cartesian trees were introduced and named by . The name is derived from the Cartesian coordinate system for the plane: in Vuillemin's version of this structure, as in the two-dimensional range searching application discussed above, a Cartesian tree for a point set has the sorted order of the points by their x-coordinates as its symmetric traversal order, and it has the heap property according to the y-coordinates of the points.and subsequent authors followed the definition here in which a Cartesian tree is defined from a sequence; this change generalizes the geometric setting of Vuillemin to allow sequences other than the sorted order of x-coordinates, and allows the Cartesian tree to be applied to non-geometric problems as well.