X-fast trie
Encyclopedia
X-fast trie | |
---|---|
Type | Trie Trie In computer science, a trie, or prefix tree, is an ordered tree data structure that is used to store an associative array where the keys are usually strings. Unlike a binary search tree, no node in the tree stores the key associated with that node; instead, its position in the tree defines the... |
Invented | 1982 |
Invented by | Dan Willard |
Asymptotic complexity in big O notation Big O notation In mathematics, big O notation is used to describe the limiting behavior of a function when the argument tends towards a particular value or infinity, usually in terms of simpler functions. It is a member of a larger family of notations that is called Landau notation, Bachmann-Landau notation, or... |
|
Space | O(n log M) |
Search | O(log log M) |
Insert | O(log M) amortized Amortized analysis In computer science, amortized analysis is a method of analyzing algorithms that considers the entire sequence of operations of the program. It allows for the establishment of a worst-case bound for the performance of an algorithm irrespective of the inputs by looking at all of the operations... |
Delete | O(log M) amortized |
In computer science
Computer science
Computer science or computing science is the study of the theoretical foundations of information and computation and of practical techniques for their implementation and application in computer systems...
, an x-fast trie is a data structure
Data structure
In computer science, a data structure is a particular way of storing and organizing data in a computer so that it can be used efficiently.Different kinds of data structures are suited to different kinds of applications, and some are highly specialized to specific tasks...
for storing integer
Integer
The integers are formed by the natural numbers together with the negatives of the non-zero natural numbers .They are known as Positive and Negative Integers respectively...
s from a bounded domain. It supports exact and predecessor or successor queries in time O(log log M), using O(n log M) space, where n is the number of stored values and M is the maximum value in the domain. The structure was proposed by Dan Willard in 1982, along with the more complicated y-fast trie
Y-fast trie
In computer science, a y-fast trie is a data structure for storing integers from a bounded domain. It supports exact and predecessor or successor queries in time O, using O space, where n is the number of stored values and M is the maximum value in the domain...
, as a way to improve the space usage of van Emde Boas tree
Van Emde Boas tree
A van Emde Boas tree , also known as a vEB tree, is a tree data structure which implements an associative array with m-bit integer keys. It performs all operations in O time...
s, while retaining the O(log log M) query time.
Structure
An x-fast trie is a bitwise trie: a binary treeBinary tree
In computer science, a binary tree is a tree data structure in which each node has at most two child nodes, usually distinguished as "left" and "right". Nodes with children are parent nodes, and child nodes may contain references to their parents. Outside the tree, there is often a reference to...
where each subtree stores values whose binary representations
Binary numeral system
The binary numeral system, or base-2 number system, represents numeric values using two symbols, 0 and 1. More specifically, the usual base-2 system is a positional notation with a radix of 2...
start with a common prefix. Each internal node is labeled with the common prefix of the values in its subtree and typically, the left child adds a 0 to the end of the prefix, while the right child adds a 1. The binary representation of an integer between 0 and M − 1 uses ⌈log2 M⌉ bits, so the height of the trie is O(log M).
All values in the x-fast trie are stored at the leaves. Internal nodes are stored only if they have leaves in their subtree. If an internal node would have no left child, it stores a pointer to the smallest leaf in its right subtree instead, called a descendant pointer. Likewise, if it would have no right child, it stores a pointer to the largest leaf in its left subtree. Each leaf stores a pointer to its predecessor and successor, thereby forming a double linked list. Finally, there is a hash table
Hash table
In computer science, a hash table or hash map is a data structure that uses a hash function to map identifying values, known as keys , to their associated values . Thus, a hash table implements an associative array...
for each level that contains all the nodes on that level. Together, these hash tables form the level-search structure (LSS). To guarantee the worst-case query times, these hash tables should use dynamic perfect hashing
Dynamic perfect hashing
In computer science, dynamic perfect hashing is a programming technique for resolving collisions in a hash table data structure. This technique is useful for situations where fast queries, insertions, and deletions must be made on a large set, S, of elements.-Details:In this method, the entries...
or cuckoo hashing
Cuckoo hashing
Cuckoo hashing is a scheme in computer programming for resolving hash collisions of values of hash functions in a table. Cuckoo hashing was first described by Rasmus Pagh and Flemming Friche Rodler in 2001...
.
The total space usage is O(n log M), since each element has a root-to-leaf path of length O(log M).
Operations
Like van Emde Boas treeVan Emde Boas tree
A van Emde Boas tree , also known as a vEB tree, is a tree data structure which implements an associative array with m-bit integer keys. It performs all operations in O time...
s, x-fast tries support the operations of an ordered associative array
Associative array
In computer science, an associative array is an abstract data type composed of a collection of pairs, such that each possible key appears at most once in the collection....
. This includes the usual associative array operations, along with two more order operations, Successor and Predecessor:
- Find(k): find the value associated with the given key
- Successor(k): find the key/value pair with the smallest key larger than or equal to the given key
- Predecessor(k): find the key/value pair with the largest key less than or equal to the given key
- Insert(k, v): insert the given key/value pair
- Delete(k): remove the key/value pair with the given key
Find
Finding the value associated with a key k that is in the data structure can be done in constant time by looking up k in LSS[0], which is a hash table on all the leaves.Successor and Predecessor
To find the successor or predecessor of a key k, we first find Ak, the lowest ancestor of k. This is the node in the trie that has the longest common prefix with k. To find Ak, we perform a binary search on the levels. We start at level h/2, where h is the height of the trie. On each level, we query the corresponding hash table in the level-search structure with the prefix of k of the right length. If a node with that prefix does not exist, we know that Ak must be at a higher level and we restrict our search to those. If a node with that prefix does exist, Ak can not be at a higher level, so we restrict our search to the current and lower levels.Once we find the lowest ancestor of k, we know that it has leaves in one of its subtrees (otherwise it wouldn't be in the trie) and k should be in the other subtree. Therefore the descendant pointer points to the successor or the predecessor of k. Depending on which one we are looking for, we might have to take one step in the linked list to the next or previous leaf.
Since the trie has height O(log M), the binary search for the lowest ancestor takes O(log log M) time. After that, the successor or predecessor can be found in constant time, so the total query time is O(log log M).
Insert
To insert a key-value pair (k, v), we first find the predecessor and successor of k. Then we create a new leaf for k, insert it in the linked list of leaves between the successor and predecessor, and give it a pointer to v. Next, we walk from the root to the new leaf, creating the necessary nodes on the way down, inserting them into the respective hash tables and updating descendant pointers where necessary.Since we have to walk down the entire height of the trie, this process takes O(log M) time.
Delete
To delete a key k, we find its leaf using the hash table on the leaves. We remove it from the linked list, but remember which were the successor and predecessor. Then we walk from the leaf to the root of the trie, removing all nodes whose subtree only contained k and updating the descendant pointers where necessary. Descendant pointers that used to point to k will now point to either the successor or predecessor of k, depending on which subtree is missing.Like insertion, this takes O(log M) time, as we have to walk through every level of the trie.
Discussion
Willard introduced x-fast tries largely as an introduction to y-fast trieY-fast trie
In computer science, a y-fast trie is a data structure for storing integers from a bounded domain. It supports exact and predecessor or successor queries in time O, using O space, where n is the number of stored values and M is the maximum value in the domain...
s, which provide the same query time, while using only O(n) space and allowing insertions and deletions in O(log log M) time.
A compression technique similar to patricia tries can be used to significantly reduce the space usage of x-fast tries in practice.
By using an exponential search before the binary search over the levels and by querying not only the current prefix x, but also its successor x + 1, x-fast tries can answer predecessor and successor queries in time O(log log Δ), where Δ is the difference between the query value and its predecessor or successor.