T-tree
Encyclopedia
In computer science
a T-tree is a type of binary tree
data structure
that is used by main-memory databases
, such as
Datablitz
, eXtremeDB, MySQL Cluster
, Oracle TimesTen
and MobileLite.
A T-tree is a balanced index tree data structure optimized for cases
where both the index and the actual data are fully kept in memory,
just as a B-tree
is an index structure optimized for storage on block
oriented secondary storage devices like hard disks. T-trees seek to gain the performance benefits
of in-memory tree structures such as AVL trees while avoiding the large storage space overhead which
is common to them.
T-trees do not keep copies of the indexed data fields within the index tree nodes themselves. Instead, they take advantage of the fact that the actual data is always in main memory together with the index so that they just contain pointers to the actual data fields.
The 'T' in T-tree refers to the shape of the node data structures
in the original paper that first described this type of index.
The main reason seems to be that the traditional assumption of memory references having uniform cost is no longer valid given the current speed gap between cache access and main memory access.
an ordered array of data pointers and some extra control data. Nodes with two subtrees
are called internal nodes, nodes without subtrees are called leaf nodes
and nodes with only one subtree are named half-leaf nodes.
A node is called the bounding node for a value if the value is between the node's current minimum and maximum value, inclusively.
For each internal node leaf or half leaf nodes exist that contain the predecessor of its smallest
data value (called the greatest lower bound) and one that contains the successor of its largest
data value (called the least upper bound). Leaf and half-leaf nodes can contain any number of
data elements from one to the maximum size of the data array. Internal nodes keep their occupancy
between predefined minimum and maximum numbers of elements
If a new node was added then the tree might need to be rebalanced, as described below.
Now we have to distinguish by node type:
If the node's data array now has less than the minimum number of elements then move the greatest lower bound value of this node to its data value. Proceed with one of the following two steps for the half leaf or leaf node the value was removed from.
If this was the only element in the data array then delete the node. Rebalance the tree if needed.
If the node's data array can be merged with its leaf's data array without overflow then do so and remove the leaf node. Rebalance the tree if needed.
.
Specifically, Lehman and Carey's article describes a T-tree balanced like an AVL tree
: it becomes out of balance when a node's child trees differ in height by at least two levels.
This can happen after an insertion or deletion of a node.
After an insertion or deletion, the tree is scanned from the leaf to the root.
If an imbalance is found, one tree rotation
or pair of rotations is performed, which is guaranteed to balance the whole tree.
When the rotation results in an internal node having fewer than the minimum number of items, items from the node's new child(ren) are moved into the internal node.
Computer science
Computer science or computing science is the study of the theoretical foundations of information and computation and of practical techniques for their implementation and application in computer systems...
a T-tree is a type of binary tree
Binary tree
In computer science, a binary tree is a tree data structure in which each node has at most two child nodes, usually distinguished as "left" and "right". Nodes with children are parent nodes, and child nodes may contain references to their parents. Outside the tree, there is often a reference to...
data structure
Data structure
In computer science, a data structure is a particular way of storing and organizing data in a computer so that it can be used efficiently.Different kinds of data structures are suited to different kinds of applications, and some are highly specialized to specific tasks...
that is used by main-memory databases
Main Memory database
An in-memory database is a database management system that primarily relies on main memory for computer data storage. It is contrasted with database management systems which employ a disk storage mechanism...
, such as
Datablitz
Datablitz
DataBlitz is a general purpose main memory database management system, developed by Lucent Bell Labs Research in 1995. It replaced various home-grown database products used throughout Lucent beginning in 1997....
, eXtremeDB, MySQL Cluster
MySQL Cluster
MySQL Cluster is a technology which provides shared-nothing clustering capabilities for the MySQL database management system. It was first included in the production release of MySQL 4.1 in November 2004. It is designed to provide high availability and high performance, while allowing for nearly...
, Oracle TimesTen
TimesTen
TimesTen is an in-memory relational database software product from Oracle Corporation. TimesTen is designed for low latency, high-volume data, event and transaction management. Unlike disk-optimized relational databases such as the Oracle database, DB2, Informix, and SQL Server, TimesTen's data is...
and MobileLite.
A T-tree is a balanced index tree data structure optimized for cases
where both the index and the actual data are fully kept in memory,
just as a B-tree
B-tree
In computer science, a B-tree is a tree data structure that keeps data sorted and allows searches, sequential access, insertions, and deletions in logarithmic time. The B-tree is a generalization of a binary search tree in that a node can have more than two children...
is an index structure optimized for storage on block
oriented secondary storage devices like hard disks. T-trees seek to gain the performance benefits
of in-memory tree structures such as AVL trees while avoiding the large storage space overhead which
is common to them.
T-trees do not keep copies of the indexed data fields within the index tree nodes themselves. Instead, they take advantage of the fact that the actual data is always in main memory together with the index so that they just contain pointers to the actual data fields.
The 'T' in T-tree refers to the shape of the node data structures
in the original paper that first described this type of index.
Performance
Although T-trees seem to be widely used for main-memory databases, recent research indicates that they actually do not perform better than B-trees on modern hardware:The main reason seems to be that the traditional assumption of memory references having uniform cost is no longer valid given the current speed gap between cache access and main memory access.
Node structures
A T-tree node usually consists of pointers to the parent node, the left and right child node,an ordered array of data pointers and some extra control data. Nodes with two subtrees
are called internal nodes, nodes without subtrees are called leaf nodes
and nodes with only one subtree are named half-leaf nodes.
A node is called the bounding node for a value if the value is between the node's current minimum and maximum value, inclusively.
For each internal node leaf or half leaf nodes exist that contain the predecessor of its smallest
data value (called the greatest lower bound) and one that contains the successor of its largest
data value (called the least upper bound). Leaf and half-leaf nodes can contain any number of
data elements from one to the maximum size of the data array. Internal nodes keep their occupancy
between predefined minimum and maximum numbers of elements
Search
- Search starts at the root node
- If the current node is the bounding node for the search value then search its data array. Search fails if the value is not found in the data array.
- If the search value is less than the minimum value of the current node then continue search in its left subtree. Search fails if there is no left subtree.
- If the search value is greater than the maximum value of the current node then continue search in its right subtree. Search failes if there is no right subtree.
Insertion
- Search for a bounding node for the new value. If such a node exist then
- check whether there is still space in its data array, if so then insert the new value and finish
- if no space is available then remove the minimum value from the node's data array and insert the new value. Now proceed to the node holding the greatest lower bound for the node that the new value was inserted to. If the removed minimum value still fits in there then add it as the new maximum value of the node, else create a new right subnode for this node.
- If no bounding node was found then insert the value into the last node searched if it still fits into it. In this case the new value will either become the new minimum or maximum value. If the value doesn't fit anymore then create a new left or right subtree.
If a new node was added then the tree might need to be rebalanced, as described below.
Deletion
- Search for bounding node of the value to be deleted. If no bounding node is found then finish.
- If the bounding node does not contain the value then finish.
- delete the value from the node's data array
Now we have to distinguish by node type:
- Internal node:
If the node's data array now has less than the minimum number of elements then move the greatest lower bound value of this node to its data value. Proceed with one of the following two steps for the half leaf or leaf node the value was removed from.
- Leaf node:
If this was the only element in the data array then delete the node. Rebalance the tree if needed.
- Half leaf node:
If the node's data array can be merged with its leaf's data array without overflow then do so and remove the leaf node. Rebalance the tree if needed.
Rotation and balancing
A T-tree is implemented on top of an underlying self-balancing binary search treeSelf-balancing binary search tree
In computer science, a self-balancing binary search tree is any node based binary search tree that automatically keeps its height small in the face of arbitrary item insertions and deletions....
.
Specifically, Lehman and Carey's article describes a T-tree balanced like an AVL tree
AVL tree
In computer science, an AVL tree is a self-balancing binary search tree, and it was the first such data structure to be invented. In an AVL tree, the heights of the two child subtrees of any node differ by at most one. Lookup, insertion, and deletion all take O time in both the average and worst...
: it becomes out of balance when a node's child trees differ in height by at least two levels.
This can happen after an insertion or deletion of a node.
After an insertion or deletion, the tree is scanned from the leaf to the root.
If an imbalance is found, one tree rotation
Tree rotation
A tree rotation is an operation on a binary tree that changes the structure without interfering with the order of the elements. A tree rotation moves one node up in the tree and one node down...
or pair of rotations is performed, which is guaranteed to balance the whole tree.
When the rotation results in an internal node having fewer than the minimum number of items, items from the node's new child(ren) are moved into the internal node.
See also
- Tree (graph theory)Tree (graph theory)In mathematics, more specifically graph theory, a tree is an undirected graph in which any two vertices are connected by exactly one simple path. In other words, any connected graph without cycles is a tree...
- Tree (set theory)Tree (set theory)In set theory, a tree is a partially ordered set In set theory, a tree is a partially ordered set (poset) In set theory, a tree is a partially ordered set (poset) (T, In set theory, a tree is a partially ordered set (poset) (T, ...
- Tree structureTree structureA tree structure is a way of representing the hierarchical nature of a structure in a graphical form. It is named a "tree structure" because the classic representation resembles a tree, even though the chart is generally upside down compared to an actual tree, with the "root" at the top and the...
- Exponential treeExponential treeAn exponential tree is almost identical to a binary search tree, with the exception that the dimension of the tree is not the same at all levels. In a normal binary search tree, each node has a dimension of 1, and has 2d children. In an exponential tree, the dimension equals the depth of the node,...
Other trees
- B-treeB-treeIn computer science, a B-tree is a tree data structure that keeps data sorted and allows searches, sequential access, insertions, and deletions in logarithmic time. The B-tree is a generalization of a binary search tree in that a node can have more than two children...
(2-3 tree2-3 treeIn computer science, a 2-3 tree is a type of data structure, a tree where every node with children has either two children and one data element or three children and two data elements...
, B+ treeB+ treeIn computer science, a B+ tree or B plus tree is a type of tree which represents sorted data in a way that allows for efficient insertion, retrieval and removal of records, each of which is identified by a key. It is a dynamic, multilevel index, with maximum and minimum bounds on the number of...
, B*-tree, UB-treeUB-treeThe UB-tree as proposed by Rudolf Bayer and Volker Markl is a balanced tree for storing and efficiently retrieving multidimensional data. It is basically a B+ tree with records stored according to Z-order, also called Morton order...
) - DSW algorithmDSW algorithmThe DSW algorithm, or in full Day/Stout/Warren algorithm, is a method for efficiently balancing binary search trees — that is, decreasing their height to O nodes, where n is the total number of nodes...
- Dancing treeDancing treeFor the film Dancing tree, see Dancing tree In computer science, a dancing tree is a tree data structure, which is similar to B+ tree. Invented by Hans Reiser, for use by the Reiser4 file system...
- Fusion treeFusion treeA fusion tree is a type of tree data structure in computer science. It implements an associative array with integer keys up to a fixed size; by exploiting the constant-time machine word multiplication operation available on many real processors, it is able to achieve all operations inO\lefttime ,...
- kd-treeKd-treeIn computer science, a k-d tree is a space-partitioning data structure for organizing points in a k-dimensional space. k-d trees are a useful data structure for several applications, such as searches involving a multidimensional search key...
- OctreeOctreeAn octree is a tree data structure in which each internal node has exactly eight children. Octrees are most often used to partition a three dimensional space by recursively subdividing it into eight octants. Octrees are the three-dimensional analog of quadtrees. The name is formed from oct + tree,...
- QuadtreeQuadtreeA quadtree is a tree data structure in which each internal node has exactly four children. Quadtrees are most often used to partition a two dimensional space by recursively subdividing it into four quadrants or regions. The regions may be square or rectangular, or may have arbitrary shapes. This...
- R-treeR-treeR-trees are tree data structures used for spatial access methods, i.e., for indexing multi-dimensional information such as geographical coordinates, rectangles or polygons. The R-tree was proposed by Antonin Guttman in 1984 and has found significant use in both research and real-world applications...
- Radix treeRadix treeIn computer science, a radix tree is a space-optimized trie data structure where each node with only one child is merged with its child. The result is that every internal node has at least two children. Unlike in regular tries, edges can be labeled with sequences of characters as well as single...
- T-tree
- T-pyramid
- Top Trees