Nested set model
Encyclopedia
The nested set model is a particular technique for representing nested sets (also known as tree
s or hierarchies
) in relational database
s.
The term was apparently introduced by Joe Celko
; others describe the same technique without naming it or using different terms.
and relational calculus
, and the SQL
operations based on them, are unable to express all desirable operations on hierarchies directly. A hierarchy can be expressed in terms of a parent-child relation - Celko calls this the adjacency list model - but if it can have arbitrary depth, this does not allow the expression of operations such as comparing the contents of hierarchies of two elements, or determining whether an element is somewhere in the subhierarchy of another element. When the hierarchy is of fixed or bounded depth, the operations are possible, but expensive, due to the necessity of performing one relational join per level. This is often known as the bill of materials
problem.
Several resolutions exist and are available in some relational database management system
s:
When these solutions are not available or not feasible, another approach must be taken.
, which visits each node twice, assigning numbers in the order of visiting, and at both visits. This leaves two numbers for each node, which are stored as two attributes. Querying becomes inexpensive: hierarchy membership can be tested by comparing these numbers. Updating requires renumbering and is therefore expensive. Refinements which use rational number
s instead of integers can avoid renumbering, and so are faster to update, although much more complicated .
The "Clothing" category, with the highest position in the hierarchy, encompasses all subordinating categories. It is therefore given left and right domain values of 1 and 22, the latter value being the double of the total number of nodes being represented. The next hierarchical level contains "Men's" and "Women's", both containing levels within themselves that must be accounted for. Each level's data node is assigned left and right domain values according to the number of sublevels contained within, as shown in the table data.
. However, recursive SQL queries can be expected to perform comparably for 'find immediate descendants' queries, and much faster for other depth search queries, and so are the faster option for databases which provide them, such as PostgreSQL
, Oracle
, and Microsoft SQL Server
.
The Nested interval model does not suffer from this problem, but is more complex to implement, and is not as well known. The nested interval model stores the position of the nodes as rational numbers expressed as quotients (n/d). http://www.sigmod.org/publications/sigmod-record/0506/p47-article-tropashko.pdf
code example:
The query gets even more complicated when searching for children more than one level deep. To overcome this limitation and simplify Tree traversal
an additional column is added to the model to maintain the depth of a node within a tree.
In this model, finding the immediate children given a parent node can be accomplished with the following SQL
code:
Tree (data structure)
In computer science, a tree is a widely-used data structure that emulates a hierarchical tree structure with a set of linked nodes.Mathematically, it is an ordered directed tree, more specifically an arborescence: an acyclic connected graph where each node has zero or more children nodes and at...
s or hierarchies
Hierarchy
A hierarchy is an arrangement of items in which the items are represented as being "above," "below," or "at the same level as" one another...
) in relational database
Relational database
A relational database is a database that conforms to relational model theory. The software used in a relational database is called a relational database management system . Colloquial use of the term "relational database" may refer to the RDBMS software, or the relational database itself...
s.
The term was apparently introduced by Joe Celko
Joe Celko
Joe Celko is an American relational database expert from Austin, Texas. He has participated on the ANSI X3H2 Database Standards Committee, and helped write the SQL-89 and SQL-92 standards. He is the author of a Morgan-Kaufmann series of books on SQL, and over 1000 published articles on SQL and...
; others describe the same technique without naming it or using different terms.
Motivation
The technique is an answer to the problem that the standard relational algebraRelational algebra
Relational algebra, an offshoot of first-order logic , deals with a set of finitary relations that is closed under certain operators. These operators operate on one or more relations to yield a relation...
and relational calculus
Relational calculus
Relational calculus consists of two calculi, the tuple relational calculus and the domain relational calculus, that are part of the relational model for databases and provide a declarative way to specify database queries...
, and the SQL
SQL
SQL is a programming language designed for managing data in relational database management systems ....
operations based on them, are unable to express all desirable operations on hierarchies directly. A hierarchy can be expressed in terms of a parent-child relation - Celko calls this the adjacency list model - but if it can have arbitrary depth, this does not allow the expression of operations such as comparing the contents of hierarchies of two elements, or determining whether an element is somewhere in the subhierarchy of another element. When the hierarchy is of fixed or bounded depth, the operations are possible, but expensive, due to the necessity of performing one relational join per level. This is often known as the bill of materials
Bill of materials
A bill of materials is a list of the raw materials, sub-assemblies, intermediate assemblies, sub-components, components, parts and the quantities of each needed to manufacture an end product...
problem.
Several resolutions exist and are available in some relational database management system
Relational database management system
A relational database management system is a database management system that is based on the relational model as introduced by E. F. Codd. Most popular databases currently in use are based on the relational database model....
s:
- support for a dedicated hierarchy data type, such as in SQL's hierarchical queryHierarchical queryA hierarchical query is a type of SQL query that handles hierarchical model data.Standard SQL specifies hierarchical queries by way of recursive common table expressions...
facility; - extending the relational language with hierarchy manipulations, such as in the nested relational algebra.
- extending the relational language with transitive closureTransitive closureIn mathematics, the transitive closure of a binary relation R on a set X is the transitive relation R+ on set X such that R+ contains R and R+ is minimal . If the binary relation itself is transitive, then the transitive closure will be that same binary relation; otherwise, the transitive closure...
, such as SQL's CONNECT statement; this allows a parent-child relation to be used, but execution remains expensive; - the queries can be expressed in a language that supports iteration and is wrapped around the relational operations, such as PL/SQLPL/SQLPL/SQL is Oracle Corporation's procedural extension language for SQL and the Oracle relational database...
, T-SQL or a general-purpose programming languageGeneral-purpose programming languageIn computer software a general-purpose programming language is a programming language designed to be used for writing software in a wide variety of application domains...
When these solutions are not available or not feasible, another approach must be taken.
The technique
The nested set model is to number the nodes according to a tree traversalTree traversal
In computer science, tree-traversal refers to the process of visiting each node in a tree data structure, exactly once, in a systematic way. Such traversals are classified by the order in which the nodes are visited...
, which visits each node twice, assigning numbers in the order of visiting, and at both visits. This leaves two numbers for each node, which are stored as two attributes. Querying becomes inexpensive: hierarchy membership can be tested by comparing these numbers. Updating requires renumbering and is therefore expensive. Refinements which use rational number
Rational number
In mathematics, a rational number is any number that can be expressed as the quotient or fraction a/b of two integers, with the denominator b not equal to zero. Since b may be equal to 1, every integer is a rational number...
s instead of integers can avoid renumbering, and so are faster to update, although much more complicated .
Example
In a clothing store catalog, clothing may be categorized according to the hierarchy given on the left:Node | Left | Right |
Clothing | 1 | 22 |
Men's | 2 | 9 |
Women's | 10 | 21 |
Suits | 3 | 8 |
Slacks | 4 | 5 |
Jackets | 6 | 7 |
Dresses | 11 | 16 |
Skirts | 17 | 18 |
Blouses | 19 | 20 |
Evening Gowns | 12 | 13 |
Sun Dresses | 14 | 15 |
The "Clothing" category, with the highest position in the hierarchy, encompasses all subordinating categories. It is therefore given left and right domain values of 1 and 22, the latter value being the double of the total number of nodes being represented. The next hierarchical level contains "Men's" and "Women's", both containing levels within themselves that must be accounted for. Each level's data node is assigned left and right domain values according to the number of sublevels contained within, as shown in the table data.
Performance
Queries using nested sets can be expected to be faster than queries using a stored procedure to traverse an adjacency list, and so are the faster option for databases which lack native recursive query constructs, such as MySQLMySQL
MySQL officially, but also commonly "My Sequel") is a relational database management system that runs as a server providing multi-user access to a number of databases. It is named after developer Michael Widenius' daughter, My...
. However, recursive SQL queries can be expected to perform comparably for 'find immediate descendants' queries, and much faster for other depth search queries, and so are the faster option for databases which provide them, such as PostgreSQL
PostgreSQL
PostgreSQL, often simply Postgres, is an object-relational database management system available for many platforms including Linux, FreeBSD, Solaris, MS Windows and Mac OS X. It is released under the PostgreSQL License, which is an MIT-style license, and is thus free and open source software...
, Oracle
Oracle Database
The Oracle Database is an object-relational database management system produced and marketed by Oracle Corporation....
, and Microsoft SQL Server
Microsoft SQL Server
Microsoft SQL Server is a relational database server, developed by Microsoft: It is a software product whose primary function is to store and retrieve data as requested by other software applications, be it those on the same computer or those running on another computer across a network...
.
Drawbacks
Nested set are very slow for inserts because it requires updating lft and rgt for all records in the table after the insert. This can cause a lot of database thrash as many rows are rewritten and indexes rebuilt.The Nested interval model does not suffer from this problem, but is more complex to implement, and is not as well known. The nested interval model stores the position of the nodes as rational numbers expressed as quotients (n/d). http://www.sigmod.org/publications/sigmod-record/0506/p47-article-tropashko.pdf
Variations
Using the nested set model as described above has some performance limitations during certain tree traversal operations. For example, trying to find the immediate child nodes given a parent node requires pruning the subtree to a specific level as in the following SQLSQL
SQL is a programming language designed for managing data in relational database management systems ....
code example:
The query gets even more complicated when searching for children more than one level deep. To overcome this limitation and simplify Tree traversal
Tree traversal
In computer science, tree-traversal refers to the process of visiting each node in a tree data structure, exactly once, in a systematic way. Such traversals are classified by the order in which the nodes are visited...
an additional column is added to the model to maintain the depth of a node within a tree.
Node | Left | Right | Depth |
Clothing | 1 | 22 | 0 |
Men's | 2 | 9 | 1 |
Women's | 10 | 21 | 1 |
Suits | 3 | 8 | 2 |
Slacks | 4 | 5 | 3 |
Jackets | 6 | 7 | 3 |
Dresses | 11 | 16 | 2 |
Skirts | 17 | 18 | 2 |
Blouses | 19 | 20 | 2 |
Evening Gowns | 12 | 13 | 3 |
Sun Dresses | 14 | 15 | 3 |
In this model, finding the immediate children given a parent node can be accomplished with the following SQL
SQL
SQL is a programming language designed for managing data in relational database management systems ....
code: