Sedna (database)
Encyclopedia
Sedna is an open source
database management system
that provides native
storage for XML
data.
The distinctive design decisions employed in Sedna are (i) schema
-based clustering storage strategy for XML data and (ii) memory management
based on layered address space
.
organization in Sedna is designed with the goal of providing a balance in performance
between XML queries and updates execution .
The 2 primary design decisions in data organization in Sedna are:
The following figure illustrates the overall principles of data organization in Sedna.
The descriptive schema represented as a tree
of schema nodes is the central component in the data organization.
Each schema node is labeled with an XML node kind (e.g. element, attribute, text, etc.) and has a pointer to data blocks
that store XML nodes
corresponding to the given schema node.
Depending on their node kind, some schema nodes are also labeled with names (e.g., element nodes, attribute nodes).
Data blocks related to a common schema node are linked via pointers into a bidirectional list.
Node descriptors in a list of blocks are partly ordered
according to document order .
Open source
The term open source describes practices in production and development that promote access to the end product's source materials. Some consider open source a philosophy, others consider it a pragmatic methodology...
database management system
Database management system
A database management system is a software package with computer programs that control the creation, maintenance, and use of a database. It allows organizations to conveniently develop databases for various applications by database administrators and other specialists. A database is an integrated...
that provides native
XML database
An XML database is a data persistence software system that allows data to be stored in XML format. This data can then be queried, exported and serialized into the desired format.Two major classes of XML database exist:...
storage for XML
XML
Extensible Markup Language is a set of rules for encoding documents in machine-readable form. It is defined in the XML 1.0 Specification produced by the W3C, and several other related specifications, all gratis open standards....
data.
The distinctive design decisions employed in Sedna are (i) schema
Database schema
A database schema of a database system is its structure described in a formal language supported by the database management system and refers to the organization of data to create a blueprint of how a database will be constructed...
-based clustering storage strategy for XML data and (ii) memory management
Memory management
Memory management is the act of managing computer memory. The essential requirement of memory management is to provide ways to dynamically allocate portions of memory to programs at their request, and freeing it for reuse when no longer needed. This is critical to the computer system.Several...
based on layered address space
Address space
In computing, an address space defines a range of discrete addresses, each of which may correspond to a network host, peripheral device, disk sector, a memory cell or other logical or physical entity.- Overview :...
.
Data Organization
DataData (computing)
In computer science, data is information in a form suitable for use with a computer. Data is often distinguished from programs. A program is a sequence of instructions that detail a task for the computer to perform...
organization in Sedna is designed with the goal of providing a balance in performance
Computer performance
Computer performance is characterized by the amount of useful work accomplished by a computer system compared to the time and resources used.Depending on the context, good computer performance may involve one or more of the following:...
between XML queries and updates execution .
The 2 primary design decisions in data organization in Sedna are:
- Direct pointers are used to represent XML node relationships such as parent, child, and sibling ones. Unlike relationalRelational modelThe relational model for database management is a database model based on first-order predicate logic, first formulated and proposed in 1969 by Edgar F...
-based approaches that require performing joinsJoin (SQL)An SQL join clause combines records from two or more tables in a database. It creates a set that can be saved as a table or used as is. A JOIN is a means for combining fields from two tables by using values common to each. ANSI standard SQL specifies four types of JOINs: INNER, OUTER, LEFT, and RIGHT...
for traversing an XML document, traversing in Sedna is performed by simply following a direct pointer. - A descriptive schemaXML schemaAn XML schema is a description of a type of XML document, typically expressed in terms of constraints on the structure and content of documents of that type, above and beyond the basic syntactical constraints imposed by XML itself...
-driven storage strategy is developed which consists of clustering nodesVertex (graph theory)In graph theory, a vertex or node is the fundamental unit out of which graphs are formed: an undirected graph consists of a set of vertices and a set of edges , while a directed graph consists of a set of vertices and a set of arcs...
of an XML document according to their positions in the descriptive schema of the document. In contrast to a prescriptive schema that is known in advance and is usually specified in DTDDocument Type DefinitionDocument Type Definition is a set of markup declarations that define a document type for SGML-family markup languages...
or XML Schema, the descriptive schema is generated from data dynamically (and is maintained incrementallyIncremental computingIncremental computing, also known as incremental computation, is a software feature which, whenever a piece of data changes, attempts to save time by only recomputing those outputs which "depend on" the changed data....
) and represents a concise and an accurate structureData structureIn computer science, a data structure is a particular way of storing and organizing data in a computer so that it can be used efficiently.Different kinds of data structures are suited to different kinds of applications, and some are highly specialized to specific tasks...
summary for data. Using the descriptive schema instead of the prescriptive one makes the storage strategy applicable to any XML document, even a one that comes with no prescriptive schema.
The following figure illustrates the overall principles of data organization in Sedna.
The descriptive schema represented as a tree
Tree (data structure)
In computer science, a tree is a widely-used data structure that emulates a hierarchical tree structure with a set of linked nodes.Mathematically, it is an ordered directed tree, more specifically an arborescence: an acyclic connected graph where each node has zero or more children nodes and at...
of schema nodes is the central component in the data organization.
Each schema node is labeled with an XML node kind (e.g. element, attribute, text, etc.) and has a pointer to data blocks
Block (data storage)
In computing , a block is a sequence of bytes or bits, having a nominal length . Data thus structured are said to be blocked. The process of putting data into blocks is called blocking. Blocking is used to facilitate the handling of the data-stream by the computer program receiving the data...
that store XML nodes
Vertex (graph theory)
In graph theory, a vertex or node is the fundamental unit out of which graphs are formed: an undirected graph consists of a set of vertices and a set of edges , while a directed graph consists of a set of vertices and a set of arcs...
corresponding to the given schema node.
Depending on their node kind, some schema nodes are also labeled with names (e.g., element nodes, attribute nodes).
Data blocks related to a common schema node are linked via pointers into a bidirectional list.
Node descriptors in a list of blocks are partly ordered
Partially ordered set
In mathematics, especially order theory, a partially ordered set formalizes and generalizes the intuitive concept of an ordering, sequencing, or arrangement of the elements of a set. A poset consists of a set together with a binary relation that indicates that, for certain pairs of elements in the...
according to document order .