Denormalization - AbsoluteAstronomy.com

Computing

Computing is usually defined as the activity of using and improving computer hardware and software. It is the computer-specific part of information technology...

, denormalization is the process of attempting to optimise the read performance of a database

Database

A database is an organized collection of data for one or more purposes, usually in digital form. The data are typically organized to model relevant aspects of reality , in a way that supports processes requiring this information...

by adding redundant data or by grouping data. In some cases, denormalisation helps cover up the inefficiencies inherent in relational

Relational model

The relational model for database management is a database model based on first-order predicate logic, first formulated and proposed in 1969 by Edgar F...

database software. A relational normalised database imposes a heavy access load over physical storage of data even if it is well tuned for high performance.

A normalised design will often store different but related pieces of information in separate logical tables (called relations). If these relations are stored physically as separate disk files, completing a database query

Information retrieval

Information retrieval is the area of study concerned with searching for documents, for information within documents, and for metadata about documents, as well as that of searching structured storage, relational databases, and the World Wide Web...

that draws information from several relations (a join operation

Join

Join may refer to:* Join , to include additional counts or additional defendants on an indictment* Join , a least upper bound of set orders in lattice theory* Join , a type of binary operator...

) can be slow. If many relations are joined, it may be prohibitively slow. There are two strategies for dealing with this. The preferred method is to keep the logical design normalised, but allow the database management system

Database management system

A database management system is a software package with computer programs that control the creation, maintenance, and use of a database. It allows organizations to conveniently develop databases for various applications by database administrators and other specialists. A database is an integrated...

(DBMS) to store additional redundant information on disk to optimise query response. In this case it is the DBMS software's responsibility to ensure that any redundant copies are kept consistent. This method is often implemented in SQL

SQL

SQL is a programming language designed for managing data in relational database management systems ....

as indexed views (Microsoft SQL Server

Microsoft SQL Server

Microsoft SQL Server is a relational database server, developed by Microsoft: It is a software product whose primary function is to store and retrieve data as requested by other software applications, be it those on the same computer or those running on another computer across a network...

) or materialised views (Oracle

Oracle Database

The Oracle Database is an object-relational database management system produced and marketed by Oracle Corporation....

). A view represents information in a format convenient for querying, and the index ensures that queries against the view are optimised.

The more usual approach is to denormalise the logical data design. With care this can achieve a similar improvement in query response, but at a cost—it is now the database designer's responsibility to ensure that the denormalised database does not become inconsistent. This is done by creating rules in the database called constraints

Constraint satisfaction

In artificial intelligence and operations research, constraint satisfaction is the process of finding a solution to a set of constraints that impose conditions that the variables must satisfy. A solution is therefore a vector of variables that satisfies all constraints.The techniques used in...

, that specify how the redundant copies of information must be kept synchronised. It is the increase in logical complexity

Complexity of constraint satisfaction

The complexity of constraint satisfaction is the application of computational complexity theory on constraint satisfaction. It has mainly been studied for discriminating between tractable and intractable classes of constraint satisfaction problems on finite domains.Solving a constraint satisfaction...

of the database design and the added complexity of the additional constraints that make this approach hazardous. Moreover, constraints introduce a trade-off

Trade-off

A trade-off is a situation that involves losing one quality or aspect of something in return for gaining another quality or aspect...

, speeding up reads (SELECT in SQL) while slowing down writes (INSERT, UPDATE, and DELETE). This means a denormalised database under heavy write load may actually offer worse performance than its functionally equivalent normalised counterpart.

A denormalised data model is not the same as a data model that has not been normalised, and denormalisation should only take place after a satisfactory level of normalisation has taken place and that any required constraints and/or rules have been created to deal with the inherent anomalies in the design. For example, all the relations are in third normal form

Third normal form

In computer science, the third normal form is a normal form used in database normalization. 3NF was originally defined by E.F. Codd in 1971. Codd's definition states that a table is in 3NF if and only if both of the following conditions hold:...

and any relations with join and multi-valued dependencies are handled appropriately.

Examples of denormalisation techniques include:

Materialised views, which may implement the following:
- Storing the count of the "many" objects in a one-to-many relationship as an attribute of the "one" relation
- Adding attributes to a relation from another relation with which it will be joined
Star schema
Star schema
In computing, the star schema is the simplest style of data warehouse schema. The star schema consists of one or more fact tables referencing any number of dimension tables...

s, which are also known as fact-dimension models and have been extended to snowflake schema
Snowflake schema
In computing, a snowflake schema is a logical arrangement of tables in a multidimensional database such that the entity relationship diagram resembles a snowflake in shape. The snowflake schema is represented by centralized fact tables which are connected to multiple dimensions.The snowflake schema...

s
Prebuilt summarisation or OLAP cube
OLAP cube
An OLAP cube is a data structure that allows fast analysis of data. It can also be defined as the capability of manipulating and analyzing data from multiple perspectives...

s

Denormalisation techniques are often used to improve the scalability of Web applications.

The source of this article is wikipedia, the free encyclopedia. The text of this article is licensed under the GFDL.