Fourth normal form
Encyclopedia
Fourth normal form is a normal form used in database normalization
. Introduced by Ronald Fagin
in 1977, 4NF is the next level of normalization after Boyce–Codd normal form (BCNF). Whereas the second
, third
, and Boyce–Codd normal forms are concerned with functional dependencies
, 4NF is concerned with a more general type of dependency known as a multivalued dependency
. A Table
is in 4NF if and only if
, for every one of its non-trivial multivalued dependencies X Y, X is a superkey
—that is, X is either a candidate key
or a superset thereof.
X Y signifies that if we choose any x actually occurring in the table (call this choice xc), and compile a list of all the xcyz combinations that occur in the table, we will find that xc is associated with the same y entries regardless of z.
A trivial multivalued dependency X Y is one where either Y is a subset of X, or X and Y together form the whole set of attributes of the relation.
A functional dependency
is a special case of multivalued dependency. In a functional dependency X → Y, every x determines exactly one y, never more than one.
Each row indicates that a given restaurant can deliver a given variety of pizza to a given area.
The table has no non-key attributes because its only key is {Restaurant, Pizza Variety, Delivery Area}. Therefore it meets all normal forms up to BCNF. If we assume, however, that pizza varieties offered by a restaurant are not affected by delivery area, then it does not meet 4NF. The problem is that the table features two non-trivial multivalued dependencies on the {Restaurant} attribute (which is not a superkey). The dependencies are:
These non-trivial multivalued dependencies on a non-superkey reflect the fact that the varieties of pizza a restaurant offers are independent from the areas to which the restaurant delivers. This state of affairs leads to redundancy
in the table: for example, we are told three times that A1 Pizza offers Stuffed Crust, and if A1 Pizza starts producing Cheese Crust pizzas then we will need to add multiple rows, one for each of A1 Pizza's delivery areas. There is, moreover, nothing to prevent us from doing this incorrectly: we might add Cheese Crust rows for all but one of A1 Pizza's delivery areas, thereby failing to respect the multivalued dependency {Restaurant} {Pizza Variety}.
To eliminate the possibility of these anomalies, we must place the facts about varieties offered into a different table from the facts about delivery areas, yielding two tables that are both in 4NF:
In contrast, if the pizza varieties offered by a restaurant sometimes did legitimately vary from one delivery area to another, the original three-column table would satisfy 4NF.
Ronald Fagin demonstrated that it is always possible to achieve 4NF. Rissanen's theorem is also applicable on multivalued dependencies
.
Database normalization
In the design of a relational database management system , the process of organizing data to minimize redundancy is called normalization. The goal of database normalization is to decompose relations with anomalies in order to produce smaller, well-structured relations...
. Introduced by Ronald Fagin
Ronald Fagin
Ronald Fagin is the Manager of the Foundations of Computer Science group at the IBM Almaden Research Center. He is best known for his pioneering work in database theory, finite model theory, and reasoning about knowledge...
in 1977, 4NF is the next level of normalization after Boyce–Codd normal form (BCNF). Whereas the second
Second normal form
Second normal form is a normal form used in database normalization. 2NF was originally defined by E.F. Codd in 1971.A table that is in first normal form must meet additional criteria if it is to qualify for second normal form...
, third
Third normal form
In computer science, the third normal form is a normal form used in database normalization. 3NF was originally defined by E.F. Codd in 1971. Codd's definition states that a table is in 3NF if and only if both of the following conditions hold:...
, and Boyce–Codd normal forms are concerned with functional dependencies
Functional dependency
A functional dependency is a constraint between two sets of attributes in a relation from a database.Given a relation R, a set of attributes X in R is said to functionally determine another attribute Y, also in R, if, and only if, each X value is associated with precisely one Y value...
, 4NF is concerned with a more general type of dependency known as a multivalued dependency
Multivalued dependency
In database theory, multivalued dependency is a full constraint between two sets of attributes in a relation.In contrast to the functional dependency, the multivalued dependency requires that certain tuples be present in a relation. Therefore, a multivalued dependency is a special case of...
. A Table
Table (database)
In relational databases and flat file databases, a table is a set of data elements that is organized using a model of vertical columns and horizontal rows. A table has a specified number of columns, but can have any number of rows...
is in 4NF if and only if
If and only if
In logic and related fields such as mathematics and philosophy, if and only if is a biconditional logical connective between statements....
, for every one of its non-trivial multivalued dependencies X Y, X is a superkey
Superkey
A superkey is defined in the relational model of database organization as a set of attributes of a relation variable for which it holds that in all relations assigned to that variable, there are no two distinct tuples that have the same values for the attributes in this set...
—that is, X is either a candidate key
Candidate key
In the relational model of databases, a candidate key of a relation is a minimal superkey for that relation; that is, a set of attributes such that# the relation does not have two distinct tuples In the relational model of databases, a candidate key of a relation is a minimal superkey for that...
or a superset thereof.
Multivalued dependencies
If the column headings in a relational database table are divided into three disjoint groupings X, Y, and Z, then, in the context of a particular row, we can refer to the data beneath each group of headings as x, y, and z respectively. A multivalued dependencyMultivalued dependency
In database theory, multivalued dependency is a full constraint between two sets of attributes in a relation.In contrast to the functional dependency, the multivalued dependency requires that certain tuples be present in a relation. Therefore, a multivalued dependency is a special case of...
X Y signifies that if we choose any x actually occurring in the table (call this choice xc), and compile a list of all the xcyz combinations that occur in the table, we will find that xc is associated with the same y entries regardless of z.
A trivial multivalued dependency X Y is one where either Y is a subset of X, or X and Y together form the whole set of attributes of the relation.
A functional dependency
Functional dependency
A functional dependency is a constraint between two sets of attributes in a relation from a database.Given a relation R, a set of attributes X in R is said to functionally determine another attribute Y, also in R, if, and only if, each X value is associated with precisely one Y value...
is a special case of multivalued dependency. In a functional dependency X → Y, every x determines exactly one y, never more than one.
Example
Consider the following example:Restaurant | Pizza Variety | Delivery Area |
---|---|---|
A1 Pizza | Thick Crust | Springfield |
A1 Pizza | Thick Crust | Shelbyville |
A1 Pizza | Thick Crust | Capital City |
A1 Pizza | Stuffed Crust | Springfield |
A1 Pizza | Stuffed Crust | Shelbyville |
A1 Pizza | Stuffed Crust | Capital City |
Elite Pizza | Thin Crust | Capital City |
Elite Pizza | Stuffed Crust | Capital City |
Vincenzo's Pizza | Thick Crust | Springfield |
Vincenzo's Pizza | Thick Crust | Shelbyville |
Vincenzo's Pizza | Thin Crust | Springfield |
Vincenzo's Pizza | Thin Crust | Shelbyville |
Each row indicates that a given restaurant can deliver a given variety of pizza to a given area.
The table has no non-key attributes because its only key is {Restaurant, Pizza Variety, Delivery Area}. Therefore it meets all normal forms up to BCNF. If we assume, however, that pizza varieties offered by a restaurant are not affected by delivery area, then it does not meet 4NF. The problem is that the table features two non-trivial multivalued dependencies on the {Restaurant} attribute (which is not a superkey). The dependencies are:
- {Restaurant} {Pizza Variety}
- {Restaurant} {Delivery Area}
These non-trivial multivalued dependencies on a non-superkey reflect the fact that the varieties of pizza a restaurant offers are independent from the areas to which the restaurant delivers. This state of affairs leads to redundancy
Data redundancy
Data redundancy occurs in database systems which have a field that is repeated in two or more tables. For instance, in case when customer data is duplicated and attached with each product bought then redundancy of data is a known source of inconsistency, since customer might appear with different...
in the table: for example, we are told three times that A1 Pizza offers Stuffed Crust, and if A1 Pizza starts producing Cheese Crust pizzas then we will need to add multiple rows, one for each of A1 Pizza's delivery areas. There is, moreover, nothing to prevent us from doing this incorrectly: we might add Cheese Crust rows for all but one of A1 Pizza's delivery areas, thereby failing to respect the multivalued dependency {Restaurant} {Pizza Variety}.
To eliminate the possibility of these anomalies, we must place the facts about varieties offered into a different table from the facts about delivery areas, yielding two tables that are both in 4NF:
Restaurant | Pizza Variety |
---|---|
A1 Pizza | Thick Crust |
A1 Pizza | Stuffed Crust |
Elite Pizza | Thin Crust |
Elite Pizza | Stuffed Crust |
Vincenzo's Pizza | Thick Crust |
Vincenzo's Pizza | Thin Crust |
Restaurant | Delivery Area |
---|---|
A1 Pizza | Springfield |
A1 Pizza | Shelbyville |
A1 Pizza | Capital City |
Elite Pizza | Capital City |
Vincenzo's Pizza | Springfield |
Vincenzo's Pizza | Shelbyville |
In contrast, if the pizza varieties offered by a restaurant sometimes did legitimately vary from one delivery area to another, the original three-column table would satisfy 4NF.
Ronald Fagin demonstrated that it is always possible to achieve 4NF. Rissanen's theorem is also applicable on multivalued dependencies
Multivalued dependency
In database theory, multivalued dependency is a full constraint between two sets of attributes in a relation.In contrast to the functional dependency, the multivalued dependency requires that certain tuples be present in a relation. Therefore, a multivalued dependency is a special case of...
.
4NF in practice
A 1992 paper by Margaret S. Wu notes that the teaching of database normalization typically stops short of 4NF, perhaps because of a belief that tables violating 4NF (but meeting all lower normal forms) are rarely encountered in business applications. This belief may not be accurate, however. Wu reports that in a study of forty organizational databases, over 20% contained one or more tables that violated 4NF while meeting all lower normal forms.See also
- Attribute-value systemAttribute-value systemAn attribute-value system is a basic knowledge representation framework comprising a table with columns designating "attributes" and rows designating "objects" An attribute-value system is a basic knowledge representation framework comprising a table with columns designating "attributes" (also...
- Third normal formThird normal formIn computer science, the third normal form is a normal form used in database normalization. 3NF was originally defined by E.F. Codd in 1971. Codd's definition states that a table is in 3NF if and only if both of the following conditions hold:...
- Fifth normal formFifth normal formFifth normal form , also known as Project-join normal form is a level of database normalization designed to reduce redundancy in relational databases recording multi-valued facts by isolating semantically related multiple relationships...
- Sixth normal formSixth normal formSixth normal form is a term in relational database theory, used in two different ways.-6NF :A book by Christopher J...
Further reading
- Date, C. J. (1999), An Introduction to Database Systems (8th ed.). Addison-Wesley Longman. ISBN 0-321-19784-4.
- Kent, W. (1983) A Simple Guide to Five Normal Forms in Relational Database Theory, Communications of the ACM, vol. 26, pp. 120–125
- Date, C.J., & Darwen, H., & Pascal, F. Database Debunkings
- Advanced Normalization by ITS, University of Texas.