Third normal form
Encyclopedia
In computer science
, the third normal form (3NF) is a normal form used in database normalization
. 3NF was originally defined by E.F. Codd in 1971. Codd's definition states that a table is in 3NF if and only if
both of the following conditions hold:
A non-prime attribute of R is an attribute that does not belong to any candidate key of R. A transitive dependency
is a functional dependency
in which X → Z (X determines Z) indirectly, by virtue of X → Y and Y → Z (where it is not the case that Y → X).
A 3NF definition that is equivalent to Codd's, but expressed differently, was given by Carlo Zaniolo in 1982. This definition states that a table is in 3NF if and only if, for each of its functional dependencies X → A, at least one of the following conditions holds:
Zaniolo's definition gives a clear sense of the difference between 3NF and the more stringent Boyce–Codd normal form (BCNF). BCNF simply eliminates the third alternative ("A is a prime attribute").
to give true evidence in a court of law, was given by Bill Kent: "[Every] non-key [attribute] must provide a fact about the key, the whole key, and nothing but the key." A common variation supplements this definition with the oath: "so help me Codd
".
Requiring existence of "the key" ensures that the table is in 1NF
; requiring that non-key attributes be dependent on "the whole key" ensures 2NF
; further requiring that non-key attributes be dependent on "nothing but the key" ensures 3NF.
Chris Date
refers to Kent's summary as "an intuitively attractive characterization" of 3NF, and notes that with slight adaptation it may serve as a definition of the slightly-stronger Boyce–Codd normal form: "Each attribute must represent a fact about the key, the whole key, and nothing but the key." The 3NF version of the definition is weaker than Date's BCNF variation, as the former is concerned only with ensuring that non-key attributes are dependent on keys. Prime attributes (which are keys or parts of keys) must not be functionally dependent at all; they each represent a fact about the key in the sense of providing part or all of the key itself. (It should be noted here that this rule applies only to functionally dependent attributes, as applying it to all attributes would implicitly prohibit composite candidate keys, since each part of any such key would violate the "whole key" clause.)
An example of a 2NF table that fails to meet the requirements of 3NF is:
Because each row in the table needs to tell us who won a particular Tournament in a particular Year, the composite key {Tournament, Year} is a minimal set of attributes guaranteed to uniquely identify a row. That is, {Tournament, Year} is a candidate key for the table.
The breach of 3NF occurs because the non-prime attribute Winner Date of Birth is transitively dependent on the candidate key {Tournament, Year} via the non-prime attribute Winner. The fact that Winner Date of Birth is functionally dependent on Winner makes the table vulnerable to logical inconsistencies, as there is nothing to stop the same person from being shown with different dates of birth on different records.
In order to express the same facts without violating 3NF, it is necessary to split the table into two:
Update anomalies cannot occur in these tables, which are both in 3NF.
(i.e. one where X does not contain A) and let A be a non-key attribute. Also let Y be a key of R. Then Y → X. Therefore A is not transitively dependent on Y if and only if X → Y, that is, if and only if X is a superkey.
or 5NF
.
Computer science
Computer science or computing science is the study of the theoretical foundations of information and computation and of practical techniques for their implementation and application in computer systems...
, the third normal form (3NF) is a normal form used in database normalization
Database normalization
In the design of a relational database management system , the process of organizing data to minimize redundancy is called normalization. The goal of database normalization is to decompose relations with anomalies in order to produce smaller, well-structured relations...
. 3NF was originally defined by E.F. Codd in 1971. Codd's definition states that a table is in 3NF if and only if
If and only if
In logic and related fields such as mathematics and philosophy, if and only if is a biconditional logical connective between statements....
both of the following conditions hold:
- The relation R (table) is in second normal formSecond normal formSecond normal form is a normal form used in database normalization. 2NF was originally defined by E.F. Codd in 1971.A table that is in first normal form must meet additional criteria if it is to qualify for second normal form...
(2NF) - Every non-prime attribute of R is non-transitively dependent (i.e. directly dependent) on every super key of R.
A non-prime attribute of R is an attribute that does not belong to any candidate key of R. A transitive dependency
Transitive dependency
In mathematics, a transitive dependency is a functional dependency which holds by virtue of transitivity. A transitive dependency can occur only in a relation that has three or more attributes. Let A, B, and C designate three distinct attributes in the relation...
is a functional dependency
Functional dependency
A functional dependency is a constraint between two sets of attributes in a relation from a database.Given a relation R, a set of attributes X in R is said to functionally determine another attribute Y, also in R, if, and only if, each X value is associated with precisely one Y value...
in which X → Z (X determines Z) indirectly, by virtue of X → Y and Y → Z (where it is not the case that Y → X).
A 3NF definition that is equivalent to Codd's, but expressed differently, was given by Carlo Zaniolo in 1982. This definition states that a table is in 3NF if and only if, for each of its functional dependencies X → A, at least one of the following conditions holds:
- X contains A (that is, X → A is trivial functional dependency), or
- X is a superkeySuperkeyA superkey is defined in the relational model of database organization as a set of attributes of a relation variable for which it holds that in all relations assigned to that variable, there are no two distinct tuples that have the same values for the attributes in this set...
, or - A-X, the set difference between A and X is a prime attribute (i.e., A-X is contained within a candidate keyCandidate keyIn the relational model of databases, a candidate key of a relation is a minimal superkey for that relation; that is, a set of attributes such that# the relation does not have two distinct tuples In the relational model of databases, a candidate key of a relation is a minimal superkey for that...
)
Zaniolo's definition gives a clear sense of the difference between 3NF and the more stringent Boyce–Codd normal form (BCNF). BCNF simply eliminates the third alternative ("A is a prime attribute").
"Nothing but the key"
A memorable statement of Codd's definition of 3NF, paralleling the traditional pledgeSworn testimony
Sworn testimony is evidence given by a witness who has made a commitment to tell the truth. If the witness is later found to have lied whilst bound by the commitment, they can often be charged with the crime of perjury. The types of commitment can include oaths, affirmations and promises which are...
to give true evidence in a court of law, was given by Bill Kent: "[Every] non-key [attribute] must provide a fact about the key, the whole key, and nothing but the key." A common variation supplements this definition with the oath: "so help me Codd
Edgar F. Codd
Edgar Frank "Ted" Codd was an English computer scientist who, while working for IBM, invented the relational model for database management, the theoretical basis for relational databases...
".
Requiring existence of "the key" ensures that the table is in 1NF
First normal form
First normal form is a normal form used in database normalization. A relational database table that adheres to 1NF is one that meets a certain minimum set of criteria...
; requiring that non-key attributes be dependent on "the whole key" ensures 2NF
Second normal form
Second normal form is a normal form used in database normalization. 2NF was originally defined by E.F. Codd in 1971.A table that is in first normal form must meet additional criteria if it is to qualify for second normal form...
; further requiring that non-key attributes be dependent on "nothing but the key" ensures 3NF.
Chris Date
Christopher J. Date
Chris Date is an independent author, lecturer, researcher, and consultant, specializing in relational database theory.-Biography:Chris Date attended High Wycombe Royal Grammar School from 1951 to 1958 and received his BA in Mathematics from Cambridge University in 1962. He entered the computer...
refers to Kent's summary as "an intuitively attractive characterization" of 3NF, and notes that with slight adaptation it may serve as a definition of the slightly-stronger Boyce–Codd normal form: "Each attribute must represent a fact about the key, the whole key, and nothing but the key." The 3NF version of the definition is weaker than Date's BCNF variation, as the former is concerned only with ensuring that non-key attributes are dependent on keys. Prime attributes (which are keys or parts of keys) must not be functionally dependent at all; they each represent a fact about the key in the sense of providing part or all of the key itself. (It should be noted here that this rule applies only to functionally dependent attributes, as applying it to all attributes would implicitly prohibit composite candidate keys, since each part of any such key would violate the "whole key" clause.)
An example of a 2NF table that fails to meet the requirements of 3NF is:
Tournament | Year | Winner | Winner Date of Birth |
---|---|---|---|
Indiana Invitational | 1998 | Al Fredrickson | 21 July 1975 |
Cleveland Open | 1999 | Bob Albertson | 28 September 1968 |
Des Moines Masters | 1999 | Al Fredrickson | 21 July 1975 |
Indiana Invitational | 1999 | Chip Masterson | 14 March 1977 |
Because each row in the table needs to tell us who won a particular Tournament in a particular Year, the composite key {Tournament, Year} is a minimal set of attributes guaranteed to uniquely identify a row. That is, {Tournament, Year} is a candidate key for the table.
The breach of 3NF occurs because the non-prime attribute Winner Date of Birth is transitively dependent on the candidate key {Tournament, Year} via the non-prime attribute Winner. The fact that Winner Date of Birth is functionally dependent on Winner makes the table vulnerable to logical inconsistencies, as there is nothing to stop the same person from being shown with different dates of birth on different records.
In order to express the same facts without violating 3NF, it is necessary to split the table into two:
Tournament | Year | Winner |
---|---|---|
Indiana Invitational | 1998 | Al Fredrickson |
Cleveland Open | 1999 | Bob Albertson |
Des Moines Masters | 1999 | Al Fredrickson |
Indiana Invitational | 1999 | Chip Masterson |
Player | Date of Birth |
---|---|
Chip Masterson | 14 March 1977 |
Al Fredrickson | 21 July 1975 |
Bob Albertson | 28 September 1968 |
Update anomalies cannot occur in these tables, which are both in 3NF.
Derivation of Zaniolo's conditions
The definition of 3NF offered by Carlo Zaniolo in 1982, and given above, is proved in the following way: Let X → A be a nontrivial FDFunctional dependency
A functional dependency is a constraint between two sets of attributes in a relation from a database.Given a relation R, a set of attributes X in R is said to functionally determine another attribute Y, also in R, if, and only if, each X value is associated with precisely one Y value...
(i.e. one where X does not contain A) and let A be a non-key attribute. Also let Y be a key of R. Then Y → X. Therefore A is not transitively dependent on Y if and only if X → Y, that is, if and only if X is a superkey.
Normalization beyond 3NF
Most 3NF tables are free of update, insertion, and deletion anomalies. Certain types of 3NF tables, rarely met with in practice, are affected by such anomalies; these are tables which either fall short of Boyce–Codd normal form (BCNF) or, if they meet BCNF, fall short of the higher normal forms 4NFFourth normal form
Fourth normal form is a normal form used in database normalization. Introduced by Ronald Fagin in 1977, 4NF is the next level of normalization after Boyce–Codd normal form . Whereas the second, third, and Boyce–Codd normal forms are concerned with functional dependencies, 4NF is concerned with a...
or 5NF
Fifth normal form
Fifth normal form , also known as Project-join normal form is a level of database normalization designed to reduce redundancy in relational databases recording multi-valued facts by isolating semantically related multiple relationships...
.
See also
- Attribute-value systemAttribute-value systemAn attribute-value system is a basic knowledge representation framework comprising a table with columns designating "attributes" and rows designating "objects" An attribute-value system is a basic knowledge representation framework comprising a table with columns designating "attributes" (also...
- Boyce–Codd normal form
- First normal formFirst normal formFirst normal form is a normal form used in database normalization. A relational database table that adheres to 1NF is one that meets a certain minimum set of criteria...
- Second normal formSecond normal formSecond normal form is a normal form used in database normalization. 2NF was originally defined by E.F. Codd in 1971.A table that is in first normal form must meet additional criteria if it is to qualify for second normal form...
- Fourth normal formFourth normal formFourth normal form is a normal form used in database normalization. Introduced by Ronald Fagin in 1977, 4NF is the next level of normalization after Boyce–Codd normal form . Whereas the second, third, and Boyce–Codd normal forms are concerned with functional dependencies, 4NF is concerned with a...
- Fifth normal formFifth normal formFifth normal form , also known as Project-join normal form is a level of database normalization designed to reduce redundancy in relational databases recording multi-valued facts by isolating semantically related multiple relationships...
Further reading
- Date, C. J. (1999), An Introduction to Database Systems (8th ed.). Addison-Wesley Longman. ISBN 0-321-19784-4.
- Kent, W. (1983) A Simple Guide to Five Normal Forms in Relational Database Theory, Communications of the ACM, vol. 26, pp. 120–125
External links
- Litt's Tips: Normalization
- Database Normalization Basics by Mike Chapple (About.com)
- An Introduction to Database Normalization by Mike Hillyer.
- A tutorial on the first 3 normal forms by Fred Coulson
- Description of the database normalization basics by Microsoft