Second normal form
Encyclopedia
Second normal form is a normal form used in database normalization
. 2NF was originally defined by E.F. Codd in 1971.
A table that is in first normal form
(1NF) must meet additional criteria if it is to qualify for second normal form. Specifically: a table is in 2NF if and only if
, it is in 1NF and no non prime attribute is dependent on any proper subset
of any candidate key
of the table. A non prime attribute of a table is an attribute that is not a part of any candidate key of the table.
Put simply, a table is in 2NF if and only if, it is in 1NF and every non-prime attribute of the table is either dependent on the whole of a candidate key, or on another non prime attribute.
Note that when a 1NF table has no composite candidate keys (candidate keys consisting of more than one attribute), the table is automatically in 2NF.
Neither {Employee} nor {Skill} is a candidate key for the table. This is because a given Employee might need to appear more than once (he might have multiple Skills), and a given Skill might need to appear more than once (it might be possessed by multiple Employees). Only the composite key {Employee, Skill} qualifies as a candidate key for the table.
The remaining attribute, Current Work Location, is dependent on only part of the candidate key, namely Employee. Therefore the table is not in 2NF. Note the redundancy in the way Current Work Locations are represented: we are told three times that Jones works at 114 Main Street, and twice that Ellis works at 73 Industrial Way. This redundancy makes the table vulnerable to update anomalies: it is, for example, possible to update Jones' work location on his "Typing" and "Shorthand" records and not update his "Whittling" record. The resulting data would imply contradictory answers to the question "What is Jones' current work location?"
A 2NF alternative to this design would represent the same information in two tables: an "Employees" table with candidate key {Employee}, and an "Employees' Skills" table with candidate key {Employee, Skill}:
Neither of these tables can suffer from update anomalies.
Not all 2NF tables are free from update anomalies, however. An example of a 2NF table which suffers from update anomalies is:
Even though Winner and Winner Date of Birth are determined by the whole key {Tournament / Year} and not part of it, particular Winner / Winner Date of Birth combinations are shown redundantly on multiple records. This leads to an update anomaly: if updates are not carried out consistently, a particular winner could be shown as having two different dates of birth.
The underlying problem is the transitive dependency to which the Winner Date of Birth attribute is subject. Winner Date of Birth actually depends on Winner, which in turn depends on the key Tournament / Year.
This problem is addressed by third normal form
(3NF).
Multiple candidate keys occur in the following table:
Even if the designer has specified the primary key as {Model Full Name}, the table is not in 2NF. {Manufacturer, Model} is also a candidate key, and Manufacturer Country is dependent on a proper subset of it: Manufacturer. To make the design conform to 2NF, it is necessary to have two tables:
Database normalization
In the design of a relational database management system , the process of organizing data to minimize redundancy is called normalization. The goal of database normalization is to decompose relations with anomalies in order to produce smaller, well-structured relations...
. 2NF was originally defined by E.F. Codd in 1971.
A table that is in first normal form
First normal form
First normal form is a normal form used in database normalization. A relational database table that adheres to 1NF is one that meets a certain minimum set of criteria...
(1NF) must meet additional criteria if it is to qualify for second normal form. Specifically: a table is in 2NF if and only if
If and only if
In logic and related fields such as mathematics and philosophy, if and only if is a biconditional logical connective between statements....
, it is in 1NF and no non prime attribute is dependent on any proper subset
Subset
In mathematics, especially in set theory, a set A is a subset of a set B if A is "contained" inside B. A and B may coincide. The relationship of one set being a subset of another is called inclusion or sometimes containment...
of any candidate key
Candidate key
In the relational model of databases, a candidate key of a relation is a minimal superkey for that relation; that is, a set of attributes such that# the relation does not have two distinct tuples In the relational model of databases, a candidate key of a relation is a minimal superkey for that...
of the table. A non prime attribute of a table is an attribute that is not a part of any candidate key of the table.
Put simply, a table is in 2NF if and only if, it is in 1NF and every non-prime attribute of the table is either dependent on the whole of a candidate key, or on another non prime attribute.
Note that when a 1NF table has no composite candidate keys (candidate keys consisting of more than one attribute), the table is automatically in 2NF.
Example
Consider a table describing employees' skills:Employee | Skill | Current Work Location |
---|---|---|
Jones | Typing | 114 Main Street |
Jones | Shorthand | 114 Main Street |
Jones | Whittling | 114 Main Street |
Bravo | Light Cleaning | 73 Industrial Way |
Ellis | Alchemy | 73 Industrial Way |
Ellis | Flying | 73 Industrial Way |
Harrison | Light Cleaning | 73 Industrial Way |
Neither {Employee} nor {Skill} is a candidate key for the table. This is because a given Employee might need to appear more than once (he might have multiple Skills), and a given Skill might need to appear more than once (it might be possessed by multiple Employees). Only the composite key {Employee, Skill} qualifies as a candidate key for the table.
The remaining attribute, Current Work Location, is dependent on only part of the candidate key, namely Employee. Therefore the table is not in 2NF. Note the redundancy in the way Current Work Locations are represented: we are told three times that Jones works at 114 Main Street, and twice that Ellis works at 73 Industrial Way. This redundancy makes the table vulnerable to update anomalies: it is, for example, possible to update Jones' work location on his "Typing" and "Shorthand" records and not update his "Whittling" record. The resulting data would imply contradictory answers to the question "What is Jones' current work location?"
A 2NF alternative to this design would represent the same information in two tables: an "Employees" table with candidate key {Employee}, and an "Employees' Skills" table with candidate key {Employee, Skill}:
Employee | Current Work Location |
---|---|
Jones | 114 Main Street |
Bravo | 73 Industrial Way |
Ellis | 73 Industrial Way |
Harrison | 73 Industrial Way |
Employee | Skill |
---|---|
Jones | Typing |
Jones | Shorthand |
Jones | Whittling |
Bravo | Light Cleaning |
Ellis | Alchemy |
Ellis | Flying |
Harrison | Light Cleaning |
Neither of these tables can suffer from update anomalies.
Not all 2NF tables are free from update anomalies, however. An example of a 2NF table which suffers from update anomalies is:
Tournament | Year | Winner | Winner Date of Birth |
---|---|---|---|
Des Moines Masters | 1998 | Chip Masterson | 14 March 1977 |
Indiana Invitational | 1998 | Al Fredrickson | 21 July 1975 |
Cleveland Open | 1999 | Bob Albertson | 28 September 1968 |
Des Moines Masters | 1999 | Al Fredrickson | 21 July 1975 |
Indiana Invitational | 1999 | Chip Masterson | 14 March 1977 |
Even though Winner and Winner Date of Birth are determined by the whole key {Tournament / Year} and not part of it, particular Winner / Winner Date of Birth combinations are shown redundantly on multiple records. This leads to an update anomaly: if updates are not carried out consistently, a particular winner could be shown as having two different dates of birth.
The underlying problem is the transitive dependency to which the Winner Date of Birth attribute is subject. Winner Date of Birth actually depends on Winner, which in turn depends on the key Tournament / Year.
This problem is addressed by third normal form
Third normal form
In computer science, the third normal form is a normal form used in database normalization. 3NF was originally defined by E.F. Codd in 1971. Codd's definition states that a table is in 3NF if and only if both of the following conditions hold:...
(3NF).
2NF and candidate keys
A table for which there are no partial functional dependencies on the primary key is typically, but not always, in 2NF. In addition to the primary key, the table may contain other candidate keys; it is necessary to establish that no non-prime attributes have part-key dependencies on any of these candidate keys.Multiple candidate keys occur in the following table:
Manufacturer | Model | Model Full Name | Manufacturer Country |
---|---|---|---|
Forte | X-Prime | Forte X-Prime | Italy |
Forte | Ultraclean | Forte Ultraclean | Italy |
Dent-o-Fresh | EZbrush | Dent-o-Fresh EZBrush | USA |
Kobayashi | ST-60 | Kobayashi ST-60 | Japan |
Hoch | Toothmaster | Hoch Toothmaster | Germany |
Hoch | X-Prime | Hoch X-Prime | Germany |
Even if the designer has specified the primary key as {Model Full Name}, the table is not in 2NF. {Manufacturer, Model} is also a candidate key, and Manufacturer Country is dependent on a proper subset of it: Manufacturer. To make the design conform to 2NF, it is necessary to have two tables:
Manufacturer | Manufacturer Country |
---|---|
Forte | Italy |
Dent-o-Fresh | USA |
Kobayashi | Japan |
Hoch | Germany |
Manufacturer | Model | Model Full Name |
---|---|---|
Forte | X-Prime | Forte X-Prime |
Forte | Ultraclean | Forte Ultraclean |
Dent-o-Fresh | EZbrush | Dent-o-Fresh EZBrush |
Kobayashi | ST-60 | Kobayashi ST-60 |
Hoch | Toothmaster | Hoch Toothmaster |
Hoch | X-Prime | Hoch X-Prime |
Further reading
- Litt's Tips: Normalization
- Date, C. J., & Lorentzos, N., & Darwen, H. (2002). Temporal Data & the Relational Model (1st ed.). Morgan Kaufmann. ISBN 1-55860-855-9.
- Kent, W. (1983) A Simple Guide to Five Normal Forms in Relational Database Theory, Communications of the ACM, vol. 26, pp. 120–125
- Date, C.J., & Darwen, H., & Pascal, F. Database Debunkings
External links
- Database Normalization Basics by Mike Chapple (About.com)
- An Introduction to Database Normalization by Mike Hillyer.
- A tutorial on the first 3 normal forms by Fred Coulson
- Description of the database normalization basics by Microsoft