Superkey
Encyclopedia
A superkey is defined in the relational model
of database
organization as a set of attributes of a relation variable for which it holds that in all relations assigned to that variable, there are no two distinct tuple
s (rows) that have the same values for the attributes in this set. Equivalently a superkey can also be defined as a set of attributes of a relvar
upon which all attributes of the relvar are functionally dependent
.
Note that the set of all attributes is a trivial superkey.
Also note that if attribute set K is a superkey of relvar R, then at all times it is the case that the projection of R over K has the same cardinality
as R itself.
Informally, a superkey is a set of attributes within a table whose values can be used to uniquely identify a tuple. A candidate key
is a minimal set of attributes necessary to identify a tuple, this is also called a minimal superkey. For example, given an employee schema, consisting of the attributes employeeID, name, job, and departmentID, we could use the employeeID in combination with any or all other attributes of this table to uniquely identify a tuple in the table. Examples of superkeys in this schema would be {employeeID, Name}, {employeeID, Name, job}, and {employeeID, Name, job, departmentID} which is the trivial superkey.
In a real database we do not need values for all of those attributes to identify a tuple. We only need, per our example, the set {employeeID}. This is a minimal superkey – that is, a minimal set of attributes that can be used to identify a single tuple. So, employeeID is a candidate key
.
First, list out all the (non-empty) sets of attributes:
Second, eliminate all the sets which do not meet superkey's requirement. For example, {Monarch Name, Royal House} cannot be a superkey because for the same attribute values (Edward, Plantagenet), there are two distinct tuples:
Finally, after elimination, the remaining sets of attributes are the only possible superkeys in this example:
In real situations, however, superkeys are normally not determined by this method, which is very tedious and time-consuming, but by analyzing functional dependencies (FD).
Relational model
The relational model for database management is a database model based on first-order predicate logic, first formulated and proposed in 1969 by Edgar F...
of database
Database
A database is an organized collection of data for one or more purposes, usually in digital form. The data are typically organized to model relevant aspects of reality , in a way that supports processes requiring this information...
organization as a set of attributes of a relation variable for which it holds that in all relations assigned to that variable, there are no two distinct tuple
Tuple
In mathematics and computer science, a tuple is an ordered list of elements. In set theory, an n-tuple is a sequence of n elements, where n is a positive integer. There is also one 0-tuple, an empty sequence. An n-tuple is defined inductively using the construction of an ordered pair...
s (rows) that have the same values for the attributes in this set. Equivalently a superkey can also be defined as a set of attributes of a relvar
Relvar
In relational databases, a relvar is a term coined by C. J. Date as an abbreviation for the concept of relation variable, which is the actual term used by the inventor of the relational model, E. F. Codd, regarding the same concept...
upon which all attributes of the relvar are functionally dependent
Functional dependency
A functional dependency is a constraint between two sets of attributes in a relation from a database.Given a relation R, a set of attributes X in R is said to functionally determine another attribute Y, also in R, if, and only if, each X value is associated with precisely one Y value...
.
Note that the set of all attributes is a trivial superkey.
Also note that if attribute set K is a superkey of relvar R, then at all times it is the case that the projection of R over K has the same cardinality
Cardinality (data modeling)
In data modeling, the cardinality of one data table with respect to another data table is a critical aspect of database design. Relationships between data tables define cardinality when explaining how each table links to another....
as R itself.
Informally, a superkey is a set of attributes within a table whose values can be used to uniquely identify a tuple. A candidate key
Candidate key
In the relational model of databases, a candidate key of a relation is a minimal superkey for that relation; that is, a set of attributes such that# the relation does not have two distinct tuples In the relational model of databases, a candidate key of a relation is a minimal superkey for that...
is a minimal set of attributes necessary to identify a tuple, this is also called a minimal superkey. For example, given an employee schema, consisting of the attributes employeeID, name, job, and departmentID, we could use the employeeID in combination with any or all other attributes of this table to uniquely identify a tuple in the table. Examples of superkeys in this schema would be {employeeID, Name}, {employeeID, Name, job}, and {employeeID, Name, job, departmentID} which is the trivial superkey.
In a real database we do not need values for all of those attributes to identify a tuple. We only need, per our example, the set {employeeID}. This is a minimal superkey – that is, a minimal set of attributes that can be used to identify a single tuple. So, employeeID is a candidate key
Candidate key
In the relational model of databases, a candidate key of a relation is a minimal superkey for that relation; that is, a set of attributes such that# the relation does not have two distinct tuples In the relational model of databases, a candidate key of a relation is a minimal superkey for that...
.
Example
Monarch Name | Monarch Number | Royal House |
---|---|---|
Edward | II | Plantagenet |
Edward | III | Plantagenet |
Richard | III | Plantagenet |
Henry | IV | Lancaster |
First, list out all the (non-empty) sets of attributes:
- • {Monarch Name}
- • {Monarch Number}
- • {Royal House}
- • {Monarch Name, Monarch Number}
- • {Monarch Name, Royal House}
- • {Monarch Number, Royal House}
- • {Monarch Name, Monarch Number, Royal House}
Second, eliminate all the sets which do not meet superkey's requirement. For example, {Monarch Name, Royal House} cannot be a superkey because for the same attribute values (Edward, Plantagenet), there are two distinct tuples:
- (Edward, II, Plantagenet)
- (Edward, III, Plantagenet)
Finally, after elimination, the remaining sets of attributes are the only possible superkeys in this example:
- {Monarch Name, Monarch Number} (Candidate Key)
- {Monarch Name, Monarch Number, Royal House}
In real situations, however, superkeys are normally not determined by this method, which is very tedious and time-consuming, but by analyzing functional dependencies (FD).
External links
- Relation Database terms of reference, Keys: An overview of the different types of keys in an RDBMS