Fact table
Encyclopedia
In data warehousing, a fact table consists of the measurements, metrics or facts of a business process
. It is often located at the centre of a star schema
or a snowflake schema
, surrounded by dimension table
s.
Fact tables provide the (usually) additive values that act as independent variables by which dimensional attributes are analyzed. Fact tables are often defined by their grain. The grain of a fact table represents the most atomic level by which the facts may be defined. The grain of a SALES fact table might be stated as "Sales volume by Day by Product by Store". Each record in this fact table is therefore uniquely defined by a day, product and store. Other dimensions might be members of this fact table (such as location/region) but these add nothing to the uniqueness of the fact records. These "affiliate dimensions" allow for additional slices of the independent facts but generally provide insights at a higher level of aggregation (a region contains many stores).
is SALES, then the corresponding fact table will typically contain columns representing both raw facts and aggregations
in rows such as:
"average daily sales" is a measurement which is stored in the fact table. The fact table also contains foreign key
s from the dimension table
s, where time series
(e.g. dates) and other dimensions
(e.g. store location, salesperson, product) are stored.
All foreign key
s between fact and dimension tables should be surrogate key
s, not reused keys from operational data.
The centralized table in a star schema is called a fact table. A fact table typically has two types of columns: those that contain facts and those that are foreign keys to dimension tables. The primary key of a fact table is usually a composite key that is made up of all of its foreign keys. Fact tables contain the content of the data warehouse and store different types of measures like additive, non additive, and semi additive measures.
A fact table might contain either detail level facts or facts that have been aggregated (fact tables that contain aggregated facts are often instead called summary tables).
Special care must be taken when handling ratios and percentage. One good design rule is to never store percentages or ratios in fact tables but only calculate these in the data access tool. Thus only store the numerator and denominator in the fact table, which then can be aggregated and the aggregated stored values can then be used for calculating the ratio or percentage in the data access tool.
In the real world, it is possible to have a fact table that contains no measures or facts. These tables are called "factless fact tables", or "junction tables
".
The "Factless fact tables" can for example be used for modeling many-to-many relationships or capture events.
An alternative approach is the four step design process described in Kimball.
Business process
A business process or business method is a collection of related, structured activities or tasks that produce a specific service or product for a particular customer or customers...
. It is often located at the centre of a star schema
Star schema
In computing, the star schema is the simplest style of data warehouse schema. The star schema consists of one or more fact tables referencing any number of dimension tables...
or a snowflake schema
Snowflake schema
In computing, a snowflake schema is a logical arrangement of tables in a multidimensional database such that the entity relationship diagram resembles a snowflake in shape. The snowflake schema is represented by centralized fact tables which are connected to multiple dimensions.The snowflake schema...
, surrounded by dimension table
Dimension table
In data warehousing, a dimension table is one of the set of companion tables to a fact table.The fact table contains business facts or measures and foreign keys which refer to candidate keys in the dimension tables....
s.
Fact tables provide the (usually) additive values that act as independent variables by which dimensional attributes are analyzed. Fact tables are often defined by their grain. The grain of a fact table represents the most atomic level by which the facts may be defined. The grain of a SALES fact table might be stated as "Sales volume by Day by Product by Store". Each record in this fact table is therefore uniquely defined by a day, product and store. Other dimensions might be members of this fact table (such as location/region) but these add nothing to the uniqueness of the fact records. These "affiliate dimensions" allow for additional slices of the independent facts but generally provide insights at a higher level of aggregation (a region contains many stores).
Example
If the business processBusiness process
A business process or business method is a collection of related, structured activities or tasks that produce a specific service or product for a particular customer or customers...
is SALES, then the corresponding fact table will typically contain columns representing both raw facts and aggregations
Aggregate (Data Warehouse)
Aggregates are used in dimensional models of the data warehouse to produce dramatic positive effects on the time it takes to query large sets of data. At the simplest form an aggregate is a simple summary table that can be derived by performing a Group by SQL query. A more common use of aggregates...
in rows such as:
- $12,000, being "sales for New York store for 15-Jan-2005"
- $34,000, being "sales for Los Angeles store for 15-Jan-2005"
- $22,000, being "sales for New York store for 16-Jan-2005"
- $50,000, being "sales for Los Angeles store for 16-Jan-2005"
- $21,000, being "average daily sales for Los Angeles Store for Jan-2005"
- $65,000, being "average daily sales for Los Angeles Store for Feb-2005"
- $33,000, being "average daily sales for Los Angeles Store for year 2005"
"average daily sales" is a measurement which is stored in the fact table. The fact table also contains foreign key
Foreign key
In the context of relational databases, a foreign key is a referential constraint between two tables.A foreign key is a field in a relational table that matches a candidate key of another table...
s from the dimension table
Dimension table
In data warehousing, a dimension table is one of the set of companion tables to a fact table.The fact table contains business facts or measures and foreign keys which refer to candidate keys in the dimension tables....
s, where time series
Time series
In statistics, signal processing, econometrics and mathematical finance, a time series is a sequence of data points, measured typically at successive times spaced at uniform time intervals. Examples of time series are the daily closing value of the Dow Jones index or the annual flow volume of the...
(e.g. dates) and other dimensions
Dimension (data warehouse)
In a data warehouse, a dimension is a data element that categorizes each item in a data set into non-overlapping regions. A data warehouse dimension provides the means to "slice and dice" data in a data warehouse. Dimensions provide structured labeling information to otherwise unordered numeric...
(e.g. store location, salesperson, product) are stored.
All foreign key
Foreign key
In the context of relational databases, a foreign key is a referential constraint between two tables.A foreign key is a field in a relational table that matches a candidate key of another table...
s between fact and dimension tables should be surrogate key
Surrogate key
A surrogate key in a database is a unique identifier for either an entity in the modeled world or an object in the database. The surrogate key is not derived from application data.- Definition :There are at least two definitions of a surrogate:...
s, not reused keys from operational data.
The centralized table in a star schema is called a fact table. A fact table typically has two types of columns: those that contain facts and those that are foreign keys to dimension tables. The primary key of a fact table is usually a composite key that is made up of all of its foreign keys. Fact tables contain the content of the data warehouse and store different types of measures like additive, non additive, and semi additive measures.
Measure types
- Additive - Measures that can be added across all dimensions.
- Non Additive - Measures that cannot be added across any dimension.
- Semi Additive - Measures that can be added across some dimensions and not across others.
A fact table might contain either detail level facts or facts that have been aggregated (fact tables that contain aggregated facts are often instead called summary tables).
Special care must be taken when handling ratios and percentage. One good design rule is to never store percentages or ratios in fact tables but only calculate these in the data access tool. Thus only store the numerator and denominator in the fact table, which then can be aggregated and the aggregated stored values can then be used for calculating the ratio or percentage in the data access tool.
In the real world, it is possible to have a fact table that contains no measures or facts. These tables are called "factless fact tables", or "junction tables
Junction table
In database management systems following the relational model, a junction table is a table that contains common fields from two or more tables. It is on the many side of a one-to-many relationship with each of the other tables...
".
The "Factless fact tables" can for example be used for modeling many-to-many relationships or capture events.
Types of fact tables
There are basically three fundamental measurement events, which characterizes all fact tables.- Transactional
- A transactional table is the most basic and fundamental. The grain associated with a transactional fact table is usually specified as "one row per line in a transaction", e.g., every line on a receipt. Typically a transactional fact table holds data of the most detailed level, causing it to have a great number of dimensionsDimension (data warehouse)In a data warehouse, a dimension is a data element that categorizes each item in a data set into non-overlapping regions. A data warehouse dimension provides the means to "slice and dice" data in a data warehouse. Dimensions provide structured labeling information to otherwise unordered numeric...
associated with it.- Periodic snapshots
- The periodic snapshot, as the name implies, takes a "picture of the moment", where the moment could be any defined period of time, e.g. a performance summary of a salesman over the previous month. A periodic snapshot table is dependent on the transactional table, as it needs the detailed data held in the transactional fact table in order to deliver the chosen performance output.
- Accumulating snapshots
- This type of fact table is used to show the activity of a process that has a well-defined beginning and end, e.g., the processing of an order. An order moves through specific steps until it is fully processed. As steps towards fulfilling the order are completed, the associated row in the fact table is updated. An accumulating snapshot table often has multiple date columns, each representing a milestone in the process. Therefore, it's important to have an entry in the associated date dimension that represents an unknown date, as many of the milestone dates are unknown at the time of the creation of the row.
Steps in designing a fact table
- Identify a business process for analysis (like sales).
- Identify measures or facts (sales dollar), by asking questions like 'What number of XX are relevant for the business process?', replacing the XX with various options that make sense within the context of the business.
- Identify dimensions for facts (product dimension, location dimension, time dimension, organization dimension), by asking questions that make sense within the context of the business, like 'Analyse by XX', where XX is replaced with the subject to test.
- List the columns that describe each dimension (region name, branch name, business unit name).
- Determine the lowest level (granularity) of summary in a fact table (e.g. sales dollars).
An alternative approach is the four step design process described in Kimball.