Association rule learning
Encyclopedia
In data mining
, association rule learning is a popular and
well researched method for discovering interesting relations between variables
in large databases. Piatetsky-Shapiro
describes analyzing and presenting
strong rules discovered in databases using different measures of
interestingness. Based on the concept of strong rules, Agrawal
et al.
introduced association rules for discovering regularities between
products in large scale transaction data recorded by point-of-sale (POS)
systems in supermarkets. For example, the rule
found in the sales data of a supermarket would indicate that if a customer buys
onions and potatoes together, he or she is likely to also buy hamburger meat. Such
information can be used as the basis for decisions about marketing activities
such as, e.g., promotional pricing
or product placement
s. In addition to the
above example from market basket analysis association rules are employed
today in many application areas including Web usage mining, intrusion detection
and bioinformatics
.
Following the original definition by Agrawal et al. the problem of association rule mining is
defined as:
Let be a set of
binary attributes called items. Let be a set of transactions called the database. Each
transaction in has a unique transaction ID and contains a
subset of the items in . A rule is defined as an implication
of the form where
and . The sets of items (for short
itemsets) and are called antecedent
(left-hand-side or LHS) and consequent (right-hand-side or RHS) of the
rule respectively.
To illustrate the concepts, we use a small example from the supermarket
domain. The set of items is
and a small database containing the items (1 codes presence and 0 absence
of an item in a transaction) is shown in
the table to the right. An example rule for the supermarket
could be
meaning that if butter and bread is bought, customers also buy milk.
Note: this example is extremely small. In practical applications, a rule needs a support of several hundred transactions before it can be considered statistically significant, and datasets often contain thousands or millions of transactions.
constraints on various measures of significance and interest can be
used. The best-known constraints are minimum thresholds on support and
confidence.
and a user-specified minimum confidence at the same time.
Association rule generation is usually split up into two separate steps:
While the second step is straight forward, the first step needs more attention.
Finding all frequent itemsets in a database is difficult since it involves
searching all possible itemsets (item combinations). The set of possible
itemsets is the power set over and has size
(excluding the empty set which is not a valid itemset).
Although the size of the powerset grows exponentially in the
number of items in , efficient search is possible
using the downward-closure property of support (also called anti-monotonicity) which guarantees
that for a frequent itemset, all its subsets are also frequent and thus for an
infrequent itemset, all its supersets must also be infrequent. Exploiting this
property, efficient algorithms (e.g., Apriori
and Eclat) can find all
frequent itemsets.
et al.
rules were proposed. Some popular measures are:
A definition of these measures can be found here. Several more measures are presented and compared by Tan et al. Looking for techniques that can model what the user has known (and using this models as interestingness measures) is currently an active research trend under the name of "Subjective Interestingness"
Some well known algorithms are Apriori, Eclat and FP-Growth, but they only do half the job, since they are algorithms for mining frequent itemsets. Another step needs to be done after to generate rules from frequent itemsets found in a database.
is a general method for exploratory data analysis that has theoretical foundations in observational calculi.
The ASSOC procedure is a GUHA method which mines for generalized association rules using fast bitstrings operations. The association rules mined by this method are more general than those output by apriori, for example "items" can be connected both with conjunction and disjunctions and the relation between antecedent and consequent of the rule is not restricted to setting minimum support and confidence as in apriori: an arbitrary combination of supported interest measures can be used.
Weighted class learning is another form of associative learning in which weight may be assigned to classes to give focus to a particular issue of concern for the consumer of the data mining results.
K-optimal pattern discovery
provides an alternative to the standard approach to association rule learning that requires that each pattern appear frequently in the data.
Mining frequent sequences uses support to find sequences in temporal data.
Generalized Association Rules hierarchical taxonomy (concept hierarchy)
Quantitiative Association Rules categorical and quantitative data
Interval Data Association Rules e.g. partition the age into 5-year-increment ranged
Maximal Association Rules
Sequential Association Rules temporal data e.g. first buy computer, then CD Roms, then a webcam.
Data mining
Data mining , a relatively young and interdisciplinary field of computer science is the process of discovering new patterns from large data sets involving methods at the intersection of artificial intelligence, machine learning, statistics and database systems...
, association rule learning is a popular and
well researched method for discovering interesting relations between variables
in large databases. Piatetsky-Shapiro
describes analyzing and presenting
strong rules discovered in databases using different measures of
interestingness. Based on the concept of strong rules, Agrawal
Rakesh Agrawal
Dr. Rakesh Agrawal is the Winthrop E. Stone Distinguished Professor of Chemical Engineering at Purdue University, West Lafayette, Indiana. Previously he was employed for more than two decades with Air Products and Chemicals, Inc., where he was elected to the highest technical position in the...
et al.
introduced association rules for discovering regularities between
products in large scale transaction data recorded by point-of-sale (POS)
systems in supermarkets. For example, the rule
found in the sales data of a supermarket would indicate that if a customer buys
onions and potatoes together, he or she is likely to also buy hamburger meat. Such
information can be used as the basis for decisions about marketing activities
such as, e.g., promotional pricing
Pricing
Pricing is the process of determining what a company will receive in exchange for its products. Pricing factors are manufacturing cost, market place, competition, market condition, and quality of product. Pricing is also a key variable in microeconomic price allocation theory. Pricing is a...
or product placement
Product placement
Product placement, or embedded marketing, is a form of advertisement, where branded goods or services are placed in a context usually devoid of ads, such as movies, music videos, the story line of television shows, or news programs. The product placement is often not disclosed at the time that the...
s. In addition to the
above example from market basket analysis association rules are employed
today in many application areas including Web usage mining, intrusion detection
Intrusion detection
In Information Security, intrusion detection is the act of detecting actions that attempt to compromise the confidentiality, integrity or availability of a resource. When Intrusion detection takes a preventive measure without direct human intervention, then it becomes an Intrusion-prevention...
and bioinformatics
Bioinformatics
Bioinformatics is the application of computer science and information technology to the field of biology and medicine. Bioinformatics deals with algorithms, databases and information systems, web technologies, artificial intelligence and soft computing, information and computation theory, software...
.
Definition
transaction ID | milk | bread | butter | beer |
---|---|---|---|---|
1 | 1 | 1 | 0 | 0 |
2 | 0 | 0 | 1 | 0 |
3 | 0 | 0 | 0 | 1 |
4 | 1 | 1 | 1 | 0 |
5 | 0 | 1 | 0 | 0 |
Following the original definition by Agrawal et al. the problem of association rule mining is
defined as:
Let be a set of
binary attributes called items. Let be a set of transactions called the database. Each
transaction in has a unique transaction ID and contains a
subset of the items in . A rule is defined as an implication
of the form where
and . The sets of items (for short
itemsets) and are called antecedent
(left-hand-side or LHS) and consequent (right-hand-side or RHS) of the
rule respectively.
To illustrate the concepts, we use a small example from the supermarket
domain. The set of items is
and a small database containing the items (1 codes presence and 0 absence
of an item in a transaction) is shown in
the table to the right. An example rule for the supermarket
could be
meaning that if butter and bread is bought, customers also buy milk.
Note: this example is extremely small. In practical applications, a rule needs a support of several hundred transactions before it can be considered statistically significant, and datasets often contain thousands or millions of transactions.
Useful Concepts
To select interesting rules from the set of all possible rules,constraints on various measures of significance and interest can be
used. The best-known constraints are minimum thresholds on support and
confidence.
- The support of an itemset is defined as the proportion of transactions in the data set which contain the itemset. In the example database, the itemset has a support of since it occurs in 20% of all transactions (1 out of 5 transactions).
- The confidence of a rule is defined . For example, the rule has a confidence of in the database, which means that for 50% of the transactions containing milk and bread the rule is correct.
- Confidence can be interpreted as an estimate of the probability , the probability of finding the RHS of the rule in transactions under the condition that these transactions also contain the LHS.
- The lift of a rule is defined as or the ratio of the observed support to that expected if X and Y were independent. The rule has a lift of .
- The conviction of a rule is defined as . The rule has a conviction of , and can be interpreted as the ratio of the expected frequency that X occurs without Y (that is to say, the frequency that the rule makes an incorrect prediction) if X and Y were independent divided by the observed frequency of incorrect predictions. In this example, the conviction value of 1.2 shows that the rule would be incorrect 20% more often (1.2 times as often) if the association between X and Y was purely random chance.
- The property of succinctness(Characterized by clear, precise expression in few words) of a constraint. A constraint is succinct if we are able to explicitly write down all Item-sets,that satisfy the constraint.
Example : Constraint C = S.Type = {NonFood}
Products that would satisfy this constraint are for ex. {Headphones,Shoes,Toilet paper}
Usage Example: Instead of using Apriori algorithmApriori algorithmIn computer science and data mining, Apriori is a classic algorithm for learning association rules. Apriori is designed to operate on databases containing transactions...
to obtain the Frequent-Item-sets we can instead create all the Item-sets and run support counting only once.
Process
Association rules are usually required to satisfy a user-specified minimum supportand a user-specified minimum confidence at the same time.
Association rule generation is usually split up into two separate steps:
- First, minimum support is applied to find all frequent itemsets in a database.
- Second, these frequent itemsets and the minimum confidence constraint are used to form rules.
While the second step is straight forward, the first step needs more attention.
Finding all frequent itemsets in a database is difficult since it involves
searching all possible itemsets (item combinations). The set of possible
itemsets is the power set over and has size
(excluding the empty set which is not a valid itemset).
Although the size of the powerset grows exponentially in the
number of items in , efficient search is possible
using the downward-closure property of support (also called anti-monotonicity) which guarantees
that for a frequent itemset, all its subsets are also frequent and thus for an
infrequent itemset, all its supersets must also be infrequent. Exploiting this
property, efficient algorithms (e.g., Apriori
and Eclat) can find all
frequent itemsets.
History
The concept of association rules was popularised particularly due to the 1993 article of Agrawal, which has acquired more than 6000 citations according to Google Scholar, as of March 2008, and is thus one of the most cited papers in the Data Mining field. However, it is possible that what is now called "association rules" is similar to what appears in the 1966 paper on GUHA, a general data mining method developed by Petr HájekPetr Hájek
Petr Hájek is a Czech scientist in the area of mathematical logic and a professor of mathematics. He works at the Institute of Computer Science at the Academy of Sciences of the Czech Republic and worked as a lecturer at the Faculty of Mathematics and Physics at the Charles University in Prague...
et al.
Alternative measures of interestingness
Next to confidence also other measures of interestingness forrules were proposed. Some popular measures are:
- All-confidence
- Collective strength
- Conviction
- Leverage
- Lift (originally called interest)
A definition of these measures can be found here. Several more measures are presented and compared by Tan et al. Looking for techniques that can model what the user has known (and using this models as interestingness measures) is currently an active research trend under the name of "Subjective Interestingness"
Statistically sound associations
One limitation of the standard approach to discovering associations is that by searching massive numbers of possible associations to look for collections of items that appear to be associated, there is a large risk of finding many spurious associations. These are collections of items that co-occur with unexpected frequency in the data, but only do so by chance. For example, suppose we are considering a collection of 10,000 items and looking for rules containing two items in the left-hand-side and 1 item in the right-hand-side. There are approximately 1,000,000,000,000 such rules. If we apply a statistical test for independence with a significance level of 0.05 it means there is only a 5% chance of accepting a rule if there is no association. If we assume there are no associations, we should nonetheless expect to find 50,000,000,000 rules. Statistically sound association discovery controls this risk, in most cases reducing the risk of finding any spurious associations to a user-specified significance level.Algorithms
Many algorithms for generating association rules were presented over time.Some well known algorithms are Apriori, Eclat and FP-Growth, but they only do half the job, since they are algorithms for mining frequent itemsets. Another step needs to be done after to generate rules from frequent itemsets found in a database.
Apriori algorithm
Apriori is the best-known algorithm to mine association rules. It uses a breadth-first search strategy to counting the support of itemsets and uses a candidate generation function which exploits the downward closure property of support.FP-growth algorithm
FP-growth (frequent pattern growth) uses an extended prefix-tree (FP-tree) structure to store the database in a compressed form. FP-growth adopts a divide-and-conquer approach to decompose both the mining tasks and the databases. It uses a pattern fragment growth method to avoid the costly process of candidate generation and testing used by Apriori.GUHA procedure ASSOC
GUHAGuha
Guha is an Indian family name and surname found amongst Bengali Hindus. It is also another name for the Hindu god Kartikeya. Guhas belong to the Kayastha community, a class of Kshatriyas that originated from Kannauj, the capital of India during much of the classical period, and immigrated to...
is a general method for exploratory data analysis that has theoretical foundations in observational calculi.
The ASSOC procedure is a GUHA method which mines for generalized association rules using fast bitstrings operations. The association rules mined by this method are more general than those output by apriori, for example "items" can be connected both with conjunction and disjunctions and the relation between antecedent and consequent of the rule is not restricted to setting minimum support and confidence as in apriori: an arbitrary combination of supported interest measures can be used.
OPUS search
OPUS is an efficient algorithm for rule discovery that, in contrast to most alternatives, does not require either monotone or anti-monotone constraints such as minimum support. Initially used to find rules for a fixed consequent it has subsequently been extended to find rules with any item as a consequent. OPUS search is the core technology in the popular Magnum Opus association discovery system.Lore
A famous story about association rule mining is the "beer and diaper" story. A purported survey of behavior of supermarket shoppers discovered that customers (presumably young men) who buy diapers tend also to buy beer. This anecdote became popular as an example of how unexpected association rules might be found from everyday data. There are varying opinions as to how much of the story is true. Daniel Powers saysIn 1992, Thomas Blischok, manager of a retail consulting group at Teradata, and his staff prepared an analysis of 1.2 million market baskets from about 25 Osco Drug stores. Database queries were developed to identify affinities. The analysis "did discover that between 5:00 and 7:00 p.m. that consumers bought beer and diapers". Osco
managers did NOT exploit the beer and diapers relationship by moving the products closer together on the shelves.
Other types of association mining
Contrast set learning is a form of associative learning. Contrast set learners use rules that differ meaningfully in their distribution across subsets.Weighted class learning is another form of associative learning in which weight may be assigned to classes to give focus to a particular issue of concern for the consumer of the data mining results.
K-optimal pattern discovery
K-optimal pattern discovery
K-optimal pattern discovery is a data mining technique that provides an alternative to the frequent pattern discovery approach that underlies most association rule learning techniques....
provides an alternative to the standard approach to association rule learning that requires that each pattern appear frequently in the data.
Mining frequent sequences uses support to find sequences in temporal data.
Generalized Association Rules hierarchical taxonomy (concept hierarchy)
Quantitiative Association Rules categorical and quantitative data
Interval Data Association Rules e.g. partition the age into 5-year-increment ranged
Maximal Association Rules
Sequential Association Rules temporal data e.g. first buy computer, then CD Roms, then a webcam.
Bibliographies
- Annotated Bibliography on Association Rules by M. Hahsler
- Statsoft Electronic Statistics Textbook: Association Rules
Implementations
- KXEN, a commercial Data Mining software
- Silverlight widget for live demonstration of association rule mining using Apriori algorithm
- RapidMiner, a free Java data mining software suite (Community Edition: GNU)
- OrangeOrange (software)Orange is a component-based data mining and machine learning software suite, featuring friendly yet powerful and flexible visual programming front-end for explorative data analysis and visualization, and Python bindings and libraries for scripting...
, a free data mining software suite, module orngAssoc - Ruby implementation (AI4R)
- arules, a package for mining association rules and frequent itemsets with RR (programming language)R is a programming language and software environment for statistical computing and graphics. The R language is widely used among statisticians for developing statistical software, and R is widely used for statistical software development and data analysis....
. - C. Borgelt's implementation of Apriori and Eclat
- Frequent Itemset Mining Implementations Repository (FIMI)
- Frequent pattern mining implementations from Bart Goethals
- Weka, a collection of machine learning algorithms for data mining tasks written in JavaJava (programming language)Java is a programming language originally developed by James Gosling at Sun Microsystems and released in 1995 as a core component of Sun Microsystems' Java platform. The language derives much of its syntax from C and C++ but has a simpler object model and fewer low-level facilities...
. - KNIMEKNIMEKNIME, the Konstanz Information Miner, is a user friendly, coherent open source data analytics, reporting and integration platform. KNIME integrates various components for machine learning and data mining through its modular data pipelining concept...
an open source workflow oriented data preprocessing and analysis platform. - Data Mining Software by Mohammed J. Zaki
- Magnum Opus, a system for statistically sound association discovery.
- LISp Miner, Mines for generalized (GUHA) association rules. Uses bitstrings not apriori algorithm.
- Ferda Dataminer, An extensible visual data mining platform, implements GUHA procedures ASSOC. Features multirelational data mining.
- STATISTICA, commercial statistics software with an Association Rules module.
- SPMF, Java implementations of 39 frequent itemset, sequential pattern, sequential rule, and association rule mining algorithms
- ARtool, GPL Java association rule mining application with GUI, offering implementations of multiple algorithms for discovery of frequent patterns and extraction of association rules (includes Apriori and FPgrowth)