ID3 algorithm
Encyclopedia
In decision tree learning
, ID3 (Iterative Dichotomiser 3) is an algorithm
used to generate a decision tree
invented by Ross Quinlan
. ID3 is the precursor to the C4.5 algorithm
.
The algorithm is as follows:
ID3 (Examples, Target_Attribute, Attributes)
: it prefers smaller decision trees (simpler theories) over larger ones. However, it does not always produce the smallest tree, and is therefore a heuristic
. Occam's razor is formalized using the concept of information entropy
:
Where :
An entropy of 0 identifies a perfectly classified set.
Entropy is used to determine which node to split next in the algorithm. The higher the entropy, the higher the potential to improve the classification here.
Where :
Gain quantifies the entropy improvement by splitting over an attribute: higher is better.
Decision tree learning
Decision tree learning, used in statistics, data mining and machine learning, uses a decision tree as a predictive model which maps observations about an item to conclusions about the item's target value. More descriptive names for such tree models are classification trees or regression trees...
, ID3 (Iterative Dichotomiser 3) is an algorithm
Algorithm
In mathematics and computer science, an algorithm is an effective method expressed as a finite list of well-defined instructions for calculating a function. Algorithms are used for calculation, data processing, and automated reasoning...
used to generate a decision tree
Decision tree learning
Decision tree learning, used in statistics, data mining and machine learning, uses a decision tree as a predictive model which maps observations about an item to conclusions about the item's target value. More descriptive names for such tree models are classification trees or regression trees...
invented by Ross Quinlan
Ross Quinlan
John Ross Quinlan is a computer science researcher in data mining and decision theory. He has contributed extensively to the development of decision tree algorithms, including inventing the canonical C4.5 and ID3 algorithms...
. ID3 is the precursor to the C4.5 algorithm
C4.5 algorithm
C4.5 is an algorithm used to generate a decision tree developed by Ross Quinlan. C4.5 is an extension of Quinlan's earlier ID3 algorithm. The decision trees generated by C4.5 can be used for classification, and for this reason, C4.5 is often referred to as a statistical classifier.-Algorithm:C4.5...
.
Algorithm
The ID3 algorithm can be summarized as follows:- Take all unused attributes and count their entropy concerning test samples
- Choose attribute for which entropy is minimum (or, equivalently, information gain is maximum)
- Make node containing that attribute
The algorithm is as follows:
ID3 (Examples, Target_Attribute, Attributes)
- Create a root node for the tree
- If all examples are positive, Return the single-node tree Root, with label = +.
- If all examples are negative, Return the single-node tree Root, with label = -.
- If number of predicting attributes is empty, then Return the single node tree Root, with label = most common value of the target attribute in the examples.
- Otherwise Begin
- A = The Attribute that best classifies examples.
- Decision Tree attribute for Root = A.
- For each possible value, , of A,
- Add a new tree branch below Root, corresponding to the test A = .
- Let Examples() be the subset of examples that have the value for A
- If Examples() is empty
- Then below this new branch add a leaf node with label = most common target value in the examples
- Else below this new branch add the subtree ID3 (Examples(), Target_Attribute, Attributes – {A})
- End
- Return Root
The ID3 metrics
The algorithm is based on Occam's razorOccam's razor
Occam's razor, also known as Ockham's razor, and sometimes expressed in Latin as lex parsimoniae , is a principle that generally recommends from among competing hypotheses selecting the one that makes the fewest new assumptions.-Overview:The principle is often summarized as "simpler explanations...
: it prefers smaller decision trees (simpler theories) over larger ones. However, it does not always produce the smallest tree, and is therefore a heuristic
Heuristic
Heuristic refers to experience-based techniques for problem solving, learning, and discovery. Heuristic methods are used to speed up the process of finding a satisfactory solution, where an exhaustive search is impractical...
. Occam's razor is formalized using the concept of information entropy
Information entropy
In information theory, entropy is a measure of the uncertainty associated with a random variable. In this context, the term usually refers to the Shannon entropy, which quantifies the expected value of the information contained in a message, usually in units such as bits...
:
Entropy
Where :
- is the information entropyInformation entropyIn information theory, entropy is a measure of the uncertainty associated with a random variable. In this context, the term usually refers to the Shannon entropy, which quantifies the expected value of the information contained in a message, usually in units such as bits...
of the set ; - is the number of different values of the attribute in (entropy is computed for one chosen attribute)
- is the frequency (proportion) of the value in the set
- is the binary logarithmBinary logarithmIn mathematics, the binary logarithm is the logarithm to the base 2. It is the inverse function of n ↦ 2n. The binary logarithm of n is the power to which the number 2 must be raised to obtain the value n. This makes the binary logarithm useful for anything involving powers of 2,...
An entropy of 0 identifies a perfectly classified set.
Entropy is used to determine which node to split next in the algorithm. The higher the entropy, the higher the potential to improve the classification here.
Gain
Gain is computed to estimate the gain produced by a split over an attribute :Where :
- is the gain of the set after a split over the attribute
- is the information entropyInformation entropyIn information theory, entropy is a measure of the uncertainty associated with a random variable. In this context, the term usually refers to the Shannon entropy, which quantifies the expected value of the information contained in a message, usually in units such as bits...
of the set - is the number of different values of the attribute in
- is the frequency (proportion) of the items possessing as value for in
- is possible value of
- is a subset of containing all items where the value of is
Gain quantifies the entropy improvement by splitting over an attribute: higher is better.
External links
- Seminars - http://www2.cs.uregina.ca/
- Description and examples - http://www.cise.ufl.edu/
- Description and examples - http://www.cis.temple.edu/
- An implementation of ID3 in Python
- An implementation of ID3 in Ruby
- An implementation of ID3 in Common Lisp
- An implementation of ID3 algorithm in C#
- An implementation of ID3 in Perl
- An implementation of ID3 in Prolog
- An implementation of ID3 in C