Boosting
Encyclopedia
Boosting is a machine learning
meta-algorithm for performing supervised learning
. Boosting is based on the question posed by Kearns: can a set of weak learners create a single strong learner? A weak learner is defined to be a classifier which is only slightly correlated with the true classification (it can label examples better than random guessing). In contrast, a strong learner is a classifier that is arbitrarily well-correlated with the true classification.
Schapire's affirmative answer to Kearns' question has had significant ramifications in machine learning
and statistics
, most notably leading to the development of boosting.
). Thus, future weak learners focus more on the examples that previous weak learners misclassified.
There are many boosting algorithms. The original ones, proposed by Robert Schapire
(a recursive majority gate formulation ) and Yoav Freund
(boost by majority ), were not adaptive and could not take full advantage of the weak learners.
Only algorithms that are provable boosting algorithms in the probably approximately correct learning
formulation are called boosting algorithms. Other algorithms that are similar in spirit to boosting algorithms are sometimes called "leveraging algorithms", although they are also sometimes incorrectly called boosting algorithms.
is very popular and perhaps the most significant historically as it was the first algorithm that could adapt to the weak learners. However, there are many more recent algorithms such as LPBoost
, TotalBoost, BrownBoost
, MadaBoost, LogitBoost
, and others. Many boosting algorithms fit into the AnyBoost framework, which shows that boosting performs gradient descent
in function space
using a convex
cost function. In 2008 Phillip Long (at Google) and Rocco A. Servedio (Columbia University) published a paper at the 25th International Conference for Machine Learning suggesting that these algorithms are provably flawed in that "convex potential boosters cannot withstand random classification
noise," thus making the applicability of such algorithms for real world, noisy data sets questionable.
Machine learning
Machine learning, a branch of artificial intelligence, is a scientific discipline concerned with the design and development of algorithms that allow computers to evolve behaviors based on empirical data, such as from sensor data or databases...
meta-algorithm for performing supervised learning
Supervised learning
Supervised learning is the machine learning task of inferring a function from supervised training data. The training data consist of a set of training examples. In supervised learning, each example is a pair consisting of an input object and a desired output value...
. Boosting is based on the question posed by Kearns: can a set of weak learners create a single strong learner? A weak learner is defined to be a classifier which is only slightly correlated with the true classification (it can label examples better than random guessing). In contrast, a strong learner is a classifier that is arbitrarily well-correlated with the true classification.
Schapire's affirmative answer to Kearns' question has had significant ramifications in machine learning
Machine learning
Machine learning, a branch of artificial intelligence, is a scientific discipline concerned with the design and development of algorithms that allow computers to evolve behaviors based on empirical data, such as from sensor data or databases...
and statistics
Statistics
Statistics is the study of the collection, organization, analysis, and interpretation of data. It deals with all aspects of this, including the planning of data collection in terms of the design of surveys and experiments....
, most notably leading to the development of boosting.
Boosting algorithms
While boosting is not algorithmically constrained, most boosting algorithms consist of iteratively learning weak classifiers with respect to a distribution and adding them to a final strong classifier. When they are added, they are typically weighted in some way that is usually related to the weak learners' accuracy. After a weak learner is added, the data is reweighted: examples that are misclassified gain weight and examples that are classified correctly lose weight (some boosting algorithms actually decrease the weight of repeatedly misclassified examples, e.g., boost by majority and BrownBoostBrownBoost
BrownBoost is a boosting algorithm that may be robust to noisy datasets. BrownBoost is an adaptive version of the boost by majority algorithm. As is true for all boosting algorithms, BrownBoost is used in conjunction with other machine learning methods...
). Thus, future weak learners focus more on the examples that previous weak learners misclassified.
There are many boosting algorithms. The original ones, proposed by Robert Schapire
Robert Schapire
Robert Elias Schapire is a professor and researcher in the computer science department at Princeton University. His primary specialty is theoretical and applied machine learning....
(a recursive majority gate formulation ) and Yoav Freund
Yoav Freund
Yoav Freund is a researcher and professor at the University of California, San Diego who mainly works on machine learning, probability theory and related fields and applications.From his homepage:...
(boost by majority ), were not adaptive and could not take full advantage of the weak learners.
Only algorithms that are provable boosting algorithms in the probably approximately correct learning
Probably approximately correct learning
In computational learning theory, probably approximately correct learning is a framework for mathematical analysis of machine learning. It was proposed in 1984 by Leslie Valiant....
formulation are called boosting algorithms. Other algorithms that are similar in spirit to boosting algorithms are sometimes called "leveraging algorithms", although they are also sometimes incorrectly called boosting algorithms.
Examples of boosting algorithms
The main variation between many boosting algorithms is their method of weighting training data points and hypotheses. AdaBoostAdaBoost
AdaBoost, short for Adaptive Boosting, is a machine learning algorithm, formulated by Yoav Freund and Robert Schapire. It is a meta-algorithm, and can be used in conjunction with many other learning algorithms to improve their performance. AdaBoost is adaptive in the sense that subsequent...
is very popular and perhaps the most significant historically as it was the first algorithm that could adapt to the weak learners. However, there are many more recent algorithms such as LPBoost
LPBoost
Linear Programming Boosting is a supervised classifier from the Boosting family of classifiers. LPBoost maximizes a margin between training samples of different classes and hence also belongs to the class of margin-maximizing supervised classification algorithms...
, TotalBoost, BrownBoost
BrownBoost
BrownBoost is a boosting algorithm that may be robust to noisy datasets. BrownBoost is an adaptive version of the boost by majority algorithm. As is true for all boosting algorithms, BrownBoost is used in conjunction with other machine learning methods...
, MadaBoost, LogitBoost
LogitBoost
LogitBoost is a boosting algorithm formulated by Jerome Friedman, Trevor Hastie, and Robert Tibshirani. The original paper casts the AdaBoost algorithm into a statistics framework...
, and others. Many boosting algorithms fit into the AnyBoost framework, which shows that boosting performs gradient descent
Gradient descent
Gradient descent is a first-order optimization algorithm. To find a local minimum of a function using gradient descent, one takes steps proportional to the negative of the gradient of the function at the current point...
in function space
Function space
In mathematics, a function space is a set of functions of a given kind from a set X to a set Y. It is called a space because in many applications it is a topological space, a vector space, or both.-Examples:...
using a convex
Convex function
In mathematics, a real-valued function f defined on an interval is called convex if the graph of the function lies below the line segment joining any two points of the graph. Equivalently, a function is convex if its epigraph is a convex set...
cost function. In 2008 Phillip Long (at Google) and Rocco A. Servedio (Columbia University) published a paper at the 25th International Conference for Machine Learning suggesting that these algorithms are provably flawed in that "convex potential boosters cannot withstand random classification
noise," thus making the applicability of such algorithms for real world, noisy data sets questionable.
See also
|
LPBoost Linear Programming Boosting is a supervised classifier from the Boosting family of classifiers. LPBoost maximizes a margin between training samples of different classes and hence also belongs to the class of margin-maximizing supervised classification algorithms... Logistic regression In statistics, logistic regression is used for prediction of the probability of occurrence of an event by fitting data to a logit function logistic curve. It is a generalized linear model used for binomial regression... Principle of maximum entropy In Bayesian probability, the principle of maximum entropy is a postulate which states that, subject to known constraints , the probability distribution which best represents the current state of knowledge is the one with largest entropy.Let some testable information about a probability distribution... Neural network The term neural network was traditionally used to refer to a network or circuit of biological neurons. The modern usage of the term often refers to artificial neural networks, which are composed of artificial neurons or nodes... s Support vector machine A support vector machine is a concept in statistics and computer science for a set of related supervised learning methods that analyze data and recognize patterns, used for classification and regression analysis... s |
Gradient boosting Gradient boosting is a machine learning technique for regression problems, which produces a prediction model in the form of an ensemble of weak prediction models, typically decision trees. It builds the model in a stage-wise fashion like other boosting methods do, and it generalizes them by... Margin classifier In machine learning, a margin classifer is a classifier which is able to give an associated distance from the decision boundary for each example. For instance, if a linear classifier In machine learning, a margin classifer is a classifier which is able to give an associated distance from the... s Machine learning Machine learning, a branch of artificial intelligence, is a scientific discipline concerned with the design and development of algorithms that allow computers to evolve behaviors based on empirical data, such as from sensor data or databases... Boosting methods for object categorization Given images containing various known objects in the world, a classifier can be learned from them to automatically categorize the objects in future images. Simple classifiers built based on some image feature of the object tend to be weak in categorization performance... |
Implementations
- OrangeOrange (software)Orange is a component-based data mining and machine learning software suite, featuring friendly yet powerful and flexible visual programming front-end for explorative data analysis and visualization, and Python bindings and libraries for scripting...
, a free data mining software suite, module orngEnsemble - WekaWeka (machine learning)Weka is a popular suite of machine learning software written in Java, developed at the University of Waikato, New Zealand...
is a machine learning set of tools that offers variate implementations of boosting algorithms like AdaBoost and LogitBoost - R package GBM (Generalized Boosted Regression Models) implements extensions to Freund and Schapire's AdaBoost algorithm and Friedman's gradient boosting machine.
Notations
- Yoav Freund and Robert E. Schapire A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences, 55(1):119--139, 1997. http://www.cse.ucsd.edu/~yfreund/papers/adaboost.pdf
- Robert E. Schapire and Yoram Singer. Improved Boosting Algorithms Using Confidence-Rated Predictors. Machine Learning, 37(3):297--336, 1999. http://citeseer.ist.psu.edu/schapire99improved.html