Regularization (machine learning)
Encyclopedia
In statistics
and machine learning
, regularization is any method of preventing overfitting
of data by a model. It is used for solving ill-conditioned parameter-estimation problems. Typical examples of regularization in statistical machine learning include ridge regression (known also as Tikhonov regularization), lasso, and L2-norm in support vector machines.
Regularization methods are also used for model selection, where they work by implicitly or explicitly penalizing models based on the number of their parameters. For example, Bayesian learning methods make use of a prior probability
that (usually) gives lower probability to more complex models. Well-known model selection techniques include the Akaike information criterion
(AIC), minimum description length
(MDL), and the Bayesian information criterion (BIC). Alternative methods of controlling overfitting include cross validation.
Examples of applications of different methods of regularization to the linear model
are:
Statistics
Statistics is the study of the collection, organization, analysis, and interpretation of data. It deals with all aspects of this, including the planning of data collection in terms of the design of surveys and experiments....
and machine learning
Machine learning
Machine learning, a branch of artificial intelligence, is a scientific discipline concerned with the design and development of algorithms that allow computers to evolve behaviors based on empirical data, such as from sensor data or databases...
, regularization is any method of preventing overfitting
Overfitting
In statistics, overfitting occurs when a statistical model describes random error or noise instead of the underlying relationship. Overfitting generally occurs when a model is excessively complex, such as having too many parameters relative to the number of observations...
of data by a model. It is used for solving ill-conditioned parameter-estimation problems. Typical examples of regularization in statistical machine learning include ridge regression (known also as Tikhonov regularization), lasso, and L2-norm in support vector machines.
Regularization methods are also used for model selection, where they work by implicitly or explicitly penalizing models based on the number of their parameters. For example, Bayesian learning methods make use of a prior probability
Prior probability
In Bayesian statistical inference, a prior probability distribution, often called simply the prior, of an uncertain quantity p is the probability distribution that would express one's uncertainty about p before the "data"...
that (usually) gives lower probability to more complex models. Well-known model selection techniques include the Akaike information criterion
Akaike information criterion
The Akaike information criterion is a measure of the relative goodness of fit of a statistical model. It was developed by Hirotsugu Akaike, under the name of "an information criterion" , and was first published by Akaike in 1974...
(AIC), minimum description length
Minimum description length
The minimum description length principle is a formalization of Occam's Razor in which the best hypothesis for a given set of data is the one that leads to the best compression of the data. MDL was introduced by Jorma Rissanen in 1978...
(MDL), and the Bayesian information criterion (BIC). Alternative methods of controlling overfitting include cross validation.
Model selection
Regularization can be used to fine tune model complexity using an augmented error function with cross-validation. The data sets used in complex models can produce a levelling-off of validation as complexity of the models increases. Training data sets errors decrease while the validation data set error remains constant. Regularization introduces a second factor which weights the penalty against more complex models with an increasing variance in the data errors. This gives an increasing penalty as model complexity increases.Examples of applications of different methods of regularization to the linear model
Linear model
In statistics, the term linear model is used in different ways according to the context. The most common occurrence is in connection with regression models and the term is often taken as synonymous with linear regression model. However the term is also used in time series analysis with a different...
are:
Model | Fit measure | Entropy measure |
---|---|---|
AIC Akaike information criterion The Akaike information criterion is a measure of the relative goodness of fit of a statistical model. It was developed by Hirotsugu Akaike, under the name of "an information criterion" , and was first published by Akaike in 1974... /BIC |
||
Ridge regression | ||
Lasso | ||
RLAD | ||
Dantzig Selector |