Generalized linear model
Encyclopedia
In statistics
, the generalized linear model (GLM) is a flexible generalization of ordinary linear regression
. The GLM generalizes linear regression by allowing the linear model to be related to the response variable via a link function and by allowing the magnitude of the variance of each measurement to be a function of its predicted value.
Generalized linear models were formulated by John Nelder
and Robert Wedderburn
as a way of unifying various other statistical models, including linear regression
, logistic regression
and Poisson regression
. They proposed an iteratively reweighted least squares method
for maximum likelihood
estimation of the model parameters. Maximum-likelihood estimation remains popular and is the default method on many statistical computing packages. Other approaches, including Bayesian approaches
and least squares
fits to variance stabilized
responses, have been developed.
in the exponential family
, a large range of probability distributions that includes the normal, binomial and poisson
distributions, among others. The mean, μ, of the distribution depends on the independent variables, X, through:
where E(Y) is the expected value
of Y; Xβ is the linear predictor, a linear combination of unknown parameters, β; g is the link function.
In this framework, the variance is typically a function, V, of the mean:
It is convenient if V follows from the exponential family distribution, but it may simply be that the variance is a function of the predicted value.
The unknown parameters, β, are typically estimated with maximum likelihood
, maximum quasi-likelihood
, or Bayesian
techniques.
3. A link function g such that E(Y) = μ = g-1(η).
and exponential dispersion model
of distributions and includes those probability distributions, parameterized by and , whose density functions f (or probability mass function
, for the case of a discrete distribution) can be expressed in the form
Statistics
Statistics is the study of the collection, organization, analysis, and interpretation of data. It deals with all aspects of this, including the planning of data collection in terms of the design of surveys and experiments....
, the generalized linear model (GLM) is a flexible generalization of ordinary linear regression
Linear regression
In statistics, linear regression is an approach to modeling the relationship between a scalar variable y and one or more explanatory variables denoted X. The case of one explanatory variable is called simple regression...
. The GLM generalizes linear regression by allowing the linear model to be related to the response variable via a link function and by allowing the magnitude of the variance of each measurement to be a function of its predicted value.
Generalized linear models were formulated by John Nelder
John Nelder
John Ashworth Nelder FRS was a British statistician known for his contributions to experimental design, analysis of variance, computational statistics, and statistical theory.-Contributions:...
and Robert Wedderburn
Robert Wedderburn (statistician)
Robert William Maclagan Wedderburn was a Scottish statistician who worked at the Rothamsted Experimental Station. He was co-developer, with John Nelder, of the generalized linear model methodology,...
as a way of unifying various other statistical models, including linear regression
Linear regression
In statistics, linear regression is an approach to modeling the relationship between a scalar variable y and one or more explanatory variables denoted X. The case of one explanatory variable is called simple regression...
, logistic regression
Logistic regression
In statistics, logistic regression is used for prediction of the probability of occurrence of an event by fitting data to a logit function logistic curve. It is a generalized linear model used for binomial regression...
and Poisson regression
Poisson regression
In statistics, Poisson regression is a form of regression analysis used to model count data and contingency tables. Poisson regression assumes the response variable Y has a Poisson distribution, and assumes the logarithm of its expected value can be modeled by a linear combination of unknown...
. They proposed an iteratively reweighted least squares method
Iterative method
In computational mathematics, an iterative method is a mathematical procedure that generates a sequence of improving approximate solutions for a class of problems. A specific implementation of an iterative method, including the termination criteria, is an algorithm of the iterative method...
for maximum likelihood
Maximum likelihood
In statistics, maximum-likelihood estimation is a method of estimating the parameters of a statistical model. When applied to a data set and given a statistical model, maximum-likelihood estimation provides estimates for the model's parameters....
estimation of the model parameters. Maximum-likelihood estimation remains popular and is the default method on many statistical computing packages. Other approaches, including Bayesian approaches
Bayesian statistics
Bayesian statistics is that subset of the entire field of statistics in which the evidence about the true state of the world is expressed in terms of degrees of belief or, more specifically, Bayesian probabilities...
and least squares
Least squares
The method of least squares is a standard approach to the approximate solution of overdetermined systems, i.e., sets of equations in which there are more equations than unknowns. "Least squares" means that the overall solution minimizes the sum of the squares of the errors made in solving every...
fits to variance stabilized
Variance-stabilizing transformation
In applied statistics, a variance-stabilizing transformation is a data transformation that is specifically chosen either to simplify considerations in graphical exploratory data analysis or to allow the application of simple regression-based or analysis of variance techniques.The aim behind the...
responses, have been developed.
Overview
In a GLM, each outcome of the dependent variables, Y, is assumed to be generated from a particular distributionProbability distribution
In probability theory, a probability mass, probability density, or probability distribution is a function that describes the probability of a random variable taking certain values....
in the exponential family
Exponential family
In probability and statistics, an exponential family is an important class of probability distributions sharing a certain form, specified below. This special form is chosen for mathematical convenience, on account of some useful algebraic properties, as well as for generality, as exponential...
, a large range of probability distributions that includes the normal, binomial and poisson
Poisson distribution
In probability theory and statistics, the Poisson distribution is a discrete probability distribution that expresses the probability of a given number of events occurring in a fixed interval of time and/or space if these events occur with a known average rate and independently of the time since...
distributions, among others. The mean, μ, of the distribution depends on the independent variables, X, through:
where E(Y) is the expected value
Expected value
In probability theory, the expected value of a random variable is the weighted average of all possible values that this random variable can take on...
of Y; Xβ is the linear predictor, a linear combination of unknown parameters, β; g is the link function.
In this framework, the variance is typically a function, V, of the mean:
It is convenient if V follows from the exponential family distribution, but it may simply be that the variance is a function of the predicted value.
The unknown parameters, β, are typically estimated with maximum likelihood
Maximum likelihood
In statistics, maximum-likelihood estimation is a method of estimating the parameters of a statistical model. When applied to a data set and given a statistical model, maximum-likelihood estimation provides estimates for the model's parameters....
, maximum quasi-likelihood
Quasi-likelihood
In statistics, quasi-likelihood estimation is one way of allowing for overdispersion, that is, greater variability in the data than would be expected from the statistical model used. It is most often used with models for count data or grouped binary data, i.e...
, or Bayesian
Bayesian probability
Bayesian probability is one of the different interpretations of the concept of probability and belongs to the category of evidential probabilities. The Bayesian interpretation of probability can be seen as an extension of logic that enables reasoning with propositions, whose truth or falsity is...
techniques.
Model components
The GLM consists of three elements:- 1. A probability distribution from the exponential family.
- 2. A linear predictor η = Xβ
Probability distribution
The overdispersed exponential family of distributions is a generalization of the exponential familyExponential family
In probability and statistics, an exponential family is an important class of probability distributions sharing a certain form, specified below. This special form is chosen for mathematical convenience, on account of some useful algebraic properties, as well as for generality, as exponential...
and exponential dispersion model
Exponential dispersion model
Exponential dispersion models are statistical models in which the probability distribution is of a special form. This class of models represents a generalisation of the exponential family of models which themselves play an important role in statistical theory because they have a special structure...
of distributions and includes those probability distributions, parameterized by and , whose density functions f (or probability mass function
Probability mass function
In probability theory and statistics, a probability mass function is a function that gives the probability that a discrete random variable is exactly equal to some value...
, for the case of a discrete distribution) can be expressed in the form
-
, called the dispersion parameter, typically is known and is usually related to the variance of the distribution. The functions , , , , and are known. Many, although not all, common distributions are in this family.
For scalar and , this reduces to
is related to the mean of the distribution. If is the identity function, then the distribution is said to be in canonical formCanonical formGenerally, in mathematics, a canonical form of an object is a standard way of presenting that object....
(or natural form). Note that any distribution can be converted to canonical form by rewriting as and then applying the transformation . It is always possible to convert in terms of the new parametrization, even if is not a one-to-one function; see comments in the page on the exponential familyExponential familyIn probability and statistics, an exponential family is an important class of probability distributions sharing a certain form, specified below. This special form is chosen for mathematical convenience, on account of some useful algebraic properties, as well as for generality, as exponential...
. If, in addition, is the identity and is known, then is called the canonical parameter (or natural parameter) and is related to the mean through
For scalar and , this reduces to
Under this scenario, the variance of the distribution can be shown to be
For scalar and , this reduces to
Linear predictor
The linear predictor is the quantity which incorporates the information about the independent variables into the model. The symbol η (GreekGreek alphabetThe Greek alphabet is the script that has been used to write the Greek language since at least 730 BC . The alphabet in its classical and modern form consists of 24 letters ordered in sequence from alpha to omega...
"etaEta (letter)Eta ) is the seventh letter of the Greek alphabet. Originally denoting a consonant /h/, its sound value in the classical Attic dialect of Ancient Greek was a long vowel , raised to in medieval Greek, a process known as itacism.In the system of Greek numerals it has a value of 8...
") is typically used to denote a linear predictor. It is related to the expected valueExpected valueIn probability theory, the expected value of a random variable is the weighted average of all possible values that this random variable can take on...
of the data (thus, "predictor") through the link function.
η is expressed as linear combinations (thus, "linear") of unknown parameters β. The coefficients of the linear combination are represented as the matrix of independent variables X. η can thus be expressed as
The elements of X are either measured by the experimenters or stipulated by them in the modeling design process.
Link function
The link function provides the relationship between the linear predictor and the meanExpected valueIn probability theory, the expected value of a random variable is the weighted average of all possible values that this random variable can take on...
of the distribution function. There are many commonly used link functions, and their choice can be somewhat arbitrary. It can be convenient to match the domain of the link function to the range of the distribution function's mean.
When using a distribution function with a canonical parameter , the canonical link function is the function that expresses in terms of , i.e. . For the most common distributions, the mean is one of the parameters in the standard form of the distribution's density function, and then is the function as defined above that maps the density function into its canonical form. When using the canonical link function, , which allows to be a sufficient statisticSufficiency (statistics)In statistics, a sufficient statistic is a statistic which has the property of sufficiency with respect to a statistical model and its associated unknown parameter, meaning that "no other statistic which can be calculated from the same sample provides any additional information as to the value of...
for .
Following is a table of canonical link functions and their inverses (sometimes referred to as the mean function, as done here) used for several distributions in the exponential family.
Canonical Link Functions Distribution Name Link Function Mean Function Normal Identity Exponential Exponential distributionIn probability theory and statistics, the exponential distribution is a family of continuous probability distributions. It describes the time between events in a Poisson process, i.e...Inverse Multiplicative inverseIn mathematics, a multiplicative inverse or reciprocal for a number x, denoted by 1/x or x−1, is a number which when multiplied by x yields the multiplicative identity, 1. The multiplicative inverse of a fraction a/b is b/a. For the multiplicative inverse of a real number, divide 1 by the...Gamma Inverse
GaussianInverse Gaussian distribution| cdf = \Phi\left +\exp\left \Phi\left...
Inverse
squaredPoisson Poisson distributionIn probability theory and statistics, the Poisson distribution is a discrete probability distribution that expresses the probability of a given number of events occurring in a fixed interval of time and/or space if these events occur with a known average rate and independently of the time since...
Log Natural logarithmThe natural logarithm is the logarithm to the base e, where e is an irrational and transcendental constant approximately equal to 2.718281828...
Binomial Logit LogitThe logit function is the inverse of the sigmoidal "logistic" function used in mathematics, especially in statistics.Log-odds and logit are synonyms.-Definition:The logit of a number p between 0 and 1 is given by the formula:...Multinomial
In the cases of the exponential and gamma distributions, the domain of the canonical link function is not the same as the permitted range of the mean. In particular, the linear predictor may be negative, which would give an impossible negative mean. When maximizing the likelihood, precautions must be taken to avoid this. An alternative is to use a noncanonical link function.
Maximum likelihood
The maximum likelihoodMaximum likelihoodIn statistics, maximum-likelihood estimation is a method of estimating the parameters of a statistical model. When applied to a data set and given a statistical model, maximum-likelihood estimation provides estimates for the model's parameters....
estimates can be found using an iteratively reweighted least squares algorithm using either a Newton–Raphson method with updates of the form:
where is the observed information matrixObserved informationIn statistics, the observed information, or observed Fisher information, is the negative of the second derivative of the "log-likelihood"...
(the negative of the Hessian matrixHessian matrixIn mathematics, the Hessian matrix is the square matrix of second-order partial derivatives of a function; that is, it describes the local curvature of a function of many variables. The Hessian matrix was developed in the 19th century by the German mathematician Ludwig Otto Hesse and later named...
) and is the score functionScore (statistics)In statistics, the score, score function, efficient score or informant plays an important role in several aspects of inference...
; or a Fisher's scoring method:
where is the Fisher information matrixFisher informationIn mathematical statistics and information theory, the Fisher information is the variance of the score. In Bayesian statistics, the asymptotic distribution of the posterior mode depends on the Fisher information and not on the prior...
. Note that if the canonical link function is used, then the two methods are the same.
Bayesian methods
In general, the posterior distribution cannot be found in closed formClosed-form expressionIn mathematics, an expression is said to be a closed-form expression if it can be expressed analytically in terms of a bounded number of certain "well-known" functions...
and so must be approximated, usually using Laplace approximations or some type of Markov chain Monte CarloMarkov chain Monte CarloMarkov chain Monte Carlo methods are a class of algorithms for sampling from probability distributions based on constructing a Markov chain that has the desired distribution as its equilibrium distribution. The state of the chain after a large number of steps is then used as a sample of the...
method such as Gibbs samplingGibbs samplingIn statistics and in statistical physics, Gibbs sampling or a Gibbs sampler is an algorithm to generate a sequence of samples from the joint probability distribution of two or more random variables...
.
General linear models
A possible point of confusion has to do with the distinction between generalized linear models and the general linear modelGeneral linear modelThe general linear model is a statistical linear model.It may be written aswhere Y is a matrix with series of multivariate measurements, X is a matrix that might be a design matrix, B is a matrix containing parameters that are usually to be estimated and U is a matrix containing errors or...
, two broad statistical models. The general linear model may be viewed as a case of the generalized linear model with identity link. As most exact results of interest are obtained only for the general linear model, the general linear model has undergone a somewhat longer historical development. Results for the generalized linear model with non-identity link are asymptotic (tending to work well with large samples).
Linear regression
A simple, very important example of a generalized linear model (also an example of a general linear model) is linear regressionLinear regressionIn statistics, linear regression is an approach to modeling the relationship between a scalar variable y and one or more explanatory variables denoted X. The case of one explanatory variable is called simple regression...
. In linear regression, the use of the least-squares estimator is justified by the Gauss-Markov theorem, which does not assume that the distribution is normal.
From the perspective of generalized linear models, however, it is useful to suppose that the distribution function is the normal distribution with constant variance and the link function is the identity, which is the canonical link if the variance is known.
For the normal distribution, the generalized linear model has a closed formClosed-form expressionIn mathematics, an expression is said to be a closed-form expression if it can be expressed analytically in terms of a bounded number of certain "well-known" functions...
expression for the maximum-likelihood estimates, which is convenient. Most other GLMs lack closed formClosed-form expressionIn mathematics, an expression is said to be a closed-form expression if it can be expressed analytically in terms of a bounded number of certain "well-known" functions...
estimates.
Binomial data
When the response data, Y, are binary (taking on only values 0 and 1), the distribution function is generally chosen to be the binomial distribution and the interpretation of μi is then the probability, p, of Yi taking on the value one.
There are several popular link functions for binomial functions; the most typical is the canonical logitLogitThe logit function is the inverse of the sigmoidal "logistic" function used in mathematics, especially in statistics.Log-odds and logit are synonyms.-Definition:The logit of a number p between 0 and 1 is given by the formula:...
link:
GLMs with this setup are logistic regressionLogistic regressionIn statistics, logistic regression is used for prediction of the probability of occurrence of an event by fitting data to a logit function logistic curve. It is a generalized linear model used for binomial regression...
models.
In addition, the inverse of any continuous cumulative distribution functionCumulative distribution functionIn probability theory and statistics, the cumulative distribution function , or just distribution function, describes the probability that a real-valued random variable X with a given probability distribution will be found at a value less than or equal to x. Intuitively, it is the "area so far"...
(CDF) can be used for the link since the CDF's range is , the range of the binomial mean. The normal CDF is a popular choice and yields the probit modelProbit modelIn statistics, a probit model is a type of regression where the dependent variable can only take two values, for example married or not married....
. Its link is
The complementary log-log function may also be used. This link function is asymmetric and will often produce different results from the probit and logit link functions.
The identity link is also sometimes used for binomial data to yield the linear probability model, but a drawback of this model is that the predicted probabilities can be greater than one or less than zero. In implementation it is possible to fix the nonsensical probabilities outside of , but interpreting the coefficients can be difficult. The model's primary merit is that near it is approximately a linear transformation of the probit and logit―econometricians sometimes call this the Harvard model.
The variance function for binomial data is given by:
where the dispersion parameter τ is typically fixed at exactly one. When it is not, the resulting quasi-likelihoodQuasi-likelihoodIn statistics, quasi-likelihood estimation is one way of allowing for overdispersion, that is, greater variability in the data than would be expected from the statistical model used. It is most often used with models for count data or grouped binary data, i.e...
model often described as binomial with overdispersionOverdispersionIn statistics, overdispersion is the presence of greater variability in a data set than would be expected based on a given simple statistical model....
or quasibinomial.
Count data
Another example of generalized linear models includes Poisson regressionPoisson regressionIn statistics, Poisson regression is a form of regression analysis used to model count data and contingency tables. Poisson regression assumes the response variable Y has a Poisson distribution, and assumes the logarithm of its expected value can be modeled by a linear combination of unknown...
which models count data using the Poisson distributionPoisson distributionIn probability theory and statistics, the Poisson distribution is a discrete probability distribution that expresses the probability of a given number of events occurring in a fixed interval of time and/or space if these events occur with a known average rate and independently of the time since...
. The link is typically the logarithm, the canonical link.
The variance function is proportional to the mean
where the dispersion parameter τ is typically fixed at exactly one. When it is not, the resulting quasi-likelihoodQuasi-likelihoodIn statistics, quasi-likelihood estimation is one way of allowing for overdispersion, that is, greater variability in the data than would be expected from the statistical model used. It is most often used with models for count data or grouped binary data, i.e...
model is often described as poisson with overdispersionOverdispersionIn statistics, overdispersion is the presence of greater variability in a data set than would be expected based on a given simple statistical model....
or quasipoisson.
Correlated or clustered data
The standard GLM assumes that the observations are uncorrelatedUncorrelatedIn probability theory and statistics, two real-valued random variables are said to be uncorrelated if their covariance is zero. Uncorrelatedness is by definition pairwise; i.e...
. Extensions have been developed to allow for correlationCorrelationIn statistics, dependence refers to any statistical relationship between two random variables or two sets of data. Correlation refers to any of a broad class of statistical relationships involving dependence....
between observations, as occurs for example in longitudinal studies and clustered designs:- Generalized estimating equations (GEEs) allow for the correlation between observations without the use of an explicit probability model for the origin of the correlations, so there is no explicit likelihoodLikelihoodLikelihood is a measure of how likely an event is, and can be expressed in terms of, for example, probability or odds in favor.-Likelihood function:...
. They are suitable when the random effects and their variances are not of inherent interest, as they allow for the correlation without explaining its origin. The focus is on estimating the average response over the population ("population-averaged" effects) rather than the regression parameters that would enable prediction of the effect of changing one or more components of X on a given individual. GEEs are usually used in conjunction with Huber-White standard errors. - Generalized linear mixed modelGeneralized linear mixed modelIn statistics, a generalized linear mixed model is a particular type of mixed model. It is an extension to the generalized linear model in which the linear predictor contains random effects in addition to the usual fixed effects...
s (GLMMs) are an extension to GLMs that includes random effects in the linear predictor, giving an explicit probability model that explains the origin of the correlations. The resulting "subject-specific" parameter estimates are suitable when the focus is on estimating the effect of changing one or more components of X on a given individual. GLMMs are a particular type of multilevel modelMultilevel modelMultilevel models are statistical models of parameters that vary at more than one level...
(mixed modelMixed modelA mixed model is a statistical model containing both fixed effects and random effects, that is mixed effects. These models are useful in a wide variety of disciplines in the physical, biological and social sciences....
). In general, fitting GLMMs is more computationally complex and intensive than fitting GEEs. - Hierarchical generalized linear models (HGLMs) are similar to GLMMs apart from two distinctions:
-
- The random effects can have any distribution in the exponential familyExponential familyIn probability and statistics, an exponential family is an important class of probability distributions sharing a certain form, specified below. This special form is chosen for mathematical convenience, on account of some useful algebraic properties, as well as for generality, as exponential...
, whereas current GLMMs nearly always have normal random effects; - They are not as computationally intensive, as instead of integrating out the random effects they are based on a modified form of likelihood known as the hierarchical likelihood or h-likelihood.
- The random effects can have any distribution in the exponential family
- The theoretical basis and accuracy of the methods used in HGLMs have been the subject of some debate in the statistical literature.
Generalized additive models
Generalized additive modelGeneralized additive modelIn statistics, the generalized additive model is a statistical model developed by Trevor Hastie and Rob Tibshirani for blending properties of generalized linear models with additive models....
s (GAMs) are another extension to GLMs in which the linear predictor η is not restricted to be linear in the covariates X but is the sum of smoothing functionsSmoothingIn statistics and image processing, to smooth a data set is to create an approximating function that attempts to capture important patterns in the data, while leaving out noise or other fine-scale structures/rapid phenomena. Many different algorithms are used in smoothing...
applied to the xis:
The smoothing functions fi are estimated from the data. In general this requires a large number of data points and is computationally intensive.
Multinomial regression
The binomial case may be easily extended to allow for a multinomial distribution as the response (also, a Generalized Linear Model for counts, with a constrained total). There are two ways in which this is usually done:
Ordered response
If the response variable is an ordinal measurement, then one may fit a model function of the form:
where .
for m > 2. Different links g lead to proportional odds modelOrdered logitIn statistics, the ordered logit model , is a regression model for ordinal dependent variables...
s or ordered probitOrdered probitIn statistics, ordered probit is a generalization of the popular probit analysis to the case of more than two outcomes of an ordinal dependent variable. Similarly, the popular logit method also has a counterpart ordered logit....
models.
Unordered response
If the response variable is a nominal measurement, or the data do not satisfy the assumptions of an ordered model, one may fit a model of the following form:
where .
for m > 2. Different links g lead to multinomial logitMultinomial logitIn statistics, economics, and genetics, a multinomial logit model, also known as multinomial logistic regression, is a regression model which generalizes logistic regression by allowing more than two discrete outcomes...
or multinomial probitMultinomial probitIn econometrics and statistics, the multinomial probit model, a popular alternative to the multinomial logit model, is a generalization of the probit model that allows more than two discrete, unordered outcomes. It is not to be confused with the multivariate probit model, which is used to model...
models. These are less efficient than the ordered response models, as more parameters are estimated.
Confusion with general linear models
The term "generalized linear model", and especially its abbreviation GLM, can be confused with general linear modelGeneral linear modelThe general linear model is a statistical linear model.It may be written aswhere Y is a matrix with series of multivariate measurements, X is a matrix that might be a design matrix, B is a matrix containing parameters that are usually to be estimated and U is a matrix containing errors or...
. John NelderJohn NelderJohn Ashworth Nelder FRS was a British statistician known for his contributions to experimental design, analysis of variance, computational statistics, and statistical theory.-Contributions:...
has expressed regret about this in a conversation with Stephen Senn:
Senn: I must confess to having some confusion
when I was a young statistician between general linear
models and generalized linear models. Do you regret
the terminology?
Nelder: I think probably I do. I suspect we should
have found some more fancy name for it that would
have stuck and not been confused with the general
linear model, although general and generalized are not
quite the same. I can see why it might have been better
to have thought of something else.
See also
- Comparison of general and generalized linear models
- Generalized linear array modelGeneralized linear array modelIn statistics, the generalized linear array model is used for analyzing data sets with array structures. It based on the generalized linear model with the design matrix written as a Kronecker product.- Overview :...
- Tweedie distributionsTweedie distributionsIn probability and statistics, the Tweedie distributions are a family of probability distributions which include continuous distributions such as the normal and gamma, the purely discrete scaled Poisson distribution, and the class of mixed compound Poisson-Gamma distributions which have positive...
- GLIM (software)GLIM (software)GLIM is a statistical software program for fitting generalized linear models .It was developed by the Royal Statistical Society'sWorking Party on Statistical Computing...
External links