Generalized linear model - AbsoluteAstronomy.com

Statistics

Statistics is the study of the collection, organization, analysis, and interpretation of data. It deals with all aspects of this, including the planning of data collection in terms of the design of surveys and experiments....

, the generalized linear model (GLM) is a flexible generalization of ordinary linear regression

Linear regression

In statistics, linear regression is an approach to modeling the relationship between a scalar variable y and one or more explanatory variables denoted X. The case of one explanatory variable is called simple regression...

. The GLM generalizes linear regression by allowing the linear model to be related to the response variable via a link function and by allowing the magnitude of the variance of each measurement to be a function of its predicted value.

Generalized linear models were formulated by John Nelder

John Nelder

John Ashworth Nelder FRS was a British statistician known for his contributions to experimental design, analysis of variance, computational statistics, and statistical theory.-Contributions:...

and Robert Wedderburn

Robert Wedderburn (statistician)

Robert William Maclagan Wedderburn was a Scottish statistician who worked at the Rothamsted Experimental Station. He was co-developer, with John Nelder, of the generalized linear model methodology,...

as a way of unifying various other statistical models, including linear regression

Linear regression

, logistic regression

Logistic regression

In statistics, logistic regression is used for prediction of the probability of occurrence of an event by fitting data to a logit function logistic curve. It is a generalized linear model used for binomial regression...

and Poisson regression

Poisson regression

In statistics, Poisson regression is a form of regression analysis used to model count data and contingency tables. Poisson regression assumes the response variable Y has a Poisson distribution, and assumes the logarithm of its expected value can be modeled by a linear combination of unknown...

. They proposed an iteratively reweighted least squares method

Iterative method

In computational mathematics, an iterative method is a mathematical procedure that generates a sequence of improving approximate solutions for a class of problems. A specific implementation of an iterative method, including the termination criteria, is an algorithm of the iterative method...

for maximum likelihood

Maximum likelihood

In statistics, maximum-likelihood estimation is a method of estimating the parameters of a statistical model. When applied to a data set and given a statistical model, maximum-likelihood estimation provides estimates for the model's parameters....

estimation of the model parameters. Maximum-likelihood estimation remains popular and is the default method on many statistical computing packages. Other approaches, including Bayesian approaches

Bayesian statistics

Bayesian statistics is that subset of the entire field of statistics in which the evidence about the true state of the world is expressed in terms of degrees of belief or, more specifically, Bayesian probabilities...

and least squares

Least squares

The method of least squares is a standard approach to the approximate solution of overdetermined systems, i.e., sets of equations in which there are more equations than unknowns. "Least squares" means that the overall solution minimizes the sum of the squares of the errors made in solving every...

fits to variance stabilized

Variance-stabilizing transformation

In applied statistics, a variance-stabilizing transformation is a data transformation that is specifically chosen either to simplify considerations in graphical exploratory data analysis or to allow the application of simple regression-based or analysis of variance techniques.The aim behind the...

responses, have been developed.

Overview

In a GLM, each outcome of the dependent variables, Y, is assumed to be generated from a particular distribution

Probability distribution

In probability theory, a probability mass, probability density, or probability distribution is a function that describes the probability of a random variable taking certain values....

in the exponential family

Exponential family

In probability and statistics, an exponential family is an important class of probability distributions sharing a certain form, specified below. This special form is chosen for mathematical convenience, on account of some useful algebraic properties, as well as for generality, as exponential...

, a large range of probability distributions that includes the normal, binomial and poisson

Poisson distribution

In probability theory and statistics, the Poisson distribution is a discrete probability distribution that expresses the probability of a given number of events occurring in a fixed interval of time and/or space if these events occur with a known average rate and independently of the time since...

distributions, among others. The mean, μ, of the distribution depends on the independent variables, X, through:

where E(Y) is the expected value

Expected value

In probability theory, the expected value of a random variable is the weighted average of all possible values that this random variable can take on...

of Y; Xβ is the linear predictor, a linear combination of unknown parameters, β; g is the link function.

In this framework, the variance is typically a function, V, of the mean:

It is convenient if V follows from the exponential family distribution, but it may simply be that the variance is a function of the predicted value.

The unknown parameters, β, are typically estimated with maximum likelihood
Maximum likelihood
In statistics, maximum-likelihood estimation is a method of estimating the parameters of a statistical model. When applied to a data set and given a statistical model, maximum-likelihood estimation provides estimates for the model's parameters....

, maximum quasi-likelihood
Quasi-likelihood
In statistics, quasi-likelihood estimation is one way of allowing for overdispersion, that is, greater variability in the data than would be expected from the statistical model used. It is most often used with models for count data or grouped binary data, i.e...

, or Bayesian
Bayesian probability
Bayesian probability is one of the different interpretations of the concept of probability and belongs to the category of evidential probabilities. The Bayesian interpretation of probability can be seen as an extension of logic that enables reasoning with propositions, whose truth or falsity is...

techniques.

Model components
The GLM consists of three elements:

1. A probability distribution from the exponential family.

2. A linear predictor η = Xβ .

3. A link function g such that E(Y) = μ = g^-1(η).

Probability distribution

The overdispersed exponential family of distributions is a generalization of the exponential family

Exponential family

and exponential dispersion model

Exponential dispersion model

Exponential dispersion models are statistical models in which the probability distribution is of a special form. This class of models represents a generalisation of the exponential family of models which themselves play an important role in statistical theory because they have a special structure...

of distributions and includes those probability distributions, parameterized by

and

, whose density functions f (or probability mass function

Probability mass function

In probability theory and statistics, a probability mass function is a function that gives the probability that a discrete random variable is exactly equal to some value...

, for the case of a discrete distribution) can be expressed in the form

, called the dispersion parameter, typically is known and is usually related to the variance of the distribution. The functions

, and

are known. Many, although not all, common distributions are in this family.

For scalar

and

, this reduces to

is related to the mean of the distribution. If

is the identity function, then the distribution is said to be in canonical form

Canonical form

Generally, in mathematics, a canonical form of an object is a standard way of presenting that object....

(or natural form). Note that any distribution can be converted to canonical form by rewriting

and then applying the transformation

. It is always possible to convert

in terms of the new parametrization, even if

is not a one-to-one function; see comments in the page on the exponential family

Exponential family

. If, in addition,

is the identity and

is known, then

is called the canonical parameter (or natural parameter) and is related to the mean through

For scalar

and

, this reduces to

Under this scenario, the variance of the distribution can be shown to be

For scalar

and

, this reduces to

Linear predictor

The linear predictor is the quantity which incorporates the information about the independent variables into the model. The symbol η (Greek

Greek alphabet

The Greek alphabet is the script that has been used to write the Greek language since at least 730 BC . The alphabet in its classical and modern form consists of 24 letters ordered in sequence from alpha to omega...

"eta

Eta (letter)

Eta ) is the seventh letter of the Greek alphabet. Originally denoting a consonant /h/, its sound value in the classical Attic dialect of Ancient Greek was a long vowel , raised to in medieval Greek, a process known as itacism.In the system of Greek numerals it has a value of 8...

") is typically used to denote a linear predictor. It is related to the expected value

Expected value

In probability theory, the expected value of a random variable is the weighted average of all possible values that this random variable can take on...

of the data (thus, "predictor") through the link function.

η is expressed as linear combinations (thus, "linear") of unknown parameters β. The coefficients of the linear combination are represented as the matrix of independent variables X. η can thus be expressed as

The elements of X are either measured by the experimenters or stipulated by them in the modeling design process.

Link function

The link function provides the relationship between the linear predictor and the mean

Expected value

In probability theory, the expected value of a random variable is the weighted average of all possible values that this random variable can take on...

of the distribution function. There are many commonly used link functions, and their choice can be somewhat arbitrary. It can be convenient to match the domain of the link function to the range of the distribution function's mean.

When using a distribution function with a canonical parameter

, the canonical link function is the function that expresses

in terms of

, i.e.

. For the most common distributions, the mean

is one of the parameters in the standard form of the distribution's density function, and then

is the function as defined above that maps the density function into its canonical form. When using the canonical link function,

, which allows

to be a sufficient statistic

Sufficiency (statistics)

In statistics, a sufficient statistic is a statistic which has the property of sufficiency with respect to a statistical model and its associated unknown parameter, meaning that "no other statistic which can be calculated from the same sample provides any additional information as to the value of...

for

.

Following is a table of canonical link functions and their inverses (sometimes referred to as the mean function, as done here) used for several distributions in the exponential family.

Canonical Link Functions
Distribution	Name	Link Function	Mean Function
Normal	Identity
Exponential Exponential distribution In probability theory and statistics, the exponential distribution is a family of continuous probability distributions. It describes the time between events in a Poisson process, i.e...	Inverse Multiplicative inverse In mathematics, a multiplicative inverse or reciprocal for a number x, denoted by 1/x or x−1, is a number which when multiplied by x yields the multiplicative identity, 1. The multiplicative inverse of a fraction a/b is b/a. For the multiplicative inverse of a real number, divide 1 by the...
Gamma
Inverse Gaussian Inverse Gaussian distribution \| cdf = \Phi\left +\exp\left \Phi\left...	Inverse squared
Poisson Poisson distribution In probability theory and statistics, the Poisson distribution is a discrete probability distribution that expresses the probability of a given number of events occurring in a fixed interval of time and/or space if these events occur with a known average rate and independently of the time since...	Log Natural logarithm The natural logarithm is the logarithm to the base e, where e is an irrational and transcendental constant approximately equal to 2.718281828...
Binomial	Logit Logit The logit function is the inverse of the sigmoidal "logistic" function used in mathematics, especially in statistics.Log-odds and logit are synonyms.-Definition:The logit of a number p between 0 and 1 is given by the formula:...
Multinomial

In the cases of the exponential and gamma distributions, the domain of the canonical link function is not the same as the permitted range of the mean. In particular, the linear predictor may be negative, which would give an impossible negative mean. When maximizing the likelihood, precautions must be taken to avoid this. An alternative is to use a noncanonical link function.

Maximum likelihood

The maximum likelihood

Maximum likelihood

estimates can be found using an iteratively reweighted least squares algorithm using either a Newton–Raphson method with updates of the form:

where

is the observed information matrix

Observed information

In statistics, the observed information, or observed Fisher information, is the negative of the second derivative of the "log-likelihood"...

(the negative of the Hessian matrix

Hessian matrix

In mathematics, the Hessian matrix is the square matrix of second-order partial derivatives of a function; that is, it describes the local curvature of a function of many variables. The Hessian matrix was developed in the 19th century by the German mathematician Ludwig Otto Hesse and later named...

) and

is the score function

Score (statistics)

In statistics, the score, score function, efficient score or informant plays an important role in several aspects of inference...

; or a Fisher's scoring method:

where

is the Fisher information matrix

Fisher information

In mathematical statistics and information theory, the Fisher information is the variance of the score. In Bayesian statistics, the asymptotic distribution of the posterior mode depends on the Fisher information and not on the prior...

. Note that if the canonical link function is used, then the two methods are the same.

Bayesian methods

In general, the posterior distribution cannot be found in closed form

Closed-form expression

In mathematics, an expression is said to be a closed-form expression if it can be expressed analytically in terms of a bounded number of certain "well-known" functions...

and so must be approximated, usually using Laplace approximations or some type of Markov chain Monte Carlo

Markov chain Monte Carlo

Markov chain Monte Carlo methods are a class of algorithms for sampling from probability distributions based on constructing a Markov chain that has the desired distribution as its equilibrium distribution. The state of the chain after a large number of steps is then used as a sample of the...

method such as Gibbs sampling

Gibbs sampling

In statistics and in statistical physics, Gibbs sampling or a Gibbs sampler is an algorithm to generate a sequence of samples from the joint probability distribution of two or more random variables...

General linear models

A possible point of confusion has to do with the distinction between generalized linear models and the general linear model

General linear model

The general linear model is a statistical linear model.It may be written aswhere Y is a matrix with series of multivariate measurements, X is a matrix that might be a design matrix, B is a matrix containing parameters that are usually to be estimated and U is a matrix containing errors or...

, two broad statistical models. The general linear model may be viewed as a case of the generalized linear model with identity link. As most exact results of interest are obtained only for the general linear model, the general linear model has undergone a somewhat longer historical development. Results for the generalized linear model with non-identity link are asymptotic (tending to work well with large samples).

Linear regression

A simple, very important example of a generalized linear model (also an example of a general linear model) is linear regression

Linear regression

. In linear regression, the use of the least-squares estimator is justified by the Gauss-Markov theorem, which does not assume that the distribution is normal.

From the perspective of generalized linear models, however, it is useful to suppose that the distribution function is the normal distribution with constant variance and the link function is the identity, which is the canonical link if the variance is known.

For the normal distribution, the generalized linear model has a closed form

Closed-form expression

In mathematics, an expression is said to be a closed-form expression if it can be expressed analytically in terms of a bounded number of certain "well-known" functions...

expression for the maximum-likelihood estimates, which is convenient. Most other GLMs lack closed form

Closed-form expression

In mathematics, an expression is said to be a closed-form expression if it can be expressed analytically in terms of a bounded number of certain "well-known" functions...

estimates.

Binomial data

When the response data, Y, are binary (taking on only values 0 and 1), the distribution function is generally chosen to be the binomial distribution and the interpretation of μ_i is then the probability, p, of Y_i taking on the value one.

There are several popular link functions for binomial functions; the most typical is the canonical logit

Logit

The logit function is the inverse of the sigmoidal "logistic" function used in mathematics, especially in statistics.Log-odds and logit are synonyms.-Definition:The logit of a number p between 0 and 1 is given by the formula:...

link:

GLMs with this setup are logistic regression

Logistic regression

models.

In addition, the inverse of any continuous cumulative distribution function

Cumulative distribution function

In probability theory and statistics, the cumulative distribution function , or just distribution function, describes the probability that a real-valued random variable X with a given probability distribution will be found at a value less than or equal to x. Intuitively, it is the "area so far"...

(CDF) can be used for the link since the CDF's range is

, the range of the binomial mean. The normal CDF

is a popular choice and yields the probit model

Probit model

In statistics, a probit model is a type of regression where the dependent variable can only take two values, for example married or not married....

. Its link is

The complementary log-log function may also be used. This link function is asymmetric and will often produce different results from the probit and logit link functions.

The identity link is also sometimes used for binomial data to yield the linear probability model, but a drawback of this model is that the predicted probabilities can be greater than one or less than zero. In implementation it is possible to fix the nonsensical probabilities outside of

, but interpreting the coefficients can be difficult. The model's primary merit is that near

it is approximately a linear transformation of the probit and logit―econometricians sometimes call this the Harvard model.

The variance function for binomial data is given by:

where the dispersion parameter τ is typically fixed at exactly one. When it is not, the resulting quasi-likelihood

Quasi-likelihood

In statistics, quasi-likelihood estimation is one way of allowing for overdispersion, that is, greater variability in the data than would be expected from the statistical model used. It is most often used with models for count data or grouped binary data, i.e...

model often described as binomial with overdispersion

Overdispersion

In statistics, overdispersion is the presence of greater variability in a data set than would be expected based on a given simple statistical model....

or quasibinomial.

Count data

Another example of generalized linear models includes Poisson regression

Poisson regression

which models count data using the Poisson distribution

Poisson distribution

. The link is typically the logarithm, the canonical link.

The variance function is proportional to the mean

where the dispersion parameter τ is typically fixed at exactly one. When it is not, the resulting quasi-likelihood

Quasi-likelihood

model is often described as poisson with overdispersion

Overdispersion

In statistics, overdispersion is the presence of greater variability in a data set than would be expected based on a given simple statistical model....

or quasipoisson.

Correlated or clustered data

The standard GLM assumes that the observations are uncorrelated

Uncorrelated

In probability theory and statistics, two real-valued random variables are said to be uncorrelated if their covariance is zero. Uncorrelatedness is by definition pairwise; i.e...

. Extensions have been developed to allow for correlation

Correlation

In statistics, dependence refers to any statistical relationship between two random variables or two sets of data. Correlation refers to any of a broad class of statistical relationships involving dependence....

between observations, as occurs for example in longitudinal studies and clustered designs:

Generalized estimating equations (GEEs) allow for the correlation between observations without the use of an explicit probability model for the origin of the correlations, so there is no explicit likelihood
Likelihood
Likelihood is a measure of how likely an event is, and can be expressed in terms of, for example, probability or odds in favor.-Likelihood function:...

. They are suitable when the random effects and their variances are not of inherent interest, as they allow for the correlation without explaining its origin. The focus is on estimating the average response over the population ("population-averaged" effects) rather than the regression parameters that would enable prediction of the effect of changing one or more components of X on a given individual. GEEs are usually used in conjunction with Huber-White standard errors.
Generalized linear mixed model
Generalized linear mixed model
In statistics, a generalized linear mixed model is a particular type of mixed model. It is an extension to the generalized linear model in which the linear predictor contains random effects in addition to the usual fixed effects...

s (GLMMs) are an extension to GLMs that includes random effects in the linear predictor, giving an explicit probability model that explains the origin of the correlations. The resulting "subject-specific" parameter estimates are suitable when the focus is on estimating the effect of changing one or more components of X on a given individual. GLMMs are a particular type of multilevel model
Multilevel model
Multilevel models are statistical models of parameters that vary at more than one level...

(mixed model
Mixed model
A mixed model is a statistical model containing both fixed effects and random effects, that is mixed effects. These models are useful in a wide variety of disciplines in the physical, biological and social sciences....

). In general, fitting GLMMs is more computationally complex and intensive than fitting GEEs.
Hierarchical generalized linear models (HGLMs) are similar to GLMMs apart from two distinctions:

The random effects can have any distribution in the exponential family
Exponential family
In probability and statistics, an exponential family is an important class of probability distributions sharing a certain form, specified below. This special form is chosen for mathematical convenience, on account of some useful algebraic properties, as well as for generality, as exponential...

, whereas current GLMMs nearly always have normal random effects;
They are not as computationally intensive, as instead of integrating out the random effects they are based on a modified form of likelihood known as the hierarchical likelihood or h-likelihood.

The theoretical basis and accuracy of the methods used in HGLMs have been the subject of some debate in the statistical literature.

Generalized additive models

Generalized additive model

In statistics, the generalized additive model is a statistical model developed by Trevor Hastie and Rob Tibshirani for blending properties of generalized linear models with additive models....

s (GAMs) are another extension to GLMs in which the linear predictor η is not restricted to be linear in the covariates X but is the sum of smoothing functions

Smoothing

In statistics and image processing, to smooth a data set is to create an approximating function that attempts to capture important patterns in the data, while leaving out noise or other fine-scale structures/rapid phenomena. Many different algorithms are used in smoothing...

applied to the x_is:

The smoothing functions f_i are estimated from the data. In general this requires a large number of data points and is computationally intensive.

Multinomial regression

The binomial case may be easily extended to allow for a multinomial distribution as the response (also, a Generalized Linear Model for counts, with a constrained total). There are two ways in which this is usually done:

Ordered response

If the response variable is an ordinal measurement, then one may fit a model function of the form:

where

.

for m > 2. Different links g lead to proportional odds model

Ordered logit

In statistics, the ordered logit model , is a regression model for ordinal dependent variables...

s or ordered probit

Ordered probit

In statistics, ordered probit is a generalization of the popular probit analysis to the case of more than two outcomes of an ordinal dependent variable. Similarly, the popular logit method also has a counterpart ordered logit....

models.

Unordered response

If the response variable is a nominal measurement, or the data do not satisfy the assumptions of an ordered model, one may fit a model of the following form:

where

.

for m > 2. Different links g lead to multinomial logit

Multinomial logit

In statistics, economics, and genetics, a multinomial logit model, also known as multinomial logistic regression, is a regression model which generalizes logistic regression by allowing more than two discrete outcomes...

or multinomial probit

Multinomial probit

In econometrics and statistics, the multinomial probit model, a popular alternative to the multinomial logit model, is a generalization of the probit model that allows more than two discrete, unordered outcomes. It is not to be confused with the multivariate probit model, which is used to model...

models. These are less efficient than the ordered response models, as more parameters are estimated.

Confusion with general linear models

The term "generalized linear model", and especially its abbreviation GLM, can be confused with general linear model

General linear model

. John Nelder

John Nelder

John Ashworth Nelder FRS was a British statistician known for his contributions to experimental design, analysis of variance, computational statistics, and statistical theory.-Contributions:...

has expressed regret about this in a conversation with Stephen Senn:

Senn: I must confess to having some confusion
when I was a young statistician between general linear
models and generalized linear models. Do you regret
the terminology?

Nelder: I think probably I do. I suspect we should
have found some more fancy name for it that would
have stuck and not been confused with the general
linear model, although general and generalized are not
quite the same. I can see why it might have been better
to have thought of something else.

External links

The source of this article is wikipedia, the free encyclopedia. The text of this article is licensed under the GFDL.

Overview

Model components

Probability distribution

Linear predictor

Link function

Maximum likelihood

Bayesian methods

General linear models

Linear regression

Binomial data

Count data

Correlated or clustered data

Generalized additive models

Multinomial regression

Ordered response

Unordered response

Confusion with general linear models

See also

External links