Separation (statistics)
Encyclopedia
In statistics
separation is a phenomenon associated with models for dichotomous or categorical outcomes, including logistic
and probit regression. Separation occurs if the predictor (or a linear combination
of some subset of the predictors) is associated with only one outcome value when the predictor is greater than some constant. For example, if the predictor X is continuous, and the outcome y = 1 for all observed x > 2. If the outcome values are perfectly determined by the predictor (e.g., y = 0 when x ≤ 2) then the condition "complete separation" is said to obtain. If instead there is some overlap (e.g., y = 0 when x < 2, but y has observed values of 0 and 1 when x = 2) then "quasi-complete separation" obtains. A 2 × 2 table with an empty cell is an example of quasi-complete separation.
This observed form of the data is important because it causes problems with estimated regression coefficients. Loosely, a parameter in the model "wants" to be infinite, if complete is observed. If quasi-complete separation is the case, the likelihood is maximized at a very large but not infinite value for that parameter. Computer programs will often output an arbitrarily large parameter estimate with a very large standard error. Methods to fit these models include exact logistic regression and "Firth" logistic regression, a bias-reduction method based on a penalized likelihood.
Statistics
Statistics is the study of the collection, organization, analysis, and interpretation of data. It deals with all aspects of this, including the planning of data collection in terms of the design of surveys and experiments....
separation is a phenomenon associated with models for dichotomous or categorical outcomes, including logistic
Logistic regression
In statistics, logistic regression is used for prediction of the probability of occurrence of an event by fitting data to a logit function logistic curve. It is a generalized linear model used for binomial regression...
and probit regression. Separation occurs if the predictor (or a linear combination
Linear combination
In mathematics, a linear combination is an expression constructed from a set of terms by multiplying each term by a constant and adding the results...
of some subset of the predictors) is associated with only one outcome value when the predictor is greater than some constant. For example, if the predictor X is continuous, and the outcome y = 1 for all observed x > 2. If the outcome values are perfectly determined by the predictor (e.g., y = 0 when x ≤ 2) then the condition "complete separation" is said to obtain. If instead there is some overlap (e.g., y = 0 when x < 2, but y has observed values of 0 and 1 when x = 2) then "quasi-complete separation" obtains. A 2 × 2 table with an empty cell is an example of quasi-complete separation.
This observed form of the data is important because it causes problems with estimated regression coefficients. Loosely, a parameter in the model "wants" to be infinite, if complete is observed. If quasi-complete separation is the case, the likelihood is maximized at a very large but not infinite value for that parameter. Computer programs will often output an arbitrarily large parameter estimate with a very large standard error. Methods to fit these models include exact logistic regression and "Firth" logistic regression, a bias-reduction method based on a penalized likelihood.