DFFITS
Encyclopedia
DFFITS is a diagnostic meant to show how influential a point is in a statistical regression. It was proposed in 1980. It is defined as the change ("DFFIT"), in the predicted value for a point, obtained when that point is left out of the regression, "Studentized" by dividing by the estimated standard deviation of the fit at that point:
where and are the prediction for point i with and without point i included in the regression,
is the standard error estimated without the point in question, and is the leverage
for the point.
DFFITS is very similar to the externally Studentized residual
, and is in fact equal to the latter times .
Since when the errors are Gaussian
the externally Studentized residual is distributed as Student's t (with a number of degrees of freedom
equal to the number of residual degrees of freedom minus one), DFFITS for a particular point will be distributed according to this same Student's t distribution multiplied by the leverage factor
for that particular point. Thus, for low leverage points, DFFITS is expected to be small, whereas as the leverage goes to 1 the distribution of the DFFITS value widens infinitely.
For a perfectly balanced experimental design (such as a factorial design or balanced partial factorial design), the leverage for each point is p/n, the number of parameters divided by the number of points. This means that the DFFITS values will be distributed (in the Gaussian case) as times a t variate. Therefore, the authors suggest investigating those points with DFFITS greater than .
A similar measure of influence is Cook's distance
.
where and are the prediction for point i with and without point i included in the regression,
is the standard error estimated without the point in question, and is the leverage
Leverage (statistics)
In statistics, leverage is a term used in connection with regression analysis and, in particular, in analyses aimed at identifying those observations that are far away from corresponding average predictor values...
for the point.
DFFITS is very similar to the externally Studentized residual
Studentized residual
In statistics, a studentized residual is the quotient resulting from the division of a residual by an estimate of its standard deviation. Typically the standard deviations of residuals in a sample vary greatly from one data point to another even when the errors all have the same standard...
, and is in fact equal to the latter times .
Since when the errors are Gaussian
GAUSSIAN
Gaussian is a computational chemistry software program initially released in 1970 by John Pople and his research group at Carnegie-Mellon University as Gaussian 70. It has been continuously updated since then...
the externally Studentized residual is distributed as Student's t (with a number of degrees of freedom
Degrees of freedom (statistics)
In statistics, the number of degrees of freedom is the number of values in the final calculation of a statistic that are free to vary.Estimates of statistical parameters can be based upon different amounts of information or data. The number of independent pieces of information that go into the...
equal to the number of residual degrees of freedom minus one), DFFITS for a particular point will be distributed according to this same Student's t distribution multiplied by the leverage factor
Leverage (statistics)
In statistics, leverage is a term used in connection with regression analysis and, in particular, in analyses aimed at identifying those observations that are far away from corresponding average predictor values...
for that particular point. Thus, for low leverage points, DFFITS is expected to be small, whereas as the leverage goes to 1 the distribution of the DFFITS value widens infinitely.
For a perfectly balanced experimental design (such as a factorial design or balanced partial factorial design), the leverage for each point is p/n, the number of parameters divided by the number of points. This means that the DFFITS values will be distributed (in the Gaussian case) as times a t variate. Therefore, the authors suggest investigating those points with DFFITS greater than .
A similar measure of influence is Cook's distance
Cook's distance
In statistics, Cook's distance is a commonly used estimate of the influence of a data point when doing least squares regression analysis. In a practical ordinary least squares analysis, Cook's distance can be used in several ways: to indicate data points that are particularly worth checking for...
.