Cochran's C test
Encyclopedia
In statistics
, Cochran's C test , named after William G. Cochran
, is a one-sided upper limit variance outlier
test. The C test is used to decide if a single estimate of a variance
(or a standard deviation
) is significantly
larger than a group of variances (or standard deviations) with which the single estimate is supposed to be comparable. The C test is discussed in many text books and has been recommended by IUPAC and ISO . Cochran's C test should not be confused with Cochran's Q test, which applies to the analysis
of two-way randomized block design
s.
The C test assumes a balanced design, i.e. the considered full data set
should consist of individual data series that all have equal size. The C test further assumes that each individual data series is normally distributed. Although primarily an outlier test, the C test is also in use as a simple alternative for regular homoscedasticity
tests such as Bartlett's
test, Levene's
test and the Brown–Forsythe test to check a statistical data set for homogeneity of variances. An even simpler way to check homoscedasticity is provided by Hartley's Fmax test
, but Hartley's Fmax test has the disadvantage that it only accounts for the minimum and the maximum of the variance range, while the C test accounts for all variances within the range.
The C test evaluates the ratio
:
where:
The C test tests the null hypothesis
(H0) against the alternative hypothesis (Ha):
CUL. CUL depends on the desired significance level α, the number of considered data series N, and the number of data points (n) per data series. Selections of CUL values have been tabulated at significance levels α = 0.01 , α = 0.025 and α = 0.05 . CUL can also be calculated from :
Where:
Statistics
Statistics is the study of the collection, organization, analysis, and interpretation of data. It deals with all aspects of this, including the planning of data collection in terms of the design of surveys and experiments....
, Cochran's C test , named after William G. Cochran
William Gemmell Cochran
William Gemmell Cochran was a prominent statistician; he was born in Scotland but spent most of his life in the United States....
, is a one-sided upper limit variance outlier
Outlier
In statistics, an outlier is an observation that is numerically distant from the rest of the data. Grubbs defined an outlier as: An outlying observation, or outlier, is one that appears to deviate markedly from other members of the sample in which it occurs....
test. The C test is used to decide if a single estimate of a variance
Variance
In probability theory and statistics, the variance is a measure of how far a set of numbers is spread out. It is one of several descriptors of a probability distribution, describing how far the numbers lie from the mean . In particular, the variance is one of the moments of a distribution...
(or a standard deviation
Standard deviation
Standard deviation is a widely used measure of variability or diversity used in statistics and probability theory. It shows how much variation or "dispersion" there is from the average...
) is significantly
Statistical significance
In statistics, a result is called statistically significant if it is unlikely to have occurred by chance. The phrase test of significance was coined by Ronald Fisher....
larger than a group of variances (or standard deviations) with which the single estimate is supposed to be comparable. The C test is discussed in many text books and has been recommended by IUPAC and ISO . Cochran's C test should not be confused with Cochran's Q test, which applies to the analysis
Analysis
Analysis is the process of breaking a complex topic or substance into smaller parts to gain a better understanding of it. The technique has been applied in the study of mathematics and logic since before Aristotle , though analysis as a formal concept is a relatively recent development.The word is...
of two-way randomized block design
Randomized block design
In the statistical theory of the design of experiments, blocking is the arranging of experimental units in groups that are similar to one another. Typically, a blocking factor is a source of variability that is not of primary interest to the experimenter...
s.
The C test assumes a balanced design, i.e. the considered full data set
Data set
A data set is a collection of data, usually presented in tabular form. Each column represents a particular variable. Each row corresponds to a given member of the data set in question. Its values for each of the variables, such as height and weight of an object or values of random numbers. Each...
should consist of individual data series that all have equal size. The C test further assumes that each individual data series is normally distributed. Although primarily an outlier test, the C test is also in use as a simple alternative for regular homoscedasticity
Homoscedasticity
In statistics, a sequence or a vector of random variables is homoscedastic if all random variables in the sequence or vector have the same finite variance. This is also known as homogeneity of variance. The complementary notion is called heteroscedasticity...
tests such as Bartlett's
Bartlett's test
In statistics, Bartlett's test is used to test if k samples are from populations with equal variances. Equal variances across samples is called homoscedasticity or homogeneity of variances. Some statistical tests, for example the analysis of variance, assume that variances are equal across groups...
test, Levene's
Levene's test
In statistics, Levene's test is an inferential statistic used to assess the equality of variances in different samples. Some common statistical procedures assume that variances of the populations from which different samples are drawn are equal. Levene's test assesses this assumption. It tests the...
test and the Brown–Forsythe test to check a statistical data set for homogeneity of variances. An even simpler way to check homoscedasticity is provided by Hartley's Fmax test
Hartley's test
In statistics, Hartley's test, also known as the Fmax test or Hartley's Fmax, is used in the analysis of variance to verify that different groups have a similar variance, an assumption needed for other statistical tests.It was developed by H. O...
, but Hartley's Fmax test has the disadvantage that it only accounts for the minimum and the maximum of the variance range, while the C test accounts for all variances within the range.
Description
The C test detects one exceptionally large variance value at a time. The corresponding data series is then omitted from the full data set. According to ISO standard 5725 the C test may be iterated until no further exceptionally large variance values are detected, but such practice may lead to excessive rejections if the underlying data series are not normally distributed.The C test evaluates the ratio
Ratio
In mathematics, a ratio is a relationship between two numbers of the same kind , usually expressed as "a to b" or a:b, sometimes expressed arithmetically as a dimensionless quotient of the two which explicitly indicates how many times the first number contains the second In mathematics, a ratio is...
:
where:
- Cj = Cochran's C statistic for data series j
- Sj = standard deviation of data series j
- N = number of data series that remain in the data set; N is decreased in steps of 1 upon each iteration of the C test
- Si = standard deviation of data series i (1 ≤ i ≤ N)
The C test tests the null hypothesis
Null hypothesis
The practice of science involves formulating and testing hypotheses, assertions that are capable of being proven false using a test of observed data. The null hypothesis typically corresponds to a general or default position...
(H0) against the alternative hypothesis (Ha):
- H0: All variances are equal.
- Ha: At least one variance value is significantly larger than the other variance values.
Critical values
The variance value of data series j is considered an outlier at significance level α if Cj exceeds the upper limit critical valueCritical value
-Differential topology:In differential topology, a critical value of a differentiable function between differentiable manifolds is the image ƒ in N of a critical point x in M.The basic result on critical values is Sard's lemma...
CUL. CUL depends on the desired significance level α, the number of considered data series N, and the number of data points (n) per data series. Selections of CUL values have been tabulated at significance levels α = 0.01 , α = 0.025 and α = 0.05 . CUL can also be calculated from :
Where:
- CUL = upper limit critical value for one-sided test on a balanced design
- α = significance level
- n = number of data points per data series
- Fc = critical value of Fisher's FF-test of equality of variancesIn statistics, an F-test for the null hypothesis that two normal populations have the same variance is sometimes used, although it needs to be used with caution as it can be sensitive to the assumption that the variables have this distribution....
ratio; Fc can be obtained from tables or using the FINV function in ExcelMicrosoft ExcelMicrosoft Excel is a proprietary commercial spreadsheet application written and distributed by Microsoft for Microsoft Windows and Mac OS X. It features calculation, graphing tools, pivot tables, and a macro programming language called Visual Basic for Applications...
Generalization
The C test can be generalized to include unbalanced designs, one-sided lower limit tests and two-sided tests at any significance level α, for any number of data series N, and for any number of individual data points nj in data series j .See also
- Bartlett's testBartlett's testIn statistics, Bartlett's test is used to test if k samples are from populations with equal variances. Equal variances across samples is called homoscedasticity or homogeneity of variances. Some statistical tests, for example the analysis of variance, assume that variances are equal across groups...
- Levene's testLevene's testIn statistics, Levene's test is an inferential statistic used to assess the equality of variances in different samples. Some common statistical procedures assume that variances of the populations from which different samples are drawn are equal. Levene's test assesses this assumption. It tests the...
- Brown–Forsythe test
- Hartley's testHartley's testIn statistics, Hartley's test, also known as the Fmax test or Hartley's Fmax, is used in the analysis of variance to verify that different groups have a similar variance, an assumption needed for other statistical tests.It was developed by H. O...
- F-test of equality of variancesF-test of equality of variancesIn statistics, an F-test for the null hypothesis that two normal populations have the same variance is sometimes used, although it needs to be used with caution as it can be sensitive to the assumption that the variables have this distribution....