Kuiper's test
Encyclopedia
Kuiper's test is used in statistics
to test that whether a given distribution
, or family of distributions, is contradicted by evidence from a sample of data. It is named after Dutch mathematician Nicolaas Kuiper
.
Kuiper's test is closely related to the more well-known Kolmogorov-Smirnov test
(or K-S test as it is often called). As with the K-S test, the discrepancy statistics D+ and D− represent the absolute sizes of the most positive and most negative differences between the two cumulative distribution function
s that are being compared. The trick with Kuiper's test is to use the quantity D+ + D− as the test statistic. This small change makes Kuiper's test as sensitive in the tails as at the median
and also makes it invariant under cyclic transformations of the independent variable. The Anderson–Darling test is another test that provides equal sensitivity at the tails as the median, but it does not provide the cyclic invariance.
This invariance under cyclic transformations makes Kuiper's test invaluable when testing for cyclic variations
by time of year or day of the week or time of day, and more generally for testing the fit of, and differences between, circular probability distributions.
which is to be the null hypothesis
. Denote the sample of data which are independent realisations of random variables, having F as their distribution function, by xi (i=1,...,n). Then define
and finally,
Tables for the critical points of the test statistic are available, and these include certain cases where the distribution being tested is not fully known, so that parameters of the family of distributions are estimated
.
. The null hypothesis
is that the failures are uniformly distributed
. Kuiper's statistic does not change if we change the beginning of the year and does not require that we bin failures into months or the like. Another test statistic having this property is the Watson statistic, which is related to the Cramér–von Mises test.
However, if failures occur mostly on weekends, many uniform-distribution tests such as K-S would miss this, since weekends are spread throughout the year. This inability to distinguish distributions with a comb
-like shape from continuous distributions is a key problem with all statistics based on a variant of the K-S test. Kuiper's test, applied to the event times modulo one week, is able to detect such a pattern.
Statistics
Statistics is the study of the collection, organization, analysis, and interpretation of data. It deals with all aspects of this, including the planning of data collection in terms of the design of surveys and experiments....
to test that whether a given distribution
Cumulative distribution function
In probability theory and statistics, the cumulative distribution function , or just distribution function, describes the probability that a real-valued random variable X with a given probability distribution will be found at a value less than or equal to x. Intuitively, it is the "area so far"...
, or family of distributions, is contradicted by evidence from a sample of data. It is named after Dutch mathematician Nicolaas Kuiper
Nicolaas Kuiper
Nicolaas Hendrik "Nico" Kuiper was a Dutch mathematician, known for Kuiper's test and proving Kuiper's theorem. He also contributed to the Nash embedding theorem.Kuiper completed his Ph.D...
.
Kuiper's test is closely related to the more well-known Kolmogorov-Smirnov test
Kolmogorov-Smirnov test
In statistics, the Kolmogorov–Smirnov test is a nonparametric test for the equality of continuous, one-dimensional probability distributions that can be used to compare a sample with a reference probability distribution , or to compare two samples...
(or K-S test as it is often called). As with the K-S test, the discrepancy statistics D+ and D− represent the absolute sizes of the most positive and most negative differences between the two cumulative distribution function
Cumulative distribution function
In probability theory and statistics, the cumulative distribution function , or just distribution function, describes the probability that a real-valued random variable X with a given probability distribution will be found at a value less than or equal to x. Intuitively, it is the "area so far"...
s that are being compared. The trick with Kuiper's test is to use the quantity D+ + D− as the test statistic. This small change makes Kuiper's test as sensitive in the tails as at the median
Median
In probability theory and statistics, a median is described as the numerical value separating the higher half of a sample, a population, or a probability distribution, from the lower half. The median of a finite list of numbers can be found by arranging all the observations from lowest value to...
and also makes it invariant under cyclic transformations of the independent variable. The Anderson–Darling test is another test that provides equal sensitivity at the tails as the median, but it does not provide the cyclic invariance.
This invariance under cyclic transformations makes Kuiper's test invaluable when testing for cyclic variations
Seasonality
In statistics, many time series exhibit cyclic variation known as seasonality, periodic variation, or periodic fluctuations. This variation can be either regular or semi regular....
by time of year or day of the week or time of day, and more generally for testing the fit of, and differences between, circular probability distributions.
Definition
The test statistic, V, for Kuiper's test is defined as follows. Let F be the continuous cumulative distribution functionCumulative distribution function
In probability theory and statistics, the cumulative distribution function , or just distribution function, describes the probability that a real-valued random variable X with a given probability distribution will be found at a value less than or equal to x. Intuitively, it is the "area so far"...
which is to be the null hypothesis
Null hypothesis
The practice of science involves formulating and testing hypotheses, assertions that are capable of being proven false using a test of observed data. The null hypothesis typically corresponds to a general or default position...
. Denote the sample of data which are independent realisations of random variables, having F as their distribution function, by xi (i=1,...,n). Then define
and finally,
Tables for the critical points of the test statistic are available, and these include certain cases where the distribution being tested is not fully known, so that parameters of the family of distributions are estimated
Estimation
Estimation is the calculated approximation of a result which is usable even if input data may be incomplete or uncertain.In statistics,*estimation theory and estimator, for topics involving inferences about probability distributions...
.
Example
We could test the hypothesis that computers fail more during some times of the year than others. To test this, we would collect the dates on which the test set of computers had failed and build an empirical distribution functionEmpirical distribution function
In statistics, the empirical distribution function, or empirical cdf, is the cumulative distribution function associated with the empirical measure of the sample. This cdf is a step function that jumps up by 1/n at each of the n data points. The empirical distribution function estimates the true...
. The null hypothesis
Null hypothesis
The practice of science involves formulating and testing hypotheses, assertions that are capable of being proven false using a test of observed data. The null hypothesis typically corresponds to a general or default position...
is that the failures are uniformly distributed
Uniform distribution (continuous)
In probability theory and statistics, the continuous uniform distribution or rectangular distribution is a family of probability distributions such that for each member of the family, all intervals of the same length on the distribution's support are equally probable. The support is defined by...
. Kuiper's statistic does not change if we change the beginning of the year and does not require that we bin failures into months or the like. Another test statistic having this property is the Watson statistic, which is related to the Cramér–von Mises test.
However, if failures occur mostly on weekends, many uniform-distribution tests such as K-S would miss this, since weekends are spread throughout the year. This inability to distinguish distributions with a comb
Comb
A comb is a toothed device used in hair care for straightening and cleaning hair or other fibres. Combs are among the oldest tools found by archaeologists...
-like shape from continuous distributions is a key problem with all statistics based on a variant of the K-S test. Kuiper's test, applied to the event times modulo one week, is able to detect such a pattern.