Lilliefors test - AbsoluteAstronomy.com

Statistics

Statistics is the study of the collection, organization, analysis, and interpretation of data. It deals with all aspects of this, including the planning of data collection in terms of the design of surveys and experiments....

, the Lilliefors test, named after Hubert Lilliefors

Hubert Lilliefors

Hubert Whitman Lilliefors was an American statistician, noted for his introduction of the Lilliefors test...

, professor of statistics at George Washington University

George Washington University

The George Washington University is a private, coeducational comprehensive university located in Washington, D.C. in the United States...

, is an adaptation of the Kolmogorov–Smirnov test. It is used to test the null hypothesis

Null hypothesis

The practice of science involves formulating and testing hypotheses, assertions that are capable of being proven false using a test of observed data. The null hypothesis typically corresponds to a general or default position...

that data come from a normally distributed population, when the null hypothesis does not specify which normal distribution; i.e., it does not specify the expected value

Expected value

In probability theory, the expected value of a random variable is the weighted average of all possible values that this random variable can take on...

and variance

Variance

In probability theory and statistics, the variance is a measure of how far a set of numbers is spread out. It is one of several descriptors of a probability distribution, describing how far the numbers lie from the mean . In particular, the variance is one of the moments of a distribution...

of the distribution

The test

The test proceeds as follows:

1. First estimate the population mean and population variance based on the data.

2. Then find the maximum discrepancy between the empirical distribution function

Empirical distribution function

In statistics, the empirical distribution function, or empirical cdf, is the cumulative distribution function associated with the empirical measure of the sample. This cdf is a step function that jumps up by 1/n at each of the n data points. The empirical distribution function estimates the true...

and the cumulative distribution function

Cumulative distribution function

In probability theory and statistics, the cumulative distribution function , or just distribution function, describes the probability that a real-valued random variable X with a given probability distribution will be found at a value less than or equal to x. Intuitively, it is the "area so far"...

(CDF) of the normal distribution with the estimated mean and estimated variance. Just as in the Kolmogorov–Smirnov test, this will be the test statistic.

3. Finally, assess whether the maximum discrepancy is large enough to be statistically significant

Statistical significance

In statistics, a result is called statistically significant if it is unlikely to have occurred by chance. The phrase test of significance was coined by Ronald Fisher....

, thus requiring rejection of the null hypothesis. This is where this test becomes more complicated than the Kolmogorov–Smirnov test. Since the hypothesized CDF has been moved closer to the data by estimation based on those data, the maximum discrepancy has been made smaller than it would have been if the null hypothesis had singled out just one normal distribution. Thus the "null distribution" of the test statistic, i.e. its probability distribution

Probability distribution

In probability theory, a probability mass, probability density, or probability distribution is a function that describes the probability of a random variable taking certain values....

assuming the null hypothesis is true, is stochastically smaller than the Kolmogorov–Smirnov distribution. This is the Lilliefors distribution. To date, tables for this distribution have been computed only by Monte Carlo method

Monte Carlo method

Monte Carlo methods are a class of computational algorithms that rely on repeated random sampling to compute their results. Monte Carlo methods are often used in computer simulations of physical and mathematical systems...

s.

There is an extensive literature on normality test

Normality test

In statistics, normality tests are used to determine whether a data set is well-modeled by a normal distribution or not, or to compute how likely an underlying random variable is to be normally distributed....

ing, but as a practical matter many experienced data analysts sidestep formal testing and assess the feasibility of a normal model by using a graphical tool such as a Q-Q plot

Q-Q plot

In statistics, a Q-Q plot is a probability plot, which is a graphical method for comparing two probability distributions by plotting their quantiles against each other. First, the set of intervals for the quantiles are chosen...

External links

US NIST Handbook of Statistics

The source of this article is wikipedia, the free encyclopedia. The text of this article is licensed under the GFDL.