Statistical assumptions
Encyclopedia
Statistical assumptions are general assumptions about statistical populations.
Statistics
, like all mathematical disciplines, does not generate valid conclusions from nothing. In order to generate interesting conclusions about real statistical population
s, it is usually required to make some background assumptions. These must be made with care, because inappropriate assumptions can generate wildly inaccurate conclusions.
The most commonly applied statistical assumptions are:
Statistics
Statistics
Statistics is the study of the collection, organization, analysis, and interpretation of data. It deals with all aspects of this, including the planning of data collection in terms of the design of surveys and experiments....
, like all mathematical disciplines, does not generate valid conclusions from nothing. In order to generate interesting conclusions about real statistical population
Statistical population
A statistical population is a set of entities concerning which statistical inferences are to be drawn, often based on a random sample taken from the population. For example, if we were interested in generalizations about crows, then we would describe the set of crows that is of interest...
s, it is usually required to make some background assumptions. These must be made with care, because inappropriate assumptions can generate wildly inaccurate conclusions.
The most commonly applied statistical assumptions are:
- independence of observations from each other: This assumption is a common error. (see statistical independenceStatistical independenceIn probability theory, to say that two events are independent intuitively means that the occurrence of one event makes it neither more nor less probable that the other occurs...
) - independence of observational error from potential confounding effects
- exact or approximate normality of observations: The assumption of normality is often erroneous, because many populations are not normal. However, it is standard practice to assume that the sample mean from a random sample is normal, because of the central-limit theorem. (see normal distribution)
- linearity of graded responses to quantitative stimuli (see linear regressionLinear regressionIn statistics, linear regression is an approach to modeling the relationship between a scalar variable y and one or more explanatory variables denoted X. The case of one explanatory variable is called simple regression...
)
Types of assumptions
Statistical assumptions can be categorised into a number of different types:- Non-modelling assumptions. Statistical analyses of data involve making certain types of assumption, whether or not a formal statistical model is used. Such assumptions underlie even descriptive statisticsDescriptive statisticsDescriptive statistics quantitatively describe the main features of a collection of data. Descriptive statistics are distinguished from inferential statistics , in that descriptive statistics aim to summarize a data set, rather than use the data to learn about the population that the data are...
.- Population assumptions. A statistical analysis of data is made on the basis that the observations available derive from either a single population or several different populations, each of which is in some way meaningful. Here a "population" is informally a set of other possible observations that might have been made. The assumption here is a simple one, to the effect that the observer should know that the observations obtained are representative of the problem, topic or class of objects being studied.
- Sampling assumptions. These relate to the way in which observations have been gathered and may often involve an assumption of random selection of some type.
- Modelling assumptions. These may be divided into two types:
- Distributional assumptions. Where a statistical modelStatistical modelA statistical model is a formalization of relationships between variables in the form of mathematical equations. A statistical model describes how one or more random variables are related to one or more random variables. The model is statistical as the variables are not deterministically but...
involves terms relating to random errors assumptions may be made about the probability distributionProbability distributionIn probability theory, a probability mass, probability density, or probability distribution is a function that describes the probability of a random variable taking certain values....
of these errors. In some cases, the distributional assumption relates to the observations themselves. - Structural assumptions. Statistical relationships between variables are often modelled by equating one variable to a function of another (or several others), plus a random errorRandom errorRandom errors are errors in measurement that lead to measurable values being inconsistent when repeated measures of a constant attribute or quantity are taken...
. Models often involve making a structural assumption about the form of the functional relationship here: for example, as in linear regressionLinear regressionIn statistics, linear regression is an approach to modeling the relationship between a scalar variable y and one or more explanatory variables denoted X. The case of one explanatory variable is called simple regression...
. This can be generalised to models involving relationships between underlying unobserved latent variableLatent variableIn statistics, latent variables , are variables that are not directly observed but are rather inferred from other variables that are observed . Mathematical models that aim to explain observed variables in terms of latent variables are called latent variable models...
s. - Cross-variation assumptions. These assumptions involve the joint probability distributions of either the observations themselves or the random errors in a model. Simple models may include the assumption that observations or errors are statistically independent.
- Distributional assumptions. Where a statistical model