A priori (statistics)
Encyclopedia
In statistics
, a priori knowledge is prior knowledge about a population
, rather than that estimated by recent observation. It is common in Bayesian inference
to make inferences conditional
upon this knowledge, and the integration of a priori knowledge is the central difference between the Bayesian and Frequentist
approach to statistics. We need not be 100% certain about something before it can be considered a priori knowledge, but conducting estimation conditional upon assumptions for which there is little evidence should be avoided. A priori knowledge often consists of knowledge of the domain
of a parameter (for example, that it is positive) that can be incorporated to improve an estimate. Within this domain the distribution is usually assumed to be uniform
in order to take advantage of certain theoretical results (most importantly the central limit theorem
).
But if, a priori, we already know there were only two red beads in the bag, we could be certain the probability of picking a red bead is zero, (the corollary being 100% certainty of a black bead being picked, but only if we know that there were more than three black beads in the bag to begin with.).
It is a commonly used statistical method to make, or fit, a priori data to support a model
, or conversely, to modify the model to accommodate a priori data.
based on recorded data, and we know beforehand that the process is stationary
. Any AR(2) process is of the form
:
Under the classical frequentist approach, we would proceed with maximum likelihood estimation (MLE), but instead we can integrate our knowledge into the likelihood function
and maximize our likelihood conditional
upon the fact that the process is stationary. We can assign prior distributions to the AR coefficients that are uniform across a limited domain in line with the constraints upon stationary process coefficients. For an AR(2) process, the constraints are:
Adding this information will change the Likelihood function, and when we now use MLE to estimate the coefficients, we will in general obtain a better estimate. This is true in particular when we suspect that the coefficients are near the boundary of the stationary domain. Note that the distribution on the domain is uniform, so we have not made any assumptions about what the coefficients actually are, only their domain.
Statistics
Statistics is the study of the collection, organization, analysis, and interpretation of data. It deals with all aspects of this, including the planning of data collection in terms of the design of surveys and experiments....
, a priori knowledge is prior knowledge about a population
Statistical population
A statistical population is a set of entities concerning which statistical inferences are to be drawn, often based on a random sample taken from the population. For example, if we were interested in generalizations about crows, then we would describe the set of crows that is of interest...
, rather than that estimated by recent observation. It is common in Bayesian inference
Bayesian inference
In statistics, Bayesian inference is a method of statistical inference. It is often used in science and engineering to determine model parameters, make predictions about unknown variables, and to perform model selection...
to make inferences conditional
Conditional probability
In probability theory, the "conditional probability of A given B" is the probability of A if B is known to occur. It is commonly notated P, and sometimes P_B. P can be visualised as the probability of event A when the sample space is restricted to event B...
upon this knowledge, and the integration of a priori knowledge is the central difference between the Bayesian and Frequentist
Frequency probability
Frequency probability is the interpretation of probability that defines an event's probability as the limit of its relative frequency in a large number of trials. The development of the frequentist account was motivated by the problems and paradoxes of the previously dominant viewpoint, the...
approach to statistics. We need not be 100% certain about something before it can be considered a priori knowledge, but conducting estimation conditional upon assumptions for which there is little evidence should be avoided. A priori knowledge often consists of knowledge of the domain
Domain (mathematics)
In mathematics, the domain of definition or simply the domain of a function is the set of "input" or argument values for which the function is defined...
of a parameter (for example, that it is positive) that can be incorporated to improve an estimate. Within this domain the distribution is usually assumed to be uniform
Uniform distribution
-Probability theory:* Discrete uniform distribution* Continuous uniform distribution-Other:* "Uniform distribution modulo 1", see Equidistributed sequence*Uniform distribution , a type of species distribution* Distribution of military uniforms...
in order to take advantage of certain theoretical results (most importantly the central limit theorem
Central limit theorem
In probability theory, the central limit theorem states conditions under which the mean of a sufficiently large number of independent random variables, each with finite mean and variance, will be approximately normally distributed. The central limit theorem has a number of variants. In its common...
).
Basic example
Suppose we pick (without replacement) two red beads and three black beads from a bag containing only black and red beads; what is the probability the next bead we pick out will be red? Without a priori knowledge of the bag's contents, we cannot answer the question.But if, a priori, we already know there were only two red beads in the bag, we could be certain the probability of picking a red bead is zero, (the corollary being 100% certainty of a black bead being picked, but only if we know that there were more than three black beads in the bag to begin with.).
It is a commonly used statistical method to make, or fit, a priori data to support a model
Statistical model
A statistical model is a formalization of relationships between variables in the form of mathematical equations. A statistical model describes how one or more random variables are related to one or more random variables. The model is statistical as the variables are not deterministically but...
, or conversely, to modify the model to accommodate a priori data.
Further theoretical example
Suppose that we are trying to estimate the coefficients of an autoregressive (AR) stochastic processStochastic process
In probability theory, a stochastic process , or sometimes random process, is the counterpart to a deterministic process...
based on recorded data, and we know beforehand that the process is stationary
Stationary process
In the mathematical sciences, a stationary process is a stochastic process whose joint probability distribution does not change when shifted in time or space...
. Any AR(2) process is of the form
Of the form
In mathematics, the phrase "of the form" indicates that a mathematical object, or a collection of objects, follows a certain pattern of expression...
:
Under the classical frequentist approach, we would proceed with maximum likelihood estimation (MLE), but instead we can integrate our knowledge into the likelihood function
Likelihood function
In statistics, a likelihood function is a function of the parameters of a statistical model, defined as follows: the likelihood of a set of parameter values given some observed outcomes is equal to the probability of those observed outcomes given those parameter values...
and maximize our likelihood conditional
Conditional probability
In probability theory, the "conditional probability of A given B" is the probability of A if B is known to occur. It is commonly notated P, and sometimes P_B. P can be visualised as the probability of event A when the sample space is restricted to event B...
upon the fact that the process is stationary. We can assign prior distributions to the AR coefficients that are uniform across a limited domain in line with the constraints upon stationary process coefficients. For an AR(2) process, the constraints are:
Adding this information will change the Likelihood function, and when we now use MLE to estimate the coefficients, we will in general obtain a better estimate. This is true in particular when we suspect that the coefficients are near the boundary of the stationary domain. Note that the distribution on the domain is uniform, so we have not made any assumptions about what the coefficients actually are, only their domain.
See also
- Bayes' theoremBayes' theoremIn probability theory and applications, Bayes' theorem relates the conditional probabilities P and P. It is commonly used in science and engineering. The theorem is named for Thomas Bayes ....
- A priori probabilityA priori probabilityThe term a priori probability is used in distinguishing the ways in which values for probabilities can be obtained. In particular, an "a priori probability" is derived purely by deductive reasoning...