Empirical probability
Encyclopedia
Empirical probability, also known as relative frequency
, or experimental probability, is the ratio of the number of "favorable" outcomes to the total number of trials, not in a sample space but in an actual sequence of experiments. In a more general sense, empirical probability estimates probabilities from experience
and observation
.
In statistical terms, the empirical probability is an estimate of a probability. If modelling using a binomial distribution is appropriate, it is the maximum likelihood estimate. It is the Bayesian estimate for the same case if certain assumptions are made for the prior distribution of the probability.
For example, consider estimating the probability among a population of men that they satisfy two conditions:
A direct estimate could be found by counting the number of men who satisfy both conditions to give the empirical probability of the combined condition. An alternative estimate could be found by multiplying the proportion of men who are over 6 feet in height with the proportion of men who prefer strawberry jam to raspberry jam, but this estimate relies on the assumption that the two conditions are statistically independent.
s can help, depending on the context, and in general one can hope that such models would provide improvements in accuracy compared to empirical probabilities, provided that the assumptions involved actually do hold.
For example, consider estimating the probability that the lowest of the daily-maximum temperatures at a site in February in any one year is less than zero degrees Celsius. A record of such temperatures in past years could be used to estimate this probability. A model-based alternative would be to select of family of probability distributions and fit it to the dataset containing past years′ values. The fitted distribution would provide an alternative estimate of the desired probability. This alternative method can provide an estimate of the probability even if all values in the record are greater than zero.
, but is not directly related to Bayesian inference
, where a-posteriori probability is occasionally used to refer to posterior probability
, which is different even though it has a confusingly similar name.
Frequency (statistics)
In statistics the frequency of an event i is the number ni of times the event occurred in the experiment or the study. These frequencies are often graphically represented in histograms....
, or experimental probability, is the ratio of the number of "favorable" outcomes to the total number of trials, not in a sample space but in an actual sequence of experiments. In a more general sense, empirical probability estimates probabilities from experience
Experience
Experience as a general concept comprises knowledge of or skill in or observation of some thing or some event gained through involvement in or exposure to that thing or event....
and observation
Observation
Observation is either an activity of a living being, such as a human, consisting of receiving knowledge of the outside world through the senses, or the recording of data using scientific instruments. The term may also refer to any data collected during this activity...
.
In statistical terms, the empirical probability is an estimate of a probability. If modelling using a binomial distribution is appropriate, it is the maximum likelihood estimate. It is the Bayesian estimate for the same case if certain assumptions are made for the prior distribution of the probability.
Advantages
An advantage of estimating probabilities using empirical probabilities is that this procedure is relatively free of assumptions.For example, consider estimating the probability among a population of men that they satisfy two conditions:
- that they are over 6 feet in height; and
- that they prefer strawberry jam to raspberry jam.
A direct estimate could be found by counting the number of men who satisfy both conditions to give the empirical probability of the combined condition. An alternative estimate could be found by multiplying the proportion of men who are over 6 feet in height with the proportion of men who prefer strawberry jam to raspberry jam, but this estimate relies on the assumption that the two conditions are statistically independent.
Disadvantages
A disadvantage in using empirical probabilities arises in estimating probabilities which are either very close to zero, or very close to one. In these cases very large sample sizes would be needed in order to estimate such probabilities to a good standard of relative accuracy. Here statistical modelStatistical model
A statistical model is a formalization of relationships between variables in the form of mathematical equations. A statistical model describes how one or more random variables are related to one or more random variables. The model is statistical as the variables are not deterministically but...
s can help, depending on the context, and in general one can hope that such models would provide improvements in accuracy compared to empirical probabilities, provided that the assumptions involved actually do hold.
For example, consider estimating the probability that the lowest of the daily-maximum temperatures at a site in February in any one year is less than zero degrees Celsius. A record of such temperatures in past years could be used to estimate this probability. A model-based alternative would be to select of family of probability distributions and fit it to the dataset containing past years′ values. The fitted distribution would provide an alternative estimate of the desired probability. This alternative method can provide an estimate of the probability even if all values in the record are greater than zero.
Mixed nomenclature
The phrase a-posteriori probability is also used as an alternative to empirical probability or relative frequency. The use of the phrase "a-posteriori" is reminiscent of terms in Bayesian statisticsBayesian inference
In statistics, Bayesian inference is a method of statistical inference. It is often used in science and engineering to determine model parameters, make predictions about unknown variables, and to perform model selection...
, but is not directly related to Bayesian inference
Bayesian inference
In statistics, Bayesian inference is a method of statistical inference. It is often used in science and engineering to determine model parameters, make predictions about unknown variables, and to perform model selection...
, where a-posteriori probability is occasionally used to refer to posterior probability
Posterior probability
In Bayesian statistics, the posterior probability of a random event or an uncertain proposition is the conditional probability that is assigned after the relevant evidence is taken into account...
, which is different even though it has a confusingly similar name.
See also
- Empirical distribution functionEmpirical distribution functionIn statistics, the empirical distribution function, or empirical cdf, is the cumulative distribution function associated with the empirical measure of the sample. This cdf is a step function that jumps up by 1/n at each of the n data points. The empirical distribution function estimates the true...
- Empirical measureEmpirical measureIn probability theory, an empirical measure is a random measure arising from a particular realization of a sequence of random variables. The precise definition is found below. Empirical measures are relevant to mathematical statistics....
- Frequency probabilityFrequency probabilityFrequency probability is the interpretation of probability that defines an event's probability as the limit of its relative frequency in a large number of trials. The development of the frequentist account was motivated by the problems and paradoxes of the previously dominant viewpoint, the...
- RealizationRealization (probability)In probability and statistics, a realization, or observed value, of a random variable is the value that is actually observed . The random variable itself should be thought of as the process how the observation comes about...
- SampleSample (statistics)In statistics, a sample is a subset of a population. Typically, the population is very large, making a census or a complete enumeration of all the values in the population impractical or impossible. The sample represents a subset of manageable size...
- A priori probabilityA priori probabilityThe term a priori probability is used in distinguishing the ways in which values for probabilities can be obtained. In particular, an "a priori probability" is derived purely by deductive reasoning...
in relation to "a posteriori probabiliy"