Bayesian inference - AbsoluteAstronomy.com

In statistics, Bayesian inference is a method of statistical inference

Statistical inference

In statistics, statistical inference is the process of drawing conclusions from data that are subject to random variation, for example, observational errors or sampling variation...

. It is often used in science and engineering to determine model parameters, make predictions about unknown variables, and to perform model selection. Bayesian inference may be contrasted to frequentist inference

Frequentist inference

Frequentist inference is one of a number of possible ways of formulating generally applicable schemes for making statistical inferences: that is, for drawing conclusions from statistical samples. An alternative name is frequentist statistics...

, which uses the sampling distribution of a statistic.

In the Bayesian interpretation of probability

Bayesian probability

Bayesian probability is one of the different interpretations of the concept of probability and belongs to the category of evidential probabilities. The Bayesian interpretation of probability can be seen as an extension of logic that enables reasoning with propositions, whose truth or falsity is...

, probability measures confidence that something is true, and may be termed confidence, uncertainty or belief. In practical usage, Bayesian inference is often viewed as an iterative process in which the confidence distribution on the value of a variable is updated as evidence for the value is observed. In each iteration, the initial distribution is called the prior

Prior probability

In Bayesian statistical inference, a prior probability distribution, often called simply the prior, of an uncertain quantity p is the probability distribution that would express one's uncertainty about p before the "data"...

and the modified distribution the posterior

Posterior probability

In Bayesian statistics, the posterior probability of a random event or an uncertain proposition is the conditional probability that is assigned after the relevant evidence is taken into account...

.

In more detail, suppose there is a real process, generating independent events with an unknown probability distribution. It is assumed that the distribution corresponds to some model, parametrised by the variable

. The state of belief concerning this process is the set of possible models (one for each value of

) and corresponding confidences. The confidences are subjective, but always sum to 1. When events are freshly observed, they may be compared to those predicted by each model and the confidences updated. This is achieved mathematically using Bayes' theorem

Bayes' theorem

In probability theory and applications, Bayes' theorem relates the conditional probabilities P and P. It is commonly used in science and engineering. The theorem is named for Thomas Bayes ....

. Typically, as iterations occur, the confidence in one model tends to 1 while that of the rest tend to 0.

In Bayesian model selection, the uncertainty of different models is compared as inference steps occur. For further details of the use of Bayesian inference in model selection, see Bayesian model selection.

General view

Suppose a process is generating independent and identically distributed events

, but the probability distribution is unknown. Let the event space

represent the current state of belief for this process. Each model is represented by

. The conditional probabilities

are specified to define the model.

is the confidence of model

. Before the first inference step,

is a set of arbitrary initial prior probabilities.

Suppose that the process is observed to generate event

. For each model

is updated to

. From Bayes' theorem

Bayes' theorem

In probability theory and applications, Bayes' theorem relates the conditional probabilities P and P. It is commonly used in science and engineering. The theorem is named for Thomas Bayes ....

, the posterior
Posterior probability
In Bayesian statistics, the posterior probability of a random event or an uncertain proposition is the conditional probability that is assigned after the relevant evidence is taken into account...

, is the confidence in after is observed.
, the prior
Prior probability
In Bayesian statistical inference, a prior probability distribution, often called simply the prior, of an uncertain quantity p is the probability distribution that would express one's uncertainty about p before the "data"...

, is the confidence in before is observed.
is a factor representing the impact of on the confidence in . The numerator is called the likelihood
Likelihood function
In statistics, a likelihood function is a function of the parameters of a statistical model, defined as follows: the likelihood of a set of parameter values given some observed outcomes is equal to the probability of those observed outcomes given those parameter values...

.

Upon observation of further evidence, this procedure may be repeated.

Parametric view

Let

be a set of independent identically distributed observations, where each

is distributed according to

is an unknown vector of parameters and predictions to be inferred from the observations. Initially, confidence in

is distributed according to some prior distribution

with the vector of hyperparameters

Hyperparameter

In Bayesian statistics, a hyperparameter is a parameter of a prior distribution; the term is used to distinguish them from parameters of the model for the underlying system under analysis...

.

From the conditional independence

Conditional independence

In probability theory, two events R and B are conditionally independent given a third event Y precisely if the occurrence or non-occurrence of R and the occurrence or non-occurrence of B are independent events in their conditional probability distribution given Y...

of the observations, the joint probability density of

given

As the observations are conditionally independent of

Bayes' theorem

In probability theory and applications, Bayes' theorem relates the conditional probabilities P and P. It is commonly used in science and engineering. The theorem is named for Thomas Bayes ....

is then applied to determine the posterior distribution

Interpretation of factor

. That is, if the model were true, the evidence would be more likely than is predicted by the current state of belief. The reverse applies for a decrease in confidence. If the confidence does not change,

. That is, the evidence is independent of the model. If the model were true, the evidence would be exactly as likely as predicted by the current state of belief.

Cromwell's rule

then

. Similarly, if

, then

.

The former can be proved by inspection of Bayes' theorem. The latter can be proved by considering that

. Therefore,

. The result now follows by substitution into Bayes' theorem.

Cromwell's rule can be interpreted to mean that hard convictions are insensitive to counter-evidence.

Asymptotic behaviour of posterior

Consider the behaviour of a belief distribution as it is updated a large number of times with independent and identically distributed trials. For sufficiently nice prior probabilities, the Bernstein-von Mises theorem

Bernstein–von Mises theorem

In Bayesian inference, the Bernstein–von Mises theorem provides the basis for the important result that the posterior distribution for unknown quantities in any problem is effectively independent of the prior distribution once the amount of information supplied by a sample of data is large...

gives that in the limit of infinite trials and the posterior converges to a Gaussian distribution independent of the initial prior under some conditions firstly outlined and rigorously proven by Joseph Leo Doob

Joseph Leo Doob

Joseph Leo Doob was an American mathematician, specializing in analysis and probability theory.The theory of martingales was developed by Doob.-Early life and education:...

in 1948, namely if the random variable in consideration has a finite probability space

Probability space

In probability theory, a probability space or a probability triple is a mathematical construct that models a real-world process consisting of states that occur randomly. A probability space is constructed with a specific kind of situation or experiment in mind...

. The more general results were obtained later by the statistician David A. Freedman

David A. Freedman (statistician)

David A. Freedman was Professor of Statistics at the University of California, Berkeley. He was a distinguished mathematical statistician whose wide-ranging research included the analysis of martingale inequalities, Markov processes, de Finetti's theorem, consistency of Bayes estimators, sampling,...

who published in two seminal research papers in 1963 and 1965 when and under what circumstances the asymptotic behaviour of posterior is guaranteed. His 1963 paper treats like Doob (1949) the finite case and comes to a satisfactory conclusion. However, if the random variable has an infinite but countable probability space

Probability space

(i.e. corresponding to a die with infinite many faces) the 1965 paper demonstrates that for a dense subset of priors the Bernstein-von Mises theorem

Bernstein–von Mises theorem

is not applicable. In this case there is almost surely

Almost surely

In probability theory, one says that an event happens almost surely if it happens with probability one. The concept is analogous to the concept of "almost everywhere" in measure theory...

no asymptotic convergence. Similar results were obtained in 1964 by Lorraine Schwarz. Later in the eighties and nineties Freedman

David A. Freedman (statistician)

and Persi Diaconis

Persi Diaconis

Persi Warren Diaconis is an American mathematician and former professional magician. He is the Mary V. Sunseri Professor of Statistics and Mathematics at Stanford University....

continued to work on the case of infinite countable probability spaces.
We conclude that in practise, there may be insufficient trials to suppress the effects of the initial choice, and especially for large (but finite) systems the convergence might be very slow.

Conjugate priors

For mathematical convenience, the prior distribution is often assumed to come from a particular family of distributions called a conjugate prior

Conjugate prior

In Bayesian probability theory, if the posterior distributions p are in the same family as the prior probability distribution p, the prior and posterior are then called conjugate distributions, and the prior is called a conjugate prior for the likelihood...

. For each family of distributions

, there will be an associated conjugate prior

Conjugate prior

family. The usefulness of the conjugate prior is that if the prior distribution

is chosen from this family, the posterior distribution of a single observation, or of a set of independent identically distributed observations, will be in the same family, and the integral in the denominator of the above calculation will be tractable

Tractable

Tractable may refer to:*Operation Tractable, a military operation in Normandy 1944*Tractability concerning how easily something can be done...

Estimates of parameters and predictions

Once the posterior distribution of the parameter is determined, any desired statistic regarding the distribution can be determined (e.g. the most likely value or mode

Mode (statistics)

In statistics, the mode is the value that occurs most frequently in a data set or a probability distribution. In some fields, notably education, sample data are often called scores, and the sample mode is known as the modal score....

, the mean

Mean

In statistics, mean has two related meanings:* the arithmetic mean .* the expected value of a random variable, which is also called the population mean....

, the variance

Variance

In probability theory and statistics, the variance is a measure of how far a set of numbers is spread out. It is one of several descriptors of a probability distribution, describing how far the numbers lie from the mean . In particular, the variance is one of the moments of a distribution...

, the median, etc.). If a point estimate of the parameter is desired, a maximum a posteriori

Maximum a posteriori

In Bayesian statistics, a maximum a posteriori probability estimate is a mode of the posterior distribution. The MAP can be used to obtain a point estimate of an unobserved quantity on the basis of empirical data...

estimate can be computed, i.e.:

This could then be used to make predictions about new observations.

However, the "properly" Bayesian tendency is to work with the entire distribution, and make predictions by marginalizing

Marginal distribution

In probability theory and statistics, the marginal distribution of a subset of a collection of random variables is the probability distribution of the variables contained in the subset. The term marginal variable is used to refer to those variables in the subset of variables being retained...

over the distribution. For example, the predictive density of a new observation

can be determined by

Furthermore, when making a point estimate of a parameter, Bayesians generally prefer to use the mean

Mean

In statistics, mean has two related meanings:* the arithmetic mean .* the expected value of a random variable, which is also called the population mean....

rather than the mode

Mode (statistics)

, i.e.

Testing a hypothesis

Suppose there are two full bowls of cookies. Bowl #1 has 10 chocolate chip and 30 plain cookies, while bowl #2 has 20 of each. Our friend Fred picks a bowl at random, and then picks a cookie at random. We may assume there is no reason to believe Fred treats one bowl differently from another, likewise for the cookies. The cookie turns out to be a plain one. How probable is it that Fred picked it out of bowl #1?

Intuitively, it seems clear that the answer should be more than a half, since there are more plain cookies in bowl #1. The precise answer is given by Bayes' theorem. Let

correspond to bowl #1, and

to bowl #2.
It is given that the bowls are identical from Fred's point of view, thus

, and the two must add up to 1, so both are equal to 0.5.
The event

is the observation of a plain cookie. From the contents of the bowls, we know that

and

. Bayes' formula then yields

Before we observed the cookie, the probability we assigned for Fred having chosen bowl #1 was the prior probability,

, which was 0.5. After observing the cookie, we must revise the probability to

, which is 0.6.

Making a prediction

An archaeologist is working at a site thought to be from the medieval period, between the 11th century to the 16th century. However, it is uncertain exactly when in this period the site was inhabited. Fragments of pottery are found, some of which are glazed and some of which are decorated. It is expected that if the site were inhabited during the early medieval period, then 1% of the pottery would be glazed and 50% of its area decorated, whereas if it had been inhabited in the late medieval period then 81% would be glazed and 5% of its area decorated. How confident can the archaeologist be in estimating the period of inhabitation as fragments are unearthed?

The confidence in the continuous variable

(century) is to be calculated, with the discrete set of events

as evidence. Assuming linear variation of glaze and decoration with time, and that these variables are independent,

Assume a uniform prior of

, and that trials are independent and identically distributed. When a new fragment of type

is discovered, Bayes' theorem is applied to update the confidence for each

A computer simulation of the changing confidence as 50 fragments are unearthed is shown on the graph. In the simulation, the site was inhabited around 1420, or

. By calculating the area under the relevant portion of the graph for 50 trials, the archaeologist can say that there is practically no chance the site was inhabited in the 11th and 12th centuries, about 1% chance that it was inhabited during the 13th century, 63% chance during the 14th century and 36% during the 15th century.Note that the Bernstein-von Mises theorem

Bernstein–von Mises theorem

asserts here the asymptotic convergence to the "true" distribution because the probability space

Probability space

corresponding to the discrete set of events

is finite (see above section on asymptotic behaviour of the posterior).

Computer applications

Bayesian inference has applications in artificial intelligence

Artificial intelligence

Artificial intelligence is the intelligence of machines and the branch of computer science that aims to create it. AI textbooks define the field as "the study and design of intelligent agents" where an intelligent agent is a system that perceives its environment and takes actions that maximize its...

and expert system

Expert system

In artificial intelligence, an expert system is a computer system that emulates the decision-making ability of a human expert. Expert systems are designed to solve complex problems by reasoning about knowledge, like an expert, and not by following the procedure of a developer as is the case in...

s. Bayesian inference techniques have been a fundamental part of computerized pattern recognition

Pattern recognition

In machine learning, pattern recognition is the assignment of some sort of output value to a given input value , according to some specific algorithm. An example of pattern recognition is classification, which attempts to assign each input value to one of a given set of classes...

techniques since the late 1950s. There is also an ever growing connection between Bayesian methods and simulation-based Monte Carlo

Monte Carlo method

Monte Carlo methods are a class of computational algorithms that rely on repeated random sampling to compute their results. Monte Carlo methods are often used in computer simulations of physical and mathematical systems...

techniques since complex models cannot be processed in closed form by a Bayesian analysis, while a graphical model

Graphical model

A graphical model is a probabilistic model for which a graph denotes the conditional independence structure between random variables. They are commonly used in probability theory, statistics—particularly Bayesian statistics—and machine learning....

structure may allow for efficient simulation algorithms like the Gibbs sampling

Gibbs sampling

In statistics and in statistical physics, Gibbs sampling or a Gibbs sampler is an algorithm to generate a sequence of samples from the joint probability distribution of two or more random variables...

and other Metropolis–Hastings algorithm schemes. Recently Bayesian inference has gained popularity amongst the phylogenetics

Phylogenetics

In biology, phylogenetics is the study of evolutionary relatedness among groups of organisms , which is discovered through molecular sequencing data and morphological data matrices...

community for these reasons; a number of applications allow many demographic and evolutionary parameters to be estimated simultaneously. In the areas of population genetics

Population genetics

Population genetics is the study of allele frequency distribution and change under the influence of the four main evolutionary processes: natural selection, genetic drift, mutation and gene flow. It also takes into account the factors of recombination, population subdivision and population...

and dynamical systems theory

Dynamical systems theory

Dynamical systems theory is an area of applied mathematics used to describe the behavior of complex dynamical systems, usually by employing differential equations or difference equations. When differential equations are employed, the theory is called continuous dynamical systems. When difference...

approximate Bayesian computation

Approximate Bayesian computation

Approximate Bayesian computation is a family of computational techniques in Bayesian statistics. These simulation techniques operate on summary data to make broad inferences with less computation than might be required if all available data were analyzed in detail...

(ABC) are also becoming increasingly popular.

As applied to statistical classification, Bayesian inference has been used in recent years to develop algorithms for identifying e-mail spam

E-mail spam

Email spam, also known as junk email or unsolicited bulk email , is a subset of spam that involves nearly identical messages sent to numerous recipients by email. Definitions of spam usually include the aspects that email is unsolicited and sent in bulk. One subset of UBE is UCE...

. Applications which make use of Bayesian inference for spam filtering include DSPAM

DSPAM

DSPAM is a free software statistical spam filter written by Jonathan A. Zdziarski, author of the book Ending Spam and other books. It is intended to be a scalable, content-based spam filter for large multi-user systems...

, Bogofilter

Bogofilter

Bogofilter is a mail filter that classifies e-mail as spam or ham by a statistical analysis of the message's header and content . The program is able to learn from the user's classifications and corrections. It was originally written by Eric S...

, SpamAssassin

SpamAssassin

SpamAssassin is a computer program released under the Apache License 2.0 used for e-mail spam filtering based on content-matching rules. It is now part of the Apache Foundation....

, SpamBayes

SpamBayes

SpamBayes is a Bayesian spam filter written in Python which uses techniques laid out by Paul Graham in his essay "A Plan for Spam". It has subsequently been improved by Gary Robinson and Tim Peters, among others....

, and Mozilla

Mozilla

Mozilla is a term used in a number of ways in relation to the Mozilla.org project and the Mozilla Foundation, their defunct commercial predecessor Netscape Communications Corporation, and their related application software....

. Spam classification is treated in more detail in the article on the naive Bayes classifier

Naive Bayes classifier

A naive Bayes classifier is a simple probabilistic classifier based on applying Bayes' theorem with strong independence assumptions...

In the courtroom

Bayesian inference can be used by jurors to coherently accumulate the evidence for and against a defendant, and to see whether, in totality, it meets their personal threshold for 'beyond a reasonable doubt

Beyond a Reasonable Doubt

Beyond a Reasonable Doubt is a 1956 film directed by Fritz Lang and written by Douglas Morrow. The film, considered film noir, was the last American film directed by Lang.-Plot:...

'. The benefit of a Bayesian approach is that it gives the juror an unbiased, rational mechanism for combining evidence. Bayes' theorem is applied successively to all evidence presented, with the posterior from one stage becoming the prior for the next. A prior probability of guilt is still required. It has been suggested that this could reasonably be the probability that a random person taken from the qualifying population is guilty. Thus, for a crime known to have been committed by an adult male living in a town containing 50,000 adult males, the appropriate initial prior might be 1/50,000.

It may be appropriate to explain Bayes' theorem to jurors in odds form

Bayes' rule

In probability theory and applications, Bayes' rule relates the odds of event A_1 to event A_2, before and after conditioning on event B. The relationship is expressed in terms of the Bayes factor, \Lambda. Bayes' rule is derived from and closely related to Bayes' theorem...

, as betting odds are more widely understood than probabilities. Alternatively, a logarithmic approach

Gambling and information theory

Statistical inference might be thought of as gambling theory applied to the world around. The myriad applications for logarithmic information measures tell us precisely how to take the best guess in the face of partial information. In that sense, information theory might be considered a formal...

, replacing multiplication with addition, might be easier for a jury to handle.

The use of Bayes' theorem by jurors is controversial. In the United Kingdom, a defence expert witness

Expert witness

An expert witness, professional witness or judicial expert is a witness, who by virtue of education, training, skill, or experience, is believed to have expertise and specialised knowledge in a particular subject beyond that of the average person, sufficient that others may officially and legally...

explained Bayes' theorem to the jury in R v Adams. The jury convicted, but the case went to appeal on the basis that no means of accumulating evidence had been provided for jurors who did not wish to use Bayes' theorem. The Court of Appeal upheld the conviction, but it also gave the opinion that "To introduce Bayes' Theorem, or any similar method, into a criminal trial plunges the jury into inappropriate and unnecessary realms of theory and complexity, deflecting them from their proper task."

Gardner-Medwin argues that the criterion on which a verdict in a criminal trial should be based is not the probability of guilt, but rather the probability of the evidence, given that the defendant is innocent (akin to a frequentist p-value

P-value

In statistical significance testing, the p-value is the probability of obtaining a test statistic at least as extreme as the one that was actually observed, assuming that the null hypothesis is true. One often "rejects the null hypothesis" when the p-value is less than the significance level α ,...

). He argues that if the posterior probability of guilt is to be computed by Bayes' theorem, the prior probability of guilt must be known. This will depend on the incidence of the crime, which is an unusual piece of evidence to consider in a criminal trial. Consider the following three propositions:

A The known facts and testimony could have arisen if the defendant is guilty

B The known facts and testimony could have arisen if the defendant is innocent

C The defendant is guilty.

Gardner-Medwin argues that the jury should believe both A and not-B in order to convict. A and not-B implies the truth of C, but the reverse is not true. It is possible that B and C are both true, but in this case he argues that a jury should acquit, even though they know that they will be letting some guilty people go free. See also Lindley's paradox

Lindley's paradox

Lindley's paradox is a counterintuitive situation in statistics in which the Bayesian and frequentist approaches to a hypothesis testing problem give opposite results for certain choices of the prior distribution...

Other

The scientific method
Scientific method
Scientific method refers to a body of techniques for investigating phenomena, acquiring new knowledge, or correcting and integrating previous knowledge. To be termed scientific, a method of inquiry must be based on gathering empirical and measurable evidence subject to specific principles of...

is sometimes interpreted as an application of Bayesian inference. In this view, Bayes' rule guides (or should guide) the updating of probabilities about hypotheses
Hypothesis
A hypothesis is a proposed explanation for a phenomenon. The term derives from the Greek, ὑποτιθέναι – hypotithenai meaning "to put under" or "to suppose". For a hypothesis to be put forward as a scientific hypothesis, the scientific method requires that one can test it...

conditional on new observations or experiment
Experiment
An experiment is a methodical procedure carried out with the goal of verifying, falsifying, or establishing the validity of a hypothesis. Experiments vary greatly in their goal and scale, but always rely on repeatable procedure and logical analysis of the results...

s.
In March 2011, English Heritage
English Heritage
English Heritage . is an executive non-departmental public body of the British Government sponsored by the Department for Culture, Media and Sport...

reported the successful outcome of a research project by archaeologists at Cardiff University
Cardiff University
Cardiff University is a leading research university located in the Cathays Park area of Cardiff, Wales, United Kingdom. It received its Royal charter in 1883 and is a member of the Russell Group of Universities. The university is consistently recognised as providing high quality research-based...

, which demonstrated the possibility of using Bayesian inference to more accurately date prehistoric remains.
Bayesian search theory
Bayesian search theory
Bayesian search theory is the application of Bayesian statistics to the search for lost objects. It has been used several times to find lost sea vessels, for example the USS Scorpion.-Procedure:The usual procedure is as follows:...

is used to search for lost objects.
Bayesian inference in phylogeny
Bayesian inference in phylogeny
Bayesian inference in phylogeny generates a posterior distribution for a parameter, composed of a phylogenetic tree and a model of evolution, based on the prior for that parameter and the likelihood of the data, generated by a multiple alignment. The Bayesian approach has become more popular due...
Bayesian tool for methylation analysis
Bayesian tool for methylation analysis
Bayesian tool for methylation analysis, also known as BATMAN, is a statistical tool for analyzing methylated DNA immunoprecipitation profiles...

Relation to decision theory

A decision-theoretic justification of the use of Bayesian inference was given by Abraham Wald

Abraham Wald

- See also :* Sequential probability ratio test * Wald distribution* Wald–Wolfowitz runs test...

, who proved that every Bayesian procedure is admissible

Admissible decision rule

In statistical decision theory, an admissible decision rule is a rule for making a decision such that there isn't any other rule that is always "better" than it, in a specific sense defined below....

.
Conversely, every admissible

Admissible decision rule

In statistical decision theory, an admissible decision rule is a rule for making a decision such that there isn't any other rule that is always "better" than it, in a specific sense defined below....

statistical procedure is either a Bayesian procedure or a limit of Bayesian procedures.

Wald's result also established the Bayesian approach as a fundamental technique in such areas of frequentist inference

Frequentist inference

as point estimation

Point estimation

In statistics, point estimation involves the use of sample data to calculate a single value which is to serve as a "best guess" or "best estimate" of an unknown population parameter....

, hypothesis testing, and confidence intervals. Wald characterized admissible procedures as Bayesian procedures (and limits of Bayesian procedures), making the Bayesian formalism a central technique in such areas of frequentist statistics as parameter estimation, hypothesis testing, and computing confidence intervals. For example:

"Under some conditions, all admissible procedures are either Bayes procedures or limits of Bayes procedures (in various senses). These remarkable results, at least in their original form, are due essentially to Wald. They are useful because the property of being Bayes is easier to analyze than admissibility."

"In decision theory, a quite general method for proving admissibility consists in exhibiting a procedure as a unique Bayes solution."

"In the first chapters of this work, prior distributions with finite support and the corresponding Bayes procedures were used to establish some of the main theorems relating to the comparison of experiments. Bayes procedures with respect to more general prior distributions have played a very important in the development of statistics, including its asymptotic theory." "There are many problems where a glance at posterior distributions, for suitable priors, yields immediately interesting information. Also, this technique can hardly be avoided in sequential analysis."

"A useful fact is that any Bayes decision rule obtained by taking a proper prior over the whole parameter space must be admissible"
"An important area of investigation in the development of admissibility ideas has been that of conventional sampling-theory procedures, and many interesting results have been obtained."

Distribution of a parameter of the hypergeometric distribution

Consider a sample of

marble

Marble (toy)

A marble is a small spherical toy usually made from glass, clay, steel, or agate. These balls vary in size. Most commonly, they are about ½ inch in diameter, but they may range from less than ¼ inch to over 3 inches , while some art glass marbles fordisplay purposes are over 12 inches ...

s drawn from an urn

Urn problem

In probability and statistics, an urn problem is an idealized mental exercise in which some objects of real interest are represented as colored balls in an urn or other container....

containing

marbles.

If the number of white marbles in the urn is known to be equal to

, then the probability that the number of white marbles in the sample is equal to

, is

.
The mean number of white marbles in the sample is

and the standard deviation is

An interesting situation is when the number of white marbles in the sample is known, and the number of white marbles in the urn is unknown.

If the number of white marbles in the sample is equal to

, then the degree of confidence that the number of white marbles in the urn is equal to

, is

where

is the probability that the number of white marbles in the urn is equal to

, that is before observing the number of white marbles in the sample, and

is the probability that the number of white marbles in the sample is equal to

, without knowing the number of white marbles in the urn.

Assume now that all the possibilities are considered equally likely in advance,

for

Then the degree of confidence that the number of white marbles in the urn is equal to

, is

The mean number of white marbles in the urn is

and the standard deviation is

These two formulas regarding the number of white marbles in the urn emerge from the simpler formulas regarding the number of white marbles in the sample by the substitution

The limiting cases for

are the binomial distribution and the beta distribution, see below.

Posterior distribution of the binomial parameter

The problem considered by Bayes in Proposition 9 of his essay is the posterior distribution for the parameter of the binomial distribution.

Consider

Bernoulli trial

In the theory of probability and statistics, a Bernoulli trial is an experiment whose outcome is random and can be either of two possible outcomes, "success" and "failure"....

s.

If the success probability is equal to

, then the conditional probability of observing

successes is the (discrete) binomial distribution function

. The mean value of

, and the standard deviation is

. The mean value of

, and the standard deviation is

.

In the more realistic situation when

is known and

is unknown,

is a likelihood function

Likelihood function

In statistics, a likelihood function is a function of the parameters of a statistical model, defined as follows: the likelihood of a set of parameter values given some observed outcomes is equal to the probability of those observed outcomes given those parameter values...

. The posterior probability

Posterior probability

In Bayesian statistics, the posterior probability of a random event or an uncertain proposition is the conditional probability that is assigned after the relevant evidence is taken into account...

distribution function of

, after observing

, is

where a prior probability

Prior probability

distribution function,

, is available to express what was known about

before observing

.

Assume now that the prior distribution is the continuous uniform distribution,

for

.

Then the posterior distribution

is the beta distribution,

. The mean value of

, rather than

, and the standard deviation is

, rather than

.

If the prior distribution is

, then the posterior distribution is

. So the beta distribution is a conjugate prior
Conjugate prior
In Bayesian probability theory, if the posterior distributions p are in the same family as the prior probability distribution p, the prior and posterior are then called conjugate distributions, and the prior is called a conjugate prior for the likelihood...

.

What is "Bayesian" about Proposition 9 is that Bayes presented it as a probability for the parameter

. That is, not only can one compute probabilities for experimental outcomes, but also for the parameter which governs them, and the same algebra is used to make inferences of either kind. Interestingly, Bayes actually states his question in a way that might make the idea of assigning a probability distribution to a parameter palatable to a frequentist. He supposes that a billiard ball is thrown at random onto a billiard table, and that the probabilities p and q are the probabilities that subsequent billiard balls will fall above or below the first ball. By making the binomial parameter

depend on a random event, he cleverly escapes a philosophical quagmire that was an issue he most likely was not even aware of.

History

The term Bayesian refers to Thomas Bayes

Thomas Bayes

Thomas Bayes was an English mathematician and Presbyterian minister, known for having formulated a specific case of the theorem that bears his name: Bayes' theorem...

(1702–1761), who proved a special case of what is now called Bayes' theorem

Bayes' theorem

In probability theory and applications, Bayes' theorem relates the conditional probabilities P and P. It is commonly used in science and engineering. The theorem is named for Thomas Bayes ....

. However, it was Pierre-Simon Laplace

Pierre-Simon Laplace

Pierre-Simon, marquis de Laplace was a French mathematician and astronomer whose work was pivotal to the development of mathematical astronomy and statistics. He summarized and extended the work of his predecessors in his five volume Mécanique Céleste...

(1749–1827) who introduced a general version of the theorem and used it to approach problems in celestial mechanics

Celestial mechanics

Celestial mechanics is the branch of astronomy that deals with the motions of celestial objects. The field applies principles of physics, historically classical mechanics, to astronomical objects such as stars and planets to produce ephemeris data. Orbital mechanics is a subfield which focuses on...

, medical statistics, reliability

Reliability (statistics)

In statistics, reliability is the consistency of a set of measurements or of a measuring instrument, often used to describe a test. Reliability is inversely related to random error.-Types:There are several general classes of reliability estimates:...

, and jurisprudence

Jurisprudence

Jurisprudence is the theory and philosophy of law. Scholars of jurisprudence, or legal theorists , hope to obtain a deeper understanding of the nature of law, of legal reasoning, legal systems and of legal institutions...

. Early Bayesian inference, which used uniform priors following Laplace's principle of insufficient reason, was called "inverse probability

Inverse probability

In probability theory, inverse probability is an obsolete term for the probability distribution of an unobserved variable.Today, the problem of determining an unobserved variable is called inferential statistics, the method of inverse probability is called Bayesian probability, the "distribution"...

" (because it infer

Inductive reasoning

Inductive reasoning, also known as induction or inductive logic, is a kind of reasoning that constructs or evaluates propositions that are abstractions of observations. It is commonly construed as a form of reasoning that makes generalizations based on individual instances...

s backwards from observations to parameters, or from effects to causes). After the 1920s, "inverse probability" was largely supplanted by a collection of methods that came to be called frequentist statistics.

In the 20th century, the ideas of Laplace were further developed in two different directions, giving rise to objective and subjective currents in Bayesian practice. In the objectivist stream, the statistical analysis depends on only the model assumed and the data analysed. No subjective decisions need to be involved. In contrast, "subjectivist" statisticians deny the possibility of fully objective analysis for the general case.

In the 1980s, there was a dramatic growth in research and applications of Bayesian methods, mostly attributed to the discovery of Markov chain Monte Carlo

Markov chain Monte Carlo

Markov chain Monte Carlo methods are a class of algorithms for sampling from probability distributions based on constructing a Markov chain that has the desired distribution as its equilibrium distribution. The state of the chain after a large number of steps is then used as a sample of the...

methods, which removed many of the computational problems, and an increasing interest in nonstandard, complex applications. Despite growth of Bayesian research, most undergraduate teaching is still based on frequentist statistics. Nonetheless, Bayesian methods are widely accepted and used, such as for example in the field of machine learning

Machine learning

Machine learning, a branch of artificial intelligence, is a scientific discipline concerned with the design and development of algorithms that allow computers to evolve behaviors based on empirical data, such as from sensor data or databases...

External links

Bayesian Statistics summary from Scholarpedia.
A nice on-line introductory tutorial to Bayesian probability from Queen Mary University of London
An Intuitive Explanation of Bayesian Reasoning "Bayes' Theorem for the curious and bewildered; an excruciatingly gentle introduction by Eliezer Yudkowsky
Eliezer Yudkowsky
Eliezer Shlomo Yudkowsky is an American artificial intelligence researcher concerned with the singularity and an advocate of friendly artificial intelligence, living in Redwood City, California.- Biography :...

"
Paul Graham. "A Plan for Spam" (exposition of a popular approach for spam classification)
Commentary on Regina versus Adams
Mathematical notes on Bayesian statistics and Markov chain Monte Carlo
Bayesian Rating/Ranking How to implement Bayes' Theorem for online rating and ranking systems
Bayesian reading list, categorized and annotated. Designed for cognitive science; maintained by Tom Griffiths.
Stanford Encyclopedia of Philosophy: Inductive Logic a comprehensive Bayesian treatment of Inductive Logic and Confirmation Theory
Bayesian Confirmation Theory An extensive presentation of Bayesian Confirmation Theory
What is Bayesian Learning?

The source of this article is wikipedia, the free encyclopedia. The text of this article is licensed under the GFDL.

General view

Parametric view

Interpretation of factor

Cromwell's rule

Asymptotic behaviour of posterior

Conjugate priors

Estimates of parameters and predictions

Testing a hypothesis

Making a prediction

Computer applications

In the courtroom

Other

Relation to decision theory

Distribution of a parameter of the hypergeometric distribution

Posterior distribution of the binomial parameter

History

See also

Elementary

Intermediate or Advanced

External links