Bayesian probability
Encyclopedia
Bayesian probability is one of the different interpretations
of the concept of probability
and belongs to the category of evidential probabilities. The Bayesian interpretation of probability can be seen as an extension of logic that enables reasoning with propositions, whose truth or falsity is uncertain. To evaluate the probability of a hypothesis
, the Bayesian probabilist specifies some prior probability, which is then updated in the light of new, relevant data
.
The Bayesian interpretation provides a standard set of procedures and formulae to perform this calculation. Bayesian probability interprets the concept of probability
as "a degree of plausibility of a proposition (belief in a proposition) based on the given state of knowledge," in contrast to interpreting it as a frequency or a "propensity" of some phenomenon
.
The term "Bayesian" refers to the 18th century mathematician and theologian Thomas Bayes
(1702–1761), who provided the first mathematical treatment of a non-trivial problem of Bayesian inference
. Nevertheless, it was the French mathematician Pierre-Simon Laplace
(1749–1827) who pioneered and popularised what is now called Bayesian probability.
Broadly speaking, there are two views on Bayesian probability that interpret the probability concept in different ways. According to the objectivist view, the rules of Bayesian statistics can be justified by requirements of rationality and consistency
and interpreted as an extension of logic
. According to the subjectivist view, probability measures a "personal belief". Many modern machine learning
methods are based on objectivist Bayesian principles. In the Bayesian view, a probability is assigned to a hypothesis, whereas under the frequentist view, a hypothesis is typically tested without being assigned a probability.
. Requirements of rationality and consistency are also important for subjectivists, for which the probability corresponds to a 'personal belief'. For subjectivists however, rationality and consistency constrain the probabilities a subject may have, but allow for substantial variation within those constraints. The objective and subjective variants of Bayesian probability differ mainly in their interpretation and construction of the prior probability.
(1702–1761), who proved a special case of what is now called Bayes' theorem
in a paper titled "An Essay towards solving a Problem in the Doctrine of Chances
". In that special case, the prior and posterior distributions were Beta distributions and the data came from Bernoulli trial
s. It was Pierre-Simon Laplace
(1749–1827) who introduced a general version of the theorem and used it to approach problems in celestial mechanics
, medical statistics, reliability
, and jurisprudence
. Early Bayesian inference, which used uniform priors following Laplace's principle of insufficient reason, was called "inverse probability
" (because it infer
s backwards from observations to parameters, or from effects to causes). After the 1920s, "inverse probability" was largely supplanted by a collection of methods that came to be called frequentist statistics.
In the 20th century, the ideas of Laplace were further developed in two different directions, giving rise to objective and subjective currents in Bayesian practice. In the objectivist stream, the statistical analysis depends on only the model assumed and the data analysed. No subjective decisions need to be involved. In contrast, "subjectivist" statisticians deny the possibility of fully objective analysis for the general case.
In the 1980s, there was a dramatic growth in research and applications of Bayesian methods, mostly attributed to the discovery of Markov chain Monte Carlo
methods, which removed many of the computational problems, and an increasing interest in nonstandard, complex applications. Despite the growth of Bayesian research, most undergraduate teaching is still based on frequentist statistics. Nonetheless, Bayesian methods are widely accepted and used, such as in the fields of machine learning
and talent analytics.
has been supported by several arguments, such as the Cox axioms
, the Dutch book argument
, arguments based on decision theory
and de Finetti's theorem
.
showed that Bayesian updating follows from several axioms, including two functional equations and controversial hypothesis of differentiability. It is known that Cox's 1961 development (mainly copied by Jaynes) is non-rigorous, and in fact a counterexample has been found by Halpern. The assumption of differentiability or even continuity is questionable since the Boolean algebra of statements may only be finite. Other axiomatizations have been suggested by various authors to make the theory more rigorous.
is made when a clever gambler places a set of bets that guarantee a profit, no matter what the outcome is of the bets. If a bookmaker
follows the rules of the Bayesian calculus in the construction of his odds, a Dutch book cannot be made.
However, Ian Hacking
noted that traditional Dutch book arguments did not specify Bayesian updating: they left open the possibility that non-Bayesian updating rules could avoid Dutch books. For example, Hacking
writes "And neither the Dutch book argument, nor any other in the personalist arsenal of proofs of the probability axioms, entails the dynamic assumption. Not one entails Bayesianism. So the personalist requires the dynamic assumption to be Bayesian. It is true that in consistency a personalist could abandon the Bayesian model of learning from experience. Salt could lose its savour
."
In fact, there are non-Bayesian updating rules that also avoid Dutch books (as discussed in the literature on "probability kinematics" following the publication of Richard C. Jeffrey's rule). The additional hypotheses sufficient to (uniquely) specify Bayesian updating are substantial, complicated, and unsatisfactory.
, who proved that every admissible
statistical procedure is either a Bayesian procedure or a limit of Bayesian procedures. Conversely, every Bayesian procedure is admissible
.
of Ramsey
and von Neumann
, decision-theorists have accounted for rational behavior
using a probability distribution for the agent. Johann Pfanzagl completed the Theory of Games and Economic Behavior
by providing an axiomatization of subjective probability and utility, a task left uncompleted by von Neumann and Oskar Morgenstern
: their original theory supposed that all the agents had the same probability distribution, as a convenience. Pfanzagl's axiomatization was endorsed by Oskar Morgenstern
: "Von Neumann and I have anticipated" the question whether probabilities "might, perhaps more typically, be subjective and have stated specifically that in the latter case axioms could be found from which could derive the desired numerical utility together with a number for the probabilities (cf. p. 19 of The Theory of Games and Economic Behavior
). We did not carry this out; it was demonstrated by Pfanzagl ... with all the necessary rigor".
Ramsey and Savage
noted that the individual agent's probability distribution could be objectively studied in experiments. The role of judgment and disagreement in science has been recognized since Aristotle
and even more clearly with Francis Bacon
. The objectivity of science lies not in the psychology of individual scientists, but in the process of science and especially in statistical methods, as noted by C. S. Peirce. Recall that the objective methods for falsifying propositions about personal probabilities have been used for a half century, as noted previously. Procedures for testing hypotheses
about probabilities (using finite samples) are due to Ramsey
(1931) and de Finetti
(1931, 1937, 1964, 1970). Both Bruno de Finetti
and Frank P. Ramsey
acknowledge their debts to pragmatic philosophy, particularly (for Ramsey) to Charles S. Peirce.
The "Ramsey test" for evaluating probability distributions is implementable in theory, and has kept experimental psychologists occupied for a half century.
This work demonstrates that Bayesian-probability propositions can be falsified
, and so meet an empirical criterion of Charles S. Peirce, whose work inspired Ramsey. (This falsifiability
-criterion was popularized by Karl Popper
.)
Modern work on the experimental evaluation of personal probabilities uses the randomization, blinding, and Boolean-decision procedures of the Peirce-Jastrow experiment. Since individuals act according to different probability judgements, these agents' probabilities are "personal" (but amenable to objective study).
Personal probabilities are problematic for science and for some applications where decision-makers lack the knowledge or time to specify an informed probability-distribution (on which they are prepared to act). To meet the needs of science and of human limitations, Bayesian statisticians have developed "objective" methods for specifying prior probabilities.
Indeed, some Bayesians have argued the prior state of knowledge defines the (unique) prior probability-distribution for "regular" statistical problems; cf. well-posed problem
s. Finding the right method for constructing such "objective" priors (for appropriate classes of regular problems) has been the quest of statistical theorists from Laplace to John Maynard Keynes
, Harold Jeffreys
, and Edwin Thompson Jaynes
: These theorists and their successors have suggested several methods for constructing "objective" priors:
Each of these methods contributes useful priors for "regular" one-parameter problems, and each prior can handle some challenging statistical model
s (with "irregularity" or several parameters). Each of these methods has been useful in Bayesian practice. Indeed, methods for constructing "objective" (alternatively, "default" or "ignorance") priors have been developed by avowed subjective (or "personal") Bayesians like James Berger (Duke University
) and José-Miguel Bernardo
(Universitat de València), simply because such priors are needed for Bayesian practice, particularly in science. The quest for "the universal method for constructing priors" continues to attract statistical theorists.
Thus, the Bayesian statistican needs either to use informed priors (using relevant expertise or previous data) or to choose among the competing methods for constructing "objective" priors.
Probability interpretations
The word probability has been used in a variety of ways since it was first coined in relation to games of chance. Does probability measure the real, physical tendency of something to occur, or is it just a measure of how strongly one believes it will occur? In answering such questions, we...
of the concept of probability
Probability
Probability is ordinarily used to describe an attitude of mind towards some proposition of whose truth we arenot certain. The proposition of interest is usually of the form "Will a specific event occur?" The attitude of mind is of the form "How certain are we that the event will occur?" The...
and belongs to the category of evidential probabilities. The Bayesian interpretation of probability can be seen as an extension of logic that enables reasoning with propositions, whose truth or falsity is uncertain. To evaluate the probability of a hypothesis
Hypothesis
A hypothesis is a proposed explanation for a phenomenon. The term derives from the Greek, ὑποτιθέναι – hypotithenai meaning "to put under" or "to suppose". For a hypothesis to be put forward as a scientific hypothesis, the scientific method requires that one can test it...
, the Bayesian probabilist specifies some prior probability, which is then updated in the light of new, relevant data
Data
The term data refers to qualitative or quantitative attributes of a variable or set of variables. Data are typically the results of measurements and can be the basis of graphs, images, or observations of a set of variables. Data are often viewed as the lowest level of abstraction from which...
.
The Bayesian interpretation provides a standard set of procedures and formulae to perform this calculation. Bayesian probability interprets the concept of probability
Probability
Probability is ordinarily used to describe an attitude of mind towards some proposition of whose truth we arenot certain. The proposition of interest is usually of the form "Will a specific event occur?" The attitude of mind is of the form "How certain are we that the event will occur?" The...
as "a degree of plausibility of a proposition (belief in a proposition) based on the given state of knowledge," in contrast to interpreting it as a frequency or a "propensity" of some phenomenon
Propensity probability
The propensity theory of probability is one interpretation of the concept of probability. Theorists who adopt this interpretation think of probability as a physical propensity, or disposition, or tendency of a given type of physical situation to yield an outcome of a certain kind, or to yield a...
.
The term "Bayesian" refers to the 18th century mathematician and theologian Thomas Bayes
Thomas Bayes
Thomas Bayes was an English mathematician and Presbyterian minister, known for having formulated a specific case of the theorem that bears his name: Bayes' theorem...
(1702–1761), who provided the first mathematical treatment of a non-trivial problem of Bayesian inference
Bayesian inference
In statistics, Bayesian inference is a method of statistical inference. It is often used in science and engineering to determine model parameters, make predictions about unknown variables, and to perform model selection...
. Nevertheless, it was the French mathematician Pierre-Simon Laplace
Pierre-Simon Laplace
Pierre-Simon, marquis de Laplace was a French mathematician and astronomer whose work was pivotal to the development of mathematical astronomy and statistics. He summarized and extended the work of his predecessors in his five volume Mécanique Céleste...
(1749–1827) who pioneered and popularised what is now called Bayesian probability.
Broadly speaking, there are two views on Bayesian probability that interpret the probability concept in different ways. According to the objectivist view, the rules of Bayesian statistics can be justified by requirements of rationality and consistency
Cox's theorem
Cox's theorem, named after the physicist Richard Threlkeld Cox, is a derivation of the laws of probability theory from a certain set of postulates. This derivation justifies the so-called "logical" interpretation of probability. As the laws of probability derived by Cox's theorem are applicable to...
and interpreted as an extension of logic
Logic
In philosophy, Logic is the formal systematic study of the principles of valid inference and correct reasoning. Logic is used in most intellectual activities, but is studied primarily in the disciplines of philosophy, mathematics, semantics, and computer science...
. According to the subjectivist view, probability measures a "personal belief". Many modern machine learning
Machine learning
Machine learning, a branch of artificial intelligence, is a scientific discipline concerned with the design and development of algorithms that allow computers to evolve behaviors based on empirical data, such as from sensor data or databases...
methods are based on objectivist Bayesian principles. In the Bayesian view, a probability is assigned to a hypothesis, whereas under the frequentist view, a hypothesis is typically tested without being assigned a probability.
Bayesian methodology
In general, Bayesian methods are characterized by the following concepts and procedures:- The use of hierarchical modelsHierarchical Bayes modelThe hierarchical Bayes model is a method in modern Bayesian statistical inference. It is a framework for describing statistical models that can capture dependencies more realistically than non-hierarchical models....
and marginalization over the values of nuisance parameters. In most cases, the computation is intractable, but good approximations can be obtained using Markov chain Monte CarloMarkov chain Monte CarloMarkov chain Monte Carlo methods are a class of algorithms for sampling from probability distributions based on constructing a Markov chain that has the desired distribution as its equilibrium distribution. The state of the chain after a large number of steps is then used as a sample of the...
methods. - The sequential use of the Bayes' formula: when more data become available after calculating a posterior distribution, the posterior becomes the next prior.
- In frequentist statistics, a hypothesisStatistical hypothesis testingA statistical hypothesis test is a method of making decisions using data, whether from a controlled experiment or an observational study . In statistics, a result is called statistically significant if it is unlikely to have occurred by chance alone, according to a pre-determined threshold...
is a proposition (which must be either true or falsePrinciple of bivalenceIn logic, the semantic principle of bivalence states that every declarative sentence expressing a proposition has exactly one truth value, either true or false...
), so that the (frequentist) probability of a frequentist hypothesis is either one or zero. In Bayesian statistics, a probability can be assigned to a hypothesis.
Objective and subjective Bayesian probabilities
Broadly speaking, there are two views on Bayesian probability that interpret the 'probability' concept in different ways. For objectivists, probability objectively measures the plausibility of propositions, i.e. the probability of a proposition corresponds to a reasonable belief everyone (even a "robot") sharing the same knowledge should share in accordance with the rules of Bayesian statistics, which can be justified by requirements of rationality and consistencyCox's theorem
Cox's theorem, named after the physicist Richard Threlkeld Cox, is a derivation of the laws of probability theory from a certain set of postulates. This derivation justifies the so-called "logical" interpretation of probability. As the laws of probability derived by Cox's theorem are applicable to...
. Requirements of rationality and consistency are also important for subjectivists, for which the probability corresponds to a 'personal belief'. For subjectivists however, rationality and consistency constrain the probabilities a subject may have, but allow for substantial variation within those constraints. The objective and subjective variants of Bayesian probability differ mainly in their interpretation and construction of the prior probability.
History
The term Bayesian refers to Thomas BayesThomas Bayes
Thomas Bayes was an English mathematician and Presbyterian minister, known for having formulated a specific case of the theorem that bears his name: Bayes' theorem...
(1702–1761), who proved a special case of what is now called Bayes' theorem
Bayes' theorem
In probability theory and applications, Bayes' theorem relates the conditional probabilities P and P. It is commonly used in science and engineering. The theorem is named for Thomas Bayes ....
in a paper titled "An Essay towards solving a Problem in the Doctrine of Chances
An Essay towards solving a Problem in the Doctrine of Chances
An Essay towards solving a Problem in the Doctrine of Chances is a work on the mathematical theory of probability by the Reverend Thomas Bayes, published in 1763, two years after its author's death. It included a statement of a special case of what is now called Bayes' theorem. In 18th-century...
". In that special case, the prior and posterior distributions were Beta distributions and the data came from Bernoulli trial
Bernoulli trial
In the theory of probability and statistics, a Bernoulli trial is an experiment whose outcome is random and can be either of two possible outcomes, "success" and "failure"....
s. It was Pierre-Simon Laplace
Pierre-Simon Laplace
Pierre-Simon, marquis de Laplace was a French mathematician and astronomer whose work was pivotal to the development of mathematical astronomy and statistics. He summarized and extended the work of his predecessors in his five volume Mécanique Céleste...
(1749–1827) who introduced a general version of the theorem and used it to approach problems in celestial mechanics
Celestial mechanics
Celestial mechanics is the branch of astronomy that deals with the motions of celestial objects. The field applies principles of physics, historically classical mechanics, to astronomical objects such as stars and planets to produce ephemeris data. Orbital mechanics is a subfield which focuses on...
, medical statistics, reliability
Reliability (statistics)
In statistics, reliability is the consistency of a set of measurements or of a measuring instrument, often used to describe a test. Reliability is inversely related to random error.-Types:There are several general classes of reliability estimates:...
, and jurisprudence
Jurisprudence
Jurisprudence is the theory and philosophy of law. Scholars of jurisprudence, or legal theorists , hope to obtain a deeper understanding of the nature of law, of legal reasoning, legal systems and of legal institutions...
. Early Bayesian inference, which used uniform priors following Laplace's principle of insufficient reason, was called "inverse probability
Inverse probability
In probability theory, inverse probability is an obsolete term for the probability distribution of an unobserved variable.Today, the problem of determining an unobserved variable is called inferential statistics, the method of inverse probability is called Bayesian probability, the "distribution"...
" (because it infer
Inductive reasoning
Inductive reasoning, also known as induction or inductive logic, is a kind of reasoning that constructs or evaluates propositions that are abstractions of observations. It is commonly construed as a form of reasoning that makes generalizations based on individual instances...
s backwards from observations to parameters, or from effects to causes). After the 1920s, "inverse probability" was largely supplanted by a collection of methods that came to be called frequentist statistics.
In the 20th century, the ideas of Laplace were further developed in two different directions, giving rise to objective and subjective currents in Bayesian practice. In the objectivist stream, the statistical analysis depends on only the model assumed and the data analysed. No subjective decisions need to be involved. In contrast, "subjectivist" statisticians deny the possibility of fully objective analysis for the general case.
In the 1980s, there was a dramatic growth in research and applications of Bayesian methods, mostly attributed to the discovery of Markov chain Monte Carlo
Markov chain Monte Carlo
Markov chain Monte Carlo methods are a class of algorithms for sampling from probability distributions based on constructing a Markov chain that has the desired distribution as its equilibrium distribution. The state of the chain after a large number of steps is then used as a sample of the...
methods, which removed many of the computational problems, and an increasing interest in nonstandard, complex applications. Despite the growth of Bayesian research, most undergraduate teaching is still based on frequentist statistics. Nonetheless, Bayesian methods are widely accepted and used, such as in the fields of machine learning
Machine learning
Machine learning, a branch of artificial intelligence, is a scientific discipline concerned with the design and development of algorithms that allow computers to evolve behaviors based on empirical data, such as from sensor data or databases...
and talent analytics.
Justification of Bayesian probabilities
The use of Bayesian probabilities as the basis of Bayesian inferenceBayesian inference
In statistics, Bayesian inference is a method of statistical inference. It is often used in science and engineering to determine model parameters, make predictions about unknown variables, and to perform model selection...
has been supported by several arguments, such as the Cox axioms
Cox's theorem
Cox's theorem, named after the physicist Richard Threlkeld Cox, is a derivation of the laws of probability theory from a certain set of postulates. This derivation justifies the so-called "logical" interpretation of probability. As the laws of probability derived by Cox's theorem are applicable to...
, the Dutch book argument
Dutch book
In gambling a Dutch book or lock is a set of odds and bets which guarantees a profit, regardless of the outcome of the gamble. It is associated with probabilities implied by the odds not being coherent....
, arguments based on decision theory
Decision theory
Decision theory in economics, psychology, philosophy, mathematics, and statistics is concerned with identifying the values, uncertainties and other issues relevant in a given decision, its rationality, and the resulting optimal decision...
and de Finetti's theorem
De Finetti's theorem
In probability theory, de Finetti's theorem explains why exchangeable observations are conditionally independent given some latent variable to which an epistemic probability distribution would then be assigned...
.
Axiomatic approach
Richard T. CoxRichard Threlkeld Cox
Richard Threlkeld Cox was a professor of physics at Johns Hopkins University, known for Cox's theorem relating to the foundations of probability....
showed that Bayesian updating follows from several axioms, including two functional equations and controversial hypothesis of differentiability. It is known that Cox's 1961 development (mainly copied by Jaynes) is non-rigorous, and in fact a counterexample has been found by Halpern. The assumption of differentiability or even continuity is questionable since the Boolean algebra of statements may only be finite. Other axiomatizations have been suggested by various authors to make the theory more rigorous.
Dutch book approach
The Dutch book argument was proposed by de Finetti, and is based on betting. A Dutch bookDutch book
In gambling a Dutch book or lock is a set of odds and bets which guarantees a profit, regardless of the outcome of the gamble. It is associated with probabilities implied by the odds not being coherent....
is made when a clever gambler places a set of bets that guarantee a profit, no matter what the outcome is of the bets. If a bookmaker
Bookmaker
A bookmaker, or bookie, is an organization or a person that takes bets on sporting and other events at agreed upon odds.- Range of events :...
follows the rules of the Bayesian calculus in the construction of his odds, a Dutch book cannot be made.
However, Ian Hacking
Ian Hacking
Ian Hacking, CC, FRSC, FBA is a Canadian philosopher, specializing in the philosophy of science.- Life and works :...
noted that traditional Dutch book arguments did not specify Bayesian updating: they left open the possibility that non-Bayesian updating rules could avoid Dutch books. For example, Hacking
Ian Hacking
Ian Hacking, CC, FRSC, FBA is a Canadian philosopher, specializing in the philosophy of science.- Life and works :...
writes "And neither the Dutch book argument, nor any other in the personalist arsenal of proofs of the probability axioms, entails the dynamic assumption. Not one entails Bayesianism. So the personalist requires the dynamic assumption to be Bayesian. It is true that in consistency a personalist could abandon the Bayesian model of learning from experience. Salt could lose its savour
Salt and Light
Salt and light are metaphors used by Jesus in the Sermon on the Mount, one of the main teachings of Jesus on morality and discipleship. These metaphors in Matthew 5:13-16 immediately follow the Beatitudes and refer to expectations from the disciples....
."
In fact, there are non-Bayesian updating rules that also avoid Dutch books (as discussed in the literature on "probability kinematics" following the publication of Richard C. Jeffrey's rule). The additional hypotheses sufficient to (uniquely) specify Bayesian updating are substantial, complicated, and unsatisfactory.
Decision theory approach
A decision-theoretic justification of the use of Bayesian inference (and hence of Bayesian probabilities) was given by Abraham WaldAbraham Wald
- See also :* Sequential probability ratio test * Wald distribution* Wald–Wolfowitz runs test...
, who proved that every admissible
Admissible decision rule
In statistical decision theory, an admissible decision rule is a rule for making a decision such that there isn't any other rule that is always "better" than it, in a specific sense defined below....
statistical procedure is either a Bayesian procedure or a limit of Bayesian procedures. Conversely, every Bayesian procedure is admissible
Admissible decision rule
In statistical decision theory, an admissible decision rule is a rule for making a decision such that there isn't any other rule that is always "better" than it, in a specific sense defined below....
.
Personal probabilities and objective methods for constructing priors
Following the work on expected utility theoryOptimal decision
An optimal decision is a decision such that no other available decision options will lead to a better outcome. It is an important concept in decision theory. In order to compare the different decision outcomes, one commonly assigns a relative utility to each of them...
of Ramsey
Frank P. Ramsey
Frank Plumpton Ramsey was a British mathematician who, in addition to mathematics, made significant and precocious contributions in philosophy and economics before his death at the age of 26...
and von Neumann
John von Neumann
John von Neumann was a Hungarian-American mathematician and polymath who made major contributions to a vast number of fields, including set theory, functional analysis, quantum mechanics, ergodic theory, geometry, fluid dynamics, economics and game theory, computer science, numerical analysis,...
, decision-theorists have accounted for rational behavior
Optimal decision
An optimal decision is a decision such that no other available decision options will lead to a better outcome. It is an important concept in decision theory. In order to compare the different decision outcomes, one commonly assigns a relative utility to each of them...
using a probability distribution for the agent. Johann Pfanzagl completed the Theory of Games and Economic Behavior
Theory of Games and Economic Behavior
Theory of Games and Economic Behavior, published in 1944 by Princeton University Press, is a book by mathematician John von Neumann and economist Oskar Morgenstern which is considered the groundbreaking text that created the interdisciplinary research field of game theory...
by providing an axiomatization of subjective probability and utility, a task left uncompleted by von Neumann and Oskar Morgenstern
Oskar Morgenstern
Oskar Morgenstern was a German-born Austrian-School economist. He, along with John von Neumann, helped found the mathematical field of game theory ....
: their original theory supposed that all the agents had the same probability distribution, as a convenience. Pfanzagl's axiomatization was endorsed by Oskar Morgenstern
Oskar Morgenstern
Oskar Morgenstern was a German-born Austrian-School economist. He, along with John von Neumann, helped found the mathematical field of game theory ....
: "Von Neumann and I have anticipated" the question whether probabilities "might, perhaps more typically, be subjective and have stated specifically that in the latter case axioms could be found from which could derive the desired numerical utility together with a number for the probabilities (cf. p. 19 of The Theory of Games and Economic Behavior
Theory of Games and Economic Behavior
Theory of Games and Economic Behavior, published in 1944 by Princeton University Press, is a book by mathematician John von Neumann and economist Oskar Morgenstern which is considered the groundbreaking text that created the interdisciplinary research field of game theory...
). We did not carry this out; it was demonstrated by Pfanzagl ... with all the necessary rigor".
Ramsey and Savage
Leonard Jimmie Savage
Leonard Jimmie Savage was an American mathematician and statistician. Nobel Prize-winning economist Milton Friedman said Savage was "one of the few people I have met whom I would unhesitatingly call a genius."...
noted that the individual agent's probability distribution could be objectively studied in experiments. The role of judgment and disagreement in science has been recognized since Aristotle
Aristotle
Aristotle was a Greek philosopher and polymath, a student of Plato and teacher of Alexander the Great. His writings cover many subjects, including physics, metaphysics, poetry, theater, music, logic, rhetoric, linguistics, politics, government, ethics, biology, and zoology...
and even more clearly with Francis Bacon
Francis Bacon
Francis Bacon, 1st Viscount St Albans, KC was an English philosopher, statesman, scientist, lawyer, jurist, author and pioneer of the scientific method. He served both as Attorney General and Lord Chancellor of England...
. The objectivity of science lies not in the psychology of individual scientists, but in the process of science and especially in statistical methods, as noted by C. S. Peirce. Recall that the objective methods for falsifying propositions about personal probabilities have been used for a half century, as noted previously. Procedures for testing hypotheses
Statistical hypothesis testing
A statistical hypothesis test is a method of making decisions using data, whether from a controlled experiment or an observational study . In statistics, a result is called statistically significant if it is unlikely to have occurred by chance alone, according to a pre-determined threshold...
about probabilities (using finite samples) are due to Ramsey
Frank P. Ramsey
Frank Plumpton Ramsey was a British mathematician who, in addition to mathematics, made significant and precocious contributions in philosophy and economics before his death at the age of 26...
(1931) and de Finetti
Bruno de Finetti
Bruno de Finetti was an Italian probabilist, statistician and actuary, noted for the "operational subjective" conception of probability...
(1931, 1937, 1964, 1970). Both Bruno de Finetti
Bruno de Finetti
Bruno de Finetti was an Italian probabilist, statistician and actuary, noted for the "operational subjective" conception of probability...
and Frank P. Ramsey
Frank P. Ramsey
Frank Plumpton Ramsey was a British mathematician who, in addition to mathematics, made significant and precocious contributions in philosophy and economics before his death at the age of 26...
acknowledge their debts to pragmatic philosophy, particularly (for Ramsey) to Charles S. Peirce.
The "Ramsey test" for evaluating probability distributions is implementable in theory, and has kept experimental psychologists occupied for a half century.
This work demonstrates that Bayesian-probability propositions can be falsified
Falsifiability
Falsifiability or refutability of an assertion, hypothesis or theory is the logical possibility that it can be contradicted by an observation or the outcome of a physical experiment...
, and so meet an empirical criterion of Charles S. Peirce, whose work inspired Ramsey. (This falsifiability
Falsifiability
Falsifiability or refutability of an assertion, hypothesis or theory is the logical possibility that it can be contradicted by an observation or the outcome of a physical experiment...
-criterion was popularized by Karl Popper
Karl Popper
Sir Karl Raimund Popper, CH FRS FBA was an Austro-British philosopher and a professor at the London School of Economics...
.)
Modern work on the experimental evaluation of personal probabilities uses the randomization, blinding, and Boolean-decision procedures of the Peirce-Jastrow experiment. Since individuals act according to different probability judgements, these agents' probabilities are "personal" (but amenable to objective study).
Personal probabilities are problematic for science and for some applications where decision-makers lack the knowledge or time to specify an informed probability-distribution (on which they are prepared to act). To meet the needs of science and of human limitations, Bayesian statisticians have developed "objective" methods for specifying prior probabilities.
Indeed, some Bayesians have argued the prior state of knowledge defines the (unique) prior probability-distribution for "regular" statistical problems; cf. well-posed problem
Well-posed problem
The mathematical term well-posed problem stems from a definition given by Jacques Hadamard. He believed that mathematical models of physical phenomena should have the properties that# A solution exists# The solution is unique...
s. Finding the right method for constructing such "objective" priors (for appropriate classes of regular problems) has been the quest of statistical theorists from Laplace to John Maynard Keynes
John Maynard Keynes
John Maynard Keynes, Baron Keynes of Tilton, CB FBA , was a British economist whose ideas have profoundly affected the theory and practice of modern macroeconomics, as well as the economic policies of governments...
, Harold Jeffreys
Harold Jeffreys
Sir Harold Jeffreys, FRS was a mathematician, statistician, geophysicist, and astronomer. His seminal book Theory of Probability, which first appeared in 1939, played an important role in the revival of the Bayesian view of probability.-Biography:Jeffreys was born in Fatfield, Washington, County...
, and Edwin Thompson Jaynes
Edwin Thompson Jaynes
Edwin Thompson Jaynes was Wayman Crow Distinguished Professor of Physics at Washington University in St. Louis...
: These theorists and their successors have suggested several methods for constructing "objective" priors:
- Maximum entropy
- Transformation group analysisHaar measureIn mathematical analysis, the Haar measure is a way to assign an "invariant volume" to subsets of locally compact topological groups and subsequently define an integral for functions on those groups....
- Reference analysisJosé-Miguel BernardoJosé-Miguel Bernardo is a Spanish mathematician and statistician. A noted Bayesian, he is currently a professor of Statistics at the University of Valencia.He is a founding co-President of the...
Each of these methods contributes useful priors for "regular" one-parameter problems, and each prior can handle some challenging statistical model
Statistical model
A statistical model is a formalization of relationships between variables in the form of mathematical equations. A statistical model describes how one or more random variables are related to one or more random variables. The model is statistical as the variables are not deterministically but...
s (with "irregularity" or several parameters). Each of these methods has been useful in Bayesian practice. Indeed, methods for constructing "objective" (alternatively, "default" or "ignorance") priors have been developed by avowed subjective (or "personal") Bayesians like James Berger (Duke University
Duke University
Duke University is a private research university located in Durham, North Carolina, United States. Founded by Methodists and Quakers in the present day town of Trinity in 1838, the school moved to Durham in 1892. In 1924, tobacco industrialist James B...
) and José-Miguel Bernardo
José-Miguel Bernardo
José-Miguel Bernardo is a Spanish mathematician and statistician. A noted Bayesian, he is currently a professor of Statistics at the University of Valencia.He is a founding co-President of the...
(Universitat de València), simply because such priors are needed for Bayesian practice, particularly in science. The quest for "the universal method for constructing priors" continues to attract statistical theorists.
Thus, the Bayesian statistican needs either to use informed priors (using relevant expertise or previous data) or to choose among the competing methods for constructing "objective" priors.
See also
- Bertrand's paradoxBertrand's paradox (probability)The Bertrand paradox is a problem within the classical interpretation of probability theory. Joseph Bertrand introduced it in his work Calcul des probabilités as an example to show that probabilities may not be well defined if the mechanism or method that produces the random variable is not...
: a paradox in classical probability, solved by E.T. Jaynes in the context of Bayesian probability - De Finetti's game – a procedure for evaluating someone's subjective probability
- UncertaintyUncertaintyUncertainty is a term used in subtly different ways in a number of fields, including physics, philosophy, statistics, economics, finance, insurance, psychology, sociology, engineering, and information science...
- An Essay towards solving a Problem in the Doctrine of ChancesAn Essay towards solving a Problem in the Doctrine of ChancesAn Essay towards solving a Problem in the Doctrine of Chances is a work on the mathematical theory of probability by the Reverend Thomas Bayes, published in 1763, two years after its author's death. It included a statement of a special case of what is now called Bayes' theorem. In 18th-century...
External links
- On-line textbook: Information Theory, Inference, and Learning Algorithms, by David MacKay, has many chapters on Bayesian methods, including introductory examples; arguments in favour of Bayesian methods (in the style of Edwin JaynesEdwin Thompson JaynesEdwin Thompson Jaynes was Wayman Crow Distinguished Professor of Physics at Washington University in St. Louis...
); state-of-the-art Monte Carlo methodMonte Carlo methodMonte Carlo methods are a class of computational algorithms that rely on repeated random sampling to compute their results. Monte Carlo methods are often used in computer simulations of physical and mathematical systems...
s, message-passing methodMessage-passing methodMessage-passing methods are a set of algorithms in statistics/machine learning for doing inference through local computation. Belief propagation on Bayesian networks is a good example of a message-passing method.* Variational message passing...
s, and variational methodsCalculus of variationsCalculus of variations is a field of mathematics that deals with extremizing functionals, as opposed to ordinary calculus which deals with functions. A functional is usually a mapping from a set of functions to the real numbers. Functionals are often formed as definite integrals involving unknown...
; and examples illustrating the intimate connections between Bayesian inference and data compressionData compressionIn computer science and information theory, data compression, source coding or bit-rate reduction is the process of encoding information using fewer bits than the original representation would use....
. - An Intuitive Explanation of Bayesian Reasoning A very gentle introduction by Eliezer Yudkowsky
- An on-line introductory tutorial to Bayesian probability from Queen Mary University of London
- James Franklin The Science of Conjecture: Evidence and Probability Before Pascal, history from a Bayesian point of view.