Cox's theorem
Encyclopedia
Cox's theorem, named after the physicist Richard Threlkeld Cox
, is a derivation of the laws of probability theory
from a certain set of postulates. This derivation justifies the so-called "logical" interpretation of probability. As the laws of probability derived by Cox's theorem are applicable to any proposition, logical probability is a type of Bayesian probability
. Other forms of Bayesianism, such as the subjective interpretation, are given other justifications.
The postulates as stated here are taken from Arnborg and Sjödin (1999).
"Common sense" includes consistency with Aristotelian logic
when
statements are completely plausible or implausible.
The postulates as originally stated by Cox were not mathematically
rigorous (although better than the informal description above), e.g.,
as noted by Halpern (1999a, 1999b). However it appears to be possible
to augment them with various mathematical assumptions made either
implicitly or explicitly by Cox to produce a valid proof.
Cox's axioms and functional equations are:
Cox's theorem implies that any plausibility model that meets the
postulates is equivalent to the subjective probability model, i.e.,
can be converted to the probability model by rescaling.
It is important to note that the postulates imply only these general properties. These are equivalent to the usual laws of probability assuming some conventions, namely that the scale of measurement is from zero to one, and the plausibility function, conventionally denoted P or Pr, is equal to wm. (We could have equivalently chosen to measure probabilities from one to infinity, with infinity representing certain falsehood.) With these conventions, we obtain the laws of probability in a more familiar form:
Rule 2 is a rule for negation, and rule 3 is a rule for conjunction. Given that any proposition containing conjunction, disjunction, and negation can be equivalently rephrased using conjunction and negation alone (the conjunctive normal form
), we can now handle any compound proposition.
The laws thus derived yield finite additivity
of probability, but not countable additivity
. The measure-theoretic formulation
of Kolmogorov assumes that a probability measure is countably additive. This slightly stronger condition is necessary for the proof of certain theorems.
s for the
use of Bayesian probability theory. For example, in Jaynes (2003) it is
discussed in detail in chapters 1 and 2 and is a cornerstone for the
rest of the book. Probability is interpreted as a formal system
of
logic
, the natural extension of Aristotelian logic (in which every
statement is either true or false) into the realm of reasoning in the
presence of uncertainty.
It has been debated to what degree the theorem excludes alternative
models for reasoning about uncertainty
. For example, if certain
"unintuitive" mathematical assumptions were dropped then alternatives
could be devised, e.g., an example provided by Halpern (1999a).
However Arnborg and Sjödin (1999, 2000a, 2000b) suggest additional
"common sense" postulates, which would allow the assumptions to be
relaxed in some cases while still ruling out the Halpern example.
The original formulation of Cox's theorem is in Cox (1946), which is extended with additional results and more discussion in Cox (1961). Jaynes (2003) cites Abel (1826) for the first known use of the associativity functional equation. Aczél (1966) provides a long proof of the "associativity equation" (pages 256-267). Jaynes (p27) reproduces the shorter proof by Cox in which differentiability is assumed.
Richard Threlkeld Cox
Richard Threlkeld Cox was a professor of physics at Johns Hopkins University, known for Cox's theorem relating to the foundations of probability....
, is a derivation of the laws of probability theory
Probability theory
Probability theory is the branch of mathematics concerned with analysis of random phenomena. The central objects of probability theory are random variables, stochastic processes, and events: mathematical abstractions of non-deterministic events or measured quantities that may either be single...
from a certain set of postulates. This derivation justifies the so-called "logical" interpretation of probability. As the laws of probability derived by Cox's theorem are applicable to any proposition, logical probability is a type of Bayesian probability
Bayesian probability
Bayesian probability is one of the different interpretations of the concept of probability and belongs to the category of evidential probabilities. The Bayesian interpretation of probability can be seen as an extension of logic that enables reasoning with propositions, whose truth or falsity is...
. Other forms of Bayesianism, such as the subjective interpretation, are given other justifications.
Cox's assumptions
Cox wanted his system to satisfy the following conditions:- Divisibility and comparability – The plausibility of a statement is a real number and is dependent on information we have related to the statement.
- Common sense – Plausibilities should vary sensibly with the assessment of plausibilities in the model.
- Consistency – If the plausibility of a statement can be derived in many ways, all the results must be equal.
The postulates as stated here are taken from Arnborg and Sjödin (1999).
"Common sense" includes consistency with Aristotelian logic
Logic
In philosophy, Logic is the formal systematic study of the principles of valid inference and correct reasoning. Logic is used in most intellectual activities, but is studied primarily in the disciplines of philosophy, mathematics, semantics, and computer science...
when
statements are completely plausible or implausible.
The postulates as originally stated by Cox were not mathematically
rigorous (although better than the informal description above), e.g.,
as noted by Halpern (1999a, 1999b). However it appears to be possible
to augment them with various mathematical assumptions made either
implicitly or explicitly by Cox to produce a valid proof.
Cox's axioms and functional equations are:
- The plausibility of a proposition determines the plausibility of the proposition's negation; either decreases as the other increases. Because "a double negative is an affirmative", this becomes a functional equation
- saying that the function f that maps the probability of a proposition to the probability of the proposition's negation is an involution, i.e., it is its own inverse.
- The plausibility of the conjunction [A & B] of two propositions A, B, depends only on the plausibility of B and that of A given that B is true. (From this Cox eventually infers that conjunction of plausibilities is associative, and then that it may as well be ordinary multiplication of real numbers.) Because of the associative nature of the "and" operation in propositional logic, this becomes a functional equation saying that the function g such that
- is an associativeAssociativityIn mathematics, associativity is a property of some binary operations. It means that, within an expression containing two or more occurrences in a row of the same associative operator, the order in which the operations are performed does not matter as long as the sequence of the operands is not...
binary operation. All strictly increasing associative binary operations on the real numbers are isomorphic to multiplication of numbers in the interval [0, 1]. This function therefore may be taken to be multiplication.
- Suppose [A & B] is equivalent to [C & D]. If we acquire new information A and then acquire further new information B, and update all probabilities each time, the updated probabilities will be the same as if we had first acquired new information C and then acquired further new information D. In view of the fact that multiplication of probabilities can be taken to be ordinary multiplication of real numbers, this becomes a functional equationFunctional equationIn mathematics, a functional equation is any equation that specifies a function in implicit form.Often, the equation relates the value of a function at some point with its values at other points. For instance, properties of functions can be determined by considering the types of functional...
- where f is as above.
Cox's theorem implies that any plausibility model that meets the
postulates is equivalent to the subjective probability model, i.e.,
can be converted to the probability model by rescaling.
Implications of Cox's postulates
The laws of probability derivable from these postulates are the following (Jaynes, 2003). Here w(A|B) is the "plausibility" of the proposition A given B, and m is some positive number.- Certainty is represented by w(A|B) = 1.
- wm(A|B) + wm(AC|B) = 1
- w(A, B|C) = w(A|C) w(B|A, C) = w(B|C) w(A|B, C).
It is important to note that the postulates imply only these general properties. These are equivalent to the usual laws of probability assuming some conventions, namely that the scale of measurement is from zero to one, and the plausibility function, conventionally denoted P or Pr, is equal to wm. (We could have equivalently chosen to measure probabilities from one to infinity, with infinity representing certain falsehood.) With these conventions, we obtain the laws of probability in a more familiar form:
- Certain truth is represented by Pr(A|B) = 1, and certain falsehood by Pr(A|B) = 0.
- Pr(A|B) + Pr(AC|B) = 1
- Pr(A, B|C) = Pr(A|C) Pr(B|A, C) = Pr(B|C) Pr(A|B, C).
Rule 2 is a rule for negation, and rule 3 is a rule for conjunction. Given that any proposition containing conjunction, disjunction, and negation can be equivalently rephrased using conjunction and negation alone (the conjunctive normal form
Conjunctive normal form
In Boolean logic, a formula is in conjunctive normal form if it is a conjunction of clauses, where a clause is a disjunction of literals.As a normal form, it is useful in automated theorem proving...
), we can now handle any compound proposition.
The laws thus derived yield finite additivity
Measure (mathematics)
In mathematical analysis, a measure on a set is a systematic way to assign to each suitable subset a number, intuitively interpreted as the size of the subset. In this sense, a measure is a generalization of the concepts of length, area, and volume...
of probability, but not countable additivity
Measure (mathematics)
In mathematical analysis, a measure on a set is a systematic way to assign to each suitable subset a number, intuitively interpreted as the size of the subset. In this sense, a measure is a generalization of the concepts of length, area, and volume...
. The measure-theoretic formulation
Probability theory
Probability theory is the branch of mathematics concerned with analysis of random phenomena. The central objects of probability theory are random variables, stochastic processes, and events: mathematical abstractions of non-deterministic events or measured quantities that may either be single...
of Kolmogorov assumes that a probability measure is countably additive. This slightly stronger condition is necessary for the proof of certain theorems.
Interpretation and further discussion
Cox's theorem has come to be used as one of the justificationTheory of justification
Theory of justification is a part of epistemology that attempts to understand the justification of propositions and beliefs. Epistemologists are concerned with various epistemic features of belief, which include the ideas of justification, warrant, rationality, and probability...
s for the
use of Bayesian probability theory. For example, in Jaynes (2003) it is
discussed in detail in chapters 1 and 2 and is a cornerstone for the
rest of the book. Probability is interpreted as a formal system
Formal system
In formal logic, a formal system consists of a formal language and a set of inference rules, used to derive an expression from one or more other premises that are antecedently supposed or derived . The axioms and rules may be called a deductive apparatus...
of
logic
Logic
In philosophy, Logic is the formal systematic study of the principles of valid inference and correct reasoning. Logic is used in most intellectual activities, but is studied primarily in the disciplines of philosophy, mathematics, semantics, and computer science...
, the natural extension of Aristotelian logic (in which every
statement is either true or false) into the realm of reasoning in the
presence of uncertainty.
It has been debated to what degree the theorem excludes alternative
models for reasoning about uncertainty
Uncertainty
Uncertainty is a term used in subtly different ways in a number of fields, including physics, philosophy, statistics, economics, finance, insurance, psychology, sociology, engineering, and information science...
. For example, if certain
"unintuitive" mathematical assumptions were dropped then alternatives
could be devised, e.g., an example provided by Halpern (1999a).
However Arnborg and Sjödin (1999, 2000a, 2000b) suggest additional
"common sense" postulates, which would allow the assumptions to be
relaxed in some cases while still ruling out the Halpern example.
The original formulation of Cox's theorem is in Cox (1946), which is extended with additional results and more discussion in Cox (1961). Jaynes (2003) cites Abel (1826) for the first known use of the associativity functional equation. Aczél (1966) provides a long proof of the "associativity equation" (pages 256-267). Jaynes (p27) reproduces the shorter proof by Cox in which differentiability is assumed.