Admissible decision rule
Encyclopedia
In statistical decision theory, an admissible decision rule is a rule for making a decision such that there isn't any other rule that is always "better" than it, in a specific sense defined below.
Generally speaking, in most decision problems the set of admissible rules is large, even infinite, so this is not a sufficient criterion to pin down a single rule, but as will be seen there are some good reasons to favor admissible rules; compare Pareto efficiency
.
, where upon observing , we choose to take action .
Also define a loss function
, which specifies the loss we would incur by taking action when the true state of nature is . Usually we will take this action after observing data , so that the loss will be . (It is possible though unconventional to recast the following definitions in terms of a utility function, which is the negative of the loss.)
Define the risk function
as the expectation
Whether a decision rule has low risk depends on the true state of nature . A decision rule dominates
a decision rule if and only if for all , and the inequality is strict for some .
A decision rule is admissible (with respect to the loss function) if and only if no other rule dominates it; otherwise it is inadmissible. Thus an admissible decision rule is a maximal element
with respect to the above partial order.
An inadmissible rule is not preferred (except for reasons of simplicity or computational efficiency), since by definition there is some other rule that will achieve equal or lower risk for all . But just because a rule is admissible does not mean it is a good rule to use. Being admissible means there is no other single rule that is always better - but other admissible rules might achieve lower risk for most that occur in practice. (The Bayes risk discussed below is a way of explicitly considering which occur in practice.)
point of view, we would regard it as a prior distribution. That is, it is our believed probability distribution on the states of nature, prior to observing data. For a frequentist
, it is merely a function on with no such special interpretation. The Bayes risk of the decision rule with respect to is the expectation
A decision rule that minimizes is called a Bayes rule
with respect to . There may be more than one such Bayes rule. If the Bayes risk is infinite for all , then no Bayes rule is defined.
where the expectation is over the posterior of given (obtained from and using Bayes' theorem
).
Having made explicit the expected loss for each given separately, we can define a decision rule by specifying for each an action that minimizes the expected loss. This is known as a generalized Bayes rule with respect to . There may be more than one generalized Bayes rule, since there may be multiple choices of that achieve the same expected loss.
At first, this may appear rather different from the Bayes rule approach of the previous section, not a generalization. However, notice that the Bayes risk already averages over in Bayesian fashion, and the Bayes risk may be recovered as the expectation over of the expected loss (where and ). Roughly speaking, minimizes this expectation of expected loss (i.e., is a Bayes rule) iff it minimizes the expected loss for each separately (i.e., is a generalized Bayes rule).
Then why is the notion of generalized Bayes rule an improvement? It is indeed equivalent to the notion of Bayes rule when a Bayes rule exists and all have positive probability. However, no Bayes rule exists if the Bayes risk is infinite (for all ). In this case it is still useful to define a generalized Bayes rule , which at least chooses a minimum-expected-loss action for those for which a finite-expected-loss action does exist. In addition, a generalized Bayes rule may be desirable because it must choose a minimum-expected-loss action for every , whereas a Bayes rule would be allowed to deviate from this policy on a set of measure 0 without affecting the Bayes risk.
More important, it is sometimes convenient to use an improper prior . In this case, the Bayes risk is not even well-defined, nor is there any well-defined distribution over . However, the posterior -- and hence the expected loss—may be well-defined for each , so that it is still possible to define a generalized Bayes rule.
, under mild conditions every admissible rule is a (generalized) Bayes rule (with respect to some prior -- possibly an improper one—that favors distributions where that rule achieves low risk). Thus, in frequentist decision theory
it is sufficient to consider only (generalized) Bayes rules.
Conversely, while Bayes rules with respect to proper priors are virtually always admissible, generalized Bayes rules corresponding to improper priors need not yield admissible procedures. Stein's example
is one such famous situation.
Generally speaking, in most decision problems the set of admissible rules is large, even infinite, so this is not a sufficient criterion to pin down a single rule, but as will be seen there are some good reasons to favor admissible rules; compare Pareto efficiency
Pareto efficiency
Pareto efficiency, or Pareto optimality, is a concept in economics with applications in engineering and social sciences. The term is named after Vilfredo Pareto, an Italian economist who used the concept in his studies of economic efficiency and income distribution.Given an initial allocation of...
.
Definition
Define sets , and , where are the states of nature, the possible observations, and the actions that may be taken. An observation is distributed as and therefore provides evidence about the state of nature . A decision rule is a functionFunction (mathematics)
In mathematics, a function associates one quantity, the argument of the function, also known as the input, with another quantity, the value of the function, also known as the output. A function assigns exactly one output to each input. The argument and the value may be real numbers, but they can...
, where upon observing , we choose to take action .
Also define a loss function
Loss function
In statistics and decision theory a loss function is a function that maps an event onto a real number intuitively representing some "cost" associated with the event. Typically it is used for parameter estimation, and the event in question is some function of the difference between estimated and...
, which specifies the loss we would incur by taking action when the true state of nature is . Usually we will take this action after observing data , so that the loss will be . (It is possible though unconventional to recast the following definitions in terms of a utility function, which is the negative of the loss.)
Define the risk function
Risk function
In decision theory and estimation theory, the risk function R of a decision rule, δ, is the expected value of a loss function L:...
as the expectation
Expected value
In probability theory, the expected value of a random variable is the weighted average of all possible values that this random variable can take on...
Whether a decision rule has low risk depends on the true state of nature . A decision rule dominates
Dominating decision rule
In decision theory, a decision rule is said to dominate another if the performance of the former is sometimes better, and never worse, than that of the latter....
a decision rule if and only if for all , and the inequality is strict for some .
A decision rule is admissible (with respect to the loss function) if and only if no other rule dominates it; otherwise it is inadmissible. Thus an admissible decision rule is a maximal element
Maximal element
In mathematics, especially in order theory, a maximal element of a subset S of some partially ordered set is an element of S that is not smaller than any other element in S. The term minimal element is defined dually...
with respect to the above partial order.
An inadmissible rule is not preferred (except for reasons of simplicity or computational efficiency), since by definition there is some other rule that will achieve equal or lower risk for all . But just because a rule is admissible does not mean it is a good rule to use. Being admissible means there is no other single rule that is always better - but other admissible rules might achieve lower risk for most that occur in practice. (The Bayes risk discussed below is a way of explicitly considering which occur in practice.)
Bayes rules
Let be a probability distribution on the states of nature. From a BayesianBayesian probability
Bayesian probability is one of the different interpretations of the concept of probability and belongs to the category of evidential probabilities. The Bayesian interpretation of probability can be seen as an extension of logic that enables reasoning with propositions, whose truth or falsity is...
point of view, we would regard it as a prior distribution. That is, it is our believed probability distribution on the states of nature, prior to observing data. For a frequentist
Frequency probability
Frequency probability is the interpretation of probability that defines an event's probability as the limit of its relative frequency in a large number of trials. The development of the frequentist account was motivated by the problems and paradoxes of the previously dominant viewpoint, the...
, it is merely a function on with no such special interpretation. The Bayes risk of the decision rule with respect to is the expectation
A decision rule that minimizes is called a Bayes rule
Bayes estimator
In estimation theory and decision theory, a Bayes estimator or a Bayes action is an estimator or decision rule that minimizes the posterior expected value of a loss function . Equivalently, it maximizes the posterior expectation of a utility function...
with respect to . There may be more than one such Bayes rule. If the Bayes risk is infinite for all , then no Bayes rule is defined.
Generalized Bayes rules
In the Bayesian approach to decision theory, the observed is considered fixed. Whereas the frequentist approach (i.e., risk) averages over possible samples , the Bayesian would fix the observed sample and average over hypotheses . Thus, the Bayesian approach is to consider for our observed the expected losswhere the expectation is over the posterior of given (obtained from and using Bayes' theorem
Bayes' theorem
In probability theory and applications, Bayes' theorem relates the conditional probabilities P and P. It is commonly used in science and engineering. The theorem is named for Thomas Bayes ....
).
Having made explicit the expected loss for each given separately, we can define a decision rule by specifying for each an action that minimizes the expected loss. This is known as a generalized Bayes rule with respect to . There may be more than one generalized Bayes rule, since there may be multiple choices of that achieve the same expected loss.
At first, this may appear rather different from the Bayes rule approach of the previous section, not a generalization. However, notice that the Bayes risk already averages over in Bayesian fashion, and the Bayes risk may be recovered as the expectation over of the expected loss (where and ). Roughly speaking, minimizes this expectation of expected loss (i.e., is a Bayes rule) iff it minimizes the expected loss for each separately (i.e., is a generalized Bayes rule).
Then why is the notion of generalized Bayes rule an improvement? It is indeed equivalent to the notion of Bayes rule when a Bayes rule exists and all have positive probability. However, no Bayes rule exists if the Bayes risk is infinite (for all ). In this case it is still useful to define a generalized Bayes rule , which at least chooses a minimum-expected-loss action for those for which a finite-expected-loss action does exist. In addition, a generalized Bayes rule may be desirable because it must choose a minimum-expected-loss action for every , whereas a Bayes rule would be allowed to deviate from this policy on a set of measure 0 without affecting the Bayes risk.
More important, it is sometimes convenient to use an improper prior . In this case, the Bayes risk is not even well-defined, nor is there any well-defined distribution over . However, the posterior -- and hence the expected loss—may be well-defined for each , so that it is still possible to define a generalized Bayes rule.
Admissibility of (generalized) Bayes rules
According to the complete class theoremsCompleteness (statistics)
In statistics, completeness is a property of a statistic in relation to a model for a set of observed data. In essence, it is a condition which ensures that the parameters of the probability distribution representing the model can all be estimated on the basis of the statistic: it ensures that the...
, under mild conditions every admissible rule is a (generalized) Bayes rule (with respect to some prior -- possibly an improper one—that favors distributions where that rule achieves low risk). Thus, in frequentist decision theory
Decision theory
Decision theory in economics, psychology, philosophy, mathematics, and statistics is concerned with identifying the values, uncertainties and other issues relevant in a given decision, its rationality, and the resulting optimal decision...
it is sufficient to consider only (generalized) Bayes rules.
Conversely, while Bayes rules with respect to proper priors are virtually always admissible, generalized Bayes rules corresponding to improper priors need not yield admissible procedures. Stein's example
Stein's example
Stein's example , in decision theory and estimation theory, is the phenomenon that when three or more parameters are estimated simultaneously, there exist combined estimators more accurate on average than any method that handles the parameters separately...
is one such famous situation.
Examples
The James–Stein estimator is a nonlinear estimator which can be shown to dominate, or outperform, the "ordinary" (least squares) technique with respect to a mean-square error loss function. Thus least squares estimation is not necessarily an admissible estimation procedure. Some others of the standard estimates associated with the normal distribution are also inadmissible: for example, the sample estimate of the variance when the population mean and variance are unknown.See also
- Decision theoryDecision theoryDecision theory in economics, psychology, philosophy, mathematics, and statistics is concerned with identifying the values, uncertainties and other issues relevant in a given decision, its rationality, and the resulting optimal decision...
- Maximal elementMaximal elementIn mathematics, especially in order theory, a maximal element of a subset S of some partially ordered set is an element of S that is not smaller than any other element in S. The term minimal element is defined dually...
- Pareto efficiencyPareto efficiencyPareto efficiency, or Pareto optimality, is a concept in economics with applications in engineering and social sciences. The term is named after Vilfredo Pareto, an Italian economist who used the concept in his studies of economic efficiency and income distribution.Given an initial allocation of...