Reference class problem
Encyclopedia
In statistics
Statistics
Statistics is the study of the collection, organization, analysis, and interpretation of data. It deals with all aspects of this, including the planning of data collection in terms of the design of surveys and experiments....

, the reference class problem is the problem of deciding what class to use when calculating the probability
Probability
Probability is ordinarily used to describe an attitude of mind towards some proposition of whose truth we arenot certain. The proposition of interest is usually of the form "Will a specific event occur?" The attitude of mind is of the form "How certain are we that the event will occur?" The...

 applicable to a particular case. For example, if I am taking off in an aircraft and wish to estimate the probability of it crashing, should I use the frequency of crashes of all aircraft? Of this make of aircraft? Of aircraft flown by this company in last ten years? Any case is a member of very many classes, in which the frequency of the attribute of interest (such as crashing) differs. Which is most appropriate to use?

More formally, many arguments in statistics take the form of a statistical syllogism
Statistical syllogism
A statistical syllogism is a non-deductive syllogism. It argues from a generalization true for the most part to a particular case .-Introduction:Statistical syllogisms may use qualifying words like "most", "frequently", "almost never", "rarely",...

:
  1. X proportion of F are G
  2. I is an F
  3. I is a G


F is called the "reference class" and G is the "attribute class" and I is the individual object. How is one to choose an appropriate class F?

In Bayesian statistics
Bayesian statistics
Bayesian statistics is that subset of the entire field of statistics in which the evidence about the true state of the world is expressed in terms of degrees of belief or, more specifically, Bayesian probabilities...

, the problem arises at that of defining a bayesian prior distribution
Prior probability
In Bayesian statistical inference, a prior probability distribution, often called simply the prior, of an uncertain quantity p is the probability distribution that would express one's uncertainty about p before the "data"...

 by the method of imaginary reference sets. It follows from the elementary foundations of probability theory
Probability theory
Probability theory is the branch of mathematics concerned with analysis of random phenomena. The central objects of probability theory are random variables, stochastic processes, and events: mathematical abstractions of non-deterministic events or measured quantities that may either be single...

 that there is no unique way of doing this.

History

John Venn
John Venn
Donald A. Venn FRS , was a British logician and philosopher. He is famous for introducing the Venn diagram, which is used in many fields, including set theory, probability, logic, statistics, and computer science....

 observed in 1876, “It is obvious that every single thing or event has an indefinite number of properties or attributes observable in it, and might therefore be considered as belonging to an indefinite number of different classes of things”, leading to problems with how to assign probabilities to a single case, for example the probability that John Smith, a consumptive Englishman aged fifty, will live to sixty-one.

The name "problem of the reference class" was given by Hans Reichenbach
Hans Reichenbach
Hans Reichenbach was a leading philosopher of science, educator and proponent of logical empiricism...

, who wrote, "If we are asked to find the probability holding for an individual future event, we must first incorporate the event into a suitable reference class. An individual thing or event may be incorporated in many reference classes, from which different probabilities will result."

There has been discussion in philosophy, with some consensus that the problem is unsolvable.

Legal applications

Applying bayesian probability
Bayesian probability
Bayesian probability is one of the different interpretations of the concept of probability and belongs to the category of evidential probabilities. The Bayesian interpretation of probability can be seen as an extension of logic that enables reasoning with propositions, whose truth or falsity is...

 in practice involves assessing a prior probability
Prior probability
In Bayesian statistical inference, a prior probability distribution, often called simply the prior, of an uncertain quantity p is the probability distribution that would express one's uncertainty about p before the "data"...

 which is then applied to a likelihood function
Likelihood function
In statistics, a likelihood function is a function of the parameters of a statistical model, defined as follows: the likelihood of a set of parameter values given some observed outcomes is equal to the probability of those observed outcomes given those parameter values...

 and updated through the use Bayes' theorem
Bayes' theorem
In probability theory and applications, Bayes' theorem relates the conditional probabilities P and P. It is commonly used in science and engineering. The theorem is named for Thomas Bayes ....

. Suppose we wish to assess the probability of guilt of a defendant in a court case in which DNA (or other probabilistic) evidence is available. We first need to assess the prior probability of guilt of the defendant. We could say that the crime occurred in a city of 1,000,000 people, of whom 15% meet the requirements of being the same sex, age group and approximate description as the perpetrator. That suggests a prior probability of guilt of 1 in 150,000. We could cast the net wider and say that there is, say, a 25% chance that the perpetrator is from out of town, but still from this country, and construct a different prior estimate. We could say that the perpetrator could come from anywhere in the world, and so on.
Legal theorists have discussed the reference class problem particularly with reference to the Shonubi case. Charles Shonubi, a Nigerian drug smuggler, was arrested at JFK Airport on Dec 10, 1991, and convicted of heroin importation. The severity of his sentence depended not only on the amount of drugs on that trip, but the total amount of drugs he was estimated to have imported on seven previous occasions on which he was not caught. Five separate legal cases debated how that amount should be estimated. In one case, "Shonubi III", the prosecution presented statistical evidence of the amount of drugs found on Nigerian drug smugglers caught at JFK Airport in the period between Shonubi's first and last trips. There has been debate over whether that is the (or a) correct reference class to use, and if so, why.
Other legal applications involve valuation. For example, houses might be valued using the data in a database of house sales of "similar" houses. To decide on which houses are similar to a given one, one needs to know which features of a house are relevant to price. Number of bathrooms might be relevant but ethnicity of the owner might not. It has been argued that such reference class problems can be solved by finding which features are relevant: a feature is relevant to house price if house price covaries
Covariance
In probability theory and statistics, covariance is a measure of how much two variables change together. Variance is a special case of the covariance when the two variables are identical.- Definition :...

with it (it makes some difference to house price), and the ideal reference class for an individual is the set of all instances which share with it all relevant features.
The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK