Truncated distribution
Encyclopedia
In statistics
, a truncated distribution is a conditional distribution
that results from restricting the domain of some other probability distribution
. Truncated distributions arise in practical statistics in cases where the ability to record, or even to know about, occurrences is limited to values which lie above or below a given threshold or within a specified range. For example, if the dates of birth of children in a school are examined, these would typically be subject to truncation relative to those of all children in the area given that the school accepts only children in a given age range on a specific date. There would be no information about how many children in the locality had dates of birth before or after the school's cutoff dates if only a direct approach to the school were used to obtain information.
Where sampling is such as to retain knowledge of items that fall outside the required range, without recording the actual values, this is known as censoring
, as opposed to the truncation
here.
Suppose we have a random variable, that is distributed according to some probability density function, , with cumulative distribution function both of which have infinite support
. Suppose we wish to know the probability density of the random variable after restricting the support to be between two constants so that the support, . That is to say, suppose we wish to know how is distributed given .
where for all and everywhere else. Notice that has the same support as .
There is, unfortunately, an ambiguity about the term Truncated Distribution. When one refers to a truncated distribution one could be referring to where one has removed the parts from the distribution but not scaled up the distribution, or one could be referring to the . In general, is not a probability density function since it does not integrate to one, whereas is a probability density function. In this article, a truncated distribution refers to
Notice that in fact is a distribution:.
Truncated distributions need not have parts removed from the top and bottom. A truncated distribution where just the bottom of the distribution has been removed is as follows:
where for all and everywhere else, and is the cumulative distribution function
.
A truncated distribution where the top of the distribution has been removed is as follows:
where for all and everywhere else, and is the cumulative distribution function
.
where again is for all and everywhere else.
Letting and be the lower and upper limits respectively of support for (i.e. the original density) properties of where is some continuous function of with a continuous derivative and where is assumed continuous include:
(i)
(ii)
(iii)
(iv)
(v)
Provided that the limits exist, that is: , and where represents either or .
is an important example.
The Tobit model
employs truncated distributions.
First, by definition:
, and
Notice that must be greater than , hence when we integrate over , we set a lower bound of . The functions and are the unconditional density and unconditional cumulative distribution function, respectively.
By Bayes' rule
,
which expands to
Statistics
Statistics is the study of the collection, organization, analysis, and interpretation of data. It deals with all aspects of this, including the planning of data collection in terms of the design of surveys and experiments....
, a truncated distribution is a conditional distribution
Conditional distribution
Given two jointly distributed random variables X and Y, the conditional probability distribution of Y given X is the probability distribution of Y when X is known to be a particular value...
that results from restricting the domain of some other probability distribution
Probability distribution
In probability theory, a probability mass, probability density, or probability distribution is a function that describes the probability of a random variable taking certain values....
. Truncated distributions arise in practical statistics in cases where the ability to record, or even to know about, occurrences is limited to values which lie above or below a given threshold or within a specified range. For example, if the dates of birth of children in a school are examined, these would typically be subject to truncation relative to those of all children in the area given that the school accepts only children in a given age range on a specific date. There would be no information about how many children in the locality had dates of birth before or after the school's cutoff dates if only a direct approach to the school were used to obtain information.
Where sampling is such as to retain knowledge of items that fall outside the required range, without recording the actual values, this is known as censoring
Censoring (statistics)
In statistics, engineering, and medical research, censoring occurs when the value of a measurement or observation is only partially known.For example, suppose a study is conducted to measure the impact of a drug on mortality. In such a study, it may be known that an individual's age at death is at...
, as opposed to the truncation
Truncation (statistics)
In statistics, truncation results in values that are limited above or below, resulting in a truncated sample. Truncation is similar to but distinct from the concept of statistical censoring. A truncated sample can be thought of as being equivalent to an underlying sample with all values outside the...
here.
Definition
The following discussion is in terms of a random variable having a continuous distribution although the same ideas apply to discrete distributions. Similarly, the discussion assumes that truncation is to a semi-open interval y ∈ (a,b] but other possibilities can be handled straightforwardly.Suppose we have a random variable, that is distributed according to some probability density function, , with cumulative distribution function both of which have infinite support
Support (mathematics)
In mathematics, the support of a function is the set of points where the function is not zero, or the closure of that set . This concept is used very widely in mathematical analysis...
. Suppose we wish to know the probability density of the random variable after restricting the support to be between two constants so that the support, . That is to say, suppose we wish to know how is distributed given .
where for all and everywhere else. Notice that has the same support as .
There is, unfortunately, an ambiguity about the term Truncated Distribution. When one refers to a truncated distribution one could be referring to where one has removed the parts from the distribution but not scaled up the distribution, or one could be referring to the . In general, is not a probability density function since it does not integrate to one, whereas is a probability density function. In this article, a truncated distribution refers to
Notice that in fact is a distribution:.
Truncated distributions need not have parts removed from the top and bottom. A truncated distribution where just the bottom of the distribution has been removed is as follows:
where for all and everywhere else, and is the cumulative distribution function
Cumulative distribution function
In probability theory and statistics, the cumulative distribution function , or just distribution function, describes the probability that a real-valued random variable X with a given probability distribution will be found at a value less than or equal to x. Intuitively, it is the "area so far"...
.
A truncated distribution where the top of the distribution has been removed is as follows:
where for all and everywhere else, and is the cumulative distribution function
Cumulative distribution function
In probability theory and statistics, the cumulative distribution function , or just distribution function, describes the probability that a real-valued random variable X with a given probability distribution will be found at a value less than or equal to x. Intuitively, it is the "area so far"...
.
Expectation of truncated random variable
Suppose we wish to find the expected value of a random variable distributed according to the density and a cumulative distribution of given that the random variable, , is greater than some known value . The expectation of a truncated random variable is thus:where again is for all and everywhere else.
Letting and be the lower and upper limits respectively of support for (i.e. the original density) properties of where is some continuous function of with a continuous derivative and where is assumed continuous include:
(i)
(ii)
(iii)
(iv)
(v)
Provided that the limits exist, that is: , and where represents either or .
Examples
The truncated normal distributionTruncated normal distribution
In probability and statistics, the truncated normal distribution is the probability distribution of a normally distributed random variable whose value is either bounded below or above . The truncated normal distribution has wide applications in statistics and econometrics...
is an important example.
The Tobit model
Tobit model
The Tobit model is a statistical model proposed by James Tobin to describe the relationship between a non-negative dependent variable y_i and an independent variable x_i....
employs truncated distributions.
Random truncation
Suppose we have the following set up: a truncation value, , is selected at random from a density, , but this value is not observed. Then a value, , is selected at random from the truncated distribution, . Suppose we observe and wish to update our belief about the density of given the observation.First, by definition:
, and
Notice that must be greater than , hence when we integrate over , we set a lower bound of . The functions and are the unconditional density and unconditional cumulative distribution function, respectively.
By Bayes' rule
Bayes' rule
In probability theory and applications, Bayes' rule relates the odds of event A_1 to event A_2, before and after conditioning on event B. The relationship is expressed in terms of the Bayes factor, \Lambda. Bayes' rule is derived from and closely related to Bayes' theorem...
,
which expands to