Truncation (statistics)
Encyclopedia
In statistics
, truncation results in values that are limited above or below, resulting in a truncated sample. Truncation is similar to but distinct from the concept of statistical censoring
. A truncated sample can be thought of as being equivalent to an underlying sample with all values outside the bounds entirely omitted, with not even a count of those omitted being kept. If the sample had been censored, a record would be of those that were censored, consisting of a note that of whether the lower or upper bound had been passed and the value of the bound.
and will lead to a new distribution, not usually one within the same family. Thus, if a random variable X has F(x) as its distribution function, the new random variable Y defined as having the distribution of X truncated to the semi-open interval (a,b] has the distribution function
for y in the interval (a, b], and 0 or 1 otherwise. If truncation were to the closed interval [a,b], the distribution function would be
for y in the interval [a, b], and 0 or 1 otherwise.
, where the likelihood would be derived from the distribution or density of the truncated distribution. This involves taking account of the factor in the modified density function which will depend on the parameters of the original distribution.
In practice, if the fraction truncated is very small the effect of truncation might be ignored when analysing data. For example, it is common to use a normal distribution to model data whose values can only be positive but for which the typical range of values is well away from zero: in such cases a truncated or censored version of the normal distribution may formally be preferable (although there would be other alternatives also), but there would be very little change in results from the more complicated analysis. However, software is readily available for maximum likelihood estimation of even moderately complicated models, such as regression models
, for truncated data.
Statistics
Statistics is the study of the collection, organization, analysis, and interpretation of data. It deals with all aspects of this, including the planning of data collection in terms of the design of surveys and experiments....
, truncation results in values that are limited above or below, resulting in a truncated sample. Truncation is similar to but distinct from the concept of statistical censoring
Censoring (statistics)
In statistics, engineering, and medical research, censoring occurs when the value of a measurement or observation is only partially known.For example, suppose a study is conducted to measure the impact of a drug on mortality. In such a study, it may be known that an individual's age at death is at...
. A truncated sample can be thought of as being equivalent to an underlying sample with all values outside the bounds entirely omitted, with not even a count of those omitted being kept. If the sample had been censored, a record would be of those that were censored, consisting of a note that of whether the lower or upper bound had been passed and the value of the bound.
Applications
Usually the values that insurance adjusters receive are either left-truncated, right-censored or both. For example, if policyholders are subject to a policy limit, u, then any loss amounts that are actually above u are reported to the insurance company as being exactly u because u is the amount the insurance companies pay. The insurance company knows that the actual loss is greater than u but they don't know what it is. On the other hand, left truncation occurs when policyholders are subject to a deductible. If policyholders are subject to a deductible d, any loss amount that is less than d will not even be reported to the insurance company. If there is a claim on a policy limit of u and a deductible of d, any loss amount that is greater than u will be reported to the insurance company as a loss of u-d because that is the amount the insurance company has to pay. Therefore insurance loss data is left-truncated because the insurance company doesn't know if there are values below the deductible d because you won't make a claim. The insurance loss is also right censored if the loss is greater than u because u is the most the insurance company will pay, so it only knows that your claim is greater than u, not what the claim amount is exactly.Probability distributions
Truncation can be applied to any probability distributionProbability distribution
In probability theory, a probability mass, probability density, or probability distribution is a function that describes the probability of a random variable taking certain values....
and will lead to a new distribution, not usually one within the same family. Thus, if a random variable X has F(x) as its distribution function, the new random variable Y defined as having the distribution of X truncated to the semi-open interval (a,b] has the distribution function
for y in the interval (a, b], and 0 or 1 otherwise. If truncation were to the closed interval [a,b], the distribution function would be
for y in the interval [a, b], and 0 or 1 otherwise.
Data analysis
The analysis of data where observations are treated as being from truncated versions of standard distributions can be undertaken using a maximum likelihoodMaximum likelihood
In statistics, maximum-likelihood estimation is a method of estimating the parameters of a statistical model. When applied to a data set and given a statistical model, maximum-likelihood estimation provides estimates for the model's parameters....
, where the likelihood would be derived from the distribution or density of the truncated distribution. This involves taking account of the factor in the modified density function which will depend on the parameters of the original distribution.
In practice, if the fraction truncated is very small the effect of truncation might be ignored when analysing data. For example, it is common to use a normal distribution to model data whose values can only be positive but for which the typical range of values is well away from zero: in such cases a truncated or censored version of the normal distribution may formally be preferable (although there would be other alternatives also), but there would be very little change in results from the more complicated analysis. However, software is readily available for maximum likelihood estimation of even moderately complicated models, such as regression models
Regression analysis
In statistics, regression analysis includes many techniques for modeling and analyzing several variables, when the focus is on the relationship between a dependent variable and one or more independent variables...
, for truncated data.