Statistical distance
Encyclopedia
In statistics
, probability theory
, and information theory
, a statistical distance quantifies the distance between two statistical objects, which can be two samples
, two random variable
s, or two probability distribution
s, for example.
(called the distance function or simply distance)
d : X × X → R
(where R is the set of real number
s). For all x, y, z in X, this function is required to satisfy the following conditions:
s, because they lack one or more properties of proper metrics. For example, pseudometric
s can violate the "positive definiteness" (alternatively, "identity of indescernibles" property); quasimetrics can violate the symmetry property; and semimetrics can violate the triangle inequality. Some statistical distances are referred to as divergence
s.
Other approaches
Statistics
Statistics is the study of the collection, organization, analysis, and interpretation of data. It deals with all aspects of this, including the planning of data collection in terms of the design of surveys and experiments....
, probability theory
Probability theory
Probability theory is the branch of mathematics concerned with analysis of random phenomena. The central objects of probability theory are random variables, stochastic processes, and events: mathematical abstractions of non-deterministic events or measured quantities that may either be single...
, and information theory
Information theory
Information theory is a branch of applied mathematics and electrical engineering involving the quantification of information. Information theory was developed by Claude E. Shannon to find fundamental limits on signal processing operations such as compressing data and on reliably storing and...
, a statistical distance quantifies the distance between two statistical objects, which can be two samples
Sample (statistics)
In statistics, a sample is a subset of a population. Typically, the population is very large, making a census or a complete enumeration of all the values in the population impractical or impossible. The sample represents a subset of manageable size...
, two random variable
Random variable
In probability and statistics, a random variable or stochastic variable is, roughly speaking, a variable whose value results from a measurement on some type of random process. Formally, it is a function from a probability space, typically to the real numbers, which is measurable functionmeasurable...
s, or two probability distribution
Probability distribution
In probability theory, a probability mass, probability density, or probability distribution is a function that describes the probability of a random variable taking certain values....
s, for example.
Metrics
A metric on a set X is a functionFunction (mathematics)
In mathematics, a function associates one quantity, the argument of the function, also known as the input, with another quantity, the value of the function, also known as the output. A function assigns exactly one output to each input. The argument and the value may be real numbers, but they can...
(called the distance function or simply distance)
d : X × X → R
(where R is the set of real number
Real number
In mathematics, a real number is a value that represents a quantity along a continuum, such as -5 , 4/3 , 8.6 , √2 and π...
s). For all x, y, z in X, this function is required to satisfy the following conditions:
- d(x, y) ≥ 0 (non-negativity)
- d(x, y) = 0 if and only if x = y (identity of indiscerniblesIdentity of indiscerniblesThe identity of indiscernibles is an ontological principle which states that two or more objects or entities are identical if they have all their properties in common. That is, entities x and y are identical if any predicate possessed by x is also possessed by y and vice versa...
. Note that condition 1 and 2 together produce positive definiteness) - d(x, y) = d(y, x) (symmetrySymmetric relationIn mathematics, a binary relation R over a set X is symmetric if it holds for all a and b in X that if a is related to b then b is related to a.In mathematical notation, this is:...
) - d(x, z) ≤ d(x, y) + d(y, z) (subadditivity / triangle inequalityTriangle inequalityIn mathematics, the triangle inequality states that for any triangle, the sum of the lengths of any two sides must be greater than or equal to the length of the remaining side ....
).
Distances: Generalized metrics
Many statistical distances are not metricMetric (mathematics)
In mathematics, a metric or distance function is a function which defines a distance between elements of a set. A set with a metric is called a metric space. A metric induces a topology on a set but not all topologies can be generated by a metric...
s, because they lack one or more properties of proper metrics. For example, pseudometric
Pseudometric space
In mathematics, a pseudometric space is a generalized metric space in which the distance between two distinct points can be zero. In the same way as every normed space is a metric space, every seminormed space is a pseudometric space...
s can violate the "positive definiteness" (alternatively, "identity of indescernibles" property); quasimetrics can violate the symmetry property; and semimetrics can violate the triangle inequality. Some statistical distances are referred to as divergence
Divergence (statistics)
In statistics and information geometry, divergence or a contrast function is a function which establishes the “distance” of one probability distribution to the other on a statistical manifold...
s.
Examples
Some important statistical distances include the following:- f-divergenceF-divergenceIn probability theory, an ƒ-divergence is a function Df that measures the difference between two probability distributions P and Q...
: includes- Kullback–Leibler divergenceKullback–Leibler divergenceIn probability theory and information theory, the Kullback–Leibler divergence is a non-symmetric measure of the difference between two probability distributions P and Q...
- Hellinger distanceHellinger distanceIn probability and statistics, the Hellinger distance is used to quantify the similarity between two probability distributions. It is a type of f-divergence...
- Total variation distance
- Kullback–Leibler divergence
- Rényi's divergenceRényi entropyIn information theory, the Rényi entropy, a generalisation of Shannon entropy, is one of a family of functionals for quantifying the diversity, uncertainty or randomness of a system...
- Jensen–Shannon divergenceJensen–Shannon divergenceIn probability theory and statistics, the Jensen–Shannon divergence is a popular method of measuring the similarity between two probability distributions. It is also known as information radius or total divergence to the average. It is based on the Kullback–Leibler divergence, with the notable ...
- Lévy–Prokhorov metric
- Bhattacharyya distanceBhattacharyya distanceIn statistics, the Bhattacharyya distance measures the similarity of two discrete or continuous probability distributions. It is closely related to the Bhattacharyya coefficient which is a measure of the amount of overlap between two statistical samples or populations. Both measures are named after A...
- Wasserstein metricWasserstein metricIn mathematics, the Wasserstein metric is a distance function defined between probability distributions on a given metric space M....
: also known as the Kantorovich metric, or earth mover's distanceEarth Mover's DistanceIn computer science, the earth mover's distance is a measure of the distance between two probability distributions over a region D. In mathematics, this is known as the Wasserstein metric... - Energy distanceEnergy distanceEnergy distance is a statistical distance between probability distributions. If X and Y are independent random vectors in Rd, with cumulative distribution functions F and G respectively, then the energy distance between the distributions F and G is definedwhere X, X' are independent and identically...
Other approaches
- Signal-to-noise ratioSignal-to-noise ratioSignal-to-noise ratio is a measure used in science and engineering that compares the level of a desired signal to the level of background noise. It is defined as the ratio of signal power to the noise power. A ratio higher than 1:1 indicates more signal than noise...
distance - Mahalanobis distanceMahalanobis distanceIn statistics, Mahalanobis distance is a distance measure introduced by P. C. Mahalanobis in 1936. It is based on correlations between variables by which different patterns can be identified and analyzed. It gauges similarity of an unknown sample set to a known one. It differs from Euclidean...
- Distance correlationDistance correlationIn statistics and in probability theory, distance correlation is a measure of statistical dependence between two random variables or two random vectors of arbitrary, not necessarily equal dimension. Its important property is that this measure of dependence is zero if and only if the random...
is a measure of dependence between two random variables, it is zero if and only if the random variables are independent. - The continuous ranked probability score is a measure how good forecasts that are expressed as probability distributions are in matching observed outcomes. Both the location and spread of the forecast distribution are taken into account in judging how close the distribution is the observed value: see probabilistic forecastingProbabilistic forecastingProbabilistic forecasting summarises what is known, or opinions about, future events. In contrast to a single-valued forecasts , probabilistic forecasts assign a probability to each of a number of different outcomes,...
. - Lukaszyk–Karmowski metric is a function defining a distance between two random variableRandom variableIn probability and statistics, a random variable or stochastic variable is, roughly speaking, a variable whose value results from a measurement on some type of random process. Formally, it is a function from a probability space, typically to the real numbers, which is measurable functionmeasurable...
s or two random vectors. It does not satisfy the identity of indiscerniblesIdentity of indiscerniblesThe identity of indiscernibles is an ontological principle which states that two or more objects or entities are identical if they have all their properties in common. That is, entities x and y are identical if any predicate possessed by x is also possessed by y and vice versa...
condition of the metric and is zero if and only if both its arguments are certain events described by Dirac delta density probability distribution functionProbability distribution functionDepending upon which text is consulted, a probability distribution function is any of:* a probability distribution function,* a cumulative distribution function,* a probability mass function, or* a probability density function....
s.