Hodges-Lehmann estimator
Encyclopedia
In statistics
, the Hodges–Lehmann estimator is a robust
and nonparametric estimator
of a population's location parameter
, the "pseudo–median", which is closely related to the population median
. The Hodges–Lehmann estimator is used not only for the pseudo–median of a single population
but also for the differences
between members of two populations. The Hodges–Lehmann estimator was proposed in 1963 independently by Pranab Kumar Sen and by Joseph Hodges and Erich Lehmann
, and so it is also called the "Hodges–Lehmann–Sen estimator".
between the values in two sets of data. If the two sets of data contain m and n data points respectively, then their Cartesian product
contains m × n pairs of points (one from each set); each such pair defines one difference of values. The Hodges–Lehmann estimator for the difference is defined as the median
of the m × n differences.
A second type of estimate which has also been called by the name "Hodges–Lehmann" relates to defining a location estimate for a single dataset. In this case, if the dataset contains n data points, then its Cartesian product with itself has n(n + 1)/2 pairs, including the pair of each item taken twice. For each such pair, the average is computed; finally, the median of the n(n + 1)/2 averages is defined to be the Hodges–Lehmann estimator of location. These pairwise averages are called the "Walsh averages".
that is closely related to the median
. The difference between the median and pseudo-median is relatively small, and so this distinction is neglected in elementary discussions. Like the spatial median, the pseudo–median is well defined for all distributions of random variables having dimension two or greater; for one-dimensional distributions, there exists some pseudo–median, which need not be unique, however. Like the median, the pseudo–median is defined for even heavy–tailed distributions that lack any (finite) mean
.
The one-sample Hodges–Lehmann statistic need not estimate any population mean, which for many distributions does not exist. The two-sample Hodges–Lehmann estimator need not estimate the difference of two means or the difference of two (pseudo-)medians; rather, it estimates the differences between the population of the paired random–variables drawn respectively from the populations.
:
Statistics
Statistics is the study of the collection, organization, analysis, and interpretation of data. It deals with all aspects of this, including the planning of data collection in terms of the design of surveys and experiments....
, the Hodges–Lehmann estimator is a robust
Robust statistics
Robust statistics provides an alternative approach to classical statistical methods. The motivation is to produce estimators that are not unduly affected by small departures from model assumptions.- Introduction :...
and nonparametric estimator
Estimation theory
Estimation theory is a branch of statistics and signal processing that deals with estimating the values of parameters based on measured/empirical data that has a random component. The parameters describe an underlying physical setting in such a way that their value affects the distribution of the...
of a population's location parameter
Location parameter
In statistics, a location family is a class of probability distributions that is parametrized by a scalar- or vector-valued parameter μ, which determines the "location" or shift of the distribution...
, the "pseudo–median", which is closely related to the population median
Median
In probability theory and statistics, a median is described as the numerical value separating the higher half of a sample, a population, or a probability distribution, from the lower half. The median of a finite list of numbers can be found by arranging all the observations from lowest value to...
. The Hodges–Lehmann estimator is used not only for the pseudo–median of a single population
Statistical population
A statistical population is a set of entities concerning which statistical inferences are to be drawn, often based on a random sample taken from the population. For example, if we were interested in generalizations about crows, then we would describe the set of crows that is of interest...
but also for the differences
Location parameter
In statistics, a location family is a class of probability distributions that is parametrized by a scalar- or vector-valued parameter μ, which determines the "location" or shift of the distribution...
between members of two populations. The Hodges–Lehmann estimator was proposed in 1963 independently by Pranab Kumar Sen and by Joseph Hodges and Erich Lehmann
Erich Leo Lehmann
Erich Leo Lehmann was an American statistician, who contributed to statistical and nonparametric hypothesis testing...
, and so it is also called the "Hodges–Lehmann–Sen estimator".
Computations
The Hodges–Lehmann estimator estimates the differenceDifference
Difference may refer to:* Difference , a 2005 power metal album* Difference , a concept in computer science* Difference , any systematic way of distinguishing similar coats of arms belonging to members of the same family* Difference , a statement about the relative size or order of two objects**...
between the values in two sets of data. If the two sets of data contain m and n data points respectively, then their Cartesian product
Cartesian product
In mathematics, a Cartesian product is a construction to build a new set out of a number of given sets. Each member of the Cartesian product corresponds to the selection of one element each in every one of those sets...
contains m × n pairs of points (one from each set); each such pair defines one difference of values. The Hodges–Lehmann estimator for the difference is defined as the median
Median
In probability theory and statistics, a median is described as the numerical value separating the higher half of a sample, a population, or a probability distribution, from the lower half. The median of a finite list of numbers can be found by arranging all the observations from lowest value to...
of the m × n differences.
A second type of estimate which has also been called by the name "Hodges–Lehmann" relates to defining a location estimate for a single dataset. In this case, if the dataset contains n data points, then its Cartesian product with itself has n(n + 1)/2 pairs, including the pair of each item taken twice. For each such pair, the average is computed; finally, the median of the n(n + 1)/2 averages is defined to be the Hodges–Lehmann estimator of location. These pairwise averages are called the "Walsh averages".
Estimating the population median
The Hodges–Lehmann statistic estimates the population's "pseudo-median", a location parameterLocation parameter
In statistics, a location family is a class of probability distributions that is parametrized by a scalar- or vector-valued parameter μ, which determines the "location" or shift of the distribution...
that is closely related to the median
Median
In probability theory and statistics, a median is described as the numerical value separating the higher half of a sample, a population, or a probability distribution, from the lower half. The median of a finite list of numbers can be found by arranging all the observations from lowest value to...
. The difference between the median and pseudo-median is relatively small, and so this distinction is neglected in elementary discussions. Like the spatial median, the pseudo–median is well defined for all distributions of random variables having dimension two or greater; for one-dimensional distributions, there exists some pseudo–median, which need not be unique, however. Like the median, the pseudo–median is defined for even heavy–tailed distributions that lack any (finite) mean
Expected value
In probability theory, the expected value of a random variable is the weighted average of all possible values that this random variable can take on...
.
The one-sample Hodges–Lehmann statistic need not estimate any population mean, which for many distributions does not exist. The two-sample Hodges–Lehmann estimator need not estimate the difference of two means or the difference of two (pseudo-)medians; rather, it estimates the differences between the population of the paired random–variables drawn respectively from the populations.
In general statistics
The Hodges–Lehmann univariate statistics have several generalizations in multivariate statisticsMultivariate statistics
Multivariate statistics is a form of statistics encompassing the simultaneous observation and analysis of more than one statistical variable. The application of multivariate statistics is multivariate analysis...
:
- Multivariate ranks and signs
- Spatial sign tests and spatial medians
- Spatial signed-rank tests
- Comparisons of tests and estimates
- Several-sample location problems