Mode (statistics)
Encyclopedia
In statistics
, the mode is the value that occurs most frequently in a data set
or a probability distribution
. In some fields, notably education, sample data are often called scores, and the sample mode is known as the modal score.
Like the statistical mean
and the median
, the mode is a way of capturing important information
about a random variable
or a population in a single quantity. The mode is in general different from the mean and median, and may be very different for strongly skewed distributions.
The mode is not necessarily unique, since the same maximum frequency may be attained at different values. The most ambiguous case occurs in uniform distributions, wherein all values are equally likely.
takes its maximum value. In other words, it is the value that is most likely to be sampled.
The mode of a continuous probability distribution is the value x at which its probability density function
attains its maximum value, so, informally speaking, the mode is at the peak.
As noted above, the mode is not necessarily unique, since the probability mass function or probability density function may achieve its maximum value at several points x1, x2, etc.
The above definition tells us that only global maxima are modes. Slightly confusingly, when a probability density function has multiple local maxima it is common to refer to all of the local maxima as modes of the distribution. Such a continuous distribution is called multimodal (as opposed to unimodal).
In symmetric
unimodal distributions, such as the normal (or Gaussian) distribution (the distribution whose density function, when graphed, gives the famous "bell curve"), the mean (if defined), median and mode all coincide. For samples, if it is known that they are drawn from a symmetric distribution, the sample mean can be used as an estimate of the population mode.
For a sample from a continuous distribution, such as [0.935..., 1.211..., 2.430..., 3.668..., 3.874...], the concept is unusable in its raw form, since each value will occur precisely once. The usual practice is to discretize the data by assigning frequency values to interval
s of equal distance, as for making a histogram
, effectively replacing the values by the midpoints of the
intervals they are assigned to. The mode is then the value where the histogram reaches its peak. For small or middle-sized samples the outcome of this procedure is sensitive to the choice of interval width if chosen too narrow or too wide; typically one should have a sizable fraction of the data concentrated in a relatively small number of intervals (5 to 10), while the fraction of the data falling outside these intervals is also sizable. An alternate approach is kernel density estimation
, which essentially blurs point samples to produce a continuous estimate of the probability density function which can provide an estimate of the mode.
The following MATLAB
code example computes the mode of a sample:
The algorithm requires as a first step to sort the sample in ascending order. It then computes the discrete derivative of the sorted list, and finds the indices where this derivative is positive. Next it computes the discrete derivative of this set of indices, locating the maximum of this derivative of indices, and finally evaluates the sorted sample at the point where that maximum occurs, which corresponds to the last member of the stretch of repeated values.
s, one might find that "Kim
" occurs more often than any other name. Then "Kim" would be the mode of the sample. In any voting system where a plurality determines victory, a single modal value determines the victor, while a multi-modal outcome would require some tie-breaking procedure to take place.
Unlike median, the concept of mean makes sense for any random variable assuming values from a vector space
, including the real number
s (a one-dimension
al vector space) and the integer
s (which can be considered embedded in the reals). For example, a distribution of points in the plane
will typically have a mean and a mode, but the concept of median does not apply. The median makes sense when there is a linear order on the possible values. Generalizations of the concept of median to higher-dimensional spaces are the geometric median
and the centerpoint
.
For the remainder, the assumption is that we have (a sample of) a real-valued random variable.
For some probability distributions, the expected value may be infinite or undefined, but if defined, it is unique. The mean of a (finite) sample is always defined. The median is the value such that the fractions not exceeding it and not falling below it are both at least 1/2. It is not necessarily unique, but never infinite or totally undefined. For a data sample it is the "halfway" value when the list of values is ordered in increasing value, where usually for a list of even length the numerical average is taken of the two values closest to "halfway". Finally, as said before, the mode is not necessarily unique. Certain pathological
distributions (for example, the Cantor distribution) have no defined mode at all. For a finite data sample, the mode is one (or more) of the values in the sample.
distribution is personal wealth
: Few people are very rich, but among those some are extremely rich. However, many are rather poor.
A well-known class of distributions that can be arbitrarily skewed is given by the log-normal distribution. It is obtained by transforming a random variable X having a normal distribution into random variable Y = eX. Then the logarithm of random variable Y is normally distributed, hence the name.
Taking the mean μ of X to be 0, the median of Y will be 1, independent of the standard deviation
σ of X. This is so because X has a symmetric distribution, so its median is also 0. The transformation from X to Y is monotonic, and so we find the median e0 = 1 for Y.
When X has standard deviation σ = 0.25, the distribution of Y is weakly skewed. Using formulas for the log-normal distribution, we find:
Indeed, the median is about one third on the way from mean to mode.
When X has a larger standard deviation, σ = 1, the distribution of Y is strongly skewed. Now
Here, Pearson's rule of thumb fails.
Statistics
Statistics is the study of the collection, organization, analysis, and interpretation of data. It deals with all aspects of this, including the planning of data collection in terms of the design of surveys and experiments....
, the mode is the value that occurs most frequently in a data set
Data set
A data set is a collection of data, usually presented in tabular form. Each column represents a particular variable. Each row corresponds to a given member of the data set in question. Its values for each of the variables, such as height and weight of an object or values of random numbers. Each...
or a probability distribution
Probability distribution
In probability theory, a probability mass, probability density, or probability distribution is a function that describes the probability of a random variable taking certain values....
. In some fields, notably education, sample data are often called scores, and the sample mode is known as the modal score.
Like the statistical mean
Mean
In statistics, mean has two related meanings:* the arithmetic mean .* the expected value of a random variable, which is also called the population mean....
and the median
Median
In probability theory and statistics, a median is described as the numerical value separating the higher half of a sample, a population, or a probability distribution, from the lower half. The median of a finite list of numbers can be found by arranging all the observations from lowest value to...
, the mode is a way of capturing important information
Summary statistics
In descriptive statistics, summary statistics are used to summarize a set of observations, in order to communicate the largest amount as simply as possible...
about a random variable
Random variable
In probability and statistics, a random variable or stochastic variable is, roughly speaking, a variable whose value results from a measurement on some type of random process. Formally, it is a function from a probability space, typically to the real numbers, which is measurable functionmeasurable...
or a population in a single quantity. The mode is in general different from the mean and median, and may be very different for strongly skewed distributions.
The mode is not necessarily unique, since the same maximum frequency may be attained at different values. The most ambiguous case occurs in uniform distributions, wherein all values are equally likely.
Mode of a probability distribution
The mode of a discrete probability distribution is the value x at which its probability mass functionProbability mass function
In probability theory and statistics, a probability mass function is a function that gives the probability that a discrete random variable is exactly equal to some value...
takes its maximum value. In other words, it is the value that is most likely to be sampled.
The mode of a continuous probability distribution is the value x at which its probability density function
Probability density function
In probability theory, a probability density function , or density of a continuous random variable is a function that describes the relative likelihood for this random variable to occur at a given point. The probability for the random variable to fall within a particular region is given by the...
attains its maximum value, so, informally speaking, the mode is at the peak.
As noted above, the mode is not necessarily unique, since the probability mass function or probability density function may achieve its maximum value at several points x1, x2, etc.
The above definition tells us that only global maxima are modes. Slightly confusingly, when a probability density function has multiple local maxima it is common to refer to all of the local maxima as modes of the distribution. Such a continuous distribution is called multimodal (as opposed to unimodal).
In symmetric
Reflection symmetry
Reflection symmetry, reflectional symmetry, line symmetry, mirror symmetry, mirror-image symmetry, or bilateral symmetry is symmetry with respect to reflection. That is, a figure which does not change upon undergoing a reflection has reflectional symmetry.In 2D there is a line of symmetry, in 3D a...
unimodal distributions, such as the normal (or Gaussian) distribution (the distribution whose density function, when graphed, gives the famous "bell curve"), the mean (if defined), median and mode all coincide. For samples, if it is known that they are drawn from a symmetric distribution, the sample mean can be used as an estimate of the population mode.
Mode of a sample
The mode of a data sample is the element that occurs most often in the collection. For example, the mode of the sample [1, 3, 6, 6, 6, 6, 7, 7, 12, 12, 17] is 6. Given the list of data [1, 1, 2, 4, 4] the mode is not unique - the dataset may be said to be bimodal, while a set with more than two modes may be described as multimodal.For a sample from a continuous distribution, such as [0.935..., 1.211..., 2.430..., 3.668..., 3.874...], the concept is unusable in its raw form, since each value will occur precisely once. The usual practice is to discretize the data by assigning frequency values to interval
Interval (mathematics)
In mathematics, a interval is a set of real numbers with the property that any number that lies between two numbers in the set is also included in the set. For example, the set of all numbers satisfying is an interval which contains and , as well as all numbers between them...
s of equal distance, as for making a histogram
Histogram
In statistics, a histogram is a graphical representation showing a visual impression of the distribution of data. It is an estimate of the probability distribution of a continuous variable and was first introduced by Karl Pearson...
, effectively replacing the values by the midpoints of the
intervals they are assigned to. The mode is then the value where the histogram reaches its peak. For small or middle-sized samples the outcome of this procedure is sensitive to the choice of interval width if chosen too narrow or too wide; typically one should have a sizable fraction of the data concentrated in a relatively small number of intervals (5 to 10), while the fraction of the data falling outside these intervals is also sizable. An alternate approach is kernel density estimation
Kernel density estimation
In statistics, kernel density estimation is a non-parametric way of estimating the probability density function of a random variable. Kernel density estimation is a fundamental data smoothing problem where inferences about the population are made, based on a finite data sample...
, which essentially blurs point samples to produce a continuous estimate of the probability density function which can provide an estimate of the mode.
The following MATLAB
MATLAB
MATLAB is a numerical computing environment and fourth-generation programming language. Developed by MathWorks, MATLAB allows matrix manipulations, plotting of functions and data, implementation of algorithms, creation of user interfaces, and interfacing with programs written in other languages,...
code example computes the mode of a sample:
The algorithm requires as a first step to sort the sample in ascending order. It then computes the discrete derivative of the sorted list, and finds the indices where this derivative is positive. Next it computes the discrete derivative of this set of indices, locating the maximum of this derivative of indices, and finally evaluates the sorted sample at the point where that maximum occurs, which corresponds to the last member of the stretch of repeated values.
Comparison of mean, median and mode
Type | Description | Example | Result |
---|---|---|---|
Arithmetic mean Arithmetic mean In mathematics and statistics, the arithmetic mean, often referred to as simply the mean or average when the context is clear, is a method to derive the central tendency of a sample space... |
Sum divided by number of values: | (1+2+2+3+4+7+9) / 7 | 4 |
Median Median In probability theory and statistics, a median is described as the numerical value separating the higher half of a sample, a population, or a probability distribution, from the lower half. The median of a finite list of numbers can be found by arranging all the observations from lowest value to... |
Middle value separating the greater and lesser halves of a data set | 1, 2, 2, 3, 4, 7, 9 | 3 |
Mode Mode (statistics) In statistics, the mode is the value that occurs most frequently in a data set or a probability distribution. In some fields, notably education, sample data are often called scores, and the sample mode is known as the modal score.... |
Most frequent value in a data set | 1, 2, 2, 3, 4, 7, 9 | 2 |
When do these measures make sense?
Unlike mean and median, the concept of mode also makes sense for "nominal data" (i.e., not consisting of numerical values). For example, taking a sample of Korean family nameKorean name
A Korean name consists of a family name followed by a given name, as used by the Korean people in both North Korea and South Korea. In the Korean language, 'ireum' or 'seong-myeong' usually refers to the family name and given name together...
s, one might find that "Kim
Kim (Korean name)
Kim, sometimes spelled Gim, is the most common family name in Korea. The name is common in both modern-day North Korea and South Korea. The hanja used for the name means "gold," and although the character is usually pronounced 금 geum, it is pronounced 김 gim when used for the family name and...
" occurs more often than any other name. Then "Kim" would be the mode of the sample. In any voting system where a plurality determines victory, a single modal value determines the victor, while a multi-modal outcome would require some tie-breaking procedure to take place.
Unlike median, the concept of mean makes sense for any random variable assuming values from a vector space
Vector space
A vector space is a mathematical structure formed by a collection of vectors: objects that may be added together and multiplied by numbers, called scalars in this context. Scalars are often taken to be real numbers, but one may also consider vector spaces with scalar multiplication by complex...
, including the real number
Real number
In mathematics, a real number is a value that represents a quantity along a continuum, such as -5 , 4/3 , 8.6 , √2 and π...
s (a one-dimension
Dimension
In physics and mathematics, the dimension of a space or object is informally defined as the minimum number of coordinates needed to specify any point within it. Thus a line has a dimension of one because only one coordinate is needed to specify a point on it...
al vector space) and the integer
Integer
The integers are formed by the natural numbers together with the negatives of the non-zero natural numbers .They are known as Positive and Negative Integers respectively...
s (which can be considered embedded in the reals). For example, a distribution of points in the plane
Plane (mathematics)
In mathematics, a plane is a flat, two-dimensional surface. A plane is the two dimensional analogue of a point , a line and a space...
will typically have a mean and a mode, but the concept of median does not apply. The median makes sense when there is a linear order on the possible values. Generalizations of the concept of median to higher-dimensional spaces are the geometric median
Geometric median
The geometric median of a discrete set of sample points in a Euclidean space is the point minimizing the sum of distances to the sample points. This generalizes the median, which has the property of minimizing the sum of distances for one-dimensional data, and provides a central tendency in higher...
and the centerpoint
Centerpoint (geometry)
In statistics and computational geometry, the notion of centerpoint is a generalization of the median to data in higher-dimensional Euclidean space...
.
Uniqueness and definedness
For some probability distributions, the expected value may be infinite or undefined, but if defined, it is unique. The mean of a (finite) sample is always defined. The median is the value such that the fractions not exceeding it and not falling below it are both at least 1/2. It is not necessarily unique, but never infinite or totally undefined. For a data sample it is the "halfway" value when the list of values is ordered in increasing value, where usually for a list of even length the numerical average is taken of the two values closest to "halfway". Finally, as said before, the mode is not necessarily unique. Certain pathological
Pathological (mathematics)
In mathematics, a pathological phenomenon is one whose properties are considered atypically bad or counterintuitive; the opposite is well-behaved....
distributions (for example, the Cantor distribution) have no defined mode at all. For a finite data sample, the mode is one (or more) of the values in the sample.
Properties
Assuming definedness, and for simplicity uniqueness, the following are some of the most interesting properties.- All three measures have the following property: If the random variable (or each value from the sample) is subjected to the linear or affine transformationAffine transformationIn geometry, an affine transformation or affine map or an affinity is a transformation which preserves straight lines. It is the most general class of transformations with this property...
which replaces X by aX+b, so are the mean, median and mode. - However, if there is an arbitrary monotonic transformation, only the median follows; for example, if X is replaced by exp(X), the median changes from m to exp(m) but the mean and mode won't.
- Except for extremely small samples, the mode is insensitive to "outliers" (such as occasional, rare, false experimental readings). The median is also very robust in the presence of outliers, while the mean is rather sensitive.
- In continuous unimodal distributions the median lies, as a rule of thumb, between the mean and the mode, about one third of the way going from mean to mode. In a formula, median ≈ (2 × mean + mode)/3. This rule, due to Karl PearsonKarl PearsonKarl Pearson FRS was an influential English mathematician who has been credited for establishing the disciplineof mathematical statistics....
, often applies to slightly non-symmetric distributions that resemble a normal distribution, but it is not always true and in general the three statistics can appear in any order. - For unimodal distributions, the mode is within standard deviations of the mean, and the root mean square deviation about the mode is between the standard deviation and twice the standard deviation.
Example for a skewed distribution
An example of a skewedSkewness
In probability theory and statistics, skewness is a measure of the asymmetry of the probability distribution of a real-valued random variable. The skewness value can be positive or negative, or even undefined...
distribution is personal wealth
Distribution of wealth
The distribution of wealth is a comparison of the wealth of various members or groups in a society. It differs from the distribution of income in that it looks at the distribution of ownership of the assets in a society, rather than the current income of members of that society.-Definition of...
: Few people are very rich, but among those some are extremely rich. However, many are rather poor.
A well-known class of distributions that can be arbitrarily skewed is given by the log-normal distribution. It is obtained by transforming a random variable X having a normal distribution into random variable Y = eX. Then the logarithm of random variable Y is normally distributed, hence the name.
Taking the mean μ of X to be 0, the median of Y will be 1, independent of the standard deviation
Standard deviation
Standard deviation is a widely used measure of variability or diversity used in statistics and probability theory. It shows how much variation or "dispersion" there is from the average...
σ of X. This is so because X has a symmetric distribution, so its median is also 0. The transformation from X to Y is monotonic, and so we find the median e0 = 1 for Y.
When X has standard deviation σ = 0.25, the distribution of Y is weakly skewed. Using formulas for the log-normal distribution, we find:
Indeed, the median is about one third on the way from mean to mode.
When X has a larger standard deviation, σ = 1, the distribution of Y is strongly skewed. Now
Here, Pearson's rule of thumb fails.
See also
- unimodal function
- summary statisticsSummary statisticsIn descriptive statistics, summary statistics are used to summarize a set of observations, in order to communicate the largest amount as simply as possible...
- descriptive statisticsDescriptive statisticsDescriptive statistics quantitatively describe the main features of a collection of data. Descriptive statistics are distinguished from inferential statistics , in that descriptive statistics aim to summarize a data set, rather than use the data to learn about the population that the data are...
- central tendencyCentral tendencyIn statistics, the term central tendency relates to the way in which quantitative data is clustered around some value. A measure of central tendency is a way of specifying - central value...
- meanMeanIn statistics, mean has two related meanings:* the arithmetic mean .* the expected value of a random variable, which is also called the population mean....
- medianMedianIn probability theory and statistics, a median is described as the numerical value separating the higher half of a sample, a population, or a probability distribution, from the lower half. The median of a finite list of numbers can be found by arranging all the observations from lowest value to...
- mean
- arg max
- moment (mathematics)Moment (mathematics)In mathematics, a moment is, loosely speaking, a quantitative measure of the shape of a set of points. The "second moment", for example, is widely used and measures the "width" of a set of points in one dimension or in higher dimensions measures the shape of a cloud of points as it could be fit by...