Statistic
Encyclopedia
A statistic is a single measure of some attribute of a sample
(e.g. its arithmetic mean value). It is calculated by applying a function
(statistical algorithm
) to the values of the items comprising the sample which are known together as a set of data
.
More formally, statistical theory defines a statistic as a function of a sample where the function itself is independent of the sample's distribution; that is, the function can be stated before realisation of the data. The term statistic is used both for the function and for the value of the function on a given sample.
A statistic is distinct from a statistical parameter
, which is not computable because often the population is much too large to examine and measure all its items. However a statistic, when used to estimate a population parameter, is called an estimator
. For instance, the sample mean is a statistic which estimates the population mean, which is a parameter.
of a sample, for example, the algorithm works by summing all the data
values observed in the sample then divides this sum by the number of data items. This single measure, the mean of the sample, is called a statistic and its value is frequently used as an estimate of the mean value of all items comprising the population from which the sample is drawn. The population mean is also a single measure however it is not called a statistic; instead it is called a population parameter.
Other examples of statistics include
, which differentiates it from a parameter
that is a generally unobservable quantity describing a property of a statistical population
. A parameter can only be computed exactly if the entire population can be observed without error; for instance, in a perfect census or for a population of standardized test
takers.
Statisticians often contemplate a parameterized family of probability distribution
s, any member of which could be the distribution of some measurable aspect of each member of a population, from which a sample is drawn randomly. For example, the parameter may be the average height of 25-year-old men in North America. The height of the members of a sample of 100 such men are measured; the average of those 100 numbers is a statistic. The average of the heights of all members of the population is not a statistic unless that has somehow also been ascertained (such as by measuring every member of the population). The average height of all (in the sense of genetically possible) 25-year-old North American men is a parameter and not a statistic.
, consistency
, sufficiency
, unbiasedness, minimum mean square error, low variance
, robustness
, and computational convenience.
which is defined on the statistic model induced by the statistic. Kullback information measure can also be used.
Sample (statistics)
In statistics, a sample is a subset of a population. Typically, the population is very large, making a census or a complete enumeration of all the values in the population impractical or impossible. The sample represents a subset of manageable size...
(e.g. its arithmetic mean value). It is calculated by applying a function
Function (mathematics)
In mathematics, a function associates one quantity, the argument of the function, also known as the input, with another quantity, the value of the function, also known as the output. A function assigns exactly one output to each input. The argument and the value may be real numbers, but they can...
(statistical algorithm
Algorithm
In mathematics and computer science, an algorithm is an effective method expressed as a finite list of well-defined instructions for calculating a function. Algorithms are used for calculation, data processing, and automated reasoning...
) to the values of the items comprising the sample which are known together as a set of data
Data set
A data set is a collection of data, usually presented in tabular form. Each column represents a particular variable. Each row corresponds to a given member of the data set in question. Its values for each of the variables, such as height and weight of an object or values of random numbers. Each...
.
More formally, statistical theory defines a statistic as a function of a sample where the function itself is independent of the sample's distribution; that is, the function can be stated before realisation of the data. The term statistic is used both for the function and for the value of the function on a given sample.
A statistic is distinct from a statistical parameter
Statistical parameter
A statistical parameter is a parameter that indexes a family of probability distributions. It can be regarded as a numerical characteristic of a population or a model....
, which is not computable because often the population is much too large to examine and measure all its items. However a statistic, when used to estimate a population parameter, is called an estimator
Estimator
In statistics, an estimator is a rule for calculating an estimate of a given quantity based on observed data: thus the rule and its result are distinguished....
. For instance, the sample mean is a statistic which estimates the population mean, which is a parameter.
Examples
In calculating the arithmetic meanArithmetic mean
In mathematics and statistics, the arithmetic mean, often referred to as simply the mean or average when the context is clear, is a method to derive the central tendency of a sample space...
of a sample, for example, the algorithm works by summing all the data
Data
The term data refers to qualitative or quantitative attributes of a variable or set of variables. Data are typically the results of measurements and can be the basis of graphs, images, or observations of a set of variables. Data are often viewed as the lowest level of abstraction from which...
values observed in the sample then divides this sum by the number of data items. This single measure, the mean of the sample, is called a statistic and its value is frequently used as an estimate of the mean value of all items comprising the population from which the sample is drawn. The population mean is also a single measure however it is not called a statistic; instead it is called a population parameter.
Other examples of statistics include
- Sample mean discussed in the example above and sample median
- Sample variance and sample standard deviationStandard deviationStandard deviation is a widely used measure of variability or diversity used in statistics and probability theory. It shows how much variation or "dispersion" there is from the average...
- Sample quantileQuantileQuantiles are points taken at regular intervals from the cumulative distribution function of a random variable. Dividing ordered data into q essentially equal-sized data subsets is the motivation for q-quantiles; the quantiles are the data values marking the boundaries between consecutive subsets...
s besides the medianMedianIn probability theory and statistics, a median is described as the numerical value separating the higher half of a sample, a population, or a probability distribution, from the lower half. The median of a finite list of numbers can be found by arranging all the observations from lowest value to...
, e.g., quartileQuartileIn descriptive statistics, the quartiles of a set of values are the three points that divide the data set into four equal groups, each representing a fourth of the population being sampled...
s and percentilePercentileIn statistics, a percentile is the value of a variable below which a certain percent of observations fall. For example, the 20th percentile is the value below which 20 percent of the observations may be found...
s - t statistics, chi-squared statistics, f statistics
- Order statisticOrder statisticIn statistics, the kth order statistic of a statistical sample is equal to its kth-smallest value. Together with rank statistics, order statistics are among the most fundamental tools in non-parametric statistics and inference....
s, including sample maximum and minimum - Sample momentsMoment (mathematics)In mathematics, a moment is, loosely speaking, a quantitative measure of the shape of a set of points. The "second moment", for example, is widely used and measures the "width" of a set of points in one dimension or in higher dimensions measures the shape of a cloud of points as it could be fit by...
and functions thereof, including kurtosisKurtosisIn probability theory and statistics, kurtosis is any measure of the "peakedness" of the probability distribution of a real-valued random variable...
and skewnessSkewnessIn probability theory and statistics, skewness is a measure of the asymmetry of the probability distribution of a real-valued random variable. The skewness value can be positive or negative, or even undefined... - Various functionalsFunctional (mathematics)In mathematics, and particularly in functional analysis, a functional is a map from a vector space into its underlying scalar field. In other words, it is a function that takes a vector as its input argument, and returns a scalar...
of the empirical distribution functionEmpirical distribution functionIn statistics, the empirical distribution function, or empirical cdf, is the cumulative distribution function associated with the empirical measure of the sample. This cdf is a step function that jumps up by 1/n at each of the n data points. The empirical distribution function estimates the true...
Observability
A statistic is an observable random variableRandom variable
In probability and statistics, a random variable or stochastic variable is, roughly speaking, a variable whose value results from a measurement on some type of random process. Formally, it is a function from a probability space, typically to the real numbers, which is measurable functionmeasurable...
, which differentiates it from a parameter
Statistical parameter
A statistical parameter is a parameter that indexes a family of probability distributions. It can be regarded as a numerical characteristic of a population or a model....
that is a generally unobservable quantity describing a property of a statistical population
Statistical population
A statistical population is a set of entities concerning which statistical inferences are to be drawn, often based on a random sample taken from the population. For example, if we were interested in generalizations about crows, then we would describe the set of crows that is of interest...
. A parameter can only be computed exactly if the entire population can be observed without error; for instance, in a perfect census or for a population of standardized test
Standardized test
A standardized test is a test that is administered and scored in a consistent, or "standard", manner. Standardized tests are designed in such a way that the questions, conditions for administering, scoring procedures, and interpretations are consistent and are administered and scored in a...
takers.
Statisticians often contemplate a parameterized family of probability distribution
Probability distribution
In probability theory, a probability mass, probability density, or probability distribution is a function that describes the probability of a random variable taking certain values....
s, any member of which could be the distribution of some measurable aspect of each member of a population, from which a sample is drawn randomly. For example, the parameter may be the average height of 25-year-old men in North America. The height of the members of a sample of 100 such men are measured; the average of those 100 numbers is a statistic. The average of the heights of all members of the population is not a statistic unless that has somehow also been ascertained (such as by measuring every member of the population). The average height of all (in the sense of genetically possible) 25-year-old North American men is a parameter and not a statistic.
Statistical properties
Important potential properties of statistics include completenessCompleteness (statistics)
In statistics, completeness is a property of a statistic in relation to a model for a set of observed data. In essence, it is a condition which ensures that the parameters of the probability distribution representing the model can all be estimated on the basis of the statistic: it ensures that the...
, consistency
Consistent estimator
In statistics, a sequence of estimators for parameter θ0 is said to be consistent if this sequence converges in probability to θ0...
, sufficiency
Sufficiency (statistics)
In statistics, a sufficient statistic is a statistic which has the property of sufficiency with respect to a statistical model and its associated unknown parameter, meaning that "no other statistic which can be calculated from the same sample provides any additional information as to the value of...
, unbiasedness, minimum mean square error, low variance
Variance
In probability theory and statistics, the variance is a measure of how far a set of numbers is spread out. It is one of several descriptors of a probability distribution, describing how far the numbers lie from the mean . In particular, the variance is one of the moments of a distribution...
, robustness
Robust statistics
Robust statistics provides an alternative approach to classical statistical methods. The motivation is to produce estimators that are not unduly affected by small departures from model assumptions.- Introduction :...
, and computational convenience.
Information of a statistic
Information of a statistic on model parameters can be defined in several ways. The most common one is the Fisher informationFisher information
In mathematical statistics and information theory, the Fisher information is the variance of the score. In Bayesian statistics, the asymptotic distribution of the posterior mode depends on the Fisher information and not on the prior...
which is defined on the statistic model induced by the statistic. Kullback information measure can also be used.
See also
- StatisticsStatisticsStatistics is the study of the collection, organization, analysis, and interpretation of data. It deals with all aspects of this, including the planning of data collection in terms of the design of surveys and experiments....
- Statistical theoryStatistical theoryThe theory of statistics provides a basis for the whole range of techniques, in both study design and data analysis, that are used within applications of statistics. The theory covers approaches to statistical-decision problems and to statistical inference, and the actions and deductions that...
- Descriptive statisticsDescriptive statisticsDescriptive statistics quantitatively describe the main features of a collection of data. Descriptive statistics are distinguished from inferential statistics , in that descriptive statistics aim to summarize a data set, rather than use the data to learn about the population that the data are...
- Statistical hypothesis testingStatistical hypothesis testingA statistical hypothesis test is a method of making decisions using data, whether from a controlled experiment or an observational study . In statistics, a result is called statistically significant if it is unlikely to have occurred by chance alone, according to a pre-determined threshold...
- Well-behaved statisticWell-behaved statisticA well-behaved statistic is a term sometimes used in the theory of statistics to describe part of a procedure. This usage is broadly similar to the use of well-behaved in more general mathematics...