Glossary of probability and statistics
Encyclopedia
The following is a glossary
of terms. It is not intended to be all-inclusive.
Glossary
A glossary, also known as an idioticon, vocabulary, or clavis, is an alphabetical list of terms in a particular domain of knowledge with the definitions for those terms...
of terms. It is not intended to be all-inclusive.
Concerned fields
- Probability theoryProbability theoryProbability theory is the branch of mathematics concerned with analysis of random phenomena. The central objects of probability theory are random variables, stochastic processes, and events: mathematical abstractions of non-deterministic events or measured quantities that may either be single...
- Algebra of random variablesAlgebra of random variablesIn the algebraic axiomatization of probability theory, the primary concept is not that of probability of an event, but rather that of a random variable. Probability distributions are determined by assigning an expectation to each random variable...
(linear algebraLinear algebraLinear algebra is a branch of mathematics that studies vector spaces, also called linear spaces, along with linear functions that input one vector and output another. Such functions are called linear maps and can be represented by matrices if a basis is given. Thus matrix theory is often...
) - StatisticsStatisticsStatistics is the study of the collection, organization, analysis, and interpretation of data. It deals with all aspects of this, including the planning of data collection in terms of the design of surveys and experiments....
- Measure theory
- Estimation theoryEstimation theoryEstimation theory is a branch of statistics and signal processing that deals with estimating the values of parameters based on measured/empirical data that has a random component. The parameters describe an underlying physical setting in such a way that their value affects the distribution of the...
Glossary
- Atomic event : another name for elementary event.
- BiasBias (statistics)A statistic is biased if it is calculated in such a way that it is systematically different from the population parameter of interest. The following lists some types of, or aspects of, bias which should not be considered mutually exclusive:...
can refer either to a sample not being representative of the population, or to the difference between the expected value of an estimator and the true value. - Binary data is data that can take only two values, usually represented by 0 and 1.
- Conditional distributionConditional distributionGiven two jointly distributed random variables X and Y, the conditional probability distribution of Y given X is the probability distribution of Y when X is known to be a particular value...
: Given two jointly distributed random variables X and Y, the conditional probability distribution of Y given X (written "Y | X") is the probability distribution of Y when X is known to be a particular value. - Conditional probabilityConditional probabilityIn probability theory, the "conditional probability of A given B" is the probability of A if B is known to occur. It is commonly notated P, and sometimes P_B. P can be visualised as the probability of event A when the sample space is restricted to event B...
is the probability of some event A, assuming event B. Conditional probability is written P(A|B), and is read "the probability of A, given B". - CompletenessCompleteness (statistics)In statistics, completeness is a property of a statistic in relation to a model for a set of observed data. In essence, it is a condition which ensures that the parameters of the probability distribution representing the model can all be estimated on the basis of the statistic: it ensures that the...
- CorrelationCorrelationIn statistics, dependence refers to any statistical relationship between two random variables or two sets of data. Correlation refers to any of a broad class of statistical relationships involving dependence....
, also called correlation coefficient, is a numeric measure of the strength of linear relationship between two random variables (one can use it to quantify, for example, how shoe size and height are correlated in the population). An example is the Pearson product-moment correlation coefficientPearson product-moment correlation coefficientIn statistics, the Pearson product-moment correlation coefficient is a measure of the correlation between two variables X and Y, giving a value between +1 and −1 inclusive...
, which is found by dividing the covariance of the two variables by the product of their standard deviations. Independent variables have a correlation of 0. - Count data is data arising from countingCountingCounting is the action of finding the number of elements of a finite set of objects. The traditional way of counting consists of continually increasing a counter by a unit for every element of the set, in some order, while marking those elements to avoid visiting the same element more than once,...
that can take only non-negative integer values. - The CovarianceCovarianceIn probability theory and statistics, covariance is a measure of how much two variables change together. Variance is a special case of the covariance when the two variables are identical.- Definition :...
between two random variables X and Y, with expected values and is defined as the expected value of random variable , and is written . It is used for measuring correlation. - Credence A subjective estimate of probability.
- A data setData setA data set is a collection of data, usually presented in tabular form. Each column represents a particular variable. Each row corresponds to a given member of the data set in question. Its values for each of the variables, such as height and weight of an object or values of random numbers. Each...
is a sample and the associated data points. - A data pointData pointIn statistics, a data point is a set of measurements on a single member of a statistical population, or a subset of those measurements for a given individual...
is a typed measurement - it can be a boolean value, a real number, a vector (in which case it's also called a data vector), etc. - A Distribution function is the function that gives the probability distribution of a random variable. It cannot be negative, and its integralIntegralIntegration is an important concept in mathematics and, together with its inverse, differentiation, is one of the two main operations in calculus...
on the probability space is equal to 1. - EfficiencyEfficiency (statistics)In statistics, an efficient estimator is an estimator that estimates the quantity of interest in some “best possible” manner. The notion of “best possible” relies upon the choice of a particular loss function — the function which quantifies the relative degree of undesirability of estimation errors...
- An Elementary eventElementary eventIn probability theory, an elementary event or atomic event is a singleton of a sample space. An outcome is an element of a sample space. An elementary event is a set containing exactly one outcome, not the outcome itself...
(or atomic event) is an event with only one element. For example, when pulling a card out of a deck, "getting the jack of spades" is an elementary event, while "getting a king or an ace" is not. - EstimatorEstimatorIn statistics, an estimator is a rule for calculating an estimate of a given quantity based on observed data: thus the rule and its result are distinguished....
is a function of the known data that is used to estimate an unknown parameter; an estimate is the result from the actual application of the function to a particular set of data. The mean can be used as an estimator. - The Expected valueExpected valueIn probability theory, the expected value of a random variable is the weighted average of all possible values that this random variable can take on...
(or expectation) of a random variable is the sum of the probability of each possible outcome of the experiment multiplied by its payoff ("value"). Thus, it represents the average amount one "expects" to win per bet if bets with identical odds are repeated many times. For example, the expected value of a six-sided die roll is 3.5. The concept is similar to the mean. The expected value of random variable X is typically written E(X) or (muMu (letter)Carlos Alberto Vives Restrepo is a Grammy Award and three-time Latin Grammy Award winning-Colombian singer, composer and actor.-Biography:...
). - ExperimentExperimentAn experiment is a methodical procedure carried out with the goal of verifying, falsifying, or establishing the validity of a hypothesis. Experiments vary greatly in their goal and scale, but always rely on repeatable procedure and logical analysis of the results...
- An eventEvent (probability theory)In probability theory, an event is a set of outcomes to which a probability is assigned. Typically, when the sample space is finite, any subset of the sample space is an event...
is a subset of the sample space, to which a probability can be assigned. For example, on rolling a die, "getting a five or a six" is an event (with a probability of one third if the die is fair). - Generating functionGenerating functionIn mathematics, a generating function is a formal power series in one indeterminate, whose coefficients encode information about a sequence of numbers an that is indexed by the natural numbers. Generating functions were first introduced by Abraham de Moivre in 1730, in order to solve the general...
- Independence or Statistical independenceStatistical independenceIn probability theory, to say that two events are independent intuitively means that the occurrence of one event makes it neither more nor less probable that the other occurs...
: Two events are independent if the outcome of one does not affect that of the other (for example, getting a 1 on one die roll does not affect the probability of getting a 1 on a second roll). Similarly, when we assert that two random variables are independent, we intuitively mean that knowing something about the value of one of them does not yield any information about the value of the other. - Joint distributionJoint distributionIn the study of probability, given two random variables X and Y that are defined on the same probability space, the joint distribution for X and Y defines the probability of events defined in terms of both X and Y...
: Given two random variables X and Y, the joint distribution of X and Y is the probability distribution of X and Y together. - Joint probability is the probability of two events occurring together. The joint probability of A and B is written or
- KurtosisKurtosisIn probability theory and statistics, kurtosis is any measure of the "peakedness" of the probability distribution of a real-valued random variable...
is a measure of the "peakedness" of the probability distribution of a real-valued random variable. Higher kurtosis means more of the variance is due to infrequent extreme deviations, as opposed to frequent modestly sized deviations. - A likelihood functionLikelihood functionIn statistics, a likelihood function is a function of the parameters of a statistical model, defined as follows: the likelihood of a set of parameter values given some observed outcomes is equal to the probability of those observed outcomes given those parameter values...
(or just likelihood) is a conditional probability function considered a function of its second argument with its first argument held fixed. For example, imagine pulling a numbered ball with the number k from a bag of n balls, numbered 1 to n. Then you could describe a likelihood function for the random variable N as the probability of getting k given that there are n balls : the likelihood will be 1/n for n greater or equal to k, and 0 for n smaller than k. Unlike a probability distribution function, this likelihood function will not sum up to 1 on the sample space. - Marginal distributionMarginal distributionIn probability theory and statistics, the marginal distribution of a subset of a collection of random variables is the probability distribution of the variables contained in the subset. The term marginal variable is used to refer to those variables in the subset of variables being retained...
: given two jointly distributed random variables X and Y, the marginal distribution of X is simply the probability distribution of X ignoring information about Y. - Marginal probability is the probability of an event, ignoring any information about other events. The marginal probability of A is written P(A). Contrast with conditional probability.
- The MeanMeanIn statistics, mean has two related meanings:* the arithmetic mean .* the expected value of a random variable, which is also called the population mean....
of a random variable is its expected value. The mean (or sample mean of a data set is just the average value. - Moment about the mean
- Mutual independence : A collection of events is mutually independent if for any subset of the collection, the joint probability of all events occurring is equal to the product of the joint probabilities of the individual events. Think of the result of a series of coin-flips. This is a stronger condition than pairwise independence.
- Pairwise independencePairwise independenceIn probability theory, a pairwise independent collection of random variables is a set of random variables any two of which are independent. Any collection of mutually independent random variables is pairwise independent, but some pairwise independent collections are not mutually independent...
: a pairwise independent collection of random variables is a set of random variables any two of which are independent. - ParameterStatistical parameterA statistical parameter is a parameter that indexes a family of probability distributions. It can be regarded as a numerical characteristic of a population or a model....
, or "statistical parameter" : Can be a population parameter, a distribution parameter, an unobserved parameter (with different shades of meaning). In statistics, this is often a quantity to be estimated. - Prior probabilityPrior probabilityIn Bayesian statistical inference, a prior probability distribution, often called simply the prior, of an uncertain quantity p is the probability distribution that would express one's uncertainty about p before the "data"...
: in Bayesian inferenceBayesian inferenceIn statistics, Bayesian inference is a method of statistical inference. It is often used in science and engineering to determine model parameters, make predictions about unknown variables, and to perform model selection...
, this represents prior beliefs or other information that is available before new data or observations are taken into account. - A population or statistical populationStatistical populationA statistical population is a set of entities concerning which statistical inferences are to be drawn, often based on a random sample taken from the population. For example, if we were interested in generalizations about crows, then we would describe the set of crows that is of interest...
is a set of entities about which statistical inferences are to be drawn, often based on random sampling. One can also talk about a population of measurements or values. - Population parameter : See statistical parameter.
- Posterior probabilityPosterior probabilityIn Bayesian statistics, the posterior probability of a random event or an uncertain proposition is the conditional probability that is assigned after the relevant evidence is taken into account...
: the result of a Bayesian analysis that encapsulates the combination of prior beliefs or information with observed data - Probability density is used to describe probability in a continuous probability distribution. For example, you can't say that the probability of a man being six feet tall is 20%, but you can say he has 20% of chances of being between five and six feet tall. Probability density is given by a probability density function. Contrast with probability mass.
- A probability density functionProbability density functionIn probability theory, a probability density function , or density of a continuous random variable is a function that describes the relative likelihood for this random variable to occur at a given point. The probability for the random variable to fall within a particular region is given by the...
gives the probability distribution for a continuous random variable. - A probability distributionProbability distributionIn probability theory, a probability mass, probability density, or probability distribution is a function that describes the probability of a random variable taking certain values....
is a function that gives the probability of all elements in a given space: see List of probability distributions - Probability interpretationsProbability interpretationsThe word probability has been used in a variety of ways since it was first coined in relation to games of chance. Does probability measure the real, physical tendency of something to occur, or is it just a measure of how strongly one believes it will occur? In answering such questions, we...
- A Probability measureProbability measureIn mathematics, a probability measure is a real-valued function defined on a set of events in a probability space that satisfies measure properties such as countable additivity...
gives the probability of events in a probability space. - A probability spaceProbability spaceIn probability theory, a probability space or a probability triple is a mathematical construct that models a real-world process consisting of states that occur randomly. A probability space is constructed with a specific kind of situation or experiment in mind...
is a sample space over which a probability measure has been defined. - Random functionRandom functionA random function is a function chosen at random from a finite family of functions. Typically, the family consists of the set of all maps from the domain to the codomain. Thus, a random function can be considered to map each input independently at random to any one of the possible outputs. Viewed...
- A random variableRandom variableIn probability and statistics, a random variable or stochastic variable is, roughly speaking, a variable whose value results from a measurement on some type of random process. Formally, it is a function from a probability space, typically to the real numbers, which is measurable functionmeasurable...
can be, for example, the possible outcomes of a dice roll (but it is not assigned a value). The distribution function of a random variable gives the probability of different results. We can also derive the mean and variance of a random variable.- Discrete random variable
- Continuous random variable
- A Random vector (or multivariate random variableMultivariate random variableIn mathematics, probability, and statistics, a multivariate random variable or random vector is a list of mathematical variables each of whose values is unknown, either because the value has not yet occurred or because there is imperfect knowledge of its value.More formally, a multivariate random...
) is a vector whose components are random variables on the same probability space. - The RangeRange (statistics)In the descriptive statistics, the range is the length of the smallest interval which contains all the data. It is calculated by subtracting the smallest observation from the greatest and provides an indication of statistical dispersion.It is measured in the same units as the data...
is the length of the smallest interval which contains all the data. - A sample is that part of a population which is actually observed.
- The sample space is the set of possible outcomes of an experiment. For example, the sample space for rolling a six-sided die will be {1, 2, 3, 4, 5, 6}.
- SamplingSampling (statistics)In statistics and survey methodology, sampling is concerned with the selection of a subset of individuals from within a population to estimate characteristics of the whole population....
is a process of selecting observations to obtain knowledge about a population. There are many methods to choose on which sample to do the observations. - A sampling distributionSampling distributionIn statistics, a sampling distribution or finite-sample distribution is the probability distribution of a given statistic based on a random sample. Sampling distributions are important in statistics because they provide a major simplification on the route to statistical inference...
is the probability distribution, under repeated sampling of the population, of a given statistic. - SkewnessSkewnessIn probability theory and statistics, skewness is a measure of the asymmetry of the probability distribution of a real-valued random variable. The skewness value can be positive or negative, or even undefined...
is a measure of the asymmetry of the probability distribution of a real-valued random variable. Roughly speaking, a distribution has positive skew (right-skewed) if the higher tail is longer and negative skew (left-skewed) if the lower tail is longer (confusing the two is a common error). - The standard deviationStandard deviationStandard deviation is a widely used measure of variability or diversity used in statistics and probability theory. It shows how much variation or "dispersion" there is from the average...
is the most commonly used measure of statistical dispersion. It is the square rootSquare rootIn mathematics, a square root of a number x is a number r such that r2 = x, or, in other words, a number r whose square is x...
of the variance, and is generally written (sigma). - Standardized moment
- A statisticStatisticA statistic is a single measure of some attribute of a sample . It is calculated by applying a function to the values of the items comprising the sample which are known together as a set of data.More formally, statistical theory defines a statistic as a function of a sample where the function...
is the result of applying a statistical algorithm to a data set. It can also be described as an observable random variable. - Statistical inferenceStatistical inferenceIn statistics, statistical inference is the process of drawing conclusions from data that are subject to random variation, for example, observational errors or sampling variation...
is inference about a population from a random sample drawn from it or, more generally, about a random process from its observed behavior during a finite period of time. - Statistical dispersionStatistical dispersionIn statistics, statistical dispersion is variability or spread in a variable or a probability distribution...
(also called statistical variability) is a measure of how diverse some data is. It can be expressed by the variance or the standard deviation. - A Statistical parameterStatistical parameterA statistical parameter is a parameter that indexes a family of probability distributions. It can be regarded as a numerical characteristic of a population or a model....
is a parameter that indexes a family of probability distributions. - SufficiencySufficiency (statistics)In statistics, a sufficient statistic is a statistic which has the property of sufficiency with respect to a statistical model and its associated unknown parameter, meaning that "no other statistic which can be calculated from the same sample provides any additional information as to the value of...
- The varianceVarianceIn probability theory and statistics, the variance is a measure of how far a set of numbers is spread out. It is one of several descriptors of a probability distribution, describing how far the numbers lie from the mean . In particular, the variance is one of the moments of a distribution...
of a random variable is a measure of its statistical dispersion, indicating how far from the expected value its values typically are. The variance of random variable X is typically designated as , , or simply .
See also
- Notation in probability and statistics
- Probability axiomsProbability axiomsIn probability theory, the probability P of some event E, denoted P, is usually defined in such a way that P satisfies the Kolmogorov axioms, named after Andrey Kolmogorov, which are described below....
- Glossary of experimental designGlossary of experimental design- Glossary :* Alias: When the estimate of an effect also includes the influence of one or more other effects the effects are said to be aliased . For example, if the estimate of effect D in a four factor experiment actually estimates , then the main effect D is aliased with the 3-way interaction ABC...
- List of statistical topics
- List of probability topics