Statistical inference
Encyclopedia
In statistics
, statistical inference is the process of drawing conclusions from data that are subject to random variation, for example, observational errors or sampling variation. More substantially, the terms statistical inference, statistical induction and inferential statistics are used to describe systems of procedures that can be used to draw conclusions from datasets arising from systems affected by random variation. Initial requirements of such a system of procedures for inference
and induction
are that the system should produce reasonable answers when applied to well-defined situations and that it should be general enough to be applied across a range of situations.
The outcome of statistical inference may be an answer to the question "what should be done next?", where this might be a decision about making further experiments or surveys, or about drawing a conclusion before implementing some organizational or governmental policy.
The conclusion of a statistical inference is a statistical proposition
. Some common forms of statistical proposition are:
. In simple terms, descriptive statistics can be thought of as being just a straightforward presentation of facts, in which modeling decisions made by a data analyst have had minimal influence. A complete statistical analysis will nearly always include both descriptive statistics and statistical inference, and will often progress in a series of steps where the emphasis moves gradually from description to inference.
Incorrect assumptions of 'simple' random sampling
can invalidate statistical inference. More complex semi- and fully parametric assumptions are also cause for concern. For example, incorrectly assuming the Cox model can in some cases lead to faulty conclusions. Incorrect assumptions of Normality in the population also invalidates some forms of regression-based inference. The use of any parametric model is viewed skeptically by most experts in sampling human populations: "most sampling statisticians, when they deal with confidence intervals at all, limit themselves to statements about [estimators] based on very large samples, where the central limit theorem ensures that these [estimators] will have distributions that are nearly normal." In particular, a normal distribution "would be a totally unrealistic and catastrophically unwise assumption to make if we were dealing with any kind of economic population." Here, the central limit theorem states that the distribution of the sample mean "for very large samples" is approximately normally distributed, if the distribution is not heavy tailed.
With finite samples, approximation results
measure how close a limiting distribution approaches the statistic's sample distribution: For example, with 10,000 independent samples the normal distribution
approximates (to two digits of accuracy) the distribution of the sample mean for many population distributions, by the Berry–Esseen theorem
.
Yet for many practical purposes, the normal approximation provides a good approximation to the sample-mean's distribution when there are 10 (or more) independent samples, according to simulation studies, and statisticians' experience. Following Kolmogorov's work in the 1950s, advanced statistics uses approximation theory
and functional analysis
to quantify the error of approximation. In this approach, the metric geometry of probability distribution
s is studied; this approach quantifies approximation error with, for example, the Kullback–Leibler distance, Bregman divergence
, and the Hellinger distance
.
With infinite samples, limiting results
like the central limit theorem
describe the sample statistic's limiting distribution, if one exists. Limiting results are not statements about finite samples, and indeed are logically irrelevant
to finite samples. However, the asymptotic theory of limiting distributions is often invoked for work in estimation and testing. For example, limiting results are often invoked to justify the generalized method of moments
and the use of generalized estimating equations, which are popular in econometrics
and biostatistics
. The magnitude of the difference between the limiting distribution and the true distribution (formally, the 'error' of the approximation) can be assessed using simulation:. The use of limiting results in this way works well in many applications, especially with low-dimensional models
with log-concave likelihood
s (such as with one-parameter exponential families).
, randomization is also of importance: in survey sampling
, use of sampling without replacement ensures the exchangeability of the sample with the population; in randomized experiments, randomization warrants a missing at random assumption for covariate
information.
Objective randomization allows properly inductive procedures.
Many statisticians prefer randomization-based analysis of data that was generated by well-defined randomization procedures. (However, it is true that in fields of science with developed theoretical knowledge and experimental control, randomized experiments may increase the costs of experimentation without improving the quality of inferences.)
Similarly, results from randomized experiment
s are recommended by leading statistical authorities as allowing inferences with greater reliability than do observational studies of the same phenomena.
However, a good observational study may be better than a bad randomized experiment.
The statistical analysis of a randomized experiment may be based on the randomization scheme stated in the experimental protocol and does not need a subjective model.
However, not all hypotheses can be tested by randomized experiments or random samples, which often require a large budget, a lot of expertise and time, and may have ethical problems.
and Bayesian inference
, which are both summarized below.
; that is, in terms of repeated sampling from a population. (In contrast, Bayesian inference calibrates procedures with regard to epistemological uncertainty
, described as a probability measure)
The frequentist calibration of procedures can be done without regard to utility functions. However, some elements of frequentist statistics, such as statistical decision theory, do incorporate utility functions. In particular, frequentist developments of optimal inference (such as minimum-variance unbiased estimator
s, or uniformly most powerful test
ing) make use of loss function
s, which play the role of (negative) utility functions. Loss functions must be explicitly stated for statistical theorists to prove that a statistical procedure has an optimality property. For example, median-unbiased estimators are optimal under absolute value
loss functions, and least squares
estimators are optimal under squared error loss functions.
While statisticians using frequentist inference must choose for themselves the parameters of interest, and the estimators/test statistic to be used, the absence of obviously explicit utilities and prior distributions has helped frequentist procedures to become widely viewed as 'objective'.
Formally, Bayesian inference is calibrated with reference to an explicitly stated utility, or loss function; the 'Bayes rule' is the one which maximizes expected utility, averaged over the posterior uncertainty. Formal Bayesian inference therefore automatically provides optimal decision
s in a decision theoretic
sense. Given assumptions, data and utility, Bayesian inference can be made for essentially any problem, although not every statistical inference need have a Bayesian interpretation. Analyses which are not formally Bayesian can be (logically) incoherent
; a feature of Bayesian procedures which use proper priors (i.e., those integrable to one) is that they are guaranteed to be coherent
. Some advocates of Bayesian inference
assert that inference must take place in this decision-theoretic framework, and that Bayesian inference
should not conclude with the evaluation and summarization of posterior beliefs.
and the theory of Kolmogorov complexity
. For example, the minimum description length
(MDL) principle selects statistical models that maximally compress the data; inference proceeds without assuming counterfactual or non-falsifiable 'data-generating mechanisms' or probability models
for the data, as might be done in frequentist or Bayesian approaches.
However, if a 'data generating mechanism' does exist in reality, then according to Shannon's source coding theorem it provides the MDL description of the data, on average and asymptotically. In minimizing description length (or descriptive complexity), MDL estimation is similar to maximum likelihood estimation and maximum a posteriori estimation (using maximum-entropy
Bayesian priors
). However, MDL avoids assuming that the underlying probability model is known; the MDL principle can also be applied without assumptions that e.g. the data arose from independent sampling. The MDL principle has been applied in communication-coding theory
in information theory
, in linear regression
, and in time-series analysis (particularly for chosing the degrees of the polynomials in Autoregressive moving average
(ARMA) models).
Information-theoretic statistical inference has been popular in data mining
, which has become a common approach for very large observational and heterogeneous datasets made possible by the computer revolution and internet
.
The evaluation of statistical inferential procedures often uses techniques or criteria from computational complexity theory
or numerical analysis
.
was an approach to statistical inference based on fiducial probability, also known as a "fiducial distribution". In subsequent work, this approach has been called ill-defined, extremely limited in applicability, and even fallacious. However this argument is the same as that which shows that a so-called confidence distribution
is not a valid probability distribution
and, since this has not invalidated the application of confidence interval
s, it does not necessarily invalidate conclusions drawn from fiducial arguments.
on group families
. Barnard reformulated the arguments behind fiducial inference on a restricted class of models on which "fiducial" procedures would be well-defined and useful.
Statistics
Statistics is the study of the collection, organization, analysis, and interpretation of data. It deals with all aspects of this, including the planning of data collection in terms of the design of surveys and experiments....
, statistical inference is the process of drawing conclusions from data that are subject to random variation, for example, observational errors or sampling variation. More substantially, the terms statistical inference, statistical induction and inferential statistics are used to describe systems of procedures that can be used to draw conclusions from datasets arising from systems affected by random variation. Initial requirements of such a system of procedures for inference
Inference
Inference is the act or process of deriving logical conclusions from premises known or assumed to be true. The conclusion drawn is also called an idiomatic. The laws of valid inference are studied in the field of logic.Human inference Inference is the act or process of deriving logical conclusions...
and induction
Inductive reasoning
Inductive reasoning, also known as induction or inductive logic, is a kind of reasoning that constructs or evaluates propositions that are abstractions of observations. It is commonly construed as a form of reasoning that makes generalizations based on individual instances...
are that the system should produce reasonable answers when applied to well-defined situations and that it should be general enough to be applied across a range of situations.
The outcome of statistical inference may be an answer to the question "what should be done next?", where this might be a decision about making further experiments or surveys, or about drawing a conclusion before implementing some organizational or governmental policy.
Scope
For the most part, statistical inference makes propositions about populations, using data drawn from the population of interest via some form of random sampling. More generally, data about a random process is obtained from its observed behavior during a finite period of time. Given a parameter or hypothesis about which one wishes to make inference, statistical inference most often uses:- a statistical modelStatistical modelA statistical model is a formalization of relationships between variables in the form of mathematical equations. A statistical model describes how one or more random variables are related to one or more random variables. The model is statistical as the variables are not deterministically but...
of the random process that is supposed to generate the data, and - a particular realization of the random process; i.e., a set of data.
The conclusion of a statistical inference is a statistical proposition
Proposition
In logic and philosophy, the term proposition refers to either the "content" or "meaning" of a meaningful declarative sentence or the pattern of symbols, marks, or sounds that make up a meaningful declarative sentence...
. Some common forms of statistical proposition are:
- an estimatePoint estimationIn statistics, point estimation involves the use of sample data to calculate a single value which is to serve as a "best guess" or "best estimate" of an unknown population parameter....
; i.e., a particular value that best approximates some parameter of interest, - a confidence intervalConfidence intervalIn statistics, a confidence interval is a particular kind of interval estimate of a population parameter and is used to indicate the reliability of an estimate. It is an observed interval , in principle different from sample to sample, that frequently includes the parameter of interest, if the...
(or set estimate); i.e., an interval constructed from the data in such a way that, under repeated sampling of datasets, such intervals would contain the true parameter value with the probabilityFrequency probabilityFrequency probability is the interpretation of probability that defines an event's probability as the limit of its relative frequency in a large number of trials. The development of the frequentist account was motivated by the problems and paradoxes of the previously dominant viewpoint, the...
at the stated confidence level, - a credible interval; i.e., a set of values containing, for example, 95% of posterior belief,
- rejection of a hypothesisStatistical hypothesis testingA statistical hypothesis test is a method of making decisions using data, whether from a controlled experiment or an observational study . In statistics, a result is called statistically significant if it is unlikely to have occurred by chance alone, according to a pre-determined threshold...
- clustering or classification of data points into groups
Comparison to descriptive statistics
Statistical inference is generally distinguished from descriptive statisticsDescriptive statistics
Descriptive statistics quantitatively describe the main features of a collection of data. Descriptive statistics are distinguished from inferential statistics , in that descriptive statistics aim to summarize a data set, rather than use the data to learn about the population that the data are...
. In simple terms, descriptive statistics can be thought of as being just a straightforward presentation of facts, in which modeling decisions made by a data analyst have had minimal influence. A complete statistical analysis will nearly always include both descriptive statistics and statistical inference, and will often progress in a series of steps where the emphasis moves gradually from description to inference.
Models/Assumptions
Any statistical inference requires some assumptions. A statistical model is a set of assumptions concerning the generation of the observed data and similar data. Descriptions of statistical models usually emphasize the role of population quantities of interest, about which we wish to draw inference.Degree of models/assumptions
Statisticians distinguish between three levels of modeling assumptions;- Fully parametric: The probability distributions describing the data-generation process are assumed to be fully described by a family of probability distributions involving only a finite number of unknown parameters. For example, one may assume that the distribution of population values is truly Normal, with unknown mean and variance, and that datasets are generated by 'simple' random samplingSimple random sampleIn statistics, a simple random sample is a subset of individuals chosen from a larger set . Each individual is chosen randomly and entirely by chance, such that each individual has the same probability of being chosen at any stage during the sampling process, and each subset of k individuals has...
. The family of generalized linear models is a widely used and flexible class of parametric models. - Non-parametric: The assumptions made about the process generating the data are much less than in parametric statistics and may be minimal. For example, every continuous probability distribution has a median, which may be estimated using the sample median or the Hodges-Lehmann-Sen estimatorHodges-Lehmann estimatorIn statistics, the Hodges–Lehmann estimator is a robust and nonparametric estimator of a population's location parameter, the "pseudo–median", which is closely related to the population median. The Hodges–Lehmann estimator is used not only for the pseudo–median of a single population but also for...
, which has good properties when the data arise from simple random sampling. - Semi-parametric: This term typically implies assumptions 'between' fully and non-parametric approaches. For example, one may assume that a population distribution have a finite mean. Furthermore, one may assume that the mean response level in the population depends in a truly linear manner on some covariate (a parametric assumption) but not make any parametric assumption describing the variance around that mean (i.e., about the presence or possible form of any heteroscedasticity). More generally, semi-parametric models can often be separated into 'structural' and 'random variation' components. One component is treated parametrically and the other non-parametrically. The well-known Cox model is a set of semi-parametric assumptions.
Importance of valid models/assumptions
Whatever level of assumption is made, correctly calibrated inference in general requires these assumptions to be correct; i.e., that the data-generating mechanisms really has been correctly specified.Incorrect assumptions of 'simple' random sampling
Simple random sample
In statistics, a simple random sample is a subset of individuals chosen from a larger set . Each individual is chosen randomly and entirely by chance, such that each individual has the same probability of being chosen at any stage during the sampling process, and each subset of k individuals has...
can invalidate statistical inference. More complex semi- and fully parametric assumptions are also cause for concern. For example, incorrectly assuming the Cox model can in some cases lead to faulty conclusions. Incorrect assumptions of Normality in the population also invalidates some forms of regression-based inference. The use of any parametric model is viewed skeptically by most experts in sampling human populations: "most sampling statisticians, when they deal with confidence intervals at all, limit themselves to statements about [estimators] based on very large samples, where the central limit theorem ensures that these [estimators] will have distributions that are nearly normal." In particular, a normal distribution "would be a totally unrealistic and catastrophically unwise assumption to make if we were dealing with any kind of economic population." Here, the central limit theorem states that the distribution of the sample mean "for very large samples" is approximately normally distributed, if the distribution is not heavy tailed.
Approximate distributions
Given the difficulty in specifying exact distributions of sample statistics, many methods have been developed for approximating these.With finite samples, approximation results
Approximation theory
In mathematics, approximation theory is concerned with how functions can best be approximated with simpler functions, and with quantitatively characterizing the errors introduced thereby...
measure how close a limiting distribution approaches the statistic's sample distribution: For example, with 10,000 independent samples the normal distribution
Central limit theorem
In probability theory, the central limit theorem states conditions under which the mean of a sufficiently large number of independent random variables, each with finite mean and variance, will be approximately normally distributed. The central limit theorem has a number of variants. In its common...
approximates (to two digits of accuracy) the distribution of the sample mean for many population distributions, by the Berry–Esseen theorem
Berry–Esséen theorem
The central limit theorem in probability theory and statistics states that under certain circumstances the sample mean, considered as a random quantity, becomes more normally distributed as the sample size is increased...
.
Yet for many practical purposes, the normal approximation provides a good approximation to the sample-mean's distribution when there are 10 (or more) independent samples, according to simulation studies, and statisticians' experience. Following Kolmogorov's work in the 1950s, advanced statistics uses approximation theory
Approximation theory
In mathematics, approximation theory is concerned with how functions can best be approximated with simpler functions, and with quantitatively characterizing the errors introduced thereby...
and functional analysis
Functional analysis
Functional analysis is a branch of mathematical analysis, the core of which is formed by the study of vector spaces endowed with some kind of limit-related structure and the linear operators acting upon these spaces and respecting these structures in a suitable sense...
to quantify the error of approximation. In this approach, the metric geometry of probability distribution
Probability distribution
In probability theory, a probability mass, probability density, or probability distribution is a function that describes the probability of a random variable taking certain values....
s is studied; this approach quantifies approximation error with, for example, the Kullback–Leibler distance, Bregman divergence
Bregman divergence
In mathematics, the Bregman divergence or Bregman distance is similar to a metric, but does not satisfy the triangle inequality nor symmetry. There are two ways in which Bregman divergences are important. Firstly, they generalize squared Euclidean distance to a class of distances that all share...
, and the Hellinger distance
Hellinger distance
In probability and statistics, the Hellinger distance is used to quantify the similarity between two probability distributions. It is a type of f-divergence...
.
With infinite samples, limiting results
Asymptotic theory (statistics)
In statistics, asymptotic theory, or large sample theory, is a generic framework for assessment of properties of estimators and statistical tests...
like the central limit theorem
Central limit theorem
In probability theory, the central limit theorem states conditions under which the mean of a sufficiently large number of independent random variables, each with finite mean and variance, will be approximately normally distributed. The central limit theorem has a number of variants. In its common...
describe the sample statistic's limiting distribution, if one exists. Limiting results are not statements about finite samples, and indeed are logically irrelevant
Relevance logic
Relevance logic, also called relevant logic, is a kind of non-classical logic requiring the antecedent and consequent of implications be relevantly related. They may be viewed as a family of substructural or modal logics...
to finite samples. However, the asymptotic theory of limiting distributions is often invoked for work in estimation and testing. For example, limiting results are often invoked to justify the generalized method of moments
Generalized method of moments
In econometrics, generalized method of moments is a generic method for estimating parameters in statistical models. Usually it is applied in the context of semiparametric models, where the parameter of interest is finite-dimensional, whereas the full shape of the distribution function of the data...
and the use of generalized estimating equations, which are popular in econometrics
Econometrics
Econometrics has been defined as "the application of mathematics and statistical methods to economic data" and described as the branch of economics "that aims to give empirical content to economic relations." More precisely, it is "the quantitative analysis of actual economic phenomena based on...
and biostatistics
Biostatistics
Biostatistics is the application of statistics to a wide range of topics in biology...
. The magnitude of the difference between the limiting distribution and the true distribution (formally, the 'error' of the approximation) can be assessed using simulation:. The use of limiting results in this way works well in many applications, especially with low-dimensional models
Statistical model
A statistical model is a formalization of relationships between variables in the form of mathematical equations. A statistical model describes how one or more random variables are related to one or more random variables. The model is statistical as the variables are not deterministically but...
with log-concave likelihood
Likelihood function
In statistics, a likelihood function is a function of the parameters of a statistical model, defined as follows: the likelihood of a set of parameter values given some observed outcomes is equal to the probability of those observed outcomes given those parameter values...
s (such as with one-parameter exponential families).
Randomization-based models
For a given dataset that was produced by a randomization design, the randomization distribution of a statistic (under the null-hypothesis) is defined by evaluating the test statistic for all of the plans that could have been generated by the randomization design. In frequentist inference, randomization allows inferences to be based on the randomization distribution rather than a subjective model, and this is important especially in survey sampling and design of experiments. Statistical inference from randomized studies is also more straightforward than many other situations. In Bayesian inferenceBayesian inference
In statistics, Bayesian inference is a method of statistical inference. It is often used in science and engineering to determine model parameters, make predictions about unknown variables, and to perform model selection...
, randomization is also of importance: in survey sampling
Survey sampling
In statistics, survey sampling describes the process of selecting a sample of elements from a target population in order to conduct a survey.A survey may refer to many different types or techniques of observation, but in the context of survey sampling it most often involves a questionnaire used to...
, use of sampling without replacement ensures the exchangeability of the sample with the population; in randomized experiments, randomization warrants a missing at random assumption for covariate
Covariate
In statistics, a covariate is a variable that is possibly predictive of the outcome under study. A covariate may be of direct interest or it may be a confounding or interacting variable....
information.
Objective randomization allows properly inductive procedures.
Many statisticians prefer randomization-based analysis of data that was generated by well-defined randomization procedures. (However, it is true that in fields of science with developed theoretical knowledge and experimental control, randomized experiments may increase the costs of experimentation without improving the quality of inferences.)
Similarly, results from randomized experiment
Randomized experiment
In science, randomized experiments are the experiments that allow the greatest reliability and validity of statistical estimates of treatment effects...
s are recommended by leading statistical authorities as allowing inferences with greater reliability than do observational studies of the same phenomena.
However, a good observational study may be better than a bad randomized experiment.
The statistical analysis of a randomized experiment may be based on the randomization scheme stated in the experimental protocol and does not need a subjective model.
However, not all hypotheses can be tested by randomized experiments or random samples, which often require a large budget, a lot of expertise and time, and may have ethical problems.
Model-based analysis of randomized experiments
It is standard practice to refer to a statistical model, often a normal linear model, when analyzing data from randomized experiments. However, the randomization scheme guides the choice of a statistical model. It is not possible to choose an appropriate model without knowing the randomization scheme. Seriously misleading results can be obtained analyzing data from randomized experiments while ignoring the experimental protocol; common mistakes include forgetting the blocking used in an experiment and confusing repeated measurements on the same experimental unit with independent replicates of the treatment applied to different experimental units.Modes of inference
Different schools of statistical inference have become established. These schools (or 'paradigms') are not mutually exclusive, and methods which work well under one paradigm often have attractive interpretations under other paradigms. The two main paradigms in use are frequentistFrequentist inference
Frequentist inference is one of a number of possible ways of formulating generally applicable schemes for making statistical inferences: that is, for drawing conclusions from statistical samples. An alternative name is frequentist statistics...
and Bayesian inference
Bayesian inference
In statistics, Bayesian inference is a method of statistical inference. It is often used in science and engineering to determine model parameters, make predictions about unknown variables, and to perform model selection...
, which are both summarized below.
Frequentist inference
This paradigm calibrates the production of propositions by considering (notional) repeated sampling of datasets similar to the one at hand. By considering its characteristics under repeated sample, the frequentist properties of any statistical inference procedure can be described — although in practice this quantification may be challenging.Frequentist inference, objectivity, and decision theory
Frequentist inference calibrates procedures, such as tests of hypothesis and constructions of confidence intervals, in terms of frequency probabilityFrequency probability
Frequency probability is the interpretation of probability that defines an event's probability as the limit of its relative frequency in a large number of trials. The development of the frequentist account was motivated by the problems and paradoxes of the previously dominant viewpoint, the...
; that is, in terms of repeated sampling from a population. (In contrast, Bayesian inference calibrates procedures with regard to epistemological uncertainty
Bayesian probability
Bayesian probability is one of the different interpretations of the concept of probability and belongs to the category of evidential probabilities. The Bayesian interpretation of probability can be seen as an extension of logic that enables reasoning with propositions, whose truth or falsity is...
, described as a probability measure)
The frequentist calibration of procedures can be done without regard to utility functions. However, some elements of frequentist statistics, such as statistical decision theory, do incorporate utility functions. In particular, frequentist developments of optimal inference (such as minimum-variance unbiased estimator
Minimum-variance unbiased estimator
In statistics a uniformly minimum-variance unbiased estimator or minimum-variance unbiased estimator is an unbiased estimator that has lower variance than any other unbiased estimator for all possible values of the parameter.The question of determining the UMVUE, if one exists, for a particular...
s, or uniformly most powerful test
Uniformly most powerful test
In statistical hypothesis testing, a uniformly most powerful test is a hypothesis test which has the greatest power 1 − β among all possible tests of a given size α...
ing) make use of loss function
Loss function
In statistics and decision theory a loss function is a function that maps an event onto a real number intuitively representing some "cost" associated with the event. Typically it is used for parameter estimation, and the event in question is some function of the difference between estimated and...
s, which play the role of (negative) utility functions. Loss functions must be explicitly stated for statistical theorists to prove that a statistical procedure has an optimality property. For example, median-unbiased estimators are optimal under absolute value
Absolute value
In mathematics, the absolute value |a| of a real number a is the numerical value of a without regard to its sign. So, for example, the absolute value of 3 is 3, and the absolute value of -3 is also 3...
loss functions, and least squares
Least squares
The method of least squares is a standard approach to the approximate solution of overdetermined systems, i.e., sets of equations in which there are more equations than unknowns. "Least squares" means that the overall solution minimizes the sum of the squares of the errors made in solving every...
estimators are optimal under squared error loss functions.
While statisticians using frequentist inference must choose for themselves the parameters of interest, and the estimators/test statistic to be used, the absence of obviously explicit utilities and prior distributions has helped frequentist procedures to become widely viewed as 'objective'.
Bayesian inference
The Bayesian calculus describes degrees of belief using the 'language' of probability; beliefs are positive, integrate to one, and obey probability axioms. Bayesian inference uses the available posterior beliefs as the basis for making statistical propositions. There are several different justifications for using the Bayesian approach.Examples of Bayesian inference
- Credible intervals for interval estimationInterval estimationIn statistics, interval estimation is the use of sample data to calculate an interval of possible values of an unknown population parameter, in contrast to point estimation, which is a single number. Neyman identified interval estimation as distinct from point estimation...
- Bayes factorBayes factorIn statistics, the use of Bayes factors is a Bayesian alternative to classical hypothesis testing. Bayesian model comparison is a method of model selection based on Bayes factors.-Definition:...
s for model comparison
Bayesian inference, subjectivity and decision theory
Many informal Bayesian inferences are based on "intuitively reasonable" summaries of the posterior. For example, the posterior mean, median and mode, highest posterior density intervals, and Bayes Factors can all be motivated in this way. While a user's utility function need not be stated for this sort of inference, these summaries do all depend (to some extent) on stated prior beliefs, and are generally viewed as subjective conclusions. (Methods of prior construction which do not require external input have been proposed but not yet fully developed.)Formally, Bayesian inference is calibrated with reference to an explicitly stated utility, or loss function; the 'Bayes rule' is the one which maximizes expected utility, averaged over the posterior uncertainty. Formal Bayesian inference therefore automatically provides optimal decision
Optimal decision
An optimal decision is a decision such that no other available decision options will lead to a better outcome. It is an important concept in decision theory. In order to compare the different decision outcomes, one commonly assigns a relative utility to each of them...
s in a decision theoretic
Decision theory
Decision theory in economics, psychology, philosophy, mathematics, and statistics is concerned with identifying the values, uncertainties and other issues relevant in a given decision, its rationality, and the resulting optimal decision...
sense. Given assumptions, data and utility, Bayesian inference can be made for essentially any problem, although not every statistical inference need have a Bayesian interpretation. Analyses which are not formally Bayesian can be (logically) incoherent
Coherence (statistics)
In probability theory and statistics, coherence can have two meanings.*When dealing with personal probability assessments, or supposed probabilities derived in nonstandard ways, it is a property of self-consistency across a whole set of such assessments...
; a feature of Bayesian procedures which use proper priors (i.e., those integrable to one) is that they are guaranteed to be coherent
Coherence (statistics)
In probability theory and statistics, coherence can have two meanings.*When dealing with personal probability assessments, or supposed probabilities derived in nonstandard ways, it is a property of self-consistency across a whole set of such assessments...
. Some advocates of Bayesian inference
Bayesian inference
In statistics, Bayesian inference is a method of statistical inference. It is often used in science and engineering to determine model parameters, make predictions about unknown variables, and to perform model selection...
assert that inference must take place in this decision-theoretic framework, and that Bayesian inference
Bayesian inference
In statistics, Bayesian inference is a method of statistical inference. It is often used in science and engineering to determine model parameters, make predictions about unknown variables, and to perform model selection...
should not conclude with the evaluation and summarization of posterior beliefs.
Information and computational complexity
Other forms of statistical inference have been developed from ideas in information theoryInformation theory
Information theory is a branch of applied mathematics and electrical engineering involving the quantification of information. Information theory was developed by Claude E. Shannon to find fundamental limits on signal processing operations such as compressing data and on reliably storing and...
and the theory of Kolmogorov complexity
Kolmogorov complexity
In algorithmic information theory , the Kolmogorov complexity of an object, such as a piece of text, is a measure of the computational resources needed to specify the object...
. For example, the minimum description length
Minimum description length
The minimum description length principle is a formalization of Occam's Razor in which the best hypothesis for a given set of data is the one that leads to the best compression of the data. MDL was introduced by Jorma Rissanen in 1978...
(MDL) principle selects statistical models that maximally compress the data; inference proceeds without assuming counterfactual or non-falsifiable 'data-generating mechanisms' or probability models
Probability distribution
In probability theory, a probability mass, probability density, or probability distribution is a function that describes the probability of a random variable taking certain values....
for the data, as might be done in frequentist or Bayesian approaches.
However, if a 'data generating mechanism' does exist in reality, then according to Shannon's source coding theorem it provides the MDL description of the data, on average and asymptotically. In minimizing description length (or descriptive complexity), MDL estimation is similar to maximum likelihood estimation and maximum a posteriori estimation (using maximum-entropy
Maximum entropy probability distribution
In statistics and information theory, a maximum entropy probability distribution is a probability distribution whose entropy is at least as great as that of all other members of a specified class of distributions....
Bayesian priors
Bayesian probability
Bayesian probability is one of the different interpretations of the concept of probability and belongs to the category of evidential probabilities. The Bayesian interpretation of probability can be seen as an extension of logic that enables reasoning with propositions, whose truth or falsity is...
). However, MDL avoids assuming that the underlying probability model is known; the MDL principle can also be applied without assumptions that e.g. the data arose from independent sampling. The MDL principle has been applied in communication-coding theory
Coding theory
Coding theory is the study of the properties of codes and their fitness for a specific application. Codes are used for data compression, cryptography, error-correction and more recently also for network coding...
in information theory
Information theory
Information theory is a branch of applied mathematics and electrical engineering involving the quantification of information. Information theory was developed by Claude E. Shannon to find fundamental limits on signal processing operations such as compressing data and on reliably storing and...
, in linear regression
Linear regression
In statistics, linear regression is an approach to modeling the relationship between a scalar variable y and one or more explanatory variables denoted X. The case of one explanatory variable is called simple regression...
, and in time-series analysis (particularly for chosing the degrees of the polynomials in Autoregressive moving average
Autoregressive moving average model
In statistics and signal processing, autoregressive–moving-average models, sometimes called Box–Jenkins models after the iterative Box–Jenkins methodology usually used to estimate them, are typically applied to autocorrelated time series data.Given a time series of data Xt, the ARMA model is a...
(ARMA) models).
Information-theoretic statistical inference has been popular in data mining
Data mining
Data mining , a relatively young and interdisciplinary field of computer science is the process of discovering new patterns from large data sets involving methods at the intersection of artificial intelligence, machine learning, statistics and database systems...
, which has become a common approach for very large observational and heterogeneous datasets made possible by the computer revolution and internet
Internet
The Internet is a global system of interconnected computer networks that use the standard Internet protocol suite to serve billions of users worldwide...
.
The evaluation of statistical inferential procedures often uses techniques or criteria from computational complexity theory
Computational complexity theory
Computational complexity theory is a branch of the theory of computation in theoretical computer science and mathematics that focuses on classifying computational problems according to their inherent difficulty, and relating those classes to each other...
or numerical analysis
Numerical analysis
Numerical analysis is the study of algorithms that use numerical approximation for the problems of mathematical analysis ....
.
Fiducial inference
Fiducial inferenceFiducial inference
Fiducial inference is one of a number of different types of statistical inference. These are rules, intended for general application, by which conclusions can be drawn from samples of data. In modern statistical practice, attempts to work with fiducial inference have fallen out of fashion in...
was an approach to statistical inference based on fiducial probability, also known as a "fiducial distribution". In subsequent work, this approach has been called ill-defined, extremely limited in applicability, and even fallacious. However this argument is the same as that which shows that a so-called confidence distribution
Confidence distribution
In statistics, the concept of a confidence distribution has often been loosely referred to as a distribution function on the parameter space that can represent confidence intervals of all levels for a parameter of interest...
is not a valid probability distribution
Probability distribution
In probability theory, a probability mass, probability density, or probability distribution is a function that describes the probability of a random variable taking certain values....
and, since this has not invalidated the application of confidence interval
Confidence interval
In statistics, a confidence interval is a particular kind of interval estimate of a population parameter and is used to indicate the reliability of an estimate. It is an observed interval , in principle different from sample to sample, that frequently includes the parameter of interest, if the...
s, it does not necessarily invalidate conclusions drawn from fiducial arguments.
Structural inference
Developing ideas of Fisher and of Pitman from 1938 to 1939, George A. Barnard developed "structural inference" or "pivotal inference", an approach using invariant probabilitiesHaar measure
In mathematical analysis, the Haar measure is a way to assign an "invariant volume" to subsets of locally compact topological groups and subsequently define an integral for functions on those groups....
on group families
Group family
In probability theory, especially as that field is used in statistics, a group family of probability distributions is a family obtained by subjecting a random variable with a fixed distribution to a suitable family of transformations such as a location-scale family, or otherwise a family of...
. Barnard reformulated the arguments behind fiducial inference on a restricted class of models on which "fiducial" procedures would be well-defined and useful.
Inference topics
The topics below are usually included in the area of statistical inference.- Statistical assumptionsStatistical assumptionsStatistical assumptions are general assumptions about statistical populations.Statistics, like all mathematical disciplines, does not generate valid conclusions from nothing. In order to generate interesting conclusions about real statistical populations, it is usually required to make some...
- Statistical decision theory
- Estimation theoryEstimation theoryEstimation theory is a branch of statistics and signal processing that deals with estimating the values of parameters based on measured/empirical data that has a random component. The parameters describe an underlying physical setting in such a way that their value affects the distribution of the...
- Statistical hypothesis testingStatistical hypothesis testingA statistical hypothesis test is a method of making decisions using data, whether from a controlled experiment or an observational study . In statistics, a result is called statistically significant if it is unlikely to have occurred by chance alone, according to a pre-determined threshold...
- Revising opinions in statistics
- Design of experimentsDesign of experimentsIn general usage, design of experiments or experimental design is the design of any information-gathering exercises where variation is present, whether under the full control of the experimenter or not. However, in statistics, these terms are usually used for controlled experiments...
, the analysis of varianceAnalysis of varianceIn statistics, analysis of variance is a collection of statistical models, and their associated procedures, in which the observed variance in a particular variable is partitioned into components attributable to different sources of variation...
, and regressionRegression analysisIn statistics, regression analysis includes many techniques for modeling and analyzing several variables, when the focus is on the relationship between a dependent variable and one or more independent variables... - Survey samplingSurvey samplingIn statistics, survey sampling describes the process of selecting a sample of elements from a target population in order to conduct a survey.A survey may refer to many different types or techniques of observation, but in the context of survey sampling it most often involves a questionnaire used to...
- Summarizing statistical data
See also
- Predictive inferencePredictive inferencePredictive inference is an approach to statistical inference that emphasizes the prediction of future observations based on past observations.Initially, predictive inference was based on observable parameters and it was the main purpose of studying probability, but it fell out of favor in the 20th...
- Induction (philosophy)
- Philosophy of statisticsPhilosophy of statisticsThe philosophy of statistics involves the meaning, justification, utility, use and abuse of statistics and its methodology, and ethical and epistemological issues involved in the consideration of choice and interpretation of data and methods of Statistics....
- Algorithmic inferenceAlgorithmic inferenceAlgorithmic inference gathers new developments in the statistical inference methods made feasible by the powerful computing devices widely available to any data analyst...
Further reading
- Casella, G., Berger, R.L. (2001). Statistical Inference. Duxbury Press. ISBN 0534243126
- David A. Freedman. "Statistical Models and Shoe Leather" (1991). Sociological Methodology, vol. 21, pp. 291–313.
- David A. Freedman. Statistical Models and Causal Inferences: A Dialogue with the Social Sciences. 2010. Edited by David Collier, Jasjeet S. Sekhon, and Philip B. Stark. Cambridge University Press.
- Lenhard, Johannes (2006). "Models and Statistical Inference: The Controversy between Fisher and Neyman—Pearson," British Journal for the Philosophy of Science, Vol. 57 Issue 1, pp. 69–91.
- Lindley, D. (1958). "Fiducial distribution and Bayes' theorem", Journal of the Royal Statistical SocietyJournal of the Royal Statistical SocietyThe Journal of the Royal Statistical Society is a series of three peer-reviewed statistics journals published by Blackwell Publishing for the London-based Royal Statistical Society.- History :...
, Series B, 20, 102–7 - Sudderth, William D. (1994). "Coherent Inference and Prediction in Statistics," in Dag PrawitzDag PrawitzDag Prawitz is a Swedish philosopher and logician. He is best known for his work on proof theory and the foundations of natural deduction....
, Bryan Skyrms, and Westerstahl (eds.), Logic, Methodology and Philosophy of Science IX: Proceedings of the Ninth International Congress of Logic, Methodology and Philosophy of Science, UppsalaUppsala- Economy :Today Uppsala is well established in medical research and recognized for its leading position in biotechnology.*Abbott Medical Optics *GE Healthcare*Pfizer *Phadia, an offshoot of Pharmacia*Fresenius*Q-Med...
, Sweden, August 7–14, 1991, Amsterdam: Elsevier. - Trusted, Jennifer (1979). The Logic of Scientific Inference: An Introduction, London: The Macmillan Press, Ltd.
- Young, G.A., Smith, R.L. (2005) Essentials of Statistical Inference, CUP. ISBN 0-521-83971-8
External links
- MIT OpenCourseWare: Statistical Inference