List of basic statistics topics - AbsoluteAstronomy.com

The following outline is provided as an overview and guide to the variety of topics included within the subject of statistics:

Statistics
Statistics
Statistics is the study of the collection, organization, analysis, and interpretation of data. It deals with all aspects of this, including the planning of data collection in terms of the design of surveys and experiments....

pertains to the collection, analysis, interpretation, and presentation of data

Data

The term data refers to qualitative or quantitative attributes of a variable or set of variables. Data are typically the results of measurements and can be the basis of graphs, images, or observations of a set of variables. Data are often viewed as the lowest level of abstraction from which...

. It is applicable to a wide variety of academic disciplines, from the physical and social science

Science

Science is a systematic enterprise that builds and organizes knowledge in the form of testable explanations and predictions about the universe...

s to the humanities

Humanities

The humanities are academic disciplines that study the human condition, using methods that are primarily analytical, critical, or speculative, as distinguished from the mainly empirical approaches of the natural sciences....

; it is also used and misused

Misuse of statistics

A misuse of statistics occurs when a statistical argument asserts a falsehood. In some cases, the misuse may be accidental. In others, it is purposeful and for the gain of the perpetrator. When the statistical reason involved is false or misapplied, this constitutes a statistical fallacy.The false...

for making informed decisions in all areas of business

Business

A business is an organization engaged in the trade of goods, services, or both to consumers. Businesses are predominant in capitalist economies, where most of them are privately owned and administered to earn profit to increase the wealth of their owners. Businesses may also be not-for-profit...

and government

Government

Government refers to the legislators, administrators, and arbitrators in the administrative bureaucracy who control a state at a given time, and to the system of government by which they are organized...

Nature of statistics

Statistics can be described as all of the following:

Academic discipline
Academic discipline
An academic discipline, or field of study, is a branch of knowledge that is taught and researched at the college or university level. Disciplines are defined , and recognized by the academic journals in which research is published, and the learned societies and academic departments or faculties to...

: one with academic departments, curricula and degrees; national and international societies; and specialized journals.
Scientific field (a branch of science
Science
Science is a systematic enterprise that builds and organizes knowledge in the form of testable explanations and predictions about the universe...

) – widely-recognized category of specialized expertise within science, and typically embodies its own terminology and nomenclature. Such a field will usually be represented by one or more scientific journals, where peer reviewed research is published.
- Formal science
  Formal science
  The formal sciences are the branches of knowledge that are concerned with formal systems, such as logic, mathematics, theoretical computer science, information theory, systems theory, decision theory, statistics, and some aspects of linguistics....
  
  – branch of knowledge concerned with formal systems.
  - Mathematical science
    Mathematical sciences
    Mathematical sciences is a broad term that refers to those academic disciplines that are primarily mathematical in nature but may not be universally considered subfields of mathematics proper...
    
    – field of science that is primarily mathematical in nature but may not be universally considered subfields of mathematics proper. Statistics, for example, is mathematical in its methods but grew out of political arithmetic which merged with inverse probability and grew through applications in the social sciences and some areas of physics and biometrics to become its own separate, though closely allied, field.

History of statistics

History of statistics
History of statistics
The history of statistics can be said to start around 1749 although, over time, there have been changes to the interpretation of what the word statistics means. In early times, the meaning was restricted to information about states...
Founders of statistics
Founders of statistics
Statistics is the theory and application of mathematics to the scientific method including hypothesis generation, experimental design, sampling, data collection, data summarization, estimation, prediction and inference from those results to the population from which the experimental sample was drawn...
History of probability
History of probability
Probability has a dual aspect: on the one hand the probability or likelihood of hypotheses given the evidence for them, and on the other hand the behavior of stochastic processes such as the throwing of dice or coins...
Timeline of probability and statistics
Timeline of probability and statistics
A timeline of probability and statistics-Before 1600:* 9th Century - Al-Kindi was the first to use statistics to decipher encrypted messages and developed the first code breaking algorithm in the House of Wisdom in Baghdad, based on frequency analysis...

Describing data

Descriptive statistics
Descriptive statistics
Descriptive statistics quantitatively describe the main features of a collection of data. Descriptive statistics are distinguished from inferential statistics , in that descriptive statistics aim to summarize a data set, rather than use the data to learn about the population that the data are...
Average
Average
In mathematics, an average, or central tendency of a data set is a measure of the "middle" value of the data set. Average is one form of central tendency. Not all central tendencies should be considered definitions of average....
- Mean
  Mean
  In statistics, mean has two related meanings:* the arithmetic mean .* the expected value of a random variable, which is also called the population mean....
- Median
  Median
  In probability theory and statistics, a median is described as the numerical value separating the higher half of a sample, a population, or a probability distribution, from the lower half. The median of a finite list of numbers can be found by arranging all the observations from lowest value to...
- Mode
  Mode (statistics)
  In statistics, the mode is the value that occurs most frequently in a data set or a probability distribution. In some fields, notably education, sample data are often called scores, and the sample mode is known as the modal score....
Measures of scale
Statistical dispersion
In statistics, statistical dispersion is variability or spread in a variable or a probability distribution...
- Variance
  Variance
  In probability theory and statistics, the variance is a measure of how far a set of numbers is spread out. It is one of several descriptors of a probability distribution, describing how far the numbers lie from the mean . In particular, the variance is one of the moments of a distribution...
- Standard deviation
  Standard deviation
  Standard deviation is a widely used measure of variability or diversity used in statistics and probability theory. It shows how much variation or "dispersion" there is from the average...
- Median absolute deviation
  Median absolute deviation
  In statistics, the median absolute deviation is a robust measure of the variability of a univariate sample of quantitative data. It can also refer to the population parameter that is estimated by the MAD calculated from a sample....
Correlation
Correlation
In statistics, dependence refers to any statistical relationship between two random variables or two sets of data. Correlation refers to any of a broad class of statistical relationships involving dependence....
Outlier
Outlier
In statistics, an outlier is an observation that is numerically distant from the rest of the data. Grubbs defined an outlier as: An outlying observation, or outlier, is one that appears to deviate markedly from other members of the sample in which it occurs....
Statistical graphics
Statistical graphics
Statistical graphics, also known as graphical techniques, are information graphics in the field of statistics used to visualize quantitative data.- Overview :...
- Histogram
  Histogram
  In statistics, a histogram is a graphical representation showing a visual impression of the distribution of data. It is an estimate of the probability distribution of a continuous variable and was first introduced by Karl Pearson...
- Frequency distribution
  Frequency distribution
  In statistics, a frequency distribution is an arrangement of the values that one or more variables take in a sample. Each entry in the table contains the frequency or count of the occurrences of values within a particular group or interval, and in this way, the table summarizes the distribution of...
- Quantile
  Quantile
  Quantiles are points taken at regular intervals from the cumulative distribution function of a random variable. Dividing ordered data into q essentially equal-sized data subsets is the motivation for q-quantiles; the quantiles are the data values marking the boundaries between consecutive subsets...
- Survival function
  Survival function
  The survival function, also known as a survivor function or reliability function, is a property of any random variable that maps a set of events, usually associated with mortality or failure of some system, onto time. It captures the probability that the system will survive beyond a specified time...
- Failure rate
  Failure rate
  Failure rate is the frequency with which an engineered system or component fails, expressed for example in failures per hour. It is often denoted by the Greek letter λ and is important in reliability engineering....

Experiments and surveys

Design of experiments
Design of experiments
In general usage, design of experiments or experimental design is the design of any information-gathering exercises where variation is present, whether under the full control of the experimenter or not. However, in statistics, these terms are usually used for controlled experiments...
- Optimal design
  Optimal design
  Optimal designs are a class of experimental designs that are optimal with respect to some statistical criterion.In the design of experiments for estimating statistical models, optimal designs allow parameters to be estimated without bias and with minimum-variance...
- factorial experiment
  Factorial experiment
  In statistics, a full factorial experiment is an experiment whose design consists of two or more factors, each with discrete possible values or "levels", and whose experimental units take on all possible combinations of these levels across all such factors. A full factorial design may also be...
- Restricted randomization
  Restricted randomization
  Many processes have more than one source of variation in them. In order to reduce variation in processes, these multiple sources must be understood, and that often leads to the concept of nested or hierarchical data structures. For example, in the semiconductor industry, a batch process may operate...
- Repeated measures design
  Repeated measures design
  The repeated measures design uses the same subjects with every condition of the research, including the control. For instance, repeated measures are collected in a longitudinal study in which change over time is assessed. Other studies compare the same measure under two or more different conditions...
- Randomized block design
  Randomized block design
  In the statistical theory of the design of experiments, blocking is the arranging of experimental units in groups that are similar to one another. Typically, a blocking factor is a source of variability that is not of primary interest to the experimenter...
Statistical survey
Statistical survey
Survey methodology is the field that studies surveys, that is, the sample of individuals from a population with a view towards making statistical inferences about the population using the sample. Polls about public opinion, such as political beliefs, are reported in the news media in democracies....
Opinion poll
Opinion poll
An opinion poll, sometimes simply referred to as a poll is a survey of public opinion from a particular sample. Opinion polls are usually designed to represent the opinions of a population by conducting a series of questions and then extrapolating generalities in ratio or within confidence...

Sampling

Sampling distribution
Sampling distribution
In statistics, a sampling distribution or finite-sample distribution is the probability distribution of a given statistic based on a random sample. Sampling distributions are important in statistics because they provide a major simplification on the route to statistical inference...
sampling
Sampling (statistics)
In statistics and survey methodology, sampling is concerned with the selection of a subset of individuals from within a population to estimate characteristics of the whole population....
- Stratified sampling
  Stratified sampling
  In statistics, stratified sampling is a method of sampling from a population.In statistical surveys, when subpopulations within an overall population vary, it is advantageous to sample each subpopulation independently. Stratification is the process of dividing members of the population into...
- Quota sampling
  Quota sampling
  Quota sampling is a method for selecting survey participants. In quota sampling, a population is first segmented into mutually exclusive sub-groups, just as in stratified sampling. Then judgment is used to select the subjects or units from each segment based on a specified proportion. For example,...
Biased sample
Biased sample
In statistics, sampling bias is when a sample is collected in such a way that some members of the intended population are less likely to be included than others. It results in a biased sample, a non-random sample of a population in which all individuals, or instances, were not equally likely to...
- Spectrum bias
  Spectrum bias
  Initially identified in 1978, spectrum bias refers to the phenomenon that the performance of a diagnostic test may change between different clinical settings owing to changes in the patient case-mix thereby affecting the transferability of study results in clinical practice...
- Survivorship bias
  Survivorship bias
  Survivorship bias is the logical error of concentrating on the people or things that "survived" some process and inadvertently overlooking those that didn't because of their lack of visibility. This can lead to false conclusions in several different ways...

Analysing data

Regression analysis
Regression analysis
In statistics, regression analysis includes many techniques for modeling and analyzing several variables, when the focus is on the relationship between a dependent variable and one or more independent variables...
- Outline of regression analysis
  Outline of regression analysis
  In statistics, regression analysis includes any technique for learning about the relationship between one or more dependent variables Y and one or more independent variables X....
- Analysis of variance
  Analysis of variance
  In statistics, analysis of variance is a collection of statistical models, and their associated procedures, in which the observed variance in a particular variable is partitioned into components attributable to different sources of variation...
  
  (ANOVA)
- General linear model
  General linear model
  The general linear model is a statistical linear model.It may be written aswhere Y is a matrix with series of multivariate measurements, X is a matrix that might be a design matrix, B is a matrix containing parameters that are usually to be estimated and U is a matrix containing errors or...
- Generalized linear model
  Generalized linear model
  In statistics, the generalized linear model is a flexible generalization of ordinary linear regression. The GLM generalizes linear regression by allowing the linear model to be related to the response variable via a link function and by allowing the magnitude of the variance of each measurement to...
Density estimation
Density estimation
In probability and statistics,density estimation is the construction of an estimate, based on observed data, of an unobservable underlying probability density function...
- Kernel density estimation
  Kernel density estimation
  In statistics, kernel density estimation is a non-parametric way of estimating the probability density function of a random variable. Kernel density estimation is a fundamental data smoothing problem where inferences about the population are made, based on a finite data sample...
- Multivariate kernel density estimation
  Multivariate kernel density estimation
  Kernel density estimation is a nonparametric technique for density estimation i.e., estimation of probability density functions, which is one of the fundamental questions in statistics. It can be viewed as a generalisation of histogram density estimation with improved statistical properties...
Time series
Time series
In statistics, signal processing, econometrics and mathematical finance, a time series is a sequence of data points, measured typically at successive times spaced at uniform time intervals. Examples of time series are the daily closing value of the Dow Jones index or the annual flow volume of the...
- Time series analysis
- Box–Jenkins
- Frequency domain
  Frequency domain
  In electronics, control systems engineering, and statistics, frequency domain is a term used to describe the domain for analysis of mathematical functions or signals with respect to frequency, rather than time....
- Time domain
  Time domain
  Time domain is a term used to describe the analysis of mathematical functions, physical signals or time series of economic or environmental data, with respect to time. In the time domain, the signal or function's value is known for all real numbers, for the case of continuous time, or at various...
Multivariate analysis
Multivariate analysis
Multivariate analysis is based on the statistical principle of multivariate statistics, which involves observation and analysis of more than one statistical variable at a time...
- Principal component analysis (PCA)
- Factor analysis
  Factor analysis
  Factor analysis is a statistical method used to describe variability among observed, correlated variables in terms of a potentially lower number of unobserved, uncorrelated variables called factors. In other words, it is possible, for example, that variations in three or four observed variables...
- Cluster analysis
Robust statistics
Robust statistics
Robust statistics provides an alternative approach to classical statistical methods. The motivation is to produce estimators that are not unduly affected by small departures from model assumptions.- Introduction :...

Filtering data

Recursive Bayesian estimation
Recursive Bayesian estimation
Recursive Bayesian estimation, also known as a Bayes filter, is a general probabilistic approach for estimating an unknown probability density function recursively over time using incoming measurements and a mathematical process model.-In robotics:...
- Kalman filter
  Kalman filter
  In statistics, the Kalman filter is a mathematical method named after Rudolf E. Kálmán. Its purpose is to use measurements observed over time, containing noise and other inaccuracies, and produce values that tend to be closer to the true values of the measurements and their associated calculated...
- Particle filter
  Particle filter
  In statistics, particle filters, also known as Sequential Monte Carlo methods , are sophisticated model estimation techniques based on simulation...
Moving average

Statistical inference

Statistical inference
Statistical inference
In statistics, statistical inference is the process of drawing conclusions from data that are subject to random variation, for example, observational errors or sampling variation...
Mathematical statistics
Mathematical statistics
Mathematical statistics is the study of statistics from a mathematical standpoint, using probability theory as well as other branches of mathematics such as linear algebra and analysis...
- Likelihood function
  Likelihood function
  In statistics, a likelihood function is a function of the parameters of a statistical model, defined as follows: the likelihood of a set of parameter values given some observed outcomes is equal to the probability of those observed outcomes given those parameter values...
- Exponential family
  Exponential family
  In probability and statistics, an exponential family is an important class of probability distributions sharing a certain form, specified below. This special form is chosen for mathematical convenience, on account of some useful algebraic properties, as well as for generality, as exponential...
Bayesian inference
Bayesian inference
In statistics, Bayesian inference is a method of statistical inference. It is often used in science and engineering to determine model parameters, make predictions about unknown variables, and to perform model selection...
- Bayes' theorem
  Bayes' theorem
  In probability theory and applications, Bayes' theorem relates the conditional probabilities P and P. It is commonly used in science and engineering. The theorem is named for Thomas Bayes ....
- Bayes estimator
  Bayes estimator
  In estimation theory and decision theory, a Bayes estimator or a Bayes action is an estimator or decision rule that minimizes the posterior expected value of a loss function . Equivalently, it maximizes the posterior expectation of a utility function...
- Prior distribution
- Posterior distribution
- Conjugate prior
  Conjugate prior
  In Bayesian probability theory, if the posterior distributions p are in the same family as the prior probability distribution p, the prior and posterior are then called conjugate distributions, and the prior is called a conjugate prior for the likelihood...
Frequentist inference
Frequentist inference
Frequentist inference is one of a number of possible ways of formulating generally applicable schemes for making statistical inferences: that is, for drawing conclusions from statistical samples. An alternative name is frequentist statistics...
- Statistical hypothesis testing
  Statistical hypothesis testing
  A statistical hypothesis test is a method of making decisions using data, whether from a controlled experiment or an observational study . In statistics, a result is called statistically significant if it is unlikely to have occurred by chance alone, according to a pre-determined threshold...
  - Null hypothesis
    Null hypothesis
    The practice of science involves formulating and testing hypotheses, assertions that are capable of being proven false using a test of observed data. The null hypothesis typically corresponds to a general or default position...
  - Alternative hypothesis
  - P-value
    P-value
    In statistical significance testing, the p-value is the probability of obtaining a test statistic at least as extreme as the one that was actually observed, assuming that the null hypothesis is true. One often "rejects the null hypothesis" when the p-value is less than the significance level α ,...
  - Significance level
  - Statistical power
    Statistical power
    The power of a statistical test is the probability that the test will reject the null hypothesis when the null hypothesis is actually false . The power is in general a function of the possible distributions, often determined by a parameter, under the alternative hypothesis...
- Likelihood-ratio test
  Likelihood-ratio test
  In statistics, a likelihood ratio test is a statistical test used to compare the fit of two models, one of which is a special case of the other . The test is based on the likelihood ratio, which expresses how many times more likely the data are under one model than the other...
- Confidence interval
  Confidence interval
  In statistics, a confidence interval is a particular kind of interval estimate of a population parameter and is used to indicate the reliability of an estimate. It is an observed interval , in principle different from sample to sample, that frequently includes the parameter of interest, if the...
Decision theory
Decision theory
Decision theory in economics, psychology, philosophy, mathematics, and statistics is concerned with identifying the values, uncertainties and other issues relevant in a given decision, its rationality, and the resulting optimal decision...
- Optimal decision
  Optimal decision
  An optimal decision is a decision such that no other available decision options will lead to a better outcome. It is an important concept in decision theory. In order to compare the different decision outcomes, one commonly assigns a relative utility to each of them...
- Type I and type II errors
  Type I and type II errors
  In statistical test theory the notion of statistical error is an integral part of hypothesis testing. The test requires an unambiguous statement of a null hypothesis, which usually corresponds to a default "state of nature", for example "this person is healthy", "this accused is not guilty" or...
Estimation theory
Estimation theory
Estimation theory is a branch of statistics and signal processing that deals with estimating the values of parameters based on measured/empirical data that has a random component. The parameters describe an underlying physical setting in such a way that their value affects the distribution of the...
- Estimator
  Estimator
  In statistics, an estimator is a rule for calculating an estimate of a given quantity based on observed data: thus the rule and its result are distinguished....
- Bayes estimator
  Bayes estimator
  In estimation theory and decision theory, a Bayes estimator or a Bayes action is an estimator or decision rule that minimizes the posterior expected value of a loss function . Equivalently, it maximizes the posterior expectation of a utility function...
- Maximum likelihood
  Maximum likelihood
  In statistics, maximum-likelihood estimation is a method of estimating the parameters of a statistical model. When applied to a data set and given a statistical model, maximum-likelihood estimation provides estimates for the model's parameters....
- Trimmed estimator
  Trimmed estimator
  Given an estimator, a trimmed estimator is obtained by excluding some of the extreme values. This is generally done to obtain a more robust statistic: the extreme values are considered outliers....
- M-estimator
  M-estimator
  In statistics, M-estimators are a broad class of estimators, which are obtained as the minima of sums of functions of the data. Least-squares estimators and many maximum-likelihood estimators are M-estimators. The definition of M-estimators was motivated by robust statistics, which contributed new...
Non-parametric statistics
Non-parametric statistics
In statistics, the term non-parametric statistics has at least two different meanings:The first meaning of non-parametric covers techniques that do not rely on data belonging to any particular distribution. These include, among others:...
- Nonparametric regression
  Nonparametric regression
  Nonparametric regression is a form of regression analysis in which the predictor does not take a predetermined form but is constructed according to information derived from the data...
- Kernels
  Kernel (statistics)
  A kernel is a weighting function used in non-parametric estimation techniques. Kernels are used in kernel density estimation to estimate random variables' density functions, or in kernel regression to estimate the conditional expectation of a random variable. Kernels are also used in time-series,...

Probability distributions

Probability distribution
Probability distribution
In probability theory, a probability mass, probability density, or probability distribution is a function that describes the probability of a random variable taking certain values....
Conditional probability distribution
Probability density function
Probability density function
In probability theory, a probability density function , or density of a continuous random variable is a function that describes the relative likelihood for this random variable to occur at a given point. The probability for the random variable to fall within a particular region is given by the...
Cumulative distribution function
Cumulative distribution function
In probability theory and statistics, the cumulative distribution function , or just distribution function, describes the probability that a real-valued random variable X with a given probability distribution will be found at a value less than or equal to x. Intuitively, it is the "area so far"...
Characteristic function
Characteristic function (probability theory)
In probability theory and statistics, the characteristic function of any random variable completely defines its probability distribution. Thus it provides the basis of an alternative route to analytical results compared with working directly with probability density functions or cumulative...

Random variables

Random variable
Random variable
In probability and statistics, a random variable or stochastic variable is, roughly speaking, a variable whose value results from a measurement on some type of random process. Formally, it is a function from a probability space, typically to the real numbers, which is measurable functionmeasurable...
Central moment
Central moment
In probability theory and statistics, central moments form one set of values by which the properties of a probability distribution can be usefully characterised...
L-moment
L-moment
In statistics, L-moments are statistics used to summarize the shape of a probability distribution. They are analogous to conventional moments in that they can be used to calculate quantities analogous to standard deviation, skewness and kurtosis, termed the L-scale, L-skewness and L-kurtosis...
Algebra of random variables
Algebra of random variables
In the algebraic axiomatization of probability theory, the primary concept is not that of probability of an event, but rather that of a random variable. Probability distributions are determined by assigning an expectation to each random variable...

Probability theory

Probability
Probability
Probability is ordinarily used to describe an attitude of mind towards some proposition of whose truth we arenot certain. The proposition of interest is usually of the form "Will a specific event occur?" The attitude of mind is of the form "How certain are we that the event will occur?" The...
Conditional probability
Conditional probability
In probability theory, the "conditional probability of A given B" is the probability of A if B is known to occur. It is commonly notated P, and sometimes P_B. P can be visualised as the probability of event A when the sample space is restricted to event B...
Law of large numbers
Law of large numbers
In probability theory, the law of large numbers is a theorem that describes the result of performing the same experiment a large number of times...
Central limit theorem
Central limit theorem
In probability theory, the central limit theorem states conditions under which the mean of a sufficiently large number of independent random variables, each with finite mean and variance, will be approximately normally distributed. The central limit theorem has a number of variants. In its common...

Statistics publications

List of statistics journals
List of important publications in statistics
List of scientific journals in statistics