List of basic statistics topics
Encyclopedia
The following outline is provided as an overview and guide to the variety of topics included within the subject of statistics:

Statistics
Statistics
Statistics is the study of the collection, organization, analysis, and interpretation of data. It deals with all aspects of this, including the planning of data collection in terms of the design of surveys and experiments....

pertains to the collection, analysis, interpretation, and presentation of data
Data
The term data refers to qualitative or quantitative attributes of a variable or set of variables. Data are typically the results of measurements and can be the basis of graphs, images, or observations of a set of variables. Data are often viewed as the lowest level of abstraction from which...

. It is applicable to a wide variety of academic disciplines, from the physical and social science
Science
Science is a systematic enterprise that builds and organizes knowledge in the form of testable explanations and predictions about the universe...

s to the humanities
Humanities
The humanities are academic disciplines that study the human condition, using methods that are primarily analytical, critical, or speculative, as distinguished from the mainly empirical approaches of the natural sciences....

; it is also used and misused
Misuse of statistics
A misuse of statistics occurs when a statistical argument asserts a falsehood. In some cases, the misuse may be accidental. In others, it is purposeful and for the gain of the perpetrator. When the statistical reason involved is false or misapplied, this constitutes a statistical fallacy.The false...

 for making informed decisions in all areas of business
Business
A business is an organization engaged in the trade of goods, services, or both to consumers. Businesses are predominant in capitalist economies, where most of them are privately owned and administered to earn profit to increase the wealth of their owners. Businesses may also be not-for-profit...

 and government
Government
Government refers to the legislators, administrators, and arbitrators in the administrative bureaucracy who control a state at a given time, and to the system of government by which they are organized...

.

Nature of statistics

Statistics can be described as all of the following:
  • Academic discipline
    Academic discipline
    An academic discipline, or field of study, is a branch of knowledge that is taught and researched at the college or university level. Disciplines are defined , and recognized by the academic journals in which research is published, and the learned societies and academic departments or faculties to...

    : one with academic departments, curricula and degrees; national and international societies; and specialized journals.
  • Scientific field (a branch of science
    Science
    Science is a systematic enterprise that builds and organizes knowledge in the form of testable explanations and predictions about the universe...

    ) – widely-recognized category of specialized expertise within science, and typically embodies its own terminology and nomenclature. Such a field will usually be represented by one or more scientific journals, where peer reviewed research is published.
    • Formal science
      Formal science
      The formal sciences are the branches of knowledge that are concerned with formal systems, such as logic, mathematics, theoretical computer science, information theory, systems theory, decision theory, statistics, and some aspects of linguistics....

       – branch of knowledge concerned with formal systems.
      • Mathematical science
        Mathematical sciences
        Mathematical sciences is a broad term that refers to those academic disciplines that are primarily mathematical in nature but may not be universally considered subfields of mathematics proper...

         – field of science that is primarily mathematical in nature but may not be universally considered subfields of mathematics proper. Statistics, for example, is mathematical in its methods but grew out of political arithmetic which merged with inverse probability and grew through applications in the social sciences and some areas of physics and biometrics to become its own separate, though closely allied, field.

History of statistics

  • History of statistics
    History of statistics
    The history of statistics can be said to start around 1749 although, over time, there have been changes to the interpretation of what the word statistics means. In early times, the meaning was restricted to information about states...

  • Founders of statistics
    Founders of statistics
    Statistics is the theory and application of mathematics to the scientific method including hypothesis generation, experimental design, sampling, data collection, data summarization, estimation, prediction and inference from those results to the population from which the experimental sample was drawn...

  • History of probability
    History of probability
    Probability has a dual aspect: on the one hand the probability or likelihood of hypotheses given the evidence for them, and on the other hand the behavior of stochastic processes such as the throwing of dice or coins...

  • Timeline of probability and statistics
    Timeline of probability and statistics
    A timeline of probability and statistics-Before 1600:* 9th Century - Al-Kindi was the first to use statistics to decipher encrypted messages and developed the first code breaking algorithm in the House of Wisdom in Baghdad, based on frequency analysis...


Describing data

  • Descriptive statistics
    Descriptive statistics
    Descriptive statistics quantitatively describe the main features of a collection of data. Descriptive statistics are distinguished from inferential statistics , in that descriptive statistics aim to summarize a data set, rather than use the data to learn about the population that the data are...

  • Average
    Average
    In mathematics, an average, or central tendency of a data set is a measure of the "middle" value of the data set. Average is one form of central tendency. Not all central tendencies should be considered definitions of average....

    • Mean
      Mean
      In statistics, mean has two related meanings:* the arithmetic mean .* the expected value of a random variable, which is also called the population mean....

    • Median
      Median
      In probability theory and statistics, a median is described as the numerical value separating the higher half of a sample, a population, or a probability distribution, from the lower half. The median of a finite list of numbers can be found by arranging all the observations from lowest value to...

    • Mode
      Mode (statistics)
      In statistics, the mode is the value that occurs most frequently in a data set or a probability distribution. In some fields, notably education, sample data are often called scores, and the sample mode is known as the modal score....

  • Measures of scale
    Statistical dispersion
    In statistics, statistical dispersion is variability or spread in a variable or a probability distribution...

    • Variance
      Variance
      In probability theory and statistics, the variance is a measure of how far a set of numbers is spread out. It is one of several descriptors of a probability distribution, describing how far the numbers lie from the mean . In particular, the variance is one of the moments of a distribution...

    • Standard deviation
      Standard deviation
      Standard deviation is a widely used measure of variability or diversity used in statistics and probability theory. It shows how much variation or "dispersion" there is from the average...

    • Median absolute deviation
      Median absolute deviation
      In statistics, the median absolute deviation is a robust measure of the variability of a univariate sample of quantitative data. It can also refer to the population parameter that is estimated by the MAD calculated from a sample....

  • Correlation
    Correlation
    In statistics, dependence refers to any statistical relationship between two random variables or two sets of data. Correlation refers to any of a broad class of statistical relationships involving dependence....

  • Outlier
    Outlier
    In statistics, an outlier is an observation that is numerically distant from the rest of the data. Grubbs defined an outlier as: An outlying observation, or outlier, is one that appears to deviate markedly from other members of the sample in which it occurs....

  • Statistical graphics
    Statistical graphics
    Statistical graphics, also known as graphical techniques, are information graphics in the field of statistics used to visualize quantitative data.- Overview :...

    • Histogram
      Histogram
      In statistics, a histogram is a graphical representation showing a visual impression of the distribution of data. It is an estimate of the probability distribution of a continuous variable and was first introduced by Karl Pearson...

    • Frequency distribution
      Frequency distribution
      In statistics, a frequency distribution is an arrangement of the values that one or more variables take in a sample. Each entry in the table contains the frequency or count of the occurrences of values within a particular group or interval, and in this way, the table summarizes the distribution of...

    • Quantile
      Quantile
      Quantiles are points taken at regular intervals from the cumulative distribution function of a random variable. Dividing ordered data into q essentially equal-sized data subsets is the motivation for q-quantiles; the quantiles are the data values marking the boundaries between consecutive subsets...

    • Survival function
      Survival function
      The survival function, also known as a survivor function or reliability function, is a property of any random variable that maps a set of events, usually associated with mortality or failure of some system, onto time. It captures the probability that the system will survive beyond a specified time...

    • Failure rate
      Failure rate
      Failure rate is the frequency with which an engineered system or component fails, expressed for example in failures per hour. It is often denoted by the Greek letter λ and is important in reliability engineering....


Experiments and surveys

  • Design of experiments
    Design of experiments
    In general usage, design of experiments or experimental design is the design of any information-gathering exercises where variation is present, whether under the full control of the experimenter or not. However, in statistics, these terms are usually used for controlled experiments...

    • Optimal design
      Optimal design
      Optimal designs are a class of experimental designs that are optimal with respect to some statistical criterion.In the design of experiments for estimating statistical models, optimal designs allow parameters to be estimated without bias and with minimum-variance...

    • factorial experiment
      Factorial experiment
      In statistics, a full factorial experiment is an experiment whose design consists of two or more factors, each with discrete possible values or "levels", and whose experimental units take on all possible combinations of these levels across all such factors. A full factorial design may also be...

    • Restricted randomization
      Restricted randomization
      Many processes have more than one source of variation in them. In order to reduce variation in processes, these multiple sources must be understood, and that often leads to the concept of nested or hierarchical data structures. For example, in the semiconductor industry, a batch process may operate...

    • Repeated measures design
      Repeated measures design
      The repeated measures design uses the same subjects with every condition of the research, including the control. For instance, repeated measures are collected in a longitudinal study in which change over time is assessed. Other studies compare the same measure under two or more different conditions...

    • Randomized block design
      Randomized block design
      In the statistical theory of the design of experiments, blocking is the arranging of experimental units in groups that are similar to one another. Typically, a blocking factor is a source of variability that is not of primary interest to the experimenter...

  • Statistical survey
    Statistical survey
    Survey methodology is the field that studies surveys, that is, the sample of individuals from a population with a view towards making statistical inferences about the population using the sample. Polls about public opinion, such as political beliefs, are reported in the news media in democracies....

  • Opinion poll
    Opinion poll
    An opinion poll, sometimes simply referred to as a poll is a survey of public opinion from a particular sample. Opinion polls are usually designed to represent the opinions of a population by conducting a series of questions and then extrapolating generalities in ratio or within confidence...


Sampling

  • Sampling distribution
    Sampling distribution
    In statistics, a sampling distribution or finite-sample distribution is the probability distribution of a given statistic based on a random sample. Sampling distributions are important in statistics because they provide a major simplification on the route to statistical inference...

  • sampling
    Sampling (statistics)
    In statistics and survey methodology, sampling is concerned with the selection of a subset of individuals from within a population to estimate characteristics of the whole population....

    • Stratified sampling
      Stratified sampling
      In statistics, stratified sampling is a method of sampling from a population.In statistical surveys, when subpopulations within an overall population vary, it is advantageous to sample each subpopulation independently. Stratification is the process of dividing members of the population into...

    • Quota sampling
      Quota sampling
      Quota sampling is a method for selecting survey participants. In quota sampling, a population is first segmented into mutually exclusive sub-groups, just as in stratified sampling. Then judgment is used to select the subjects or units from each segment based on a specified proportion. For example,...

  • Biased sample
    Biased sample
    In statistics, sampling bias is when a sample is collected in such a way that some members of the intended population are less likely to be included than others. It results in a biased sample, a non-random sample of a population in which all individuals, or instances, were not equally likely to...

    • Spectrum bias
      Spectrum bias
      Initially identified in 1978, spectrum bias refers to the phenomenon that the performance of a diagnostic test may change between different clinical settings owing to changes in the patient case-mix thereby affecting the transferability of study results in clinical practice...

    • Survivorship bias
      Survivorship bias
      Survivorship bias is the logical error of concentrating on the people or things that "survived" some process and inadvertently overlooking those that didn't because of their lack of visibility. This can lead to false conclusions in several different ways...


Analysing data

  • Regression analysis
    Regression analysis
    In statistics, regression analysis includes many techniques for modeling and analyzing several variables, when the focus is on the relationship between a dependent variable and one or more independent variables...

    • Outline of regression analysis
      Outline of regression analysis
      In statistics, regression analysis includes any technique for learning about the relationship between one or more dependent variables Y and one or more independent variables X....

    • Analysis of variance
      Analysis of variance
      In statistics, analysis of variance is a collection of statistical models, and their associated procedures, in which the observed variance in a particular variable is partitioned into components attributable to different sources of variation...

       (ANOVA)
    • General linear model
      General linear model
      The general linear model is a statistical linear model.It may be written aswhere Y is a matrix with series of multivariate measurements, X is a matrix that might be a design matrix, B is a matrix containing parameters that are usually to be estimated and U is a matrix containing errors or...

    • Generalized linear model
      Generalized linear model
      In statistics, the generalized linear model is a flexible generalization of ordinary linear regression. The GLM generalizes linear regression by allowing the linear model to be related to the response variable via a link function and by allowing the magnitude of the variance of each measurement to...

  • Density estimation
    Density estimation
    In probability and statistics,density estimation is the construction of an estimate, based on observed data, of an unobservable underlying probability density function...

    • Kernel density estimation
      Kernel density estimation
      In statistics, kernel density estimation is a non-parametric way of estimating the probability density function of a random variable. Kernel density estimation is a fundamental data smoothing problem where inferences about the population are made, based on a finite data sample...

    • Multivariate kernel density estimation
      Multivariate kernel density estimation
      Kernel density estimation is a nonparametric technique for density estimation i.e., estimation of probability density functions, which is one of the fundamental questions in statistics. It can be viewed as a generalisation of histogram density estimation with improved statistical properties...

  • Time series
    Time series
    In statistics, signal processing, econometrics and mathematical finance, a time series is a sequence of data points, measured typically at successive times spaced at uniform time intervals. Examples of time series are the daily closing value of the Dow Jones index or the annual flow volume of the...

    • Time series analysis
    • Box–Jenkins
    • Frequency domain
      Frequency domain
      In electronics, control systems engineering, and statistics, frequency domain is a term used to describe the domain for analysis of mathematical functions or signals with respect to frequency, rather than time....

    • Time domain
      Time domain
      Time domain is a term used to describe the analysis of mathematical functions, physical signals or time series of economic or environmental data, with respect to time. In the time domain, the signal or function's value is known for all real numbers, for the case of continuous time, or at various...

  • Multivariate analysis
    Multivariate analysis
    Multivariate analysis is based on the statistical principle of multivariate statistics, which involves observation and analysis of more than one statistical variable at a time...

    • Principal component analysis (PCA)
    • Factor analysis
      Factor analysis
      Factor analysis is a statistical method used to describe variability among observed, correlated variables in terms of a potentially lower number of unobserved, uncorrelated variables called factors. In other words, it is possible, for example, that variations in three or four observed variables...

    • Cluster analysis
  • Robust statistics
    Robust statistics
    Robust statistics provides an alternative approach to classical statistical methods. The motivation is to produce estimators that are not unduly affected by small departures from model assumptions.- Introduction :...


Filtering data

  • Recursive Bayesian estimation
    Recursive Bayesian estimation
    Recursive Bayesian estimation, also known as a Bayes filter, is a general probabilistic approach for estimating an unknown probability density function recursively over time using incoming measurements and a mathematical process model.-In robotics:...

    • Kalman filter
      Kalman filter
      In statistics, the Kalman filter is a mathematical method named after Rudolf E. Kálmán. Its purpose is to use measurements observed over time, containing noise and other inaccuracies, and produce values that tend to be closer to the true values of the measurements and their associated calculated...

    • Particle filter
      Particle filter
      In statistics, particle filters, also known as Sequential Monte Carlo methods , are sophisticated model estimation techniques based on simulation...

  • Moving average

Statistical inference

  • Statistical inference
    Statistical inference
    In statistics, statistical inference is the process of drawing conclusions from data that are subject to random variation, for example, observational errors or sampling variation...

  • Mathematical statistics
    Mathematical statistics
    Mathematical statistics is the study of statistics from a mathematical standpoint, using probability theory as well as other branches of mathematics such as linear algebra and analysis...

    • Likelihood function
      Likelihood function
      In statistics, a likelihood function is a function of the parameters of a statistical model, defined as follows: the likelihood of a set of parameter values given some observed outcomes is equal to the probability of those observed outcomes given those parameter values...

    • Exponential family
      Exponential family
      In probability and statistics, an exponential family is an important class of probability distributions sharing a certain form, specified below. This special form is chosen for mathematical convenience, on account of some useful algebraic properties, as well as for generality, as exponential...

  • Bayesian inference
    Bayesian inference
    In statistics, Bayesian inference is a method of statistical inference. It is often used in science and engineering to determine model parameters, make predictions about unknown variables, and to perform model selection...

    • Bayes' theorem
      Bayes' theorem
      In probability theory and applications, Bayes' theorem relates the conditional probabilities P and P. It is commonly used in science and engineering. The theorem is named for Thomas Bayes ....

    • Bayes estimator
      Bayes estimator
      In estimation theory and decision theory, a Bayes estimator or a Bayes action is an estimator or decision rule that minimizes the posterior expected value of a loss function . Equivalently, it maximizes the posterior expectation of a utility function...

    • Prior distribution
    • Posterior distribution
    • Conjugate prior
      Conjugate prior
      In Bayesian probability theory, if the posterior distributions p are in the same family as the prior probability distribution p, the prior and posterior are then called conjugate distributions, and the prior is called a conjugate prior for the likelihood...

  • Frequentist inference
    Frequentist inference
    Frequentist inference is one of a number of possible ways of formulating generally applicable schemes for making statistical inferences: that is, for drawing conclusions from statistical samples. An alternative name is frequentist statistics...

    • Statistical hypothesis testing
      Statistical hypothesis testing
      A statistical hypothesis test is a method of making decisions using data, whether from a controlled experiment or an observational study . In statistics, a result is called statistically significant if it is unlikely to have occurred by chance alone, according to a pre-determined threshold...

      • Null hypothesis
        Null hypothesis
        The practice of science involves formulating and testing hypotheses, assertions that are capable of being proven false using a test of observed data. The null hypothesis typically corresponds to a general or default position...

      • Alternative hypothesis
      • P-value
        P-value
        In statistical significance testing, the p-value is the probability of obtaining a test statistic at least as extreme as the one that was actually observed, assuming that the null hypothesis is true. One often "rejects the null hypothesis" when the p-value is less than the significance level α ,...

      • Significance level
      • Statistical power
        Statistical power
        The power of a statistical test is the probability that the test will reject the null hypothesis when the null hypothesis is actually false . The power is in general a function of the possible distributions, often determined by a parameter, under the alternative hypothesis...

    • Likelihood-ratio test
      Likelihood-ratio test
      In statistics, a likelihood ratio test is a statistical test used to compare the fit of two models, one of which is a special case of the other . The test is based on the likelihood ratio, which expresses how many times more likely the data are under one model than the other...

    • Confidence interval
      Confidence interval
      In statistics, a confidence interval is a particular kind of interval estimate of a population parameter and is used to indicate the reliability of an estimate. It is an observed interval , in principle different from sample to sample, that frequently includes the parameter of interest, if the...

  • Decision theory
    Decision theory
    Decision theory in economics, psychology, philosophy, mathematics, and statistics is concerned with identifying the values, uncertainties and other issues relevant in a given decision, its rationality, and the resulting optimal decision...

    • Optimal decision
      Optimal decision
      An optimal decision is a decision such that no other available decision options will lead to a better outcome. It is an important concept in decision theory. In order to compare the different decision outcomes, one commonly assigns a relative utility to each of them...

    • Type I and type II errors
      Type I and type II errors
      In statistical test theory the notion of statistical error is an integral part of hypothesis testing. The test requires an unambiguous statement of a null hypothesis, which usually corresponds to a default "state of nature", for example "this person is healthy", "this accused is not guilty" or...

  • Estimation theory
    Estimation theory
    Estimation theory is a branch of statistics and signal processing that deals with estimating the values of parameters based on measured/empirical data that has a random component. The parameters describe an underlying physical setting in such a way that their value affects the distribution of the...

    • Estimator
      Estimator
      In statistics, an estimator is a rule for calculating an estimate of a given quantity based on observed data: thus the rule and its result are distinguished....

    • Bayes estimator
      Bayes estimator
      In estimation theory and decision theory, a Bayes estimator or a Bayes action is an estimator or decision rule that minimizes the posterior expected value of a loss function . Equivalently, it maximizes the posterior expectation of a utility function...

    • Maximum likelihood
      Maximum likelihood
      In statistics, maximum-likelihood estimation is a method of estimating the parameters of a statistical model. When applied to a data set and given a statistical model, maximum-likelihood estimation provides estimates for the model's parameters....

    • Trimmed estimator
      Trimmed estimator
      Given an estimator, a trimmed estimator is obtained by excluding some of the extreme values. This is generally done to obtain a more robust statistic: the extreme values are considered outliers....

    • M-estimator
      M-estimator
      In statistics, M-estimators are a broad class of estimators, which are obtained as the minima of sums of functions of the data. Least-squares estimators and many maximum-likelihood estimators are M-estimators. The definition of M-estimators was motivated by robust statistics, which contributed new...

  • Non-parametric statistics
    Non-parametric statistics
    In statistics, the term non-parametric statistics has at least two different meanings:The first meaning of non-parametric covers techniques that do not rely on data belonging to any particular distribution. These include, among others:...

    • Nonparametric regression
      Nonparametric regression
      Nonparametric regression is a form of regression analysis in which the predictor does not take a predetermined form but is constructed according to information derived from the data...

    • Kernels
      Kernel (statistics)
      A kernel is a weighting function used in non-parametric estimation techniques. Kernels are used in kernel density estimation to estimate random variables' density functions, or in kernel regression to estimate the conditional expectation of a random variable. Kernels are also used in time-series,...


Probability distributions

  • Probability distribution
    Probability distribution
    In probability theory, a probability mass, probability density, or probability distribution is a function that describes the probability of a random variable taking certain values....

  • Conditional probability distribution
  • Probability density function
    Probability density function
    In probability theory, a probability density function , or density of a continuous random variable is a function that describes the relative likelihood for this random variable to occur at a given point. The probability for the random variable to fall within a particular region is given by the...

  • Cumulative distribution function
    Cumulative distribution function
    In probability theory and statistics, the cumulative distribution function , or just distribution function, describes the probability that a real-valued random variable X with a given probability distribution will be found at a value less than or equal to x. Intuitively, it is the "area so far"...

  • Characteristic function
    Characteristic function (probability theory)
    In probability theory and statistics, the characteristic function of any random variable completely defines its probability distribution. Thus it provides the basis of an alternative route to analytical results compared with working directly with probability density functions or cumulative...


Random variables

  • Random variable
    Random variable
    In probability and statistics, a random variable or stochastic variable is, roughly speaking, a variable whose value results from a measurement on some type of random process. Formally, it is a function from a probability space, typically to the real numbers, which is measurable functionmeasurable...

  • Central moment
    Central moment
    In probability theory and statistics, central moments form one set of values by which the properties of a probability distribution can be usefully characterised...

  • L-moment
    L-moment
    In statistics, L-moments are statistics used to summarize the shape of a probability distribution. They are analogous to conventional moments in that they can be used to calculate quantities analogous to standard deviation, skewness and kurtosis, termed the L-scale, L-skewness and L-kurtosis...

  • Algebra of random variables
    Algebra of random variables
    In the algebraic axiomatization of probability theory, the primary concept is not that of probability of an event, but rather that of a random variable. Probability distributions are determined by assigning an expectation to each random variable...


Probability theory

  • Probability
    Probability
    Probability is ordinarily used to describe an attitude of mind towards some proposition of whose truth we arenot certain. The proposition of interest is usually of the form "Will a specific event occur?" The attitude of mind is of the form "How certain are we that the event will occur?" The...

  • Conditional probability
    Conditional probability
    In probability theory, the "conditional probability of A given B" is the probability of A if B is known to occur. It is commonly notated P, and sometimes P_B. P can be visualised as the probability of event A when the sample space is restricted to event B...

  • Law of large numbers
    Law of large numbers
    In probability theory, the law of large numbers is a theorem that describes the result of performing the same experiment a large number of times...

  • Central limit theorem
    Central limit theorem
    In probability theory, the central limit theorem states conditions under which the mean of a sufficiently large number of independent random variables, each with finite mean and variance, will be approximately normally distributed. The central limit theorem has a number of variants. In its common...


Statistics publications


See also

  • Index of statistics articles
  • Glossary of probability and statistics
    Glossary of probability and statistics
    The following is a glossary of terms. It is not intended to be all-inclusive.- Concerned fields :*Probability theory*Algebra of random variables *Statistics*Measure theory*Estimation theory- Glossary :...

  • Notation in probability and statistics
  • Outline of probability
    Outline of probability
    Probability is the likelihood or chance that something is the case or will happen. Probability theory is used extensively in statistics, mathematics, science and philosophy to draw conclusions about the likelihood of potential events and the underlying mechanics of complex systems.The following...

  • Combinatorics
    Combinatorics
    Combinatorics is a branch of mathematics concerning the study of finite or countable discrete structures. Aspects of combinatorics include counting the structures of a given kind and size , deciding when certain criteria can be met, and constructing and analyzing objects meeting the criteria ,...

  • Monte Carlo method
    Monte Carlo method
    Monte Carlo methods are a class of computational algorithms that rely on repeated random sampling to compute their results. Monte Carlo methods are often used in computer simulations of physical and mathematical systems...

  • Simulation
    Simulation
    Simulation is the imitation of some real thing available, state of affairs, or process. The act of simulating something generally entails representing certain key characteristics or behaviours of a selected physical or abstract system....

  • List of fields of application of statistics
  • List of graphical methods
  • Lists of statistics topics
The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK