Outline of regression analysis
Encyclopedia
In statistics
, regression analysis includes any technique for learning about the relationship between one or more dependent variables Y and one or more independent variables X.
The following outline is an overview and guide to the variety of topics included within the subject of regression analysis
.
Methods for dependent data
Other forms of regression
See also
Statistics
Statistics is the study of the collection, organization, analysis, and interpretation of data. It deals with all aspects of this, including the planning of data collection in terms of the design of surveys and experiments....
, regression analysis includes any technique for learning about the relationship between one or more dependent variables Y and one or more independent variables X.
The following outline is an overview and guide to the variety of topics included within the subject of regression analysis
Regression analysis
In statistics, regression analysis includes many techniques for modeling and analyzing several variables, when the focus is on the relationship between a dependent variable and one or more independent variables...
.
Non-statistical articles related to regression
- Least squaresLeast squaresThe method of least squares is a standard approach to the approximate solution of overdetermined systems, i.e., sets of equations in which there are more equations than unknowns. "Least squares" means that the overall solution minimizes the sum of the squares of the errors made in solving every...
- Linear least squares (mathematics)
- Non-linear least squaresNon-linear least squaresNon-linear least squares is the form of least squares analysis which is used to fit a set of m observations with a model that is non-linear in n unknown parameters . It is used in some forms of non-linear regression. The basis of the method is to approximate the model by a linear one and to...
- Least absolute deviationsLeast absolute deviationsLeast absolute deviations , also known as Least Absolute Errors , Least Absolute Value , or the L1 norm problem, is a mathematical optimization technique similar to the popular least squares technique that attempts to find a function which closely approximates a set of data...
- Curve fittingCurve fittingCurve fitting is the process of constructing a curve, or mathematical function, that has the best fit to a series of data points, possibly subject to constraints. Curve fitting can involve either interpolation, where an exact fit to the data is required, or smoothing, in which a "smooth" function...
- SmoothingSmoothingIn statistics and image processing, to smooth a data set is to create an approximating function that attempts to capture important patterns in the data, while leaving out noise or other fine-scale structures/rapid phenomena. Many different algorithms are used in smoothing...
- Cross-sectional studyCross-sectional studyCross-sectional studies form a class of research methods that involve observation of all of a population, or a representative subset, at one specific point in time...
Basic statistical ideas related to regression
- Conditional expectationConditional expectationIn probability theory, a conditional expectation is the expected value of a real random variable with respect to a conditional probability distribution....
- CorrelationCorrelationIn statistics, dependence refers to any statistical relationship between two random variables or two sets of data. Correlation refers to any of a broad class of statistical relationships involving dependence....
- Correlation coefficientPearson product-moment correlation coefficientIn statistics, the Pearson product-moment correlation coefficient is a measure of the correlation between two variables X and Y, giving a value between +1 and −1 inclusive...
- Mean square error
- Residual sum of squaresResidual sum of squaresIn statistics, the residual sum of squares is the sum of squares of residuals. It is also known as the sum of squared residuals or the sum of squared errors of prediction . It is a measure of the discrepancy between the data and an estimation model...
- Explained sum of squaresExplained sum of squaresIn statistics, the explained sum of squares is a quantity used in describing how well a model, often a regression model, represents the data being modelled...
- Total sum of squaresTotal sum of squaresIn statistical data analysis the total sum of squares is a quantity that appears as part of a standard way of presenting results of such analyses...
Linear regression based on least squares
- General linear modelGeneral linear modelThe general linear model is a statistical linear model.It may be written aswhere Y is a matrix with series of multivariate measurements, X is a matrix that might be a design matrix, B is a matrix containing parameters that are usually to be estimated and U is a matrix containing errors or...
- Ordinary least squaresOrdinary least squaresIn statistics, ordinary least squares or linear least squares is a method for estimating the unknown parameters in a linear regression model. This method minimizes the sum of squared vertical distances between the observed responses in the dataset and the responses predicted by the linear...
- Generalized least squaresGeneralized least squaresIn statistics, generalized least squares is a technique for estimating the unknown parameters in a linear regression model. The GLS is applied when the variances of the observations are unequal , or when there is a certain degree of correlation between the observations...
- Simple linear regressionSimple linear regressionIn statistics, simple linear regression is the least squares estimator of a linear regression model with a single explanatory variable. In other words, simple linear regression fits a straight line through the set of n points in such a way that makes the sum of squared residuals of the model as...
- Trend estimationTrend estimationTrend estimation is a statistical technique to aid interpretation of data. When a series of measurements of a process are treated as a time series, trend estimation can be used to make and justify statements about tendencies in the data...
- Ridge regression
- Polynomial regressionPolynomial regressionIn statistics, polynomial regression is a form of linear regression in which the relationship between the independent variable x and the dependent variable y is modeled as an nth order polynomial...
- Segmented regressionSegmented regressionSegmented regression is a method in regression analysis in which the independent variable is partitioned into intervals and a separate line segment is fit to each interval. Segmented or piecewise regression analysis can also be performed on multivariate data by partitioning the various independent...
- Nonlinear regressionNonlinear regressionIn statistics, nonlinear regression is a form of regression analysis in which observational data are modeled by a function which is a nonlinear combination of the model parameters and depends on one or more independent variables...
Generalized linear models
- Generalized linear models
- Logistic regressionLogistic regressionIn statistics, logistic regression is used for prediction of the probability of occurrence of an event by fitting data to a logit function logistic curve. It is a generalized linear model used for binomial regression...
- Ordered logitOrdered logitIn statistics, the ordered logit model , is a regression model for ordinal dependent variables...
- Probit modelProbit modelIn statistics, a probit model is a type of regression where the dependent variable can only take two values, for example married or not married....
- Ordered probitOrdered probitIn statistics, ordered probit is a generalization of the popular probit analysis to the case of more than two outcomes of an ordinal dependent variable. Similarly, the popular logit method also has a counterpart ordered logit....
- Poisson regressionPoisson regressionIn statistics, Poisson regression is a form of regression analysis used to model count data and contingency tables. Poisson regression assumes the response variable Y has a Poisson distribution, and assumes the logarithm of its expected value can be modeled by a linear combination of unknown...
- Maximum likelihoodMaximum likelihoodIn statistics, maximum-likelihood estimation is a method of estimating the parameters of a statistical model. When applied to a data set and given a statistical model, maximum-likelihood estimation provides estimates for the model's parameters....
- Cochrane–Orcutt estimation
Inference for regression models
- F-testF-testAn F-test is any statistical test in which the test statistic has an F-distribution under the null hypothesis.It is most often used when comparing statistical models that have been fit to a data set, in order to identify the model that best fits the population from which the data were sampled. ...
- t-test
- Lack-of-fit sum of squaresLack-of-fit sum of squaresIn statistics, a sum of squares due to lack of fit, or more tersely a lack-of-fit sum of squares, is one of the components of a partition of the sum of squares in an analysis of variance, used in the numerator in an F-test of the null hypothesis that says that a proposed model fits well.- Sketch of...
- Confidence bandConfidence bandA confidence band is used in statistical analysis to represent the uncertainty in an estimate of a curve or function based on limited or noisy data. Confidence bands are often used as part of the graphical presentation of results in a statistical analysis...
- Coefficient of determinationCoefficient of determinationIn statistics, the coefficient of determination R2 is used in the context of statistical models whose main purpose is the prediction of future outcomes on the basis of other related information. It is the proportion of variability in a data set that is accounted for by the statistical model...
- Multiple correlationMultiple correlationIn statistics, multiple correlation is a linear relationship among more than two variables. It is measured by the coefficient of multiple determination, denoted as R2, which is a measure of the fit of a linear regression...
- Scheffé's methodScheffé's methodIn statistics, Scheffé's method, named after Henry Scheffé, is a method for adjusting significance levels in a linear regression analysis to account for multiple comparisons...
Challenges to regression modeling
- AutocorrelationAutocorrelationAutocorrelation is the cross-correlation of a signal with itself. Informally, it is the similarity between observations as a function of the time separation between them...
- CointegrationCointegrationCointegration is a statistical property of time series variables. Two or more time series are cointegrated if they share a common stochastic drift.-Introduction:...
- MulticollinearityMulticollinearityMulticollinearity is a statistical phenomenon in which two or more predictor variables in a multiple regression model are highly correlated. In this situation the coefficient estimates may change erratically in response to small changes in the model or the data...
- Homoscedasticity and heteroscedasticity
- Lack of fitGoodness of fitThe goodness of fit of a statistical model describes how well it fits a set of observations. Measures of goodness of fit typically summarize the discrepancy between observed values and the values expected under the model in question. Such measures can be used in statistical hypothesis testing, e.g...
- Non-normality of errorsNormality testIn statistics, normality tests are used to determine whether a data set is well-modeled by a normal distribution or not, or to compute how likely an underlying random variable is to be normally distributed....
- OutlierOutlierIn statistics, an outlier is an observation that is numerically distant from the rest of the data. Grubbs defined an outlier as: An outlying observation, or outlier, is one that appears to deviate markedly from other members of the sample in which it occurs....
s
Diagnostics for regression models
- Regression model validation
- Studentized residualStudentized residualIn statistics, a studentized residual is the quotient resulting from the division of a residual by an estimate of its standard deviation. Typically the standard deviations of residuals in a sample vary greatly from one data point to another even when the errors all have the same standard...
- Cook's distanceCook's distanceIn statistics, Cook's distance is a commonly used estimate of the influence of a data point when doing least squares regression analysis. In a practical ordinary least squares analysis, Cook's distance can be used in several ways: to indicate data points that are particularly worth checking for...
- Variance inflation factorVariance inflation factorIn statistics, the variance inflation factor quantifies the severity of multicollinearity in an ordinary least squares regression analysis...
- DFFITS
- Partial residual plotPartial residual plotIn applied statistics, a partial residual plot is a graphical technique that attempts to show the relationship between a given independent variable and the response variable given that other independent variables are also in the model.-Background:...
- Partial regression plotPartial regression plotIn applied statistics, a partial regression plot attempts to show the effect of adding an additional variable to the model...
- LeverageLeverage (statistics)In statistics, leverage is a term used in connection with regression analysis and, in particular, in analyses aimed at identifying those observations that are far away from corresponding average predictor values...
- Durbin–Watson statistic
Formal aids to model selection
- Model selectionModel selectionModel selection is the task of selecting a statistical model from a set of candidate models, given data. In the simplest cases, a pre-existing set of data is considered...
- Mallows' CpMallows' CpIn statistics, Mallows' Cp, named for Colin L. Mallows, is used to assess the fit of a regression model that has been estimated using ordinary least squares. It is applied in the context of model selection, where a number of predictor variables are available for predicting some outcome, and the...
- Akaike information criterionAkaike information criterionThe Akaike information criterion is a measure of the relative goodness of fit of a statistical model. It was developed by Hirotsugu Akaike, under the name of "an information criterion" , and was first published by Akaike in 1974...
- Bayesian information criterion
- Hannan–Quinn information criterion
- Cross validation
Terminology
- Linear modelLinear modelIn statistics, the term linear model is used in different ways according to the context. The most common occurrence is in connection with regression models and the term is often taken as synonymous with linear regression model. However the term is also used in time series analysis with a different...
 — relates to meaning of "linear"
- Dependent and independent variablesDependent and independent variablesThe terms "dependent variable" and "independent variable" are used in similar but subtly different ways in mathematics and statistics as part of the standard terminology in those subjects...
- Errors and residuals in statisticsErrors and residuals in statisticsIn statistics and optimization, statistical errors and residuals are two closely related and easily confused measures of the deviation of a sample from its "theoretical value"...
- Hat matrixHat matrixIn statistics, the hat matrix, H, maps the vector of observed values to the vector of fitted values. It describes the influence each observed value has on each fitted value...
- Trend stationary
- Cross-sectional dataCross-sectional dataCross-sectional data or cross section in statistics and econometrics is a type of one-dimensional data set. Cross-sectional data refers to data collected by observing many subjects at the same point of time, or without regard to differences in time...
- Time seriesTime seriesIn statistics, signal processing, econometrics and mathematical finance, a time series is a sequence of data points, measured typically at successive times spaced at uniform time intervals. Examples of time series are the daily closing value of the Dow Jones index or the annual flow volume of the...
Methods for dependent data
- Mixed modelMixed modelA mixed model is a statistical model containing both fixed effects and random effects, that is mixed effects. These models are useful in a wide variety of disciplines in the physical, biological and social sciences....
- Random effects model
- Hierarchical linear models
Other forms of regression
Mixed model
A mixed model is a statistical model containing both fixed effects and random effects, that is mixed effects. These models are useful in a wide variety of disciplines in the physical, biological and social sciences....
- Total least squares regression
- Deming regressionDeming regressionIn statistics, Deming regression, named after W. Edwards Deming, is an errors-in-variables model which tries to find the line of best fit for a two-dimensional dataset...
- Errors-in-variables modelErrors-in-variables modelTotal least squares, also known as errors in variables, rigorous least squares, or orthogonal regression, is a least squares data modeling technique in which observational errors on both dependent and independent variables are taken into account...
- Instrumental variables regression
- Quantile regressionQuantile regressionQuantile regression is a type of regression analysis used in statistics. Whereas the method of least squares results in estimates that approximate the conditional mean of the response variable given certain values of the predictor variables, quantile regression results in estimates approximating...
- Generalized additive modelGeneralized additive modelIn statistics, the generalized additive model is a statistical model developed by Trevor Hastie and Rob Tibshirani for blending properties of generalized linear models with additive models....
- Autoregressive modelAutoregressive modelIn statistics and signal processing, an autoregressive model is a type of random process which is often used to model and predict various types of natural phenomena...
- Moving average modelMoving average modelIn time series analysis, the moving-average model is a common approach for modeling univariate time series models. The notation MA refers to the moving average model of order q:...
- Autoregressive moving average modelAutoregressive moving average modelIn statistics and signal processing, autoregressive–moving-average models, sometimes called Box–Jenkins models after the iterative Box–Jenkins methodology usually used to estimate them, are typically applied to autocorrelated time series data.Given a time series of data Xt, the ARMA model is a...
- Autoregressive integrated moving averageAutoregressive integrated moving averageIn statistics and econometrics, and in particular in time series analysis, an autoregressive integrated moving average model is a generalization of an autoregressive moving average model. These models are fitted to time series data either to better understand the data or to predict future points...
- Autoregressive conditional heteroskedasticityAutoregressive conditional heteroskedasticityIn econometrics, AutoRegressive Conditional Heteroskedasticity models are used to characterize and model observed time series. They are used whenever there is reason to believe that, at any point in a series, the terms will have a characteristic size, or variance...
See also
- PredictionPredictionA prediction or forecast is a statement about the way things will happen in the future, often but not always based on experience or knowledge...
- Design of experimentsDesign of experimentsIn general usage, design of experiments or experimental design is the design of any information-gathering exercises where variation is present, whether under the full control of the experimenter or not. However, in statistics, these terms are usually used for controlled experiments...
- Data transformationData transformation (statistics)In statistics, data transformation refers to the application of a deterministic mathematical function to each point in a data set — that is, each data point zi is replaced with the transformed value yi = f, where f is a function...
- Box–Cox transformationPower transformIn statistics, the power transform is from a family of functions that are applied to create a rank-preserving transformation of data using power functions. This is a useful data processing technique used to stabilize variance, make the data more normal distribution-like, improve the correlation...
- Machine learningMachine learningMachine learning, a branch of artificial intelligence, is a scientific discipline concerned with the design and development of algorithms that allow computers to evolve behaviors based on empirical data, such as from sensor data or databases...
- Analysis of variance
- Causal inferenceCausalityCausality is the relationship between an event and a second event , where the second event is understood as a consequence of the first....