Kernel regression
Encyclopedia
The kernel regression is a non-parametric technique in statistics to estimate the conditional expectation
of a random variable
. The objective is to find a non-linear relation between a pair of random variables X and Y.
In any nonparametric regression
, the conditional expectation
of a variable relative to a variable may be written:
where is an unknown function.
as a weighting function. The Nadaraya-Watson estimator is:
where is a kernel with a bandwidth . The fraction is a weighting term with sum 1.
Using the kernel density estimation
for the joint distribution f(x,y) and f(x) with a kernel K,
,
we obtain the Nadaraya-Watson estimator.
where
of a random sample taken from the 1971 Canadian Census Public Use
Tapes for male individuals having common education (grade 13). There
are 205 observations in total.
We consider estimating the unknown regression function using
Nadaraya-Watson kernel regression via the
R np package
that uses automatic (data-driven) bandwidth selection; see the np vignette for an introduction to the np package.
The figure below shows the estimated regression function using a
second order Gaussian kernel along with asymptotic variability bounds
Estimated Regression Function.
npreg function to deliver optimal smoothing and to create
the figure given above. These commands can be entered at the command
prompt via cut and paste.
library(np) /* non parametic library */
data(cps71)
attach(cps71)
m <- npreg(logwage~age)
plot(m,plot.errors.method="asymptotic",
plot.errors.style="band",
ylim=c(11,15.2))
points(age,logwage,cex=.25)
Conditional expectation
In probability theory, a conditional expectation is the expected value of a real random variable with respect to a conditional probability distribution....
of a random variable
Random variable
In probability and statistics, a random variable or stochastic variable is, roughly speaking, a variable whose value results from a measurement on some type of random process. Formally, it is a function from a probability space, typically to the real numbers, which is measurable functionmeasurable...
. The objective is to find a non-linear relation between a pair of random variables X and Y.
In any nonparametric regression
Nonparametric regression
Nonparametric regression is a form of regression analysis in which the predictor does not take a predetermined form but is constructed according to information derived from the data...
, the conditional expectation
Conditional expectation
In probability theory, a conditional expectation is the expected value of a real random variable with respect to a conditional probability distribution....
of a variable relative to a variable may be written:
where is an unknown function.
Nadaraya-Watson kernel regression
Nadaraya (1964) and Watson (1964) proposed to estimate as a locally weighted average, using a kernelKernel (statistics)
A kernel is a weighting function used in non-parametric estimation techniques. Kernels are used in kernel density estimation to estimate random variables' density functions, or in kernel regression to estimate the conditional expectation of a random variable. Kernels are also used in time-series,...
as a weighting function. The Nadaraya-Watson estimator is:
where is a kernel with a bandwidth . The fraction is a weighting term with sum 1.
Derivation
Using the kernel density estimation
Kernel density estimation
In statistics, kernel density estimation is a non-parametric way of estimating the probability density function of a random variable. Kernel density estimation is a fundamental data smoothing problem where inferences about the population are made, based on a finite data sample...
for the joint distribution f(x,y) and f(x) with a kernel K,
,
we obtain the Nadaraya-Watson estimator.
Priestley-Chao kernel estimator
Gasser-Müller kernel estimator
where
Example
This example is based upon Canadian cross-section wage data consistingof a random sample taken from the 1971 Canadian Census Public Use
Tapes for male individuals having common education (grade 13). There
are 205 observations in total.
We consider estimating the unknown regression function using
Nadaraya-Watson kernel regression via the
R np package
that uses automatic (data-driven) bandwidth selection; see the np vignette for an introduction to the np package.
The figure below shows the estimated regression function using a
second order Gaussian kernel along with asymptotic variability bounds
Script for example
The following commands of the R programming language use thenpreg function to deliver optimal smoothing and to create
the figure given above. These commands can be entered at the command
prompt via cut and paste.
library(np) /* non parametic library */
data(cps71)
attach(cps71)
m <- npreg(logwage~age)
plot(m,plot.errors.method="asymptotic",
plot.errors.style="band",
ylim=c(11,15.2))
points(age,logwage,cex=.25)
Related
According to Salsburg(2002, page 290-291), the algorithms used in kernel regression were independently developed and used in Fuzzy Systems: "Coming up with almost exactly the same computer algorithm, fuzzy systems and kernel density-based regressions appear to have been developed completely independently of one another."Statistical implementation
- StataStataStata is a general-purpose statistical software package created in 1985 by StataCorp. It is used by many businesses and academic institutions around the world...
kernreg2
kernreg2 y x, bwidth(.5) kercode(3) npoint(500) gen(kernelprediction gridofpoints)
- RR (programming language)R is a programming language and software environment for statistical computing and graphics. The R language is widely used among statisticians for developing statistical software, and R is widely used for statistical software development and data analysis....
: npreg (package np) - GNU/octaveGNU OctaveGNU Octave is a high-level language, primarily intended for numerical computations. It provides a convenient command-line interface for solving linear and nonlinear problems numerically, and for performing other numerical experiments using a language that is mostly compatible with MATLAB...
mathematical program package:
External links
- Scale-adaptive kernel regression (with Matlab software).
- An online kernel regression demonstration Requires .NET 3.0 or later.
- The np package An RR (programming language)R is a programming language and software environment for statistical computing and graphics. The R language is widely used among statisticians for developing statistical software, and R is widely used for statistical software development and data analysis....
package that provides a variety of nonparametric and semiparametric kernel methods that seamlessly handle a mix of continuous, unordered, and ordered factor data types.