Wolfe conditions
Encyclopedia
In the unconstrained minimization
problem, the Wolfe conditions are a set of inequalities for performing inexact line search, especially in quasi-Newton methods.
In these methods the idea is to find
for some smooth
. Each step often involves approximately solving the subproblem
where is the current best guess, is a search direction, and is the step length.
Then inexact line searches provide an efficient way of computing an acceptable step length that reduces the objective function 'sufficiently', rather than minimizing the objective function over exactly. A line search algorithm can use Wolfe conditions as a requirement for any guessed , before finding a new search direction .
with . (In examining condition (ii), recall that to ensure that is a descent direction, we have .)
is usually chosen to quite small while is much larger; Nocedal gives example values of and for Newton or quasi-Newton methods and for the nonlinear conjugate gradient method
. Inequality i) is known as the Armijo rule and ii) as the curvature condition; i) ensures that the step length decreases 'sufficiently', and ii) ensures that the slope has been reduced sufficiently.
then i) and iia) together form the so-called strong Wolfe conditions, and force to lie close to a critical point
of .
The principal reason for imposing the Wolfe conditions in an optimization algorithm where is to ensure convergence of the gradient to zero. In particular, if the cosine of the angle between and the gradient,
is bounded away from zero and the i) and ii) hold, then .
An additional motivation, in the case of a quasi-Newton method
is that if , where the matrix is updated by the BFGS or DFP
formula, then if is positive definite ii) implies is also positive definite.
Optimization (mathematics)
In mathematics, computational science, or management science, mathematical optimization refers to the selection of a best element from some set of available alternatives....
problem, the Wolfe conditions are a set of inequalities for performing inexact line search, especially in quasi-Newton methods.
In these methods the idea is to find
for some smooth
Smooth function
In mathematical analysis, a differentiability class is a classification of functions according to the properties of their derivatives. Higher order differentiability classes correspond to the existence of more derivatives. Functions that have derivatives of all orders are called smooth.Most of...
. Each step often involves approximately solving the subproblem
where is the current best guess, is a search direction, and is the step length.
Then inexact line searches provide an efficient way of computing an acceptable step length that reduces the objective function 'sufficiently', rather than minimizing the objective function over exactly. A line search algorithm can use Wolfe conditions as a requirement for any guessed , before finding a new search direction .
Armijo rule and curvature
Denote a univariate function restricted to the direction as . A step length is said to satisfy the Wolfe conditions if the following two inequalities hold:- i) ,
- ii) ,
with . (In examining condition (ii), recall that to ensure that is a descent direction, we have .)
is usually chosen to quite small while is much larger; Nocedal gives example values of and for Newton or quasi-Newton methods and for the nonlinear conjugate gradient method
Conjugate gradient method
In mathematics, the conjugate gradient method is an algorithm for the numerical solution of particular systems of linear equations, namely those whose matrix is symmetric and positive-definite. The conjugate gradient method is an iterative method, so it can be applied to sparse systems that are too...
. Inequality i) is known as the Armijo rule and ii) as the curvature condition; i) ensures that the step length decreases 'sufficiently', and ii) ensures that the slope has been reduced sufficiently.
Strong Wolfe condition on curvature
The Wolfe conditions, however, can result in a value for the step length that is not close to a minimizer of . If we modify the curvature condition to the following,- iia)
then i) and iia) together form the so-called strong Wolfe conditions, and force to lie close to a critical point
Critical point (mathematics)
In calculus, a critical point of a function of a real variable is any value in the domain where either the function is not differentiable or its derivative is 0. The value of the function at a critical point is a critical value of the function...
of .
The principal reason for imposing the Wolfe conditions in an optimization algorithm where is to ensure convergence of the gradient to zero. In particular, if the cosine of the angle between and the gradient,
is bounded away from zero and the i) and ii) hold, then .
An additional motivation, in the case of a quasi-Newton method
Quasi-Newton method
In optimization, quasi-Newton methods are algorithms for finding local maxima and minima of functions. Quasi-Newton methods are based on...
is that if , where the matrix is updated by the BFGS or DFP
Davidon-Fletcher-Powell formula
The Davidon–Fletcher–Powell formula finds the solution to the secant equation that is closest to the current estimate and satisfies the curvature condition . It was the first quasi-Newton method which generalize the secant method to a multidimensional problem...
formula, then if is positive definite ii) implies is also positive definite.