2. INTRODUCTION
An algorithm for finding the nearest local minimum
of a function which presupposes that the gradient of
the function can be computed.
The method of steepest descent is also called the
gradient descent method starts at point P(0) and,
as many time as needed.
It moves from point P(i) to P(i+1) by minimizing
along the line extending from P(i) in the direction of
-<delta> function of P(i).
4. A DRAWBACK IN THE METHOD
This method has the severe drawback of requiring
a great many iterations for functions which have
long, valley structures. In such cases, a conjugate
gradient method is preferable.
To find a local minimum of a function using gradient
descent, one takes steps proportional to the
negative of the gradient for of the approximate
gradient of the function at the current point.
If instead one takes steps proportional to the
positive of the gradient, one approaches a local
maximum of that function ; the procedure is then
known as gradient ascent.
6. A GOOD EXAMPLE
In the above plot you
can see the function to
be minimized and the
points at each iteration
of the gradient descent
.if you increase too
much ,the method will
not converge
7. THE BAD EXAMPLE:
There is a chronical
problem to the gradient
descent .for function
that have valleys(in the
case of descent) or
saddle points (in the
case of ascent). The
gradient
descent/ascent
algorithm zig -zags ,
because the gradient is
nearly orthogonal .
8. THE UGLY:
Imagine the ugliest example you can think of.
Draw it on your notebook.
Compare it to the guy next to you.
Ugliest example wins.
9. ESTIMATING STEP SIZE
A wrong step size may not reach convergence,so a
careful selection of the step size is important.
Too large it will diverge,too small it will take a long
time to coverge.
One option is to choose a fixed step size that will
assure covergence wherever you start gradient
descent.
Another option is choose a different step size at
each iteration (adaptive step size).
10. MAXIMUM STEP SIZE FOR CONVERGENCE
Any differentiable function has a maximum derivative
value, the maximum of the derivatives at all points. If
this maximum is not infinite, this value is known as the
lipschitc constant and the function is lipschitz
continuous.
o This constant is important because it says that,
given a certain function, any derivative will have a
smaller value than the lipschitc constant.
||f(x)-f(y)||||x-y||<=L(f),for any x,y
11. The same can be said for the gradient of the
function: if the maximum second derivative is finite ,
the function is lipschitz continues gradient and that
value is the lipschitiz constant of
For the f(x)=x2 example, the derivative is
df(x)/dx=2x and therefore the function is not
lipschitz continuous.
But the second derivative is d2f(x)/dx2=2,and the
function is lipschitz continuous gradient with
lipschitz constant .