Ali Madani
https://www.linkedin.com/in/amlearning/
Introduction to optimization
for deep learning
gradient descent
2
Obtaining the parameters in the direction of
maximum variation:
Cost function
Learning rate
gradient descent
3
Obtaining the parameters in the direction of
maximum variation:
Cost function
Learning rate
Changing the
parameters to minimize
the cost
gradient descent
4
Obtaining the parameters in the direction of
maximum variation:
Cost function
Learning rate
Summation over
all data points
● slow
● intractable
Stochastic gradient descent
5
Obtaining the parameters in the direction of
maximum variation:
Cost function
Learning rate
Stochastic gradient descent
6
Obtaining the parameters in the direction of
maximum variation:
Issue solved: parameter update for each training example
● Objective function fluctuates
○ Maybe get to a better local minima faster (by jumps)
Cost function
Learning rate
Comparison of gradient descent and stochastic
gradient descent
7
Mini-batch gradient descent
8
Let’s get the middle
Cost function
Learning rate
Mini-batch gradient descent
9
Let’s get the middle
Cost function
Learning rate
Update for mini-batches
of n training examples
Please share what you
learned with others

Introduction to optimization for deep learning