Optimization/Gradient Descent

•Download as PPTX, PDF•

3 likes•3,127 views

kandelin

General Overiew

Data & Analytics

Optimization/Gradient Descent
Kyle Andelin

Optimization?
• Selecting the “best” elements of
some set given a problem
• E.g. the #hours spent studying
class1, class2, and class3 to
maximize gpa

Formally
• 𝑦 = 𝑓 𝑥1, 𝑥2, … , 𝑥 𝑛
• Find 𝑥1, 𝑥2, … , 𝑥 𝑛 that maximizes or
minimizes y
• Usually a cost/loss function or a
profit/likelihood function

Gradient
• Single variable:
– The derivative: slope of the tangent line at a point
𝑥0

Gradient
• Multivariable:
– 𝛻𝑓 =
𝜕𝑓
𝜕𝑥1
,
𝜕𝑓
𝜕𝑥2
, … ,
𝜕𝑓
𝜕𝑥 𝑛
– A vector of partial derivatives with respect to each
of the independent variables
• 𝛻𝑓 points in the direction of greatest rate of
change or “steepest ascent”
• Magnitude (or length) of 𝛻𝑓 is the greatest
rate of change

The general idea
• We have k parameters 𝜃1, 𝜃2, … , 𝜃 𝑘we’d like to
train for a model – with respect to some
error/loss function 𝐽(𝜃1, … , 𝜃 𝑘) to be minimized
• Gradient descent is one way to iteratively
determine the optimal set of parameter values:
1. Initialize parameters
2. Keep changing values to reduce 𝐽(𝜃1, … , 𝜃 𝑘)
– 𝛻𝐽 tells us which direction increases 𝐽 the most
– We go in the opposite direction of 𝛻𝐽

To actually descend…
Set initial parameter values 𝜃1
0
, … , 𝜃 𝑘
0
while(not converged) {
calculate 𝛻𝐽 (i.e. evaluate
𝜕𝐽
𝜕𝜃1
, … ,
𝜕𝐽
𝜕𝜃 𝑘
)
do {
𝜃1 ≔ 𝜃1 − α
𝜕𝐽
𝜕𝜃1
𝜃2 ≔ 𝜃2 − α
𝜕𝐽
𝜕𝜃2
⋮
𝜃 𝑘 ≔ 𝜃 𝑘 − α
𝜕𝐽
𝜕𝜃 𝑘
}
}
Where α is the ‘learning rate’ or ‘step size’
- Small enough α ensures 𝐽 𝜃1
𝑖
, … , 𝜃 𝑘
𝑖
≤ 𝐽(𝜃1
𝑖−1
, … , 𝜃 𝑘
𝑖−1
)

After each iteration:
Picture credit: Andrew Ng, Stanford University, Coursera Machine Learning, Lecture 2 Slides

Issues
• Convex objective function guarantees
convergence to global minimum
• Non-convexity brings the possibility of getting
stuck in a local minimum
– Different, randomized starting values can fight this

Initial Values and Convergence
Picture credit: Andrew Ng, Stanford University, Coursera Machine Learning, Lecture 2 Slides

Issues cont.
• Convergence can be slow
– Larger learning rate α can speed things up, but
with too large of α, optimums can be ‘jumped’ or
skipped over - requiring more iterations
– Too small of a step size will keep convergence slow
– Can be combined with a line search to find the
optimal α on every iteration

What's hot

Support Vector Machines ( SVM ) Mohammad Junaid Khan

Hyperparameter TuningJon Lederman

backpropagation in neural networksAkash Goel

Ensemble methods in machine learningSANTHOSH RAJA M G

Perceptron (neural network)EdutechLearners

Deep Feed Forward Neural Networks and RegularizationYan Xu

Decision trees in Machine Learning Mohammad Junaid Khan

Decision Tree LearningMilind Gokhale

Image segmentation with deep learningAntonio Rueda-Toicen

Perceptron & Neural NetworksNAGUR SHAREEF SHAIK

Regularization in deep learningKien Le

2.5 backpropagationKrish_ver2

Instance based learningSlideshare

Optimization in Deep LearningYan Xu

Reinforcement learning Chandra Meena

Linear regression with gradient descentSuraj Parmar

Gradient descent optimizerHojin Yang

Activation functionAstha Jain

Introduction to CNNShuai Zhang

Training Neural NetworksDatabricks

What's hot (20)

Support Vector Machines ( SVM )

Hyperparameter Tuning

backpropagation in neural networks

Ensemble methods in machine learning

Perceptron (neural network)

Deep Feed Forward Neural Networks and Regularization

Decision trees in Machine Learning

Decision Tree Learning

Image segmentation with deep learning

Perceptron & Neural Networks

Regularization in deep learning

2.5 backpropagation

Instance based learning

Optimization in Deep Learning

Reinforcement learning

Linear regression with gradient descent

Gradient descent optimizer

Activation function

Introduction to CNN

Training Neural Networks

Viewers also liked

CS571: Gradient DescentJinho Choi

Using Gradient Descent for Optimization and LearningDr. Volkan OBAN

Gradient Descent, Back Propagation, and Auto Differentiation - Advanced Spark...Chris Fregly

Maple Code for Steepest DescentJeremy Lane

Art2ESCOM

07 logistic regression and stochastic gradient descentSubhas Kumar Ghosh

Companies gone open: adoption and application of Open-Source Business ModelsRoberto Medico

Super Resolution для сжатого видеоMSU GML VideoGroup

UbiTeach: Methods for Augumented TeachingRoberto Medico

Gradient Steepest method application on Griewank Function Imane Haf

02.03 Artificial Intelligence: Search by OptimizationAndres Mendez-Vazquez

Super resolution from a single imageLakkhana Mallikarachchi

Neural Networks: Least Mean Square (LSM) AlgorithmMostafa G. M. Mostafa

Learning to learn by gradient descent by gradient descentHiroyuki Fukuda

Super Resolution in Digital Image processingRamrao Desai

Hill-climbing #2Mohamed Gad

IT Security Presentationelihuwalker

Hillclimbing search algorthim #introductionMohamed Gad

NIPS 2016 Highlights - Sebastian RuderSebastian Ruder

Single Image Super-Resolution from Transformed Self-Exemplars (CVPR 2015)Jia-Bin Huang

Viewers also liked (20)

CS571: Gradient Descent

Using Gradient Descent for Optimization and Learning

Gradient Descent, Back Propagation, and Auto Differentiation - Advanced Spark...

Maple Code for Steepest Descent

Art2

07 logistic regression and stochastic gradient descent

Companies gone open: adoption and application of Open-Source Business Models

Super Resolution для сжатого видео

UbiTeach: Methods for Augumented Teaching

Gradient Steepest method application on Griewank Function

02.03 Artificial Intelligence: Search by Optimization

Super resolution from a single image

Neural Networks: Least Mean Square (LSM) Algorithm

Learning to learn by gradient descent by gradient descent

Super Resolution in Digital Image processing

Hill-climbing #2

IT Security Presentation

Hillclimbing search algorthim #introduction

NIPS 2016 Highlights - Sebastian Ruder

Single Image Super-Resolution from Transformed Self-Exemplars (CVPR 2015)

Similar to Optimization/Gradient Descent

Paper Study: Melding the data decision pipelineChenYiHuang5

Show, Attend and Tell: Neural Image Caption Generation with Visual AttentionEun Ji Lee

Training DNN Models - II.pptxPrabhuSelvaraj15

K-Means Clustering SimplyEmad Nabil

Learning stochastic neural networks with ChainerSeiya Tokui

Python week 4 2019 2020 for g10 by eng.osama ghandourOsama Ghandour Geris

Multilayer Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intell...Universitat Politècnica de Catalunya

Demystifying deep reinforement learning재연 윤

NeurIPS22.pptxJulián Tachella

19 - Neural Networks I.pptxEmanAl15

[Paper Reading] Attention is All You NeedDaiki Tanaka

DeepLearningLecture.pptxssuserf07225

Paper Study: OptNet: Differentiable Optimization as a Layer in Neural NetworksChenYiHuang5

Deep Learning for SearchBhaskar Mitra

[GAN by Hung-yi Lee]Part 1: General introduction of GANNAVER Engineering

Reinforcement Learning and Artificial Neural NetsPierre de Lacaze

Neural network basic and introduction of Deep learningTapas Majumdar

Introduction to Deep Reinforcement LearningIDEAS - Int'l Data Engineering and Science Association

Deep Learning for SearchBhaskar Mitra

AAC ch 3 Advance strategies (Dynamic Programming).pptxHarshitSingh334328

Similar to Optimization/Gradient Descent (20)

Paper Study: Melding the data decision pipeline

Show, Attend and Tell: Neural Image Caption Generation with Visual Attention

Training DNN Models - II.pptx

K-Means Clustering Simply

Learning stochastic neural networks with Chainer

Python week 4 2019 2020 for g10 by eng.osama ghandour

Multilayer Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intell...

Demystifying deep reinforement learning

NeurIPS22.pptx

19 - Neural Networks I.pptx

[Paper Reading] Attention is All You Need

DeepLearningLecture.pptx

Paper Study: OptNet: Differentiable Optimization as a Layer in Neural Networks

Deep Learning for Search

[GAN by Hung-yi Lee]Part 1: General introduction of GAN

Reinforcement Learning and Artificial Neural Nets

Neural network basic and introduction of Deep learning

Introduction to Deep Reinforcement Learning

Deep Learning for Search

AAC ch 3 Advance strategies (Dynamic Programming).pptx

Recently uploaded

SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...Elaine Werffeli

Cytotec in Jeddah+966572737505) get unwanted pregnancy kit RiyadhAbortion pills in Riyadh +966572737505 get cytotec

+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...Health

怎样办理圣地亚哥州立大学毕业证（SDSU毕业证书）成绩单学校原版复制vexqp

Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Riyadh +966572737505 get cytotec

DATA SUMMIT 24 Building Real-Time Pipelines With FLaNKTimothy Spann

Ranking and Scoring Exercises for ResearchRajesh Mondal

一比一原版(曼大毕业证书）曼尼托巴大学毕业证成绩单留信学历认证一手价格q6pzkpark

Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangeThinkInnovation

Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...nirzagarg

7. Epi of Chronic respiratory diseases.pptibrahimabdi22

The-boAt-Story-Navigating-the-Waves-of-Innovation.pptxVivek487417

Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...gajnagarg

Reconciling Conflicting Data Curation Actions: Transparency Through Argument...Bertram Ludäscher

Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...gajnagarg

怎样办理纽约州立大学宾汉姆顿分校毕业证（SUNY-Bin毕业证书）成绩单学校原版复制vexqp

Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowgargpaaro

Gartner's Data Analytics Maturity Model.pptxchadhar227

5CL-ADBA,5cladba, Chinese supplier, safety is guaranteedamy56318795

Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...nirzagarg

Recently uploaded (20)

SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...

Cytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh

+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...

怎样办理圣地亚哥州立大学毕业证（SDSU毕业证书）成绩单学校原版复制

Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec

DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK

Ranking and Scoring Exercises for Research

一比一原版(曼大毕业证书）曼尼托巴大学毕业证成绩单留信学历认证一手价格

Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange

Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...

7. Epi of Chronic respiratory diseases.ppt

The-boAt-Story-Navigating-the-Waves-of-Innovation.pptx

Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...

Reconciling Conflicting Data Curation Actions: Transparency Through Argument...

Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...

怎样办理纽约州立大学宾汉姆顿分校毕业证（SUNY-Bin毕业证书）成绩单学校原版复制

Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now

Gartner's Data Analytics Maturity Model.pptx

5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed

Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...

Optimization/Gradient Descent

1. Optimization/Gradient Descent Kyle Andelin

2. Optimization? • Selecting the “best” elements of some set given a problem • E.g. the #hours spent studying class1, class2, and class3 to maximize gpa

3. Formally • 𝑦 = 𝑓 𝑥1, 𝑥2, … , 𝑥 𝑛 • Find 𝑥1, 𝑥2, … , 𝑥 𝑛 that maximizes or minimizes y • Usually a cost/loss function or a profit/likelihood function

4. How?

5. Calculus - Geometrically

6. Global/Local Max/Mins

7. Gradient • Single variable: – The derivative: slope of the tangent line at a point 𝑥0

8. Gradient • Multivariable: – 𝛻𝑓 = 𝜕𝑓 𝜕𝑥1 , 𝜕𝑓 𝜕𝑥2 , … , 𝜕𝑓 𝜕𝑥 𝑛 – A vector of partial derivatives with respect to each of the independent variables • 𝛻𝑓 points in the direction of greatest rate of change or “steepest ascent” • Magnitude (or length) of 𝛻𝑓 is the greatest rate of change

9. Gradient

10. The general idea • We have k parameters 𝜃1, 𝜃2, … , 𝜃 𝑘we’d like to train for a model – with respect to some error/loss function 𝐽(𝜃1, … , 𝜃 𝑘) to be minimized • Gradient descent is one way to iteratively determine the optimal set of parameter values: 1. Initialize parameters 2. Keep changing values to reduce 𝐽(𝜃1, … , 𝜃 𝑘) – 𝛻𝐽 tells us which direction increases 𝐽 the most – We go in the opposite direction of 𝛻𝐽

11. To actually descend… Set initial parameter values 𝜃1 0 , … , 𝜃 𝑘 0 while(not converged) { calculate 𝛻𝐽 (i.e. evaluate 𝜕𝐽 𝜕𝜃1 , … , 𝜕𝐽 𝜕𝜃 𝑘 ) do { 𝜃1 ≔ 𝜃1 − α 𝜕𝐽 𝜕𝜃1 𝜃2 ≔ 𝜃2 − α 𝜕𝐽 𝜕𝜃2 ⋮ 𝜃 𝑘 ≔ 𝜃 𝑘 − α 𝜕𝐽 𝜕𝜃 𝑘 } } Where α is the ‘learning rate’ or ‘step size’ - Small enough α ensures 𝐽 𝜃1 𝑖 , … , 𝜃 𝑘 𝑖 ≤ 𝐽(𝜃1 𝑖−1 , … , 𝜃 𝑘 𝑖−1 )

12. After each iteration: Picture credit: Andrew Ng, Stanford University, Coursera Machine Learning, Lecture 2 Slides

13. After each iteration: Picture credit: Andrew Ng, Stanford University, Coursera Machine Learning, Lecture 2 Slides

14. After each iteration: Picture credit: Andrew Ng, Stanford University, Coursera Machine Learning, Lecture 2 Slides

15. After each iteration: Picture credit: Andrew Ng, Stanford University, Coursera Machine Learning, Lecture 2 Slides

16. After each iteration: Picture credit: Andrew Ng, Stanford University, Coursera Machine Learning, Lecture 2 Slides

17. After each iteration: Picture credit: Andrew Ng, Stanford University, Coursera Machine Learning, Lecture 2 Slides

18. After each iteration: Picture credit: Andrew Ng, Stanford University, Coursera Machine Learning, Lecture 2 Slides

19. After each iteration: Picture credit: Andrew Ng, Stanford University, Coursera Machine Learning, Lecture 2 Slides

20. Issues • Convex objective function guarantees convergence to global minimum • Non-convexity brings the possibility of getting stuck in a local minimum – Different, randomized starting values can fight this

21. Initial Values and Convergence Picture credit: Andrew Ng, Stanford University, Coursera Machine Learning, Lecture 2 Slides

22. Initial Values and Convergence Picture credit: Andrew Ng, Stanford University, Coursera Machine Learning, Lecture 2 Slides

23. Issues cont. • Convergence can be slow – Larger learning rate α can speed things up, but with too large of α, optimums can be ‘jumped’ or skipped over - requiring more iterations – Too small of a step size will keep convergence slow – Can be combined with a line search to find the optimal α on every iteration

Editor's Notes

Note: Draw simple 1 variable graph on board to illustrate

Optimization/Gradient Descent

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Optimization/Gradient Descent

Similar to Optimization/Gradient Descent (20)

Recently uploaded

Recently uploaded (20)

Optimization/Gradient Descent

Editor's Notes