SlideShare a Scribd company logo
1 of 18
Deep Learning
Optimization
Rookie’s Seminar, Jan 2018
Jaehyun Jun
Biointelligence Laboratory
Interdisciplinary Program of Neuro Science, Seoul National Univertisy
http://bi.snu.ac.kr
Contents
8.1 Pure Optimization
8.2 Challenges
8.3 Basic Algorithms
SGD, Momentum, Nesterov momentum
8.5 Adaptive Learning Rates
AdaGrad, RMSProp, AdaDelta,
RMS Prop w Nesterov momentum, Adam
8.4 & 8.7 Strategies
Reference: Ian Goodfellow, Yoshua Bengio and Aaron Courville, Deep Learning, MIT Press, 2016
© 2017, SNU Biointelligence Lab., http://bi.snu.ac.kr 2
Pure Optimization
 Objective: minimizing J is a goal in and of itself.
 We don't know pdata (intractable)
-> we use empirical risk minimization
-> be prone to overfitting
© 2017, SNU Biointelligence Lab., http://bi.snu.ac.kr 3
Differs from Pure Optimization
 Types of optimization algorithm
 batch (deterministic) algorithm: use the entire training set
 stochastic (online) algorithm: use a single example
 minibatch (minibatch stochastic) algorithm: use more than one but
less than all
 select randomly -> prevent biased
4© 2018, SNU Biointelligence Lab., http://bi.snu.ac.kr
Challenges
 Ill-Conditioning
 Second-order Tayler series expansion
 Local minima
5© 2018, SNU Biointelligence Lab., http://bi.snu.ac.kr
Challenges
 Cliff and exploding gradients
 Long-term dependencies
-> vanishing and exploding gradient problem
 Poor correspondence between local and global structure
 the gradient of local minima cannot reach a global solution.
6© 2018, SNU Biointelligence Lab., http://bi.snu.ac.kr
Basic Algorithms
 Stochastic Gradient Descent (SGD)
 Obtain an unbiased estimate of the gradient by taking the average
gradient on a mini batch of m examples drawn i.i.d from the data
generating distribution
7© 2018, SNU Biointelligence Lab., http://bi.snu.ac.kr
Basic Algorithms
 Momentum
 The method of momentum is designed to
solve poor conditioning of the Hessian matrix
and variance in SGD
8© 2018, SNU Biointelligence Lab., http://bi.snu.ac.kr
Basic Algorithms
 Nesterov momentum
 Difference is that with Nesterov momentum the gradient is
evaluated after the current velocity is applied
9© 2018, SNU Biointelligence Lab., http://bi.snu.ac.kr
Adaptive Learning Rates
 AdaGrad
 AdaGrad gives larger weights for gradients of rare terms, and
smaller weights for those of common terms.
10© 2018, SNU Biointelligence Lab., http://bi.snu.ac.kr
Adaptive Learning Rates
 RMSProp
 Perform better in the non-convex setting
 Uses an exponentially decaying average to discard history from the
extreme past so that it can converge rapidly after finding a convex
bowl
11© 2018, SNU Biointelligence Lab., http://bi.snu.ac.kr
Adaptive Learning Rates
 RMSProp with Nesterov Momentum
12© 2018, SNU Biointelligence Lab., http://bi.snu.ac.kr
Adaptive Learning Rates
 AdaDelta
 approximate the second-order optimization instead of the first-order
optimization.
13© 2018, SNU Biointelligence Lab., http://bi.snu.ac.kr
Adaptive Learning Rates
 Adam (RMSProp + momentum)
14
Optimizers
15Ref: http://shuuki4.github.io/deep%20learning/2016/05/20/Gradient-Descent-Algorithm-Overview.html
Strategies
 Initialization
 Point: break symmetry
 random initialization
 large value: strong symmetry breaking effect, but exploding
gradient problem
 random Gram-Schmidt orthogonalization
 Sparse initialization -> strong prior
 initialize with unsupervised model, different task
16© 2018, SNU Biointelligence Lab., http://bi.snu.ac.kr
(Glorot and Bengio, 2010)
Strategies
 Batch Normalization
 hard to choose an appropriate learning rate
 Adaptive reparameterization
 A second-order term might be very small or very large based on wi
 second-order term:
𝐵𝑁 𝒉, 𝛾, 𝛽 = 𝛽 + 𝛾
𝒉 − 𝐸(𝒉)
𝑉𝑎𝑟 𝒉 + 𝜖
17© 2018, SNU Biointelligence Lab., http://bi.snu.ac.kr
Strategies
 Batch Normalization
 Advantage
 reduce the problem of coordinating updates across many layers
 mitigate exploding or vanishing gradient problem
 allow higher learning ra
 reduce the strong dependence on initialization
 act as a regularization method
18© 2018, SNU Biointelligence Lab., http://bi.snu.ac.kr

More Related Content

What's hot

New Ways to Find Latency in Linux Using Tracing
New Ways to Find Latency in Linux Using TracingNew Ways to Find Latency in Linux Using Tracing
New Ways to Find Latency in Linux Using TracingScyllaDB
 
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language UnderstandingBERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language UnderstandingMinh Pham
 
Convolutional Neural Networks
Convolutional Neural NetworksConvolutional Neural Networks
Convolutional Neural NetworksAshray Bhandare
 
Deep Dive into Hyperparameter Tuning
Deep Dive into Hyperparameter TuningDeep Dive into Hyperparameter Tuning
Deep Dive into Hyperparameter TuningShubhmay Potdar
 
Artifical Neural Network and its applications
Artifical Neural Network and its applicationsArtifical Neural Network and its applications
Artifical Neural Network and its applicationsSangeeta Tiwari
 
Introduction to Recurrent Neural Network
Introduction to Recurrent Neural NetworkIntroduction to Recurrent Neural Network
Introduction to Recurrent Neural NetworkYan Xu
 
Autoencoder
AutoencoderAutoencoder
AutoencoderHARISH R
 
AI On the Edge: Model Compression
AI On the Edge: Model CompressionAI On the Edge: Model Compression
AI On the Edge: Model CompressionApache MXNet
 
Artificial Neural Network
Artificial Neural NetworkArtificial Neural Network
Artificial Neural NetworkAtul Krishna
 
Notes on attention mechanism
Notes on attention mechanismNotes on attention mechanism
Notes on attention mechanismKhang Pham
 
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language UnderstandingBERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language Understandinggohyunwoong
 
Introduction to Recurrent Neural Network
Introduction to Recurrent Neural NetworkIntroduction to Recurrent Neural Network
Introduction to Recurrent Neural NetworkKnoldus Inc.
 
Transformer Introduction (Seminar Material)
Transformer Introduction (Seminar Material)Transformer Introduction (Seminar Material)
Transformer Introduction (Seminar Material)Yuta Niki
 
Autoencoders in Deep Learning
Autoencoders in Deep LearningAutoencoders in Deep Learning
Autoencoders in Deep Learningmilad abbasi
 
Deep Learning - Convolutional Neural Networks
Deep Learning - Convolutional Neural NetworksDeep Learning - Convolutional Neural Networks
Deep Learning - Convolutional Neural NetworksChristian Perone
 
Lecture 4: Transformers (Full Stack Deep Learning - Spring 2021)
Lecture 4: Transformers (Full Stack Deep Learning - Spring 2021)Lecture 4: Transformers (Full Stack Deep Learning - Spring 2021)
Lecture 4: Transformers (Full Stack Deep Learning - Spring 2021)Sergey Karayev
 

What's hot (20)

New Ways to Find Latency in Linux Using Tracing
New Ways to Find Latency in Linux Using TracingNew Ways to Find Latency in Linux Using Tracing
New Ways to Find Latency in Linux Using Tracing
 
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language UnderstandingBERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
 
Convolutional Neural Networks
Convolutional Neural NetworksConvolutional Neural Networks
Convolutional Neural Networks
 
Deep Dive into Hyperparameter Tuning
Deep Dive into Hyperparameter TuningDeep Dive into Hyperparameter Tuning
Deep Dive into Hyperparameter Tuning
 
Artifical Neural Network and its applications
Artifical Neural Network and its applicationsArtifical Neural Network and its applications
Artifical Neural Network and its applications
 
Introduction to Recurrent Neural Network
Introduction to Recurrent Neural NetworkIntroduction to Recurrent Neural Network
Introduction to Recurrent Neural Network
 
Autoencoder
AutoencoderAutoencoder
Autoencoder
 
AI On the Edge: Model Compression
AI On the Edge: Model CompressionAI On the Edge: Model Compression
AI On the Edge: Model Compression
 
Artificial Neural Network
Artificial Neural NetworkArtificial Neural Network
Artificial Neural Network
 
Notes on attention mechanism
Notes on attention mechanismNotes on attention mechanism
Notes on attention mechanism
 
Autoencoder
AutoencoderAutoencoder
Autoencoder
 
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language UnderstandingBERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
 
Introduction to Recurrent Neural Network
Introduction to Recurrent Neural NetworkIntroduction to Recurrent Neural Network
Introduction to Recurrent Neural Network
 
Transformer Introduction (Seminar Material)
Transformer Introduction (Seminar Material)Transformer Introduction (Seminar Material)
Transformer Introduction (Seminar Material)
 
Neural network
Neural networkNeural network
Neural network
 
Neural Networks
Neural NetworksNeural Networks
Neural Networks
 
Autoencoders in Deep Learning
Autoencoders in Deep LearningAutoencoders in Deep Learning
Autoencoders in Deep Learning
 
Deep Learning - Convolutional Neural Networks
Deep Learning - Convolutional Neural NetworksDeep Learning - Convolutional Neural Networks
Deep Learning - Convolutional Neural Networks
 
rnn BASICS
rnn BASICSrnn BASICS
rnn BASICS
 
Lecture 4: Transformers (Full Stack Deep Learning - Spring 2021)
Lecture 4: Transformers (Full Stack Deep Learning - Spring 2021)Lecture 4: Transformers (Full Stack Deep Learning - Spring 2021)
Lecture 4: Transformers (Full Stack Deep Learning - Spring 2021)
 

Similar to Deep Learning - Optimization Basic

Optimization in deep learning
Optimization in deep learningOptimization in deep learning
Optimization in deep learningJeremy Nixon
 
Estimating Parameter of Nonlinear Bias Correction Method using NSGA-II in Dai...
Estimating Parameter of Nonlinear Bias Correction Method using NSGA-II in Dai...Estimating Parameter of Nonlinear Bias Correction Method using NSGA-II in Dai...
Estimating Parameter of Nonlinear Bias Correction Method using NSGA-II in Dai...TELKOMNIKA JOURNAL
 
jStanley: Placing a Green Thumb on Java Collections
jStanley: Placing a Green Thumb on  Java CollectionsjStanley: Placing a Green Thumb on  Java Collections
jStanley: Placing a Green Thumb on Java CollectionsJácome Cunha
 
Optimization of Unit Commitment Problem using Classical Soft Computing Techni...
Optimization of Unit Commitment Problem using Classical Soft Computing Techni...Optimization of Unit Commitment Problem using Classical Soft Computing Techni...
Optimization of Unit Commitment Problem using Classical Soft Computing Techni...IRJET Journal
 
IRJET- Wind Energy Storage Prediction using Machine Learning
IRJET- Wind Energy Storage Prediction using Machine LearningIRJET- Wind Energy Storage Prediction using Machine Learning
IRJET- Wind Energy Storage Prediction using Machine LearningIRJET Journal
 
XGBOOST [Autosaved]12.pptx
XGBOOST [Autosaved]12.pptxXGBOOST [Autosaved]12.pptx
XGBOOST [Autosaved]12.pptxyadav834181
 
Performance improvement of a Rainfall Prediction Model using Particle Swarm O...
Performance improvement of a Rainfall Prediction Model using Particle Swarm O...Performance improvement of a Rainfall Prediction Model using Particle Swarm O...
Performance improvement of a Rainfall Prediction Model using Particle Swarm O...ijceronline
 
GRADIENT OMISSIVE DESCENT IS A MINIMIZATION ALGORITHM
GRADIENT OMISSIVE DESCENT IS A MINIMIZATION ALGORITHMGRADIENT OMISSIVE DESCENT IS A MINIMIZATION ALGORITHM
GRADIENT OMISSIVE DESCENT IS A MINIMIZATION ALGORITHMijscai
 
Short Term Electrical Load Forecasting by Artificial Neural Network
Short Term Electrical Load Forecasting by Artificial Neural NetworkShort Term Electrical Load Forecasting by Artificial Neural Network
Short Term Electrical Load Forecasting by Artificial Neural NetworkIJERA Editor
 
ANALYSIS AND PREDICTION OF RAINFALL USING MACHINE LEARNING TECHNIQUES
ANALYSIS AND PREDICTION OF RAINFALL USING MACHINE LEARNING TECHNIQUESANALYSIS AND PREDICTION OF RAINFALL USING MACHINE LEARNING TECHNIQUES
ANALYSIS AND PREDICTION OF RAINFALL USING MACHINE LEARNING TECHNIQUESIRJET Journal
 
Adapted Branch-and-Bound Algorithm Using SVM With Model Selection
Adapted Branch-and-Bound Algorithm Using SVM With Model SelectionAdapted Branch-and-Bound Algorithm Using SVM With Model Selection
Adapted Branch-and-Bound Algorithm Using SVM With Model SelectionIJECEIAES
 
Predictive Data Mining with Normalized Adaptive Training Method for Neural Ne...
Predictive Data Mining with Normalized Adaptive Training Method for Neural Ne...Predictive Data Mining with Normalized Adaptive Training Method for Neural Ne...
Predictive Data Mining with Normalized Adaptive Training Method for Neural Ne...IJERDJOURNAL
 
IRJET- Efficient JPEG Reconstruction using Bayesian MAP and BFMT
IRJET-  	  Efficient JPEG Reconstruction using Bayesian MAP and BFMTIRJET-  	  Efficient JPEG Reconstruction using Bayesian MAP and BFMT
IRJET- Efficient JPEG Reconstruction using Bayesian MAP and BFMTIRJET Journal
 
IRJET - Intelligent Weather Forecasting using Machine Learning Techniques
IRJET -  	  Intelligent Weather Forecasting using Machine Learning TechniquesIRJET -  	  Intelligent Weather Forecasting using Machine Learning Techniques
IRJET - Intelligent Weather Forecasting using Machine Learning TechniquesIRJET Journal
 

Similar to Deep Learning - Optimization Basic (20)

Dnn guidelines
Dnn guidelinesDnn guidelines
Dnn guidelines
 
Mask R-CNN
Mask R-CNNMask R-CNN
Mask R-CNN
 
Optimization in deep learning
Optimization in deep learningOptimization in deep learning
Optimization in deep learning
 
Estimating Parameter of Nonlinear Bias Correction Method using NSGA-II in Dai...
Estimating Parameter of Nonlinear Bias Correction Method using NSGA-II in Dai...Estimating Parameter of Nonlinear Bias Correction Method using NSGA-II in Dai...
Estimating Parameter of Nonlinear Bias Correction Method using NSGA-II in Dai...
 
jStanley: Placing a Green Thumb on Java Collections
jStanley: Placing a Green Thumb on  Java CollectionsjStanley: Placing a Green Thumb on  Java Collections
jStanley: Placing a Green Thumb on Java Collections
 
08039246
0803924608039246
08039246
 
Optimization of Unit Commitment Problem using Classical Soft Computing Techni...
Optimization of Unit Commitment Problem using Classical Soft Computing Techni...Optimization of Unit Commitment Problem using Classical Soft Computing Techni...
Optimization of Unit Commitment Problem using Classical Soft Computing Techni...
 
IRJET- Wind Energy Storage Prediction using Machine Learning
IRJET- Wind Energy Storage Prediction using Machine LearningIRJET- Wind Energy Storage Prediction using Machine Learning
IRJET- Wind Energy Storage Prediction using Machine Learning
 
40120140507002
4012014050700240120140507002
40120140507002
 
40120140507002
4012014050700240120140507002
40120140507002
 
XGBOOST [Autosaved]12.pptx
XGBOOST [Autosaved]12.pptxXGBOOST [Autosaved]12.pptx
XGBOOST [Autosaved]12.pptx
 
Performance improvement of a Rainfall Prediction Model using Particle Swarm O...
Performance improvement of a Rainfall Prediction Model using Particle Swarm O...Performance improvement of a Rainfall Prediction Model using Particle Swarm O...
Performance improvement of a Rainfall Prediction Model using Particle Swarm O...
 
GRADIENT OMISSIVE DESCENT IS A MINIMIZATION ALGORITHM
GRADIENT OMISSIVE DESCENT IS A MINIMIZATION ALGORITHMGRADIENT OMISSIVE DESCENT IS A MINIMIZATION ALGORITHM
GRADIENT OMISSIVE DESCENT IS A MINIMIZATION ALGORITHM
 
Short Term Electrical Load Forecasting by Artificial Neural Network
Short Term Electrical Load Forecasting by Artificial Neural NetworkShort Term Electrical Load Forecasting by Artificial Neural Network
Short Term Electrical Load Forecasting by Artificial Neural Network
 
ANALYSIS AND PREDICTION OF RAINFALL USING MACHINE LEARNING TECHNIQUES
ANALYSIS AND PREDICTION OF RAINFALL USING MACHINE LEARNING TECHNIQUESANALYSIS AND PREDICTION OF RAINFALL USING MACHINE LEARNING TECHNIQUES
ANALYSIS AND PREDICTION OF RAINFALL USING MACHINE LEARNING TECHNIQUES
 
solar air heater Using ANN
solar air heater Using ANNsolar air heater Using ANN
solar air heater Using ANN
 
Adapted Branch-and-Bound Algorithm Using SVM With Model Selection
Adapted Branch-and-Bound Algorithm Using SVM With Model SelectionAdapted Branch-and-Bound Algorithm Using SVM With Model Selection
Adapted Branch-and-Bound Algorithm Using SVM With Model Selection
 
Predictive Data Mining with Normalized Adaptive Training Method for Neural Ne...
Predictive Data Mining with Normalized Adaptive Training Method for Neural Ne...Predictive Data Mining with Normalized Adaptive Training Method for Neural Ne...
Predictive Data Mining with Normalized Adaptive Training Method for Neural Ne...
 
IRJET- Efficient JPEG Reconstruction using Bayesian MAP and BFMT
IRJET-  	  Efficient JPEG Reconstruction using Bayesian MAP and BFMTIRJET-  	  Efficient JPEG Reconstruction using Bayesian MAP and BFMT
IRJET- Efficient JPEG Reconstruction using Bayesian MAP and BFMT
 
IRJET - Intelligent Weather Forecasting using Machine Learning Techniques
IRJET -  	  Intelligent Weather Forecasting using Machine Learning TechniquesIRJET -  	  Intelligent Weather Forecasting using Machine Learning Techniques
IRJET - Intelligent Weather Forecasting using Machine Learning Techniques
 

Recently uploaded

办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一fhwihughh
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts ServiceSapana Sha
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett
 
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptxAmazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptxAbdelrhman abooda
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改atducpo
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceSapana Sha
 
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...ThinkInnovation
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubaihf8803863
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Sapana Sha
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...Suhani Kapoor
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...soniya singh
 

Recently uploaded (20)

办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts Service
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdf
 
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptxAmazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts Service
 
Decoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in ActionDecoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in Action
 
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
 

Deep Learning - Optimization Basic

  • 1. Deep Learning Optimization Rookie’s Seminar, Jan 2018 Jaehyun Jun Biointelligence Laboratory Interdisciplinary Program of Neuro Science, Seoul National Univertisy http://bi.snu.ac.kr
  • 2. Contents 8.1 Pure Optimization 8.2 Challenges 8.3 Basic Algorithms SGD, Momentum, Nesterov momentum 8.5 Adaptive Learning Rates AdaGrad, RMSProp, AdaDelta, RMS Prop w Nesterov momentum, Adam 8.4 & 8.7 Strategies Reference: Ian Goodfellow, Yoshua Bengio and Aaron Courville, Deep Learning, MIT Press, 2016 © 2017, SNU Biointelligence Lab., http://bi.snu.ac.kr 2
  • 3. Pure Optimization  Objective: minimizing J is a goal in and of itself.  We don't know pdata (intractable) -> we use empirical risk minimization -> be prone to overfitting © 2017, SNU Biointelligence Lab., http://bi.snu.ac.kr 3
  • 4. Differs from Pure Optimization  Types of optimization algorithm  batch (deterministic) algorithm: use the entire training set  stochastic (online) algorithm: use a single example  minibatch (minibatch stochastic) algorithm: use more than one but less than all  select randomly -> prevent biased 4© 2018, SNU Biointelligence Lab., http://bi.snu.ac.kr
  • 5. Challenges  Ill-Conditioning  Second-order Tayler series expansion  Local minima 5© 2018, SNU Biointelligence Lab., http://bi.snu.ac.kr
  • 6. Challenges  Cliff and exploding gradients  Long-term dependencies -> vanishing and exploding gradient problem  Poor correspondence between local and global structure  the gradient of local minima cannot reach a global solution. 6© 2018, SNU Biointelligence Lab., http://bi.snu.ac.kr
  • 7. Basic Algorithms  Stochastic Gradient Descent (SGD)  Obtain an unbiased estimate of the gradient by taking the average gradient on a mini batch of m examples drawn i.i.d from the data generating distribution 7© 2018, SNU Biointelligence Lab., http://bi.snu.ac.kr
  • 8. Basic Algorithms  Momentum  The method of momentum is designed to solve poor conditioning of the Hessian matrix and variance in SGD 8© 2018, SNU Biointelligence Lab., http://bi.snu.ac.kr
  • 9. Basic Algorithms  Nesterov momentum  Difference is that with Nesterov momentum the gradient is evaluated after the current velocity is applied 9© 2018, SNU Biointelligence Lab., http://bi.snu.ac.kr
  • 10. Adaptive Learning Rates  AdaGrad  AdaGrad gives larger weights for gradients of rare terms, and smaller weights for those of common terms. 10© 2018, SNU Biointelligence Lab., http://bi.snu.ac.kr
  • 11. Adaptive Learning Rates  RMSProp  Perform better in the non-convex setting  Uses an exponentially decaying average to discard history from the extreme past so that it can converge rapidly after finding a convex bowl 11© 2018, SNU Biointelligence Lab., http://bi.snu.ac.kr
  • 12. Adaptive Learning Rates  RMSProp with Nesterov Momentum 12© 2018, SNU Biointelligence Lab., http://bi.snu.ac.kr
  • 13. Adaptive Learning Rates  AdaDelta  approximate the second-order optimization instead of the first-order optimization. 13© 2018, SNU Biointelligence Lab., http://bi.snu.ac.kr
  • 14. Adaptive Learning Rates  Adam (RMSProp + momentum) 14
  • 16. Strategies  Initialization  Point: break symmetry  random initialization  large value: strong symmetry breaking effect, but exploding gradient problem  random Gram-Schmidt orthogonalization  Sparse initialization -> strong prior  initialize with unsupervised model, different task 16© 2018, SNU Biointelligence Lab., http://bi.snu.ac.kr (Glorot and Bengio, 2010)
  • 17. Strategies  Batch Normalization  hard to choose an appropriate learning rate  Adaptive reparameterization  A second-order term might be very small or very large based on wi  second-order term: 𝐵𝑁 𝒉, 𝛾, 𝛽 = 𝛽 + 𝛾 𝒉 − 𝐸(𝒉) 𝑉𝑎𝑟 𝒉 + 𝜖 17© 2018, SNU Biointelligence Lab., http://bi.snu.ac.kr
  • 18. Strategies  Batch Normalization  Advantage  reduce the problem of coordinating updates across many layers  mitigate exploding or vanishing gradient problem  allow higher learning ra  reduce the strong dependence on initialization  act as a regularization method 18© 2018, SNU Biointelligence Lab., http://bi.snu.ac.kr