SlideShare a Scribd company logo
1 of 28
Interaction Lab. Kumoh National Institute of Technology
Deep Learning from Scratch
chapter 6. Learning-related skills
JaeYeop Jeong
■Intro
■Optimizer
■Initial Value of Weight
■Overcome of Overfitting
■Hyper Parameter
Agenda
Interaction Lab., Kumoh National Institue of Technology 2
■Optimization
 A parameter that reduces the value of the loss function
• Gradient
 SGD
• 𝑊 = 𝑊 − η
𝜎𝐿
𝜎𝑊
Intro(1/4)
Interaction Lab., Kumoh National Institue of Technology 3
■SGD
 𝑓 𝑥, 𝑦 =
1
20
𝑥2
+ 𝑦2
Intro(2/4)
Interaction Lab., Kumoh National Institue of Technology 4
■SGD
 (-7, 2) start
Intro(3/4)
Interaction Lab., Kumoh National Institue of Technology 5
■SGD
Intro(4/4)
Interaction Lab., Kumoh National Institue of Technology 6
■Momentum
 𝑣 ← 𝑎𝑣 − η
𝜎𝐿
𝜎𝑊
 W ← 𝑊 + 𝑣
Optimizer(1/3)
Interaction Lab., Kumoh National Institue of Technology 7
■AdaGrad
 ℎ ← ℎ +
𝜎𝐿
𝜎𝑊
∙
𝜎𝐿
𝜎𝑊
 𝑊 ← 𝑊 − η
1
ℎ
∙
𝜎𝐿
𝜎𝑊
 Learning rate decay
 RMSProp
Optimizer(2/3)
Interaction Lab., Kumoh National Institue of Technology 8
■Adam
 AdaGrad + Momentum
Optimizer(3/3)
Interaction Lab., Kumoh National Institue of Technology 9
■In case of 0
 Bad idea
• All weights are updated equally in backpropagation
• Learning is not working effectively
Initial value of weight(1/11)
Interaction Lab., Kumoh National Institue of Technology 10
■In case of
 Using sigmoid
 Normal distribution with 1 standard deviation
 Gradient vanishing
Initial value of weight(2/11)
Interaction Lab., Kumoh National Institue of Technology 11
■In case of
 Using sigmoid
 Normal distribution with 0.01 standard deviation
 Representation spectrum
Initial value of weight(3/11)
Interaction Lab., Kumoh National Institue of Technology 12
■In case of Xavier value
 Using sigmoid
 Before N node
 Normal distribution with
1
𝑛
standard deviation
Initial value of weight(4/11)
Interaction Lab., Kumoh National Institue of Technology 13
■In case of He value
 Using ReLU
 Before N node
 Normal distribution with
2
𝑛
standard deviation
Initial value of weight(5/11)
Interaction Lab., Kumoh National Institue of Technology 14
■Batch normalization
 Force distribution of activation values
 Learning speed improvement
 Does not depend on initial value
 Suppression of overfitting
Initial value of weight(6/11)
Interaction Lab., Kumoh National Institue of Technology 15
■Batch normalization
 Insert “Batch Norm” layer
• Adjust so that the activation value is properly distribution
Initial value of weight(7/11)
Interaction Lab., Kumoh National Institue of Technology 16
■Batch normalization
Initial value of weight(8/11)
Interaction Lab., Kumoh National Institue of Technology 17
{x1, x2, x3, …, xn} {𝒙1, 𝒙 2, 𝒙 3, …, 𝒙 n}
Mini-batch mean
Mini-batch variance
normalize
Initial value of weight(8/11)
Interaction Lab., Kumoh National Institue of Technology 18
Initial value of weight(9/11)
Interaction Lab., Kumoh National Institue of Technology 19
■Batch normalization
Initial value of weight(10/11)
Interaction Lab., Kumoh National Institue of Technology 20
■Batch normalization
Initial value of weight(11/11)
Interaction Lab., Kumoh National Institue of Technology 21
■Model with many parameter and high expressiveness
■Little training data
Overcome of Overfitting(1/3)
Interaction Lab., Kumoh National Institue of Technology 22
■Weight decay
 In learning, penalize large weight
 Loss +
1
2
λ𝑊2
 λ: hyper parameter
• If λ is large, penalize weights

1
2
λ𝑊2
→ λ𝑊
Overcome of Overfitting(2/3)
Interaction Lab., Kumoh National Institue of Technology 23
■Dropout
Overcome of Overfitting(3/3)
Interaction Lab., Kumoh National Institue of Technology 24
■Hyper parameter
 Number of neuron
 Batch size
 Learning rate
 Etc…
Hyper parameter(1/3)
Interaction Lab., Kumoh National Institue of Technology 25
■Training data
 Only train
■Test data
 Only test
■Validation data
 Adjust hyper parameter
Hyper parameter(2/3)
Interaction Lab., Kumoh National Institue of Technology 26
■Optimization
 Setting the range of value
 Randomization
 Evaluation after learning with the extracted value
 Repeat and narrow down
Hyper parameter(3/3)
Interaction Lab., Kumoh National Institue of Technology 27
Q&A
Interaction Lab., Kumoh National Institue of Technology 28

More Related Content

What's hot

Machine Learning - Ensemble Methods
Machine Learning - Ensemble MethodsMachine Learning - Ensemble Methods
Machine Learning - Ensemble MethodsAndrew Ferlitsch
 
Presentation_OCR
Presentation_OCRPresentation_OCR
Presentation_OCRsamvb18
 
Cross-validation aggregation for forecasting
Cross-validation aggregation for forecastingCross-validation aggregation for forecasting
Cross-validation aggregation for forecastingDevon Barrow
 
Ensemble hybrid learning technique
Ensemble hybrid learning techniqueEnsemble hybrid learning technique
Ensemble hybrid learning techniqueDishaSinha9
 
Racing for unbalanced methods selection
Racing for unbalanced methods selectionRacing for unbalanced methods selection
Racing for unbalanced methods selectionAndrea Dal Pozzolo
 
Optimization Shuffled Frog Leaping Algorithm
Optimization Shuffled Frog Leaping AlgorithmOptimization Shuffled Frog Leaping Algorithm
Optimization Shuffled Frog Leaping AlgorithmUday Wankar
 
Boosting Algorithms Omar Odibat
Boosting Algorithms Omar Odibat Boosting Algorithms Omar Odibat
Boosting Algorithms Omar Odibat omarodibat
 
(Machine Learning) Ensemble learning
(Machine Learning) Ensemble learning (Machine Learning) Ensemble learning
(Machine Learning) Ensemble learning Omkar Rane
 
Adversarial Reinforced Learning for Unsupervised Domain Adaptation
Adversarial Reinforced Learning for Unsupervised Domain AdaptationAdversarial Reinforced Learning for Unsupervised Domain Adaptation
Adversarial Reinforced Learning for Unsupervised Domain Adaptationtaeseon ryu
 
safe and efficient off policy reinforcement learning
safe and efficient off policy reinforcement learningsafe and efficient off policy reinforcement learning
safe and efficient off policy reinforcement learningRyo Iwaki
 
Machine learning with ADA Boost
Machine learning with ADA BoostMachine learning with ADA Boost
Machine learning with ADA BoostAman Patel
 
H2O World - Ensembles with Erin LeDell
H2O World - Ensembles with Erin LeDellH2O World - Ensembles with Erin LeDell
H2O World - Ensembles with Erin LeDellSri Ambati
 
Understanding Bagging and Boosting
Understanding Bagging and BoostingUnderstanding Bagging and Boosting
Understanding Bagging and BoostingMohit Rajput
 
Ensemble learning
Ensemble learningEnsemble learning
Ensemble learningHaris Jamil
 
Mlp mixer an all-mlp architecture for vision
Mlp mixer  an all-mlp architecture for visionMlp mixer  an all-mlp architecture for vision
Mlp mixer an all-mlp architecture for visionJaey Jeong
 
Caching strategies for in memory neighborhood-based recommender systems
Caching strategies for in memory neighborhood-based recommender systemsCaching strategies for in memory neighborhood-based recommender systems
Caching strategies for in memory neighborhood-based recommender systemsSimon Dooms
 
Using HOG Descriptors on Superpixels for Human Detection of UAV Imagery
Using HOG Descriptors on Superpixels for Human Detection of UAV ImageryUsing HOG Descriptors on Superpixels for Human Detection of UAV Imagery
Using HOG Descriptors on Superpixels for Human Detection of UAV ImageryWai Nwe Tun
 
Boosting Approach to Solving Machine Learning Problems
Boosting Approach to Solving Machine Learning ProblemsBoosting Approach to Solving Machine Learning Problems
Boosting Approach to Solving Machine Learning ProblemsDr Sulaimon Afolabi
 
Decision Forests and discriminant analysis
Decision Forests and discriminant analysisDecision Forests and discriminant analysis
Decision Forests and discriminant analysispotaters
 

What's hot (20)

Machine Learning - Ensemble Methods
Machine Learning - Ensemble MethodsMachine Learning - Ensemble Methods
Machine Learning - Ensemble Methods
 
Presentation_OCR
Presentation_OCRPresentation_OCR
Presentation_OCR
 
Cross-validation aggregation for forecasting
Cross-validation aggregation for forecastingCross-validation aggregation for forecasting
Cross-validation aggregation for forecasting
 
Ensemble hybrid learning technique
Ensemble hybrid learning techniqueEnsemble hybrid learning technique
Ensemble hybrid learning technique
 
Racing for unbalanced methods selection
Racing for unbalanced methods selectionRacing for unbalanced methods selection
Racing for unbalanced methods selection
 
Optimization Shuffled Frog Leaping Algorithm
Optimization Shuffled Frog Leaping AlgorithmOptimization Shuffled Frog Leaping Algorithm
Optimization Shuffled Frog Leaping Algorithm
 
Boosting Algorithms Omar Odibat
Boosting Algorithms Omar Odibat Boosting Algorithms Omar Odibat
Boosting Algorithms Omar Odibat
 
(Machine Learning) Ensemble learning
(Machine Learning) Ensemble learning (Machine Learning) Ensemble learning
(Machine Learning) Ensemble learning
 
Adversarial Reinforced Learning for Unsupervised Domain Adaptation
Adversarial Reinforced Learning for Unsupervised Domain AdaptationAdversarial Reinforced Learning for Unsupervised Domain Adaptation
Adversarial Reinforced Learning for Unsupervised Domain Adaptation
 
safe and efficient off policy reinforcement learning
safe and efficient off policy reinforcement learningsafe and efficient off policy reinforcement learning
safe and efficient off policy reinforcement learning
 
Machine learning with ADA Boost
Machine learning with ADA BoostMachine learning with ADA Boost
Machine learning with ADA Boost
 
H2O World - Ensembles with Erin LeDell
H2O World - Ensembles with Erin LeDellH2O World - Ensembles with Erin LeDell
H2O World - Ensembles with Erin LeDell
 
Understanding Bagging and Boosting
Understanding Bagging and BoostingUnderstanding Bagging and Boosting
Understanding Bagging and Boosting
 
Ensemble learning
Ensemble learningEnsemble learning
Ensemble learning
 
Mlp mixer an all-mlp architecture for vision
Mlp mixer  an all-mlp architecture for visionMlp mixer  an all-mlp architecture for vision
Mlp mixer an all-mlp architecture for vision
 
Caching strategies for in memory neighborhood-based recommender systems
Caching strategies for in memory neighborhood-based recommender systemsCaching strategies for in memory neighborhood-based recommender systems
Caching strategies for in memory neighborhood-based recommender systems
 
Using HOG Descriptors on Superpixels for Human Detection of UAV Imagery
Using HOG Descriptors on Superpixels for Human Detection of UAV ImageryUsing HOG Descriptors on Superpixels for Human Detection of UAV Imagery
Using HOG Descriptors on Superpixels for Human Detection of UAV Imagery
 
Boosting Approach to Solving Machine Learning Problems
Boosting Approach to Solving Machine Learning ProblemsBoosting Approach to Solving Machine Learning Problems
Boosting Approach to Solving Machine Learning Problems
 
Decision Forests and discriminant analysis
Decision Forests and discriminant analysisDecision Forests and discriminant analysis
Decision Forests and discriminant analysis
 
Kaggle kenneth
Kaggle kennethKaggle kenneth
Kaggle kenneth
 

Similar to deep learning from scratch chapter 5.learning related skills

deep learning from scratch chapter 6.backpropagation
deep learning from scratch chapter 6.backpropagationdeep learning from scratch chapter 6.backpropagation
deep learning from scratch chapter 6.backpropagationJaey Jeong
 
Tablet gaze unconstrained appearance based gaze estimation in mobile tablets
Tablet gaze unconstrained appearance based gaze estimation in mobile tabletsTablet gaze unconstrained appearance based gaze estimation in mobile tablets
Tablet gaze unconstrained appearance based gaze estimation in mobile tabletsJaey Jeong
 
Unsupervised representation learning for gaze estimation
Unsupervised representation learning for gaze estimationUnsupervised representation learning for gaze estimation
Unsupervised representation learning for gaze estimationJaey Jeong
 
Appearance based gaze estimation using deep features and random forest regres...
Appearance based gaze estimation using deep features and random forest regres...Appearance based gaze estimation using deep features and random forest regres...
Appearance based gaze estimation using deep features and random forest regres...Jaey Jeong
 
Tracking the tracker: Time Series Analysis in Python from First Principles
Tracking the tracker: Time Series Analysis in Python from First PrinciplesTracking the tracker: Time Series Analysis in Python from First Principles
Tracking the tracker: Time Series Analysis in Python from First Principleskenluck2001
 
Deep learning based gaze detection system for automobile drivers using nir ca...
Deep learning based gaze detection system for automobile drivers using nir ca...Deep learning based gaze detection system for automobile drivers using nir ca...
Deep learning based gaze detection system for automobile drivers using nir ca...Jaey Jeong
 
deep learning from scratch chapter 4.neural network learing
deep learning from scratch chapter 4.neural network learingdeep learning from scratch chapter 4.neural network learing
deep learning from scratch chapter 4.neural network learingJaey Jeong
 
Optimization of Unit Commitment Problem using Classical Soft Computing Techni...
Optimization of Unit Commitment Problem using Classical Soft Computing Techni...Optimization of Unit Commitment Problem using Classical Soft Computing Techni...
Optimization of Unit Commitment Problem using Classical Soft Computing Techni...IRJET Journal
 

Similar to deep learning from scratch chapter 5.learning related skills (9)

deep learning from scratch chapter 6.backpropagation
deep learning from scratch chapter 6.backpropagationdeep learning from scratch chapter 6.backpropagation
deep learning from scratch chapter 6.backpropagation
 
Tablet gaze unconstrained appearance based gaze estimation in mobile tablets
Tablet gaze unconstrained appearance based gaze estimation in mobile tabletsTablet gaze unconstrained appearance based gaze estimation in mobile tablets
Tablet gaze unconstrained appearance based gaze estimation in mobile tablets
 
Presentation1
Presentation1Presentation1
Presentation1
 
Unsupervised representation learning for gaze estimation
Unsupervised representation learning for gaze estimationUnsupervised representation learning for gaze estimation
Unsupervised representation learning for gaze estimation
 
Appearance based gaze estimation using deep features and random forest regres...
Appearance based gaze estimation using deep features and random forest regres...Appearance based gaze estimation using deep features and random forest regres...
Appearance based gaze estimation using deep features and random forest regres...
 
Tracking the tracker: Time Series Analysis in Python from First Principles
Tracking the tracker: Time Series Analysis in Python from First PrinciplesTracking the tracker: Time Series Analysis in Python from First Principles
Tracking the tracker: Time Series Analysis in Python from First Principles
 
Deep learning based gaze detection system for automobile drivers using nir ca...
Deep learning based gaze detection system for automobile drivers using nir ca...Deep learning based gaze detection system for automobile drivers using nir ca...
Deep learning based gaze detection system for automobile drivers using nir ca...
 
deep learning from scratch chapter 4.neural network learing
deep learning from scratch chapter 4.neural network learingdeep learning from scratch chapter 4.neural network learing
deep learning from scratch chapter 4.neural network learing
 
Optimization of Unit Commitment Problem using Classical Soft Computing Techni...
Optimization of Unit Commitment Problem using Classical Soft Computing Techni...Optimization of Unit Commitment Problem using Classical Soft Computing Techni...
Optimization of Unit Commitment Problem using Classical Soft Computing Techni...
 

More from Jaey Jeong

Improving accuracy of binary neural networks using unbalanced activation dist...
Improving accuracy of binary neural networks using unbalanced activation dist...Improving accuracy of binary neural networks using unbalanced activation dist...
Improving accuracy of binary neural networks using unbalanced activation dist...Jaey Jeong
 
Gaze estimation using transformer
Gaze estimation using transformerGaze estimation using transformer
Gaze estimation using transformerJaey Jeong
 
핵심 딥러닝 입문 4장 RNN
핵심 딥러닝 입문 4장 RNN핵심 딥러닝 입문 4장 RNN
핵심 딥러닝 입문 4장 RNNJaey Jeong
 
Neural networks for semantic gaze analysis in xr settings
Neural networks for semantic gaze analysis in xr settingsNeural networks for semantic gaze analysis in xr settings
Neural networks for semantic gaze analysis in xr settingsJaey Jeong
 
Gaze supported 3 d object manipulation in virtual reality
Gaze supported 3 d object manipulation in virtual realityGaze supported 3 d object manipulation in virtual reality
Gaze supported 3 d object manipulation in virtual realityJaey Jeong
 
deep learning from scratch chapter 3 neural network
deep learning from scratch chapter 3 neural networkdeep learning from scratch chapter 3 neural network
deep learning from scratch chapter 3 neural networkJaey Jeong
 

More from Jaey Jeong (6)

Improving accuracy of binary neural networks using unbalanced activation dist...
Improving accuracy of binary neural networks using unbalanced activation dist...Improving accuracy of binary neural networks using unbalanced activation dist...
Improving accuracy of binary neural networks using unbalanced activation dist...
 
Gaze estimation using transformer
Gaze estimation using transformerGaze estimation using transformer
Gaze estimation using transformer
 
핵심 딥러닝 입문 4장 RNN
핵심 딥러닝 입문 4장 RNN핵심 딥러닝 입문 4장 RNN
핵심 딥러닝 입문 4장 RNN
 
Neural networks for semantic gaze analysis in xr settings
Neural networks for semantic gaze analysis in xr settingsNeural networks for semantic gaze analysis in xr settings
Neural networks for semantic gaze analysis in xr settings
 
Gaze supported 3 d object manipulation in virtual reality
Gaze supported 3 d object manipulation in virtual realityGaze supported 3 d object manipulation in virtual reality
Gaze supported 3 d object manipulation in virtual reality
 
deep learning from scratch chapter 3 neural network
deep learning from scratch chapter 3 neural networkdeep learning from scratch chapter 3 neural network
deep learning from scratch chapter 3 neural network
 

Recently uploaded

How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerThousandEyes
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...OnePlan Solutions
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software DevelopersVinodh Ram
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Steffen Staab
 
Test Automation Strategy for Frontend and Backend
Test Automation Strategy for Frontend and BackendTest Automation Strategy for Frontend and Backend
Test Automation Strategy for Frontend and BackendArshad QA
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfkalichargn70th171
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...gurkirankumar98700
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxbodapatigopi8531
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionSolGuruz
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsAndolasoft Inc
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...OnePlan Solutions
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVshikhaohhpro
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsAlberto González Trastoy
 
Active Directory Penetration Testing, cionsystems.com.pdf
Active Directory Penetration Testing, cionsystems.com.pdfActive Directory Penetration Testing, cionsystems.com.pdf
Active Directory Penetration Testing, cionsystems.com.pdfCionsystems
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfkalichargn70th171
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxComplianceQuest1
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️Delhi Call girls
 

Recently uploaded (20)

How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software Developers
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
Test Automation Strategy for Frontend and Backend
Test Automation Strategy for Frontend and BackendTest Automation Strategy for Frontend and Backend
Test Automation Strategy for Frontend and Backend
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptx
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with Precision
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.js
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
Active Directory Penetration Testing, cionsystems.com.pdf
Active Directory Penetration Testing, cionsystems.com.pdfActive Directory Penetration Testing, cionsystems.com.pdf
Active Directory Penetration Testing, cionsystems.com.pdf
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 

deep learning from scratch chapter 5.learning related skills

  • 1. Interaction Lab. Kumoh National Institute of Technology Deep Learning from Scratch chapter 6. Learning-related skills JaeYeop Jeong
  • 2. ■Intro ■Optimizer ■Initial Value of Weight ■Overcome of Overfitting ■Hyper Parameter Agenda Interaction Lab., Kumoh National Institue of Technology 2
  • 3. ■Optimization  A parameter that reduces the value of the loss function • Gradient  SGD • 𝑊 = 𝑊 − η 𝜎𝐿 𝜎𝑊 Intro(1/4) Interaction Lab., Kumoh National Institue of Technology 3
  • 4. ■SGD  𝑓 𝑥, 𝑦 = 1 20 𝑥2 + 𝑦2 Intro(2/4) Interaction Lab., Kumoh National Institue of Technology 4
  • 5. ■SGD  (-7, 2) start Intro(3/4) Interaction Lab., Kumoh National Institue of Technology 5
  • 6. ■SGD Intro(4/4) Interaction Lab., Kumoh National Institue of Technology 6
  • 7. ■Momentum  𝑣 ← 𝑎𝑣 − η 𝜎𝐿 𝜎𝑊  W ← 𝑊 + 𝑣 Optimizer(1/3) Interaction Lab., Kumoh National Institue of Technology 7
  • 8. ■AdaGrad  ℎ ← ℎ + 𝜎𝐿 𝜎𝑊 ∙ 𝜎𝐿 𝜎𝑊  𝑊 ← 𝑊 − η 1 ℎ ∙ 𝜎𝐿 𝜎𝑊  Learning rate decay  RMSProp Optimizer(2/3) Interaction Lab., Kumoh National Institue of Technology 8
  • 9. ■Adam  AdaGrad + Momentum Optimizer(3/3) Interaction Lab., Kumoh National Institue of Technology 9
  • 10. ■In case of 0  Bad idea • All weights are updated equally in backpropagation • Learning is not working effectively Initial value of weight(1/11) Interaction Lab., Kumoh National Institue of Technology 10
  • 11. ■In case of  Using sigmoid  Normal distribution with 1 standard deviation  Gradient vanishing Initial value of weight(2/11) Interaction Lab., Kumoh National Institue of Technology 11
  • 12. ■In case of  Using sigmoid  Normal distribution with 0.01 standard deviation  Representation spectrum Initial value of weight(3/11) Interaction Lab., Kumoh National Institue of Technology 12
  • 13. ■In case of Xavier value  Using sigmoid  Before N node  Normal distribution with 1 𝑛 standard deviation Initial value of weight(4/11) Interaction Lab., Kumoh National Institue of Technology 13
  • 14. ■In case of He value  Using ReLU  Before N node  Normal distribution with 2 𝑛 standard deviation Initial value of weight(5/11) Interaction Lab., Kumoh National Institue of Technology 14
  • 15. ■Batch normalization  Force distribution of activation values  Learning speed improvement  Does not depend on initial value  Suppression of overfitting Initial value of weight(6/11) Interaction Lab., Kumoh National Institue of Technology 15
  • 16. ■Batch normalization  Insert “Batch Norm” layer • Adjust so that the activation value is properly distribution Initial value of weight(7/11) Interaction Lab., Kumoh National Institue of Technology 16
  • 17. ■Batch normalization Initial value of weight(8/11) Interaction Lab., Kumoh National Institue of Technology 17 {x1, x2, x3, …, xn} {𝒙1, 𝒙 2, 𝒙 3, …, 𝒙 n} Mini-batch mean Mini-batch variance normalize
  • 18. Initial value of weight(8/11) Interaction Lab., Kumoh National Institue of Technology 18
  • 19. Initial value of weight(9/11) Interaction Lab., Kumoh National Institue of Technology 19
  • 20. ■Batch normalization Initial value of weight(10/11) Interaction Lab., Kumoh National Institue of Technology 20
  • 21. ■Batch normalization Initial value of weight(11/11) Interaction Lab., Kumoh National Institue of Technology 21
  • 22. ■Model with many parameter and high expressiveness ■Little training data Overcome of Overfitting(1/3) Interaction Lab., Kumoh National Institue of Technology 22
  • 23. ■Weight decay  In learning, penalize large weight  Loss + 1 2 λ𝑊2  λ: hyper parameter • If λ is large, penalize weights  1 2 λ𝑊2 → λ𝑊 Overcome of Overfitting(2/3) Interaction Lab., Kumoh National Institue of Technology 23
  • 24. ■Dropout Overcome of Overfitting(3/3) Interaction Lab., Kumoh National Institue of Technology 24
  • 25. ■Hyper parameter  Number of neuron  Batch size  Learning rate  Etc… Hyper parameter(1/3) Interaction Lab., Kumoh National Institue of Technology 25
  • 26. ■Training data  Only train ■Test data  Only test ■Validation data  Adjust hyper parameter Hyper parameter(2/3) Interaction Lab., Kumoh National Institue of Technology 26
  • 27. ■Optimization  Setting the range of value  Randomization  Evaluation after learning with the extracted value  Repeat and narrow down Hyper parameter(3/3) Interaction Lab., Kumoh National Institue of Technology 27
  • 28. Q&A Interaction Lab., Kumoh National Institue of Technology 28

Editor's Notes

  1. 그릇에 구슬이 구르듯이 값을 탐색 방향성을 가지고 현재 방향에서 일정 값 더 탐색
  2. 학습률 감소를 이용해서 값을 탐색 많이 갱신되는 가중치는 최적 값에 가까이 갔다고 판단 그 후는 조금씩 탐색 기울기 역수 값이 계속 곱해져서 언젠가는 0에 가까움 값 -> 기울기 손실 RMSProp 그 전 기울기 값보다 최신 기울기 값이 더 반영되게 하는 것
  3. 둘이 장점 합친거 adaGrad에 갱신 되는 값을 조절해줘서 처음엔 크지만 점점 조금씪 탐색 모멘텀에서 방향성을 가지면서 값을 탐색
  4. 학습에 관련된 기법 중 가중치 초기 값 결정이 중요함
  5. 표준편차가 1인 정규분포에서 가중치 값을 초기화 표준편차 1이면 큰 값 따라서 넓게 분포 즉 분산이 크다 시그모이드 함수에서 대부분 0과 1에 분포
  6. 표준편차 0.01 정규분포 가중치 값 초기화 중앙에 값이 분포 각 노드들이 대부분 같은 값을 가지는 것은 표현력이 제한
  7. 세이버 값 앞에 노드 개수가 N 개일때 표준편차 루트 1/n을 정규분포 가중치 초기화 값 망이 깊어 갈수록 모양이 일그러지지만 나름 좋음 sigmoid와 사용할 때 좋음
  8. ReLU를 사용할 때 사용하는 He 초기값 표준편차 루트 2/n을 정규분포 가중치 초기화 값으로 사용 0에 많은 값이 몰린 이유는 ReLU 수식에서 음수는 다 0이기 때문에 그런것이라고 생각
  9. 앞에서 활성화 함수 값의 분포를 위한 가중치 값들의 초기값을 결정에 대해서 알아봤는데, 각 노드에 활성화 값을 강제로 분포 하는 방식이 배치정규화 학습이 빠르다(학습률 더 조절 가능(정규화해주기 때문에)) 초기 가중치 값 설정할 필요 없음 과적합 방지(입력 값을 정규화 해줘서 0~1사이 값으로 만들어주기 때문에 가중치 갱신에 큰 영향 없게)
  10. 레이어 사이에 배치 정규화 레이어 삽입
  11. 입력 미니배치 x에 평균 분산을 구해서 정규화 한다.
  12. 각 데이터에서 같은 feature끼리 평균과 분산을 구해서 다음 식으로 정규화 한다 즉, 0과 1사이 값으로 변경시켜줌
  13. 다음 그림과 같이 각 데이터가 어떤 모습에 분포를 가지더라도 오른쪽으로 정규화 가능
  14. 배치 정규화를 사용한 것과 사용하지 않은 것들의 차이
  15. 가중치 초기 값을 정해주는거랑 배치 종규화 사용 \
  16. 매개변수가 많거나 표현력이 높은 모델 적은 훈련 데이터
  17. 가중치 감소 기법 가중치가 큰 값 즉 학습에 영향력이 큰 가중치에는 패널티를 주는 방법 손실함수에 ½람다가중치제곱 갑을 곱하는데, 여기서 람다는 사용자가 정하는 하이퍼 파라미터로써 값이 클수록 큰 패널티를 줄 수 있고 앞에 상수 값은 전체 패널티 값 결정 즉 손실함수에 값을 추가해줌으로써 이 가중치가 중요하지 않다라는 것을 표현 역전파에서는 미분한 값을 더해서 갱신에도 영향을 줄 수 있음
  18. 학습 중에 임의에 노드들을 삭제해서 학습하는 방법 모든 노드르 사용하지 않고 매번 삭제하는 노드를 바꿈으로써 매번 다른 모델을 학습시키는 것 테스트할 때는 모든 뉴런 사용
  19. 값의 범위를 설정한다. (0.001 ~ 1000) 랜덤으로 추출 추출된 값으로 학습하고 검증데이터로 검증 반복하고 조절