SlideShare a Scribd company logo
1 of 35
Download to read offline
Copyright © 2017 DeepScale 1
A Shallow Dive into Training Deep
Neural Networks
Sammy Sidhu
May 2017
Copyright © 2017 DeepScale 2
• Perception systems for autonomous vehicles
• Focusing on enabling technologies for mass-produced autonomous
vehicles
• Working with a number of OEMs and automotive suppliers
• Open Source ☺
• Visit http://deepscale.ai
About DeepScale
Copyright © 2017 DeepScale 3
• Feature Engineering vs. Learned Features
• Neural Network Review
• Loss Function (Objective Function)
• Gradients
• Optimization Techniques
• Datasets
• Overfitting and Underfitting
Overview
Copyright © 2017 DeepScale 4
Feature Engineering vs. Learned Features
Example of hand written features for face detection
Copyright © 2017 DeepScale 5
• Feature Engineering for computer vision can work well
• Very time consuming to find useful features
• Requires BOTH domain expertise and programming know-how
• Hard to generalize all cases (lumination, pose and variations in
domain)
• Can use generalized features like HOG/SIFT but accuracy suffers
Feature Engineering vs. Learned Features (Cont’d.)
Copyright © 2017 DeepScale 6
Feature Engineering vs. Learned Features (Cont’d.)
Example of learned features of a CNN for facial
classification [DeepFace CVPR14]
Copyright © 2017 DeepScale 7
• Learned Features for computer vision can work extremely well
• Image Classification: 5.71% vs. 26.2% error [ResNet-152 vs. SIFT
sparse]
• Only requires labeled data, deep learning expertise and computing
power
• “Training” the network is essentially learning features layer by layer
• The deeper you go, the features become much more complex
• Hard to perform validation outside of putting in data and seeing what
happens
Feature Engineering vs. Learned Features (Cont’d.)
Copyright © 2017 DeepScale 8
y = fw(x)
where w is a set of parameters we can learn and f is a nonlinear function
A neural network can be seen as a function approximation
Neural Networks — Quick Review
8
Typical nonlinear functions in DNN
Copyright © 2017 DeepScale 9
• Take the example of a Linear Regression
• Given data, we fit a line (𝑦 = 𝑚𝑥 + 𝑏) that minimizes the sum of the
squares of differences (Euclidian distance loss function)
• This function that we minimize is the loss function
• An example would be to predict house value given square footage and
median income
• f(sqft, income) --> value where value is [0, inf] dollars
• we want to minimize L(actual_value, predicted_value) where L is the
loss function
Loss Function (Objective Function)
Copyright © 2017 DeepScale 10
Loss Function (Objective Function) (Cont’d.)
Copyright © 2017 DeepScale 11
Loss Function (Objective Function) (Cont’d.)
• Another loss function is the Softmax loss for classification
• This is useful for the case if we want to predict the probability of an event
• For Example: Predict if an image is of a cat or a dog
Copyright © 2017 DeepScale 12
• Loss functions can be used for either classification or regression
• The goal is to pick a set of weights that makes this loss value as small
as possible
• It is very crucial to pick the right objective function for the right task, i.e.,
one technically can use a squared loss for predicting probability
Loss Function (Objective Function) (Cont’d.)
Copyright © 2017 DeepScale 13
• Now if we have a loss function and a neural network, how do we know
what part of the network is “responsible” for causing that error?
• Let’s go back to the simple linear regression!
Gradients
Copyright © 2017 DeepScale 14
• Let’s define the loss function
• 𝐿 =
1
2
(𝑌 − ෠𝑌)2 where ෠𝑌 is the predicted
• Let’s then take the derivative to see how ෠𝑌 contributes to the loss L
•
𝑑𝐿
𝑑 ෠𝑌
= −(𝑌 − ෠𝑌) = ෠𝑌 − 𝑌
• We’re fitting a line
• ෠𝑌 = 𝑚𝑋 + 𝑏
• Two weights to optimize (slope and bias)
•
𝑑 ෠𝑌
𝑑𝑚
= X,
𝑑 ෠𝑌
𝑑𝑏
= 1
Gradients (Cont’d.)
Copyright © 2017 DeepScale 15
Gradients (Cont’d.)
Line with noise to fit Surface of loss w.r.t slope and bias (m, b)
https://spin.atomicobject.com/2014/06/24/gradient-descent-linear-regression/
Copyright © 2017 DeepScale 16
• We know
𝑑𝐿
𝑑 ෠𝑌
= ෠𝑌 − 𝑌 and
𝑑 ෠𝑌
𝑑𝑚
= X,
𝑑 ෠𝑌
𝑑𝑏
= 1
• To optimize our line [slope and bias] we use the chain rule!
•
𝑑𝐿
𝑑𝑚
=
𝑑𝐿
𝑑෢𝑌
𝑑 ෠𝑌
𝑑𝑚
= X(෠𝑌 − 𝑌) and
𝑑𝐿
𝑑𝑏
=
𝑑𝐿
𝑑෢𝑌
𝑑 ෠𝑌
𝑑𝑏
= (෠𝑌 − 𝑌)
• Together, these two derivatives make a Gradient!
• We update our weights with the following
• 𝑚 = 𝑚 + 𝛼
𝑑𝐿
𝑑𝑚
and 𝑏 = 𝑏 + 𝛼
𝑑𝐿
𝑑𝑏
• where 𝛼 is a rate parameter
Gradients (Cont’d.)
Copyright © 2017 DeepScale 17
• How to minimize loss?
• Walk down surface via gradient steps until you reach the minimum!
Gradients (Cont’d.)
https://github.com/mattnedrich/GradientDescentExample
Copyright © 2017 DeepScale 18
• Gradient descent is not just limited to linear regression
• We can take derivatives with respect to any parameter in the
neural network
• To avoid math complexity and recomputation, we can use the
chain rule again
• We can even do this through our nonlinear functions that are
not continuous
Gradients (Cont’d.)
Copyright © 2017 DeepScale 19
Gradients (Cont.)
• This process of computing and applying gradient updates to a neural
network layer by layer is called Back Propagation
Copyright © 2017 DeepScale 20
• Given the fact that we now have gradients, and the weights, what's the
best way to apply the updates?
• In the previous linear regression example
• Grab random sample and apply updates to slope and bias
• Repeat until converges
• Known as SGD
• Can we do better to find the best possible set of weights to minimize
loss? (Optimization)
Optimization Techniques
Copyright © 2017 DeepScale 21
• Momentum
• Keep a running average of previous updates and add to each update
Optimization Techniques (Cont’d.)
Steps without Momentum Steps with Momentum
Copyright © 2017 DeepScale 22
• AdaGrad, AdaProp, RMSProp, ADAM
• Automatically tune learning rate to reach convergence in less
updates
• Great for fast convergence
• Sometimes finicky for reaching lowest loss possible for a network
Optimization Techniques (Cont’d.)
Copyright © 2017 DeepScale 23
Optimization Techniques (Cont’d.)
Copyright © 2017 DeepScale 24
• When it comes to neural networks, you want to have a diverse dataset
that large enough to training your network without overfitting (more on
this later)
• You can also augment your data to generate more samples
• Rotations / reflections when makes sense
• Add noise / hue / contrast
• This is extremely useful in the case where you have rare samples classes
Datasets
Copyright © 2017 DeepScale 25
Datasets (Cont’d.)
MNIST
Copyright © 2017 DeepScale 26
Datasets (Cont’d.)
CIFAR-10
Copyright © 2017 DeepScale 27
Datasets (Cont’d.)
Imagenet
Copyright © 2017 DeepScale 28
• What is Overfitting?
• Fitting to the training data but not generalizing well
• What is Underfitting?
• The model does not capture the trends in the data
• How to tell?
Overfitting and Underfitting
Copyright © 2017 DeepScale 29
Overfitting and Underfitting (Cont’d.)
Copyright © 2017 DeepScale 30
• We can split the training data into 3 disjoint parts
• Training set, Validation set, Test set
• During training
• “Learn” via the training set
• Evaluate the model every epoch with the validation set
• After Training
• Test the model with the test set which the model hasn’t seen before
Overfitting and Underfitting (Cont’d.)
Copyright © 2017 DeepScale 31
Overfitting and Underfitting (Cont’d.)
• Overfitting when
• Training loss is low but validation and test loss is high
Copyright © 2017 DeepScale 32
• How to combat overfitting?
• More data
• Data augmentation
• Regularization (weight decay)
• Add the magnitude of the weights to the loss function
• Ignore some of the weight updates (Dropout)
• Simpler model?
Overfitting and Underfitting (Cont’d.)
Copyright © 2017 DeepScale 33
• Underfitting when
• Training loss drops at first then stops
• Training loss is still high
• Training loss tracks validation loss
• More complex model?
• Turn down regularization
Overfitting and Underfitting (Cont’d.)
Copyright © 2017 DeepScale 34
• Neural Nets are function approximators
• Deep Learning can work surprising well
• Optimizing nets is an art that requires intuition
• Making good datasets is hard
• Overfitting makes it hard to generalize for applications
• We can find how robust our models are with validation testing
Takeaways
Copyright © 2017 DeepScale 35
Thank you!
Questions?

More Related Content

What's hot

Ashfaq Munshi, ML7 Fellow, Pepperdata
Ashfaq Munshi, ML7 Fellow, PepperdataAshfaq Munshi, ML7 Fellow, Pepperdata
Ashfaq Munshi, ML7 Fellow, Pepperdata
MLconf
 

What's hot (9)

GluonCV
GluonCVGluonCV
GluonCV
 
A More Scaleable Way of Making Recommendations with MLlib-(Xiangrui Meng, Dat...
A More Scaleable Way of Making Recommendations with MLlib-(Xiangrui Meng, Dat...A More Scaleable Way of Making Recommendations with MLlib-(Xiangrui Meng, Dat...
A More Scaleable Way of Making Recommendations with MLlib-(Xiangrui Meng, Dat...
 
Score-Based Generative Modeling through Stochastic Differential Equations
Score-Based Generative Modeling through Stochastic Differential EquationsScore-Based Generative Modeling through Stochastic Differential Equations
Score-Based Generative Modeling through Stochastic Differential Equations
 
Linear regression with gradient descent
Linear regression with gradient descentLinear regression with gradient descent
Linear regression with gradient descent
 
Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...
Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...
Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...
 
1118_Seminar_Continuous_Deep Q-Learning with Model based acceleration
1118_Seminar_Continuous_Deep Q-Learning with Model based acceleration1118_Seminar_Continuous_Deep Q-Learning with Model based acceleration
1118_Seminar_Continuous_Deep Q-Learning with Model based acceleration
 
Ashfaq Munshi, ML7 Fellow, Pepperdata
Ashfaq Munshi, ML7 Fellow, PepperdataAshfaq Munshi, ML7 Fellow, Pepperdata
Ashfaq Munshi, ML7 Fellow, Pepperdata
 
Introductions to Online Machine Learning Algorithms
Introductions to Online Machine Learning AlgorithmsIntroductions to Online Machine Learning Algorithms
Introductions to Online Machine Learning Algorithms
 
Machine learning and_nlp
Machine learning and_nlpMachine learning and_nlp
Machine learning and_nlp
 

Similar to "A Shallow Dive into Training Deep Neural Networks," a Presentation from DeepScale

"Performing Multiple Perceptual Tasks With a Single Deep Neural Network," a P...
"Performing Multiple Perceptual Tasks With a Single Deep Neural Network," a P..."Performing Multiple Perceptual Tasks With a Single Deep Neural Network," a P...
"Performing Multiple Perceptual Tasks With a Single Deep Neural Network," a P...
Edge AI and Vision Alliance
 
“High-fidelity Conversion of Floating-point Networks for Low-precision Infere...
“High-fidelity Conversion of Floating-point Networks for Low-precision Infere...“High-fidelity Conversion of Floating-point Networks for Low-precision Infere...
“High-fidelity Conversion of Floating-point Networks for Low-precision Infere...
Edge AI and Vision Alliance
 
Computer Vision for Beginners
Computer Vision for BeginnersComputer Vision for Beginners
Computer Vision for Beginners
Sanghamitra Deb
 

Similar to "A Shallow Dive into Training Deep Neural Networks," a Presentation from DeepScale (20)

"Performing Multiple Perceptual Tasks With a Single Deep Neural Network," a P...
"Performing Multiple Perceptual Tasks With a Single Deep Neural Network," a P..."Performing Multiple Perceptual Tasks With a Single Deep Neural Network," a P...
"Performing Multiple Perceptual Tasks With a Single Deep Neural Network," a P...
 
An overview of gradient descent optimization algorithms.pdf
An overview of gradient descent optimization algorithms.pdfAn overview of gradient descent optimization algorithms.pdf
An overview of gradient descent optimization algorithms.pdf
 
Stop the Chaos! Get Real Oracle Performance by Query Tuning Part 2
Stop the Chaos! Get Real Oracle Performance by Query Tuning Part 2Stop the Chaos! Get Real Oracle Performance by Query Tuning Part 2
Stop the Chaos! Get Real Oracle Performance by Query Tuning Part 2
 
Amazon SageMaker 內建機器學習演算法 (Level 400)
Amazon SageMaker 內建機器學習演算法 (Level 400)Amazon SageMaker 內建機器學習演算法 (Level 400)
Amazon SageMaker 內建機器學習演算法 (Level 400)
 
BSSML17 - Deepnets
BSSML17 - DeepnetsBSSML17 - Deepnets
BSSML17 - Deepnets
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
 
Ironwood4_Tuesday_Medasani_1PM
Ironwood4_Tuesday_Medasani_1PMIronwood4_Tuesday_Medasani_1PM
Ironwood4_Tuesday_Medasani_1PM
 
Parallel/Distributed Deep Learning and CDSW
Parallel/Distributed Deep Learning and CDSWParallel/Distributed Deep Learning and CDSW
Parallel/Distributed Deep Learning and CDSW
 
Machine Learning for Capacity Management
 Machine Learning for Capacity Management Machine Learning for Capacity Management
Machine Learning for Capacity Management
 
“High-fidelity Conversion of Floating-point Networks for Low-precision Infere...
“High-fidelity Conversion of Floating-point Networks for Low-precision Infere...“High-fidelity Conversion of Floating-point Networks for Low-precision Infere...
“High-fidelity Conversion of Floating-point Networks for Low-precision Infere...
 
"Deep Learning Beyond Cats and Cars: Developing a Real-life DNN-based Embedde...
"Deep Learning Beyond Cats and Cars: Developing a Real-life DNN-based Embedde..."Deep Learning Beyond Cats and Cars: Developing a Real-life DNN-based Embedde...
"Deep Learning Beyond Cats and Cars: Developing a Real-life DNN-based Embedde...
 
DAT320_Moving a Galaxy into Cloud
DAT320_Moving a Galaxy into CloudDAT320_Moving a Galaxy into Cloud
DAT320_Moving a Galaxy into Cloud
 
Java 8 - Gateway Drug or End of Line?
Java 8 - Gateway Drug or End of Line?Java 8 - Gateway Drug or End of Line?
Java 8 - Gateway Drug or End of Line?
 
Practical Artificial Intelligence: Deep Learning Beyond Cats and Cars
Practical Artificial Intelligence: Deep Learning Beyond Cats and CarsPractical Artificial Intelligence: Deep Learning Beyond Cats and Cars
Practical Artificial Intelligence: Deep Learning Beyond Cats and Cars
 
Implement DevOps Like a Unicorn—Even If You’re Not One
Implement DevOps Like a Unicorn—Even If You’re Not OneImplement DevOps Like a Unicorn—Even If You’re Not One
Implement DevOps Like a Unicorn—Even If You’re Not One
 
MCL310_Building Deep Learning Applications with Apache MXNet and Gluon
MCL310_Building Deep Learning Applications with Apache MXNet and GluonMCL310_Building Deep Learning Applications with Apache MXNet and Gluon
MCL310_Building Deep Learning Applications with Apache MXNet and Gluon
 
Workshop on Advanced Design Patterns for Amazon DynamoDB - DAT405 - re:Invent...
Workshop on Advanced Design Patterns for Amazon DynamoDB - DAT405 - re:Invent...Workshop on Advanced Design Patterns for Amazon DynamoDB - DAT405 - re:Invent...
Workshop on Advanced Design Patterns for Amazon DynamoDB - DAT405 - re:Invent...
 
PostgreSQL at 20TB and Beyond
PostgreSQL at 20TB and BeyondPostgreSQL at 20TB and Beyond
PostgreSQL at 20TB and Beyond
 
Computer Vision for Beginners
Computer Vision for BeginnersComputer Vision for Beginners
Computer Vision for Beginners
 
Parallel & Distributed Deep Learning - Dataworks Summit
Parallel & Distributed Deep Learning - Dataworks SummitParallel & Distributed Deep Learning - Dataworks Summit
Parallel & Distributed Deep Learning - Dataworks Summit
 

More from Edge AI and Vision Alliance

“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...
“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...
“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...
Edge AI and Vision Alliance
 
“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...
“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...
“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...
Edge AI and Vision Alliance
 
“Vision-language Representations for Robotics,” a Presentation from the Unive...
“Vision-language Representations for Robotics,” a Presentation from the Unive...“Vision-language Representations for Robotics,” a Presentation from the Unive...
“Vision-language Representations for Robotics,” a Presentation from the Unive...
Edge AI and Vision Alliance
 
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...
Edge AI and Vision Alliance
 
“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...
“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...
“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...
Edge AI and Vision Alliance
 
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...
Edge AI and Vision Alliance
 
“Updating the Edge ML Development Process,” a Presentation from Samsara
“Updating the Edge ML Development Process,” a Presentation from Samsara“Updating the Edge ML Development Process,” a Presentation from Samsara
“Updating the Edge ML Development Process,” a Presentation from Samsara
Edge AI and Vision Alliance
 
“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...
“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...
“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...
Edge AI and Vision Alliance
 

More from Edge AI and Vision Alliance (20)

“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...
“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...
“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...
 
“Introduction to Computer Vision with CNNs,” a Presentation from Mohammad Hag...
“Introduction to Computer Vision with CNNs,” a Presentation from Mohammad Hag...“Introduction to Computer Vision with CNNs,” a Presentation from Mohammad Hag...
“Introduction to Computer Vision with CNNs,” a Presentation from Mohammad Hag...
 
“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...
“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...
“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...
 
“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...
“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...
“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...
 
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...
 
“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...
“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...
“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...
 
“Vision-language Representations for Robotics,” a Presentation from the Unive...
“Vision-language Representations for Robotics,” a Presentation from the Unive...“Vision-language Representations for Robotics,” a Presentation from the Unive...
“Vision-language Representations for Robotics,” a Presentation from the Unive...
 
“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights
“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights
“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights
 
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...
 
“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...
“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...
“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...
 
“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...
“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...
“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...
 
“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...
“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...
“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...
 
“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...
“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...
“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...
 
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...
 
“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...
“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...
“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...
 
“Updating the Edge ML Development Process,” a Presentation from Samsara
“Updating the Edge ML Development Process,” a Presentation from Samsara“Updating the Edge ML Development Process,” a Presentation from Samsara
“Updating the Edge ML Development Process,” a Presentation from Samsara
 
“Combating Bias in Production Computer Vision Systems,” a Presentation from R...
“Combating Bias in Production Computer Vision Systems,” a Presentation from R...“Combating Bias in Production Computer Vision Systems,” a Presentation from R...
“Combating Bias in Production Computer Vision Systems,” a Presentation from R...
 
“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...
“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...
“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...
 
“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...
“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...
“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...
 
“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...
“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...
“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...
 

Recently uploaded

Recently uploaded (20)

TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 

"A Shallow Dive into Training Deep Neural Networks," a Presentation from DeepScale

  • 1. Copyright © 2017 DeepScale 1 A Shallow Dive into Training Deep Neural Networks Sammy Sidhu May 2017
  • 2. Copyright © 2017 DeepScale 2 • Perception systems for autonomous vehicles • Focusing on enabling technologies for mass-produced autonomous vehicles • Working with a number of OEMs and automotive suppliers • Open Source ☺ • Visit http://deepscale.ai About DeepScale
  • 3. Copyright © 2017 DeepScale 3 • Feature Engineering vs. Learned Features • Neural Network Review • Loss Function (Objective Function) • Gradients • Optimization Techniques • Datasets • Overfitting and Underfitting Overview
  • 4. Copyright © 2017 DeepScale 4 Feature Engineering vs. Learned Features Example of hand written features for face detection
  • 5. Copyright © 2017 DeepScale 5 • Feature Engineering for computer vision can work well • Very time consuming to find useful features • Requires BOTH domain expertise and programming know-how • Hard to generalize all cases (lumination, pose and variations in domain) • Can use generalized features like HOG/SIFT but accuracy suffers Feature Engineering vs. Learned Features (Cont’d.)
  • 6. Copyright © 2017 DeepScale 6 Feature Engineering vs. Learned Features (Cont’d.) Example of learned features of a CNN for facial classification [DeepFace CVPR14]
  • 7. Copyright © 2017 DeepScale 7 • Learned Features for computer vision can work extremely well • Image Classification: 5.71% vs. 26.2% error [ResNet-152 vs. SIFT sparse] • Only requires labeled data, deep learning expertise and computing power • “Training” the network is essentially learning features layer by layer • The deeper you go, the features become much more complex • Hard to perform validation outside of putting in data and seeing what happens Feature Engineering vs. Learned Features (Cont’d.)
  • 8. Copyright © 2017 DeepScale 8 y = fw(x) where w is a set of parameters we can learn and f is a nonlinear function A neural network can be seen as a function approximation Neural Networks — Quick Review 8 Typical nonlinear functions in DNN
  • 9. Copyright © 2017 DeepScale 9 • Take the example of a Linear Regression • Given data, we fit a line (𝑦 = 𝑚𝑥 + 𝑏) that minimizes the sum of the squares of differences (Euclidian distance loss function) • This function that we minimize is the loss function • An example would be to predict house value given square footage and median income • f(sqft, income) --> value where value is [0, inf] dollars • we want to minimize L(actual_value, predicted_value) where L is the loss function Loss Function (Objective Function)
  • 10. Copyright © 2017 DeepScale 10 Loss Function (Objective Function) (Cont’d.)
  • 11. Copyright © 2017 DeepScale 11 Loss Function (Objective Function) (Cont’d.) • Another loss function is the Softmax loss for classification • This is useful for the case if we want to predict the probability of an event • For Example: Predict if an image is of a cat or a dog
  • 12. Copyright © 2017 DeepScale 12 • Loss functions can be used for either classification or regression • The goal is to pick a set of weights that makes this loss value as small as possible • It is very crucial to pick the right objective function for the right task, i.e., one technically can use a squared loss for predicting probability Loss Function (Objective Function) (Cont’d.)
  • 13. Copyright © 2017 DeepScale 13 • Now if we have a loss function and a neural network, how do we know what part of the network is “responsible” for causing that error? • Let’s go back to the simple linear regression! Gradients
  • 14. Copyright © 2017 DeepScale 14 • Let’s define the loss function • 𝐿 = 1 2 (𝑌 − ෠𝑌)2 where ෠𝑌 is the predicted • Let’s then take the derivative to see how ෠𝑌 contributes to the loss L • 𝑑𝐿 𝑑 ෠𝑌 = −(𝑌 − ෠𝑌) = ෠𝑌 − 𝑌 • We’re fitting a line • ෠𝑌 = 𝑚𝑋 + 𝑏 • Two weights to optimize (slope and bias) • 𝑑 ෠𝑌 𝑑𝑚 = X, 𝑑 ෠𝑌 𝑑𝑏 = 1 Gradients (Cont’d.)
  • 15. Copyright © 2017 DeepScale 15 Gradients (Cont’d.) Line with noise to fit Surface of loss w.r.t slope and bias (m, b) https://spin.atomicobject.com/2014/06/24/gradient-descent-linear-regression/
  • 16. Copyright © 2017 DeepScale 16 • We know 𝑑𝐿 𝑑 ෠𝑌 = ෠𝑌 − 𝑌 and 𝑑 ෠𝑌 𝑑𝑚 = X, 𝑑 ෠𝑌 𝑑𝑏 = 1 • To optimize our line [slope and bias] we use the chain rule! • 𝑑𝐿 𝑑𝑚 = 𝑑𝐿 𝑑෢𝑌 𝑑 ෠𝑌 𝑑𝑚 = X(෠𝑌 − 𝑌) and 𝑑𝐿 𝑑𝑏 = 𝑑𝐿 𝑑෢𝑌 𝑑 ෠𝑌 𝑑𝑏 = (෠𝑌 − 𝑌) • Together, these two derivatives make a Gradient! • We update our weights with the following • 𝑚 = 𝑚 + 𝛼 𝑑𝐿 𝑑𝑚 and 𝑏 = 𝑏 + 𝛼 𝑑𝐿 𝑑𝑏 • where 𝛼 is a rate parameter Gradients (Cont’d.)
  • 17. Copyright © 2017 DeepScale 17 • How to minimize loss? • Walk down surface via gradient steps until you reach the minimum! Gradients (Cont’d.) https://github.com/mattnedrich/GradientDescentExample
  • 18. Copyright © 2017 DeepScale 18 • Gradient descent is not just limited to linear regression • We can take derivatives with respect to any parameter in the neural network • To avoid math complexity and recomputation, we can use the chain rule again • We can even do this through our nonlinear functions that are not continuous Gradients (Cont’d.)
  • 19. Copyright © 2017 DeepScale 19 Gradients (Cont.) • This process of computing and applying gradient updates to a neural network layer by layer is called Back Propagation
  • 20. Copyright © 2017 DeepScale 20 • Given the fact that we now have gradients, and the weights, what's the best way to apply the updates? • In the previous linear regression example • Grab random sample and apply updates to slope and bias • Repeat until converges • Known as SGD • Can we do better to find the best possible set of weights to minimize loss? (Optimization) Optimization Techniques
  • 21. Copyright © 2017 DeepScale 21 • Momentum • Keep a running average of previous updates and add to each update Optimization Techniques (Cont’d.) Steps without Momentum Steps with Momentum
  • 22. Copyright © 2017 DeepScale 22 • AdaGrad, AdaProp, RMSProp, ADAM • Automatically tune learning rate to reach convergence in less updates • Great for fast convergence • Sometimes finicky for reaching lowest loss possible for a network Optimization Techniques (Cont’d.)
  • 23. Copyright © 2017 DeepScale 23 Optimization Techniques (Cont’d.)
  • 24. Copyright © 2017 DeepScale 24 • When it comes to neural networks, you want to have a diverse dataset that large enough to training your network without overfitting (more on this later) • You can also augment your data to generate more samples • Rotations / reflections when makes sense • Add noise / hue / contrast • This is extremely useful in the case where you have rare samples classes Datasets
  • 25. Copyright © 2017 DeepScale 25 Datasets (Cont’d.) MNIST
  • 26. Copyright © 2017 DeepScale 26 Datasets (Cont’d.) CIFAR-10
  • 27. Copyright © 2017 DeepScale 27 Datasets (Cont’d.) Imagenet
  • 28. Copyright © 2017 DeepScale 28 • What is Overfitting? • Fitting to the training data but not generalizing well • What is Underfitting? • The model does not capture the trends in the data • How to tell? Overfitting and Underfitting
  • 29. Copyright © 2017 DeepScale 29 Overfitting and Underfitting (Cont’d.)
  • 30. Copyright © 2017 DeepScale 30 • We can split the training data into 3 disjoint parts • Training set, Validation set, Test set • During training • “Learn” via the training set • Evaluate the model every epoch with the validation set • After Training • Test the model with the test set which the model hasn’t seen before Overfitting and Underfitting (Cont’d.)
  • 31. Copyright © 2017 DeepScale 31 Overfitting and Underfitting (Cont’d.) • Overfitting when • Training loss is low but validation and test loss is high
  • 32. Copyright © 2017 DeepScale 32 • How to combat overfitting? • More data • Data augmentation • Regularization (weight decay) • Add the magnitude of the weights to the loss function • Ignore some of the weight updates (Dropout) • Simpler model? Overfitting and Underfitting (Cont’d.)
  • 33. Copyright © 2017 DeepScale 33 • Underfitting when • Training loss drops at first then stops • Training loss is still high • Training loss tracks validation loss • More complex model? • Turn down regularization Overfitting and Underfitting (Cont’d.)
  • 34. Copyright © 2017 DeepScale 34 • Neural Nets are function approximators • Deep Learning can work surprising well • Optimizing nets is an art that requires intuition • Making good datasets is hard • Overfitting makes it hard to generalize for applications • We can find how robust our models are with validation testing Takeaways
  • 35. Copyright © 2017 DeepScale 35 Thank you! Questions?