SlideShare a Scribd company logo
1 of 54
Download to read offline
Neural NetworksUniversal Function Approximators
-Prakhar Mishra
Agenda
● Machine Learning Refresher
○ An Example
○ Hierarchical Division
○ Split Ratio
○ Evaluation Metric
● Neural Networks
○ Inspiration
○ Computation Graph
○ Architecture
○ Hyperparameters
○ Regularization
○ Backpropagation
Machine Learning - Quick Refresher
Machine Learning - Quick Refresher
Machine Learning - Quick Refresher
Feature Engineering
Machine Learning - Quick Refresher
Figure out yourself
Machine Learning - Quick Refresher
Machine Learning - Quick Refresher
Machine Learning - Quick Refresher
70%-80% 30%-20%
Machine Learning - Evaluation Metrics
● Confusion Matrix
○ Evaluation for performance of classification model
● Accuracy = (TP + TN) /total samples
Machine Learning - Evaluation Metrics
● Root Mean Squared Error
○ Spread of the predicted y-values about the original y-values.
N = Total Samples
Yi
= Predicted
Yi
= Actual
Rise of Neural Nets
Scale drives
Deep Learning
Learning from Data
Structured Unstructured
Neural Nets - Supervised
Input Output Application
Home Features Cost Real Estate
Ad, User Information Click on Ad ? Online Advertising
Image (1...1000) Class Photo Tagging
Audio Text Speech Recognition
English Chinese Machine Translation
Computation Graph
J(a, b, c) = 3(a + bc)
U = bc
V = a + U
J = 3V
Substitution
U=b*c
b
c
a V= a+U J = 3V
Input
a = 5
b = 3
c = 2
How does J
change if we
change V a bit?
11
33
6
How does J
change if we
change a a bit?
a→V→J
∂J/∂a = (∂J/∂V) x (∂V/∂a)
How does J
change if we
change b a bit?
b→U→V→J
∂J/∂b = (∂J/∂V) x (∂V/∂U) x (∂U/∂b)
Forward →
Backward ←
Architecture
w1
i
1
i2
.
.
in
wn
o1
on
.
.
xF
F = Activation Function
X = w1
*i1
+ w2
*i2
+ . . +wn
*in
+ b
3 Layer NN
Hyperparameters
● There are number of parameters that can be tuned in while building your
neural network.
○ Number of Hidden Layers
○ Epochs
○ Loss Function
○ Optimization Function
○ Weight Initialization
○ Activation Functions
○ Batch Size
○ Learning Rate
Weight Initialization
● If the weights in a network start too small, then the signal shrinks as it
passes through each layer until it’s too tiny to be useful.
● If the weights in a network start too large, then the signal grows as it
passes through each layer until it’s too massive to be useful.
-
Xavier Initialization
-
Weight Initialization
Wi
= √(2 / ni
)
Hyperparameters
● There are number of parameters that can be tuned in while building your
neural network.
○ Number of Hidden Layers
○ Epochs
○ Loss Function
○ Optimization Function
○ Weight Initialization
○ Activation Functions
○ Batch Size
○ Learning Rate
Loss Functions
● Binary Cross Entropy
● Categorical Cross Entropy
● Root Mean Squared Error
Hyperparameters
● There are number of parameters that can be tuned in while building your
neural network.
○ Number of Hidden Layers
○ Epochs
○ Loss Function
○ Optimization Function
○ Weight Initialization
○ Activation Functions
○ Batch Size
○ Learning Rate
Optimization Functions
● Adagrad Optimizer
● Gradient Descent Optimizer
● Adams Optimizer
● Stochastic Gradient Descent Optimizer
● RMSProp Optimizer
Optimization Functions - Adam
Optimization Functions - Adam
Hyperparameters
● There are number of parameters that can be tuned in while building your
neural network.
○ Number of Hidden Layers
○ Epochs
○ Loss Function
○ Optimization Function
○ Weight Initialization
○ Activation Functions
○ Batch Size
○ Learning Rate
Learning Rate
● Decaying the Learning Rate overtime is seen to fasten the learning
process/convergence.
Learning Rate- Intuition
Learning Rate- Formula
1
1 + decay x learning_rate
Alpha0Alpha1
Learning Rate- Special Case
Wi
= Wi-1
+ Alpha x Slope
Pseudo Self Adaptive in
Convex Curve
Hyperparameters
● There are number of parameters that can be tuned in while building your
neural network.
○ Number of Hidden Layers
○ Epochs
○ Loss Function
○ Optimization Function
○ Weight Initialization
○ Activation Functions
○ Batch Size
○ Learning Rate
Activation Functions
Biologically inspired by activity of our brain, where different neurons are
activated by different stimuli.
Activation Functions - Sigmoid
Activation Functions - Tanh
Activation Functions - ReLU
Activation Functions - Standards
● In practice, Tanh outperforms Sigmoid for internal layers.
○ Mean 0, Tanh Function.
○ Mean 0, Sigmoid Function.
○ In ML, we tend to center our data to avoid any kind of bias behaviour.
● Rule of thumb, ReLU for hidden layers generally performs well.
● Avoid Sigmoid for hidden layers.
● Sigmoid is a good candidate for Binary Classification problem.
● Identity Function for hidden layers - No Sense
Activation Functions - ReLU or Tanh ?
ReLU > Tanh
-
Avoids Vanishing Gradient
-
Is it the best ? [No]
Activation Functions - Why ?
Because
fLinear fLinear = fLinear = (N) Layers = (N-X) Layers
-
Trivial Functions are learned
-
Activation Functions - Why ?
● More Advanced Functions - Nonlinear.
● Should be Differentiable - for Backpropagation.
Hyperparameters
● There are number of parameters that can be tuned in while building your
neural network.
○ Number of Hidden Layers
○ Epochs
○ Loss Function
○ Optimization Function
○ Weight Initialization
○ Activation Functions
○ Batch Size
○ Learning Rate
Batch Size
● The Batch Size is the number of samples that will be passed through the
network at a time.
● Advantages
○ Your machine might not fit all the data in-memory at any given instance.
○ You want your model to generalize quickly.
Training - Pre:1
Derivative
Training - Pre:2
Partial Derivative
Training - Pre:3
Chain Rule
Training - Example
0.05
0.10
0.02
Xi
(Input)
Input
0.15
0.30
0.20
Weights
H1
H2
H3
X1
X2
X3
O1 Y
Output
0.33
Input Layer
Hidden Layer
Output Layer
Training - Forward Propagation
Hi
= ∑i=1
wi
*xi
(Compact Representation)
H1
= w1
*x1
+ w2
*x2
+ w3
*x3
(Expanded Representation)
H1
= 0.15*0.05 + 0.20*0.10 + 0.30*0.02
H1
= σ(0.0335) = Hσ1
O1
= ∑i=0
Hi
*wi
(Compact Representation)
O1
= H1
*0.33 = 0.0335*0.33
O1
= σ(0.011055) = Oσ1
σ = 1 / 1 + e-H
Error = |Y - Yi
|
Training - Backward Propagation
The Goal,
Is to update each of the weights in the network so that they
cause the actual output to be closer the target output.
Training - Backward Propagation
∂Error/∂w4
= (∂Error/∂O 1
) x (∂O 1
/∂O1
) x (∂O1
/∂w4
)
∂Error/∂wi
= Partial derivative w.r.t wi
w4
O 1O1
w
4
Error
∂Error/∂w1
= (∂Error/∂H 1
) x (∂H 1
/∂H1
) x (∂H1
/∂w1
)
= |Y - Yi
|
w1
Training - Backward Propagation
0.33
0.04
H1
Etotal
= Eo1
+ Eo2
E0
E1
∂Etotal
/∂H 1
= ∂Eo1
/∂H 1
+ ∂Eo2
/∂H 1
∂Eo1
/∂H 1
= (∂Eo1
/∂H1
x (∂H1
/∂H 1
)
∂Eo2
/∂H 1
= (∂Eo2
/∂H1
) x (∂H1
/∂H 1
)
∂Etotal
/∂w1
= (∂Etotal
/∂H 1
) x (∂H 1
/∂H1
) x (∂H1
/∂w1
)
w1
H 1
Works perfect on training data ?
Regularization
Technique for preventing overfitting
Regularization reduces overfitting by adding a penalty to the loss function
Regularization- Dropout
● Dropout refers to ignoring units (i.e. neurons) during the training phase of
certain set of neurons which is chosen at random.
● Avoids co-dependency amongst neurons during training.
● Dropout with a given probability (20%-50%) in each weight update cycle.
● Dropout at each layer of the network has shown good results.
Regularization- Dropout
References
● Adam Optimization
● Andrew Ng Youtube
● Siraj Raval Youtube
● Adam Optimization
● Cross Entropy
● Deep Learning Basics
● BackPropagation

More Related Content

What's hot

What's hot (20)

Introduction to Deep learning and H2O for beginner's
Introduction to Deep learning and H2O for beginner'sIntroduction to Deep learning and H2O for beginner's
Introduction to Deep learning and H2O for beginner's
 
Techniques in Deep Learning
Techniques in Deep LearningTechniques in Deep Learning
Techniques in Deep Learning
 
Variational Autoencoder
Variational AutoencoderVariational Autoencoder
Variational Autoencoder
 
Deep Learning in Finance
Deep Learning in FinanceDeep Learning in Finance
Deep Learning in Finance
 
Introduction of "TrailBlazer" algorithm
Introduction of "TrailBlazer" algorithmIntroduction of "TrailBlazer" algorithm
Introduction of "TrailBlazer" algorithm
 
Deep learning with TensorFlow
Deep learning with TensorFlowDeep learning with TensorFlow
Deep learning with TensorFlow
 
Ultrasound Nerve Segmentation
Ultrasound Nerve Segmentation Ultrasound Nerve Segmentation
Ultrasound Nerve Segmentation
 
Introducton to Convolutional Nerural Network with TensorFlow
Introducton to Convolutional Nerural Network with TensorFlowIntroducton to Convolutional Nerural Network with TensorFlow
Introducton to Convolutional Nerural Network with TensorFlow
 
Differential privacy without sensitivity [NIPS2016読み会資料]
Differential privacy without sensitivity [NIPS2016読み会資料]Differential privacy without sensitivity [NIPS2016読み会資料]
Differential privacy without sensitivity [NIPS2016読み会資料]
 
Training Neural Networks
Training Neural NetworksTraining Neural Networks
Training Neural Networks
 
Improving Variational Inference with Inverse Autoregressive Flow
Improving Variational Inference with Inverse Autoregressive FlowImproving Variational Inference with Inverse Autoregressive Flow
Improving Variational Inference with Inverse Autoregressive Flow
 
Chap 8. Optimization for training deep models
Chap 8. Optimization for training deep modelsChap 8. Optimization for training deep models
Chap 8. Optimization for training deep models
 
Optimization (DLAI D4L1 2017 UPC Deep Learning for Artificial Intelligence)
Optimization (DLAI D4L1 2017 UPC Deep Learning for Artificial Intelligence)Optimization (DLAI D4L1 2017 UPC Deep Learning for Artificial Intelligence)
Optimization (DLAI D4L1 2017 UPC Deep Learning for Artificial Intelligence)
 
Anomaly Detection by ADGM / LVAE
Anomaly Detection by ADGM / LVAEAnomaly Detection by ADGM / LVAE
Anomaly Detection by ADGM / LVAE
 
Generative adversarial networks
Generative adversarial networksGenerative adversarial networks
Generative adversarial networks
 
Backpropagation (DLAI D3L1 2017 UPC Deep Learning for Artificial Intelligence)
Backpropagation (DLAI D3L1 2017 UPC Deep Learning for Artificial Intelligence)Backpropagation (DLAI D3L1 2017 UPC Deep Learning for Artificial Intelligence)
Backpropagation (DLAI D3L1 2017 UPC Deep Learning for Artificial Intelligence)
 
VAE-type Deep Generative Models
VAE-type Deep Generative ModelsVAE-type Deep Generative Models
VAE-type Deep Generative Models
 
Reinforcement Learning for Self Driving Cars
Reinforcement Learning for Self Driving CarsReinforcement Learning for Self Driving Cars
Reinforcement Learning for Self Driving Cars
 
Overview of TensorFlow For Natural Language Processing
Overview of TensorFlow For Natural Language ProcessingOverview of TensorFlow For Natural Language Processing
Overview of TensorFlow For Natural Language Processing
 
Reading group gan - 20170417
Reading group   gan - 20170417Reading group   gan - 20170417
Reading group gan - 20170417
 

Similar to Neural networks

Deep Learning Module 2A Training MLP.pptx
Deep Learning Module 2A Training MLP.pptxDeep Learning Module 2A Training MLP.pptx
Deep Learning Module 2A Training MLP.pptx
vipul6601
 
Deep Learning Interview Questions And Answers | AI & Deep Learning Interview ...
Deep Learning Interview Questions And Answers | AI & Deep Learning Interview ...Deep Learning Interview Questions And Answers | AI & Deep Learning Interview ...
Deep Learning Interview Questions And Answers | AI & Deep Learning Interview ...
Simplilearn
 
Artificial Neural Network
Artificial Neural NetworkArtificial Neural Network
Artificial Neural Network
Pratik Aggarwal
 
Deep learning - a primer
Deep learning - a primerDeep learning - a primer
Deep learning - a primer
Uwe Friedrichsen
 
Deep learning - a primer
Deep learning - a primerDeep learning - a primer
Deep learning - a primer
Shirin Elsinghorst
 

Similar to Neural networks (20)

Deep Learning Module 2A Training MLP.pptx
Deep Learning Module 2A Training MLP.pptxDeep Learning Module 2A Training MLP.pptx
Deep Learning Module 2A Training MLP.pptx
 
08 neural networks
08 neural networks08 neural networks
08 neural networks
 
Machine Learning With Neural Networks
Machine Learning  With Neural NetworksMachine Learning  With Neural Networks
Machine Learning With Neural Networks
 
Eye deep
Eye deepEye deep
Eye deep
 
Deep learning
Deep learningDeep learning
Deep learning
 
Deep learning from scratch
Deep learning from scratch Deep learning from scratch
Deep learning from scratch
 
Deep Learning Tutorial
Deep Learning Tutorial Deep Learning Tutorial
Deep Learning Tutorial
 
Deep Learning Interview Questions And Answers | AI & Deep Learning Interview ...
Deep Learning Interview Questions And Answers | AI & Deep Learning Interview ...Deep Learning Interview Questions And Answers | AI & Deep Learning Interview ...
Deep Learning Interview Questions And Answers | AI & Deep Learning Interview ...
 
Artificial Neural Network
Artificial Neural NetworkArtificial Neural Network
Artificial Neural Network
 
Deep learning crash course
Deep learning crash courseDeep learning crash course
Deep learning crash course
 
V2.0 open power ai virtual university deep learning and ai introduction
V2.0 open power ai virtual university   deep learning and ai introductionV2.0 open power ai virtual university   deep learning and ai introduction
V2.0 open power ai virtual university deep learning and ai introduction
 
DeepLearningLecture.pptx
DeepLearningLecture.pptxDeepLearningLecture.pptx
DeepLearningLecture.pptx
 
Artificial Neural Networks
Artificial Neural NetworksArtificial Neural Networks
Artificial Neural Networks
 
CSSC ML Workshop
CSSC ML WorkshopCSSC ML Workshop
CSSC ML Workshop
 
Machine Learning, Deep Learning and Data Analysis Introduction
Machine Learning, Deep Learning and Data Analysis IntroductionMachine Learning, Deep Learning and Data Analysis Introduction
Machine Learning, Deep Learning and Data Analysis Introduction
 
Activation_function.pptx
Activation_function.pptxActivation_function.pptx
Activation_function.pptx
 
Deep learning - a primer
Deep learning - a primerDeep learning - a primer
Deep learning - a primer
 
Deep learning - a primer
Deep learning - a primerDeep learning - a primer
Deep learning - a primer
 
Practical ML
Practical MLPractical ML
Practical ML
 
Foundations: Artificial Neural Networks
Foundations: Artificial Neural NetworksFoundations: Artificial Neural Networks
Foundations: Artificial Neural Networks
 

Recently uploaded

Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 

Recently uploaded (20)

Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 

Neural networks

  • 1. Neural NetworksUniversal Function Approximators -Prakhar Mishra
  • 2. Agenda ● Machine Learning Refresher ○ An Example ○ Hierarchical Division ○ Split Ratio ○ Evaluation Metric ● Neural Networks ○ Inspiration ○ Computation Graph ○ Architecture ○ Hyperparameters ○ Regularization ○ Backpropagation
  • 3. Machine Learning - Quick Refresher
  • 4. Machine Learning - Quick Refresher
  • 5. Machine Learning - Quick Refresher Feature Engineering
  • 6. Machine Learning - Quick Refresher Figure out yourself
  • 7. Machine Learning - Quick Refresher
  • 8. Machine Learning - Quick Refresher
  • 9. Machine Learning - Quick Refresher 70%-80% 30%-20%
  • 10. Machine Learning - Evaluation Metrics ● Confusion Matrix ○ Evaluation for performance of classification model ● Accuracy = (TP + TN) /total samples
  • 11. Machine Learning - Evaluation Metrics ● Root Mean Squared Error ○ Spread of the predicted y-values about the original y-values. N = Total Samples Yi = Predicted Yi = Actual
  • 12. Rise of Neural Nets Scale drives Deep Learning
  • 14. Neural Nets - Supervised Input Output Application Home Features Cost Real Estate Ad, User Information Click on Ad ? Online Advertising Image (1...1000) Class Photo Tagging Audio Text Speech Recognition English Chinese Machine Translation
  • 15. Computation Graph J(a, b, c) = 3(a + bc) U = bc V = a + U J = 3V Substitution U=b*c b c a V= a+U J = 3V Input a = 5 b = 3 c = 2 How does J change if we change V a bit? 11 33 6 How does J change if we change a a bit? a→V→J ∂J/∂a = (∂J/∂V) x (∂V/∂a) How does J change if we change b a bit? b→U→V→J ∂J/∂b = (∂J/∂V) x (∂V/∂U) x (∂U/∂b) Forward → Backward ←
  • 16. Architecture w1 i 1 i2 . . in wn o1 on . . xF F = Activation Function X = w1 *i1 + w2 *i2 + . . +wn *in + b 3 Layer NN
  • 17. Hyperparameters ● There are number of parameters that can be tuned in while building your neural network. ○ Number of Hidden Layers ○ Epochs ○ Loss Function ○ Optimization Function ○ Weight Initialization ○ Activation Functions ○ Batch Size ○ Learning Rate
  • 18. Weight Initialization ● If the weights in a network start too small, then the signal shrinks as it passes through each layer until it’s too tiny to be useful. ● If the weights in a network start too large, then the signal grows as it passes through each layer until it’s too massive to be useful. - Xavier Initialization -
  • 20. Hyperparameters ● There are number of parameters that can be tuned in while building your neural network. ○ Number of Hidden Layers ○ Epochs ○ Loss Function ○ Optimization Function ○ Weight Initialization ○ Activation Functions ○ Batch Size ○ Learning Rate
  • 21. Loss Functions ● Binary Cross Entropy ● Categorical Cross Entropy ● Root Mean Squared Error
  • 22. Hyperparameters ● There are number of parameters that can be tuned in while building your neural network. ○ Number of Hidden Layers ○ Epochs ○ Loss Function ○ Optimization Function ○ Weight Initialization ○ Activation Functions ○ Batch Size ○ Learning Rate
  • 23. Optimization Functions ● Adagrad Optimizer ● Gradient Descent Optimizer ● Adams Optimizer ● Stochastic Gradient Descent Optimizer ● RMSProp Optimizer
  • 26. Hyperparameters ● There are number of parameters that can be tuned in while building your neural network. ○ Number of Hidden Layers ○ Epochs ○ Loss Function ○ Optimization Function ○ Weight Initialization ○ Activation Functions ○ Batch Size ○ Learning Rate
  • 27. Learning Rate ● Decaying the Learning Rate overtime is seen to fasten the learning process/convergence.
  • 29. Learning Rate- Formula 1 1 + decay x learning_rate Alpha0Alpha1
  • 30. Learning Rate- Special Case Wi = Wi-1 + Alpha x Slope Pseudo Self Adaptive in Convex Curve
  • 31. Hyperparameters ● There are number of parameters that can be tuned in while building your neural network. ○ Number of Hidden Layers ○ Epochs ○ Loss Function ○ Optimization Function ○ Weight Initialization ○ Activation Functions ○ Batch Size ○ Learning Rate
  • 32. Activation Functions Biologically inspired by activity of our brain, where different neurons are activated by different stimuli.
  • 36. Activation Functions - Standards ● In practice, Tanh outperforms Sigmoid for internal layers. ○ Mean 0, Tanh Function. ○ Mean 0, Sigmoid Function. ○ In ML, we tend to center our data to avoid any kind of bias behaviour. ● Rule of thumb, ReLU for hidden layers generally performs well. ● Avoid Sigmoid for hidden layers. ● Sigmoid is a good candidate for Binary Classification problem. ● Identity Function for hidden layers - No Sense
  • 37. Activation Functions - ReLU or Tanh ? ReLU > Tanh - Avoids Vanishing Gradient - Is it the best ? [No]
  • 38. Activation Functions - Why ? Because fLinear fLinear = fLinear = (N) Layers = (N-X) Layers - Trivial Functions are learned -
  • 39. Activation Functions - Why ? ● More Advanced Functions - Nonlinear. ● Should be Differentiable - for Backpropagation.
  • 40. Hyperparameters ● There are number of parameters that can be tuned in while building your neural network. ○ Number of Hidden Layers ○ Epochs ○ Loss Function ○ Optimization Function ○ Weight Initialization ○ Activation Functions ○ Batch Size ○ Learning Rate
  • 41. Batch Size ● The Batch Size is the number of samples that will be passed through the network at a time. ● Advantages ○ Your machine might not fit all the data in-memory at any given instance. ○ You want your model to generalize quickly.
  • 46. Training - Forward Propagation Hi = ∑i=1 wi *xi (Compact Representation) H1 = w1 *x1 + w2 *x2 + w3 *x3 (Expanded Representation) H1 = 0.15*0.05 + 0.20*0.10 + 0.30*0.02 H1 = σ(0.0335) = Hσ1 O1 = ∑i=0 Hi *wi (Compact Representation) O1 = H1 *0.33 = 0.0335*0.33 O1 = σ(0.011055) = Oσ1 σ = 1 / 1 + e-H Error = |Y - Yi |
  • 47. Training - Backward Propagation The Goal, Is to update each of the weights in the network so that they cause the actual output to be closer the target output.
  • 48. Training - Backward Propagation ∂Error/∂w4 = (∂Error/∂O 1 ) x (∂O 1 /∂O1 ) x (∂O1 /∂w4 ) ∂Error/∂wi = Partial derivative w.r.t wi w4 O 1O1 w 4 Error ∂Error/∂w1 = (∂Error/∂H 1 ) x (∂H 1 /∂H1 ) x (∂H1 /∂w1 ) = |Y - Yi | w1
  • 49. Training - Backward Propagation 0.33 0.04 H1 Etotal = Eo1 + Eo2 E0 E1 ∂Etotal /∂H 1 = ∂Eo1 /∂H 1 + ∂Eo2 /∂H 1 ∂Eo1 /∂H 1 = (∂Eo1 /∂H1 x (∂H1 /∂H 1 ) ∂Eo2 /∂H 1 = (∂Eo2 /∂H1 ) x (∂H1 /∂H 1 ) ∂Etotal /∂w1 = (∂Etotal /∂H 1 ) x (∂H 1 /∂H1 ) x (∂H1 /∂w1 ) w1 H 1
  • 50. Works perfect on training data ?
  • 51. Regularization Technique for preventing overfitting Regularization reduces overfitting by adding a penalty to the loss function
  • 52. Regularization- Dropout ● Dropout refers to ignoring units (i.e. neurons) during the training phase of certain set of neurons which is chosen at random. ● Avoids co-dependency amongst neurons during training. ● Dropout with a given probability (20%-50%) in each weight update cycle. ● Dropout at each layer of the network has shown good results.
  • 54. References ● Adam Optimization ● Andrew Ng Youtube ● Siraj Raval Youtube ● Adam Optimization ● Cross Entropy ● Deep Learning Basics ● BackPropagation