SlideShare a Scribd company logo
1 of 54
Deep Learning
DSE Module 2
Sivagami R
BITS Pilani
The author of this deck, Prof. Seetha Parameswaran,
is gratefully acknowledging the authors who made
their course materials freely available online.
2
Single Perceptron for Regression
Real Valued Output
3
Linear Regression Example
● Suppose that we wish to estimate the prices of houses (in dollars)
based on their area (in square feet) and age (in years).
● The linearity assumption just says that the target (price) can be
expressed as a weighted sum of the features (area and age):
price = warea * area + wage * age + b
● warea and wage are called weights, and b is called a bias.
● The weights determine the influence of each feature on our prediction
and the bias just says what value the predicted price should take
when all of the features take value 0.
4
Data
● Data
○ The dataset is called a training dataset or training set.
○ Each row is called an example (or data point, data instance, sample).
○ The thing we are trying to predict is called a label (or target). 
○ The independent variables upon which the predictions are based are called
features (or covariates).
5
Affine transformations and Linear Models
● The equation of the form
● is an affine transformation of input features, which is characterized by a
linear transformation of features via weighted sum, combined with a
translation via the added bias.
● Models whose output prediction is determined by the affine
transformation of input features are linear models.
● The affine transformation is specified by the chosen weights (w) and bias
(b). 6
Loss Function
● Loss function is a quality measure for some given model or a
measure of fitness.
● The loss function quantifies the distance between the real and
predicted value of the target.
● The loss will usually be a non-negative number where smaller values
are better and perfect predictions incur a loss of 0.
● The most popular loss function in regression problems is the squared
error.
7
Squared Error Loss Function
● The most popular loss function in regression problems is the squared
error.
● For each example,
● For the entire dataset of n examples
○ average (or equivalently, sum) the losses
● When training the model, find parameters (w∗, b∗ ) that minimize the
total loss across all training examples
8
Minibatch Stochastic Gradient Descent (SGD)
● Apply Gradient descent algorithm on a random minibatch of examples
every time we need to compute the update.
● In each iteration,
○ Step 1: randomly sample a minibatch B consisting of a fixed number of training
examples.
○ Step 2: compute the derivative (gradient) of the average loss on the minibatch with
regard to the model parameters.
○ Step 3: multiply the gradient by a predetermined positive value η and subtract the
resulting term from the current parameter values.
9
Training using SGD Algorithm
PS: The number of epochs and the learning rate are both hyperparameters. Setting
hyperparameters requires some adjustment by trial and error.
10
Prediction
● Estimating targets given features is commonly called prediction or
inference.
● Given the learned model, values of target can be predicted, for any
set of features.
11
Single-Layer Neural Network
Linear regression is a single-layer neural network
○ Number of inputs (or feature dimensionality) in the input layer is d.
○ The inputs are x1 , . . . , xd.
○ The output is o1.
○ Number of outputs in the output layer is 1.
○ Number of layers for the neural network is 1. (conventionally we do not consider the
input layer when counting layers.)
○ Every input is connected to every output, This transformation is a fully-connected
layer or dense layer.
12
Multiple Perceptrons for Classification
Binary Outputs
13
Classification Example
● Each input consists of a 2 × 2 grayscale image.
● Represent each pixel value with a single scalar, giving four features
x1 , x2, x3 , x4.
● Assume that each image belongs to one among the categories
“square”, “triangle”, and “circle”.
● How to represent the labels?
○ Use label encoding. y ∈ {1, 2, 3}, where the integers represent {circle, square,
triangle} respectively.
○ Use one-hot encoding. y ∈ {(1, 0, 0), (0, 1, 0), (0, 0, 1)}.
■ y would be a three-dimensional vector, with (1, 0, 0) corresponding to “circle”,
(0, 1, 0) to “square”, and (0, 0, 1) to “triangle”.
14
Network Architecture
● A model with multiple outputs, one per class. Each output will
correspond to its own affine function.
○ 4 features and 3 possible output categories
15
Network Architecture
○ 12 scalars to represent the weights and 3 scalars to represent the biases
○ compute three logits, o1, o2, and o3, for each input.
○ weights is a 3×4 matrix and bias is 1×4 matrix
16
Softmax Operation
● Interpret the outputs of our model as probabilities.
○ Any output ŷj is interpreted as the probability that a given item belongs to class j. Then
choose the class with the largest output value as our prediction argmaxj yj .
○ If ŷ1 , ŷ2 , and ŷ3 are 0.1, 0.8, and 0.1, respectively, then predict category 2.
○ To interpret the outputs as probabilities, we must guarantee that, they will be
nonnegative and sum up to 1.
● The softmax function transforms the outputs such that they become
nonnegative and sum to 1, while requiring that the model remains
differentiable.
○ first exponentiate each logit (ensuring non-negativity) and then divide by their sum
(ensuring that they sum to 1)
● Softmax is a nonlinear function.
17
Log-Likelihood Loss Function / Cross-Entropy loss
● The softmax function gives us a vector ŷ, which we can interpret as
estimated conditional probabilities of each class given any input x,
○ ŷ1 = P (y = cat | x).
● Compare the estimates with reality by checking how probable the
actual classes are according to our model, given the features:
● Maximize P (Y | X) = minimizing the negative log-likelihood
18
Multi Layered Perceptrons (MLP)
19
Multilayer Perceptron
● With deep neural networks, use the data to jointly learn both a
representation via hidden layers and a linear predictor that acts upon
that representation.
● Add many hidden layers by stacking many fully-connected layers on
top of each other. Each layer feeds into the layer above it, until we
generate outputs.
● The first (L−1) layers learns the representation and the final layer is
the linear predictor. This architecture is commonly called a multilayer
perceptron (MLP).
20
MLP Architecture
○ MLP has 4 inputs, 3 outputs, and its hidden layer contains 5 hidden units.
○ Number of layers in this MLP is 2.
○ The layers are both fully connected. Every input influences every neuron in the
hidden layer, and each of these in turn influences every neuron in the output layer.
○ The outputs of the hidden layer are called as hidden representations or hidden-
layer variable or a hidden variable.
21
Nonlinearity in MLP
22
Activation Functions
23
Activation function
● Activation function of a neuron defines the output of that neuron given
an input or set of inputs.
● Activation functions decide whether a neuron should be activated or
not by calculating the weighted sum and adding bias with it.
● They are differentiable operators to transform input signals to outputs,
while most of them add non-linearity.
● Artificial neural networks are designed as universal function
approximators, they must have the ability to calculate and learn any
nonlinear function.
24
1. Step Function
25
2. Sigmoid (Logistic) Activation Function
Squashing function as: it squashes any input in the range (-inf, inf) to some value
in the range (0, 1) 26
3. Tanh (hyperbolic tangent) Activation Function
● More efficient because it has a wider range for faster learning and grading.
● The tanh activation usually works better than sigmoid activation function for
hidden units because the mean of its output is closer to zero, and so it centers the
data better for the next layer.
● Issues with tanh
○ computationally expensive
○ lead to vanishing gradients
27
4. ReLU Activation Function
● If we combine two ReLU units, we can recover a piecewise linear approximation of the Sigmoid function.
● Some ReLU variants: Softplus (Smooth ReLU), Noisy ReLU, Leaky ReLU, Parametric ReLU and Exponential ReLU
(ELU).
● Advantages
○ Fast Learning and Efficient computation
○ Fewer vanishing gradient problems
○ Sparse activation
○ Scale invariant (max operation)
● Disadvantages
○ Leads to exploding gradient.
28
Comparing Activation Functions
29
Training MLP
30
Two Layer Neural Network
31
Compute the Activations
32
Vectoring Forward Propagation
33
Neural Network Training – Forward Pass
34
Neural Network Training – Forward Pass
35
Neural Network Training – Forward Pass
36
Forward Propagation Algorithm
37
Computation Graph for Forward Pass
38
Cost Function
39
Neural Network Training – Backward Pass
40
Neural Network Training – Backward Pass
41
Neural Network Training – Backward Pass
42
Backpropagation Algorithm to compute Gradients
43
Computation Graph for BackProp
44
Neural Network Training – Update Parameters
45
Parameter Updation
46
Training of MultiLayer Perceptron (MLP)
Requires
1. Forward Pass through each layer to compute the output.
2. Compute the deviation or error between the desired output and
computed output in the forward pass (first step). This morphs into
objective function, as we want to minimize this deviation or error.
3. The deviation has to be send back through each layer to compute the
delta or change in the parameter values. This is achieved using back
propagation algorithm.
4. Update the parameters.
47
48
Scaling up for L layers in MLP
49
Forward Propagation Algorithm
50
Backward Propagation Algorithm
51
Update the Parameters
52
Ref:
Chapter 3 and 4 of T1
53
Next Session:
Power of MLP
54

More Related Content

Similar to Deep Learning Module 2A Training MLP.pptx

Machine Learning Notes for beginners ,Step by step
Machine Learning Notes for beginners ,Step by stepMachine Learning Notes for beginners ,Step by step
Machine Learning Notes for beginners ,Step by stepSanjanaSaxena17
 
Artificial Neural Network
Artificial Neural NetworkArtificial Neural Network
Artificial Neural NetworkAtul Krishna
 
Introducton to Convolutional Nerural Network with TensorFlow
Introducton to Convolutional Nerural Network with TensorFlowIntroducton to Convolutional Nerural Network with TensorFlow
Introducton to Convolutional Nerural Network with TensorFlowEtsuji Nakai
 
Introduction to Artificial Neural Networks
Introduction to Artificial Neural NetworksIntroduction to Artificial Neural Networks
Introduction to Artificial Neural NetworksStratio
 
LightGBM and Multilayer perceptron (MLP) slide
LightGBM and Multilayer perceptron (MLP) slideLightGBM and Multilayer perceptron (MLP) slide
LightGBM and Multilayer perceptron (MLP) slideriahaque1950
 
Illustrative Introductory Neural Networks
Illustrative Introductory Neural NetworksIllustrative Introductory Neural Networks
Illustrative Introductory Neural NetworksYasutoTamura1
 
Loss functions (DLAI D4L2 2017 UPC Deep Learning for Artificial Intelligence)
Loss functions (DLAI D4L2 2017 UPC Deep Learning for Artificial Intelligence)Loss functions (DLAI D4L2 2017 UPC Deep Learning for Artificial Intelligence)
Loss functions (DLAI D4L2 2017 UPC Deep Learning for Artificial Intelligence)Universitat Politècnica de Catalunya
 
Aaa ped-17-Unsupervised Learning: Dimensionality reduction
Aaa ped-17-Unsupervised Learning: Dimensionality reductionAaa ped-17-Unsupervised Learning: Dimensionality reduction
Aaa ped-17-Unsupervised Learning: Dimensionality reductionAminaRepo
 
Activation_function.pptx
Activation_function.pptxActivation_function.pptx
Activation_function.pptxMohamed Essam
 
Artificial Neural Network
Artificial Neural NetworkArtificial Neural Network
Artificial Neural NetworkPratik Aggarwal
 
Neural network basic and introduction of Deep learning
Neural network basic and introduction of Deep learningNeural network basic and introduction of Deep learning
Neural network basic and introduction of Deep learningTapas Majumdar
 
Deep learning: Mathematical Perspective
Deep learning: Mathematical PerspectiveDeep learning: Mathematical Perspective
Deep learning: Mathematical PerspectiveYounusS2
 
Introduction to Applied Machine Learning
Introduction to Applied Machine LearningIntroduction to Applied Machine Learning
Introduction to Applied Machine LearningSheilaJimenezMorejon
 
Auto encoders in Deep Learning
Auto encoders in Deep LearningAuto encoders in Deep Learning
Auto encoders in Deep LearningShajun Nisha
 

Similar to Deep Learning Module 2A Training MLP.pptx (20)

Machine Learning Notes for beginners ,Step by step
Machine Learning Notes for beginners ,Step by stepMachine Learning Notes for beginners ,Step by step
Machine Learning Notes for beginners ,Step by step
 
Artificial Neural Network
Artificial Neural NetworkArtificial Neural Network
Artificial Neural Network
 
Introducton to Convolutional Nerural Network with TensorFlow
Introducton to Convolutional Nerural Network with TensorFlowIntroducton to Convolutional Nerural Network with TensorFlow
Introducton to Convolutional Nerural Network with TensorFlow
 
Introduction to Artificial Neural Networks
Introduction to Artificial Neural NetworksIntroduction to Artificial Neural Networks
Introduction to Artificial Neural Networks
 
LightGBM and Multilayer perceptron (MLP) slide
LightGBM and Multilayer perceptron (MLP) slideLightGBM and Multilayer perceptron (MLP) slide
LightGBM and Multilayer perceptron (MLP) slide
 
Illustrative Introductory Neural Networks
Illustrative Introductory Neural NetworksIllustrative Introductory Neural Networks
Illustrative Introductory Neural Networks
 
Loss functions (DLAI D4L2 2017 UPC Deep Learning for Artificial Intelligence)
Loss functions (DLAI D4L2 2017 UPC Deep Learning for Artificial Intelligence)Loss functions (DLAI D4L2 2017 UPC Deep Learning for Artificial Intelligence)
Loss functions (DLAI D4L2 2017 UPC Deep Learning for Artificial Intelligence)
 
CSSC ML Workshop
CSSC ML WorkshopCSSC ML Workshop
CSSC ML Workshop
 
Aaa ped-17-Unsupervised Learning: Dimensionality reduction
Aaa ped-17-Unsupervised Learning: Dimensionality reductionAaa ped-17-Unsupervised Learning: Dimensionality reduction
Aaa ped-17-Unsupervised Learning: Dimensionality reduction
 
Eye deep
Eye deepEye deep
Eye deep
 
The Perceptron (D1L2 Deep Learning for Speech and Language)
The Perceptron (D1L2 Deep Learning for Speech and Language)The Perceptron (D1L2 Deep Learning for Speech and Language)
The Perceptron (D1L2 Deep Learning for Speech and Language)
 
Activation_function.pptx
Activation_function.pptxActivation_function.pptx
Activation_function.pptx
 
Neural networks
Neural networksNeural networks
Neural networks
 
Artificial Neural Network
Artificial Neural NetworkArtificial Neural Network
Artificial Neural Network
 
Neural network basic and introduction of Deep learning
Neural network basic and introduction of Deep learningNeural network basic and introduction of Deep learning
Neural network basic and introduction of Deep learning
 
Deep learning: Mathematical Perspective
Deep learning: Mathematical PerspectiveDeep learning: Mathematical Perspective
Deep learning: Mathematical Perspective
 
Introduction to Applied Machine Learning
Introduction to Applied Machine LearningIntroduction to Applied Machine Learning
Introduction to Applied Machine Learning
 
Auto encoders in Deep Learning
Auto encoders in Deep LearningAuto encoders in Deep Learning
Auto encoders in Deep Learning
 
UNIT 5-ANN.ppt
UNIT 5-ANN.pptUNIT 5-ANN.ppt
UNIT 5-ANN.ppt
 
The Perceptron - Xavier Giro-i-Nieto - UPC Barcelona 2018
The Perceptron - Xavier Giro-i-Nieto - UPC Barcelona 2018The Perceptron - Xavier Giro-i-Nieto - UPC Barcelona 2018
The Perceptron - Xavier Giro-i-Nieto - UPC Barcelona 2018
 

Recently uploaded

VIP Call Girls Pune Madhuri 8617697112 Independent Escort Service Pune
VIP Call Girls Pune Madhuri 8617697112 Independent Escort Service PuneVIP Call Girls Pune Madhuri 8617697112 Independent Escort Service Pune
VIP Call Girls Pune Madhuri 8617697112 Independent Escort Service PuneCall girls in Ahmedabad High profile
 
Call Girls in Uttam Nagar Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Uttam Nagar Delhi 💯Call Us 🔝8264348440🔝Call Girls in Uttam Nagar Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Uttam Nagar Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Russian Call Girls in Kolkata Ishita 🤌 8250192130 🚀 Vip Call Girls Kolkata
Russian Call Girls in Kolkata Ishita 🤌  8250192130 🚀 Vip Call Girls KolkataRussian Call Girls in Kolkata Ishita 🤌  8250192130 🚀 Vip Call Girls Kolkata
Russian Call Girls in Kolkata Ishita 🤌 8250192130 🚀 Vip Call Girls Kolkataanamikaraghav4
 
Chennai Call Girls Alwarpet Phone 🍆 8250192130 👅 celebrity escorts service
Chennai Call Girls Alwarpet Phone 🍆 8250192130 👅 celebrity escorts serviceChennai Call Girls Alwarpet Phone 🍆 8250192130 👅 celebrity escorts service
Chennai Call Girls Alwarpet Phone 🍆 8250192130 👅 celebrity escorts servicevipmodelshub1
 
Chennai Call Girls Porur Phone 🍆 8250192130 👅 celebrity escorts service
Chennai Call Girls Porur Phone 🍆 8250192130 👅 celebrity escorts serviceChennai Call Girls Porur Phone 🍆 8250192130 👅 celebrity escorts service
Chennai Call Girls Porur Phone 🍆 8250192130 👅 celebrity escorts servicesonalikaur4
 
DDoS In Oceania and the Pacific, presented by Dave Phelan at NZNOG 2024
DDoS In Oceania and the Pacific, presented by Dave Phelan at NZNOG 2024DDoS In Oceania and the Pacific, presented by Dave Phelan at NZNOG 2024
DDoS In Oceania and the Pacific, presented by Dave Phelan at NZNOG 2024APNIC
 
Best VIP Call Girls Noida Sector 75 Call Me: 8448380779
Best VIP Call Girls Noida Sector 75 Call Me: 8448380779Best VIP Call Girls Noida Sector 75 Call Me: 8448380779
Best VIP Call Girls Noida Sector 75 Call Me: 8448380779Delhi Call girls
 
定制(CC毕业证书)美国美国社区大学毕业证成绩单原版一比一
定制(CC毕业证书)美国美国社区大学毕业证成绩单原版一比一定制(CC毕业证书)美国美国社区大学毕业证成绩单原版一比一
定制(CC毕业证书)美国美国社区大学毕业证成绩单原版一比一3sw2qly1
 
Call Girls South Delhi Delhi reach out to us at ☎ 9711199012
Call Girls South Delhi Delhi reach out to us at ☎ 9711199012Call Girls South Delhi Delhi reach out to us at ☎ 9711199012
Call Girls South Delhi Delhi reach out to us at ☎ 9711199012rehmti665
 
'Future Evolution of the Internet' delivered by Geoff Huston at Everything Op...
'Future Evolution of the Internet' delivered by Geoff Huston at Everything Op...'Future Evolution of the Internet' delivered by Geoff Huston at Everything Op...
'Future Evolution of the Internet' delivered by Geoff Huston at Everything Op...APNIC
 
Call Girls In Model Towh Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Model Towh Delhi 💯Call Us 🔝8264348440🔝Call Girls In Model Towh Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Model Towh Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
VIP Kolkata Call Girl Dum Dum 👉 8250192130 Available With Room
VIP Kolkata Call Girl Dum Dum 👉 8250192130  Available With RoomVIP Kolkata Call Girl Dum Dum 👉 8250192130  Available With Room
VIP Kolkata Call Girl Dum Dum 👉 8250192130 Available With Roomdivyansh0kumar0
 
VIP Kolkata Call Girls Salt Lake 8250192130 Available With Room
VIP Kolkata Call Girls Salt Lake 8250192130 Available With RoomVIP Kolkata Call Girls Salt Lake 8250192130 Available With Room
VIP Kolkata Call Girls Salt Lake 8250192130 Available With Roomgirls4nights
 
VIP 7001035870 Find & Meet Hyderabad Call Girls Dilsukhnagar high-profile Cal...
VIP 7001035870 Find & Meet Hyderabad Call Girls Dilsukhnagar high-profile Cal...VIP 7001035870 Find & Meet Hyderabad Call Girls Dilsukhnagar high-profile Cal...
VIP 7001035870 Find & Meet Hyderabad Call Girls Dilsukhnagar high-profile Cal...aditipandeya
 
Gram Darshan PPT cyber rural in villages of india
Gram Darshan PPT cyber rural  in villages of indiaGram Darshan PPT cyber rural  in villages of india
Gram Darshan PPT cyber rural in villages of indiaimessage0108
 
GDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark Web
GDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark WebGDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark Web
GDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark WebJames Anderson
 

Recently uploaded (20)

VIP Call Girls Pune Madhuri 8617697112 Independent Escort Service Pune
VIP Call Girls Pune Madhuri 8617697112 Independent Escort Service PuneVIP Call Girls Pune Madhuri 8617697112 Independent Escort Service Pune
VIP Call Girls Pune Madhuri 8617697112 Independent Escort Service Pune
 
Call Girls in Uttam Nagar Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Uttam Nagar Delhi 💯Call Us 🔝8264348440🔝Call Girls in Uttam Nagar Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Uttam Nagar Delhi 💯Call Us 🔝8264348440🔝
 
Russian Call Girls in Kolkata Ishita 🤌 8250192130 🚀 Vip Call Girls Kolkata
Russian Call Girls in Kolkata Ishita 🤌  8250192130 🚀 Vip Call Girls KolkataRussian Call Girls in Kolkata Ishita 🤌  8250192130 🚀 Vip Call Girls Kolkata
Russian Call Girls in Kolkata Ishita 🤌 8250192130 🚀 Vip Call Girls Kolkata
 
Chennai Call Girls Alwarpet Phone 🍆 8250192130 👅 celebrity escorts service
Chennai Call Girls Alwarpet Phone 🍆 8250192130 👅 celebrity escorts serviceChennai Call Girls Alwarpet Phone 🍆 8250192130 👅 celebrity escorts service
Chennai Call Girls Alwarpet Phone 🍆 8250192130 👅 celebrity escorts service
 
Chennai Call Girls Porur Phone 🍆 8250192130 👅 celebrity escorts service
Chennai Call Girls Porur Phone 🍆 8250192130 👅 celebrity escorts serviceChennai Call Girls Porur Phone 🍆 8250192130 👅 celebrity escorts service
Chennai Call Girls Porur Phone 🍆 8250192130 👅 celebrity escorts service
 
DDoS In Oceania and the Pacific, presented by Dave Phelan at NZNOG 2024
DDoS In Oceania and the Pacific, presented by Dave Phelan at NZNOG 2024DDoS In Oceania and the Pacific, presented by Dave Phelan at NZNOG 2024
DDoS In Oceania and the Pacific, presented by Dave Phelan at NZNOG 2024
 
Best VIP Call Girls Noida Sector 75 Call Me: 8448380779
Best VIP Call Girls Noida Sector 75 Call Me: 8448380779Best VIP Call Girls Noida Sector 75 Call Me: 8448380779
Best VIP Call Girls Noida Sector 75 Call Me: 8448380779
 
定制(CC毕业证书)美国美国社区大学毕业证成绩单原版一比一
定制(CC毕业证书)美国美国社区大学毕业证成绩单原版一比一定制(CC毕业证书)美国美国社区大学毕业证成绩单原版一比一
定制(CC毕业证书)美国美国社区大学毕业证成绩单原版一比一
 
Call Girls South Delhi Delhi reach out to us at ☎ 9711199012
Call Girls South Delhi Delhi reach out to us at ☎ 9711199012Call Girls South Delhi Delhi reach out to us at ☎ 9711199012
Call Girls South Delhi Delhi reach out to us at ☎ 9711199012
 
Rohini Sector 6 Call Girls Delhi 9999965857 @Sabina Saikh No Advance
Rohini Sector 6 Call Girls Delhi 9999965857 @Sabina Saikh No AdvanceRohini Sector 6 Call Girls Delhi 9999965857 @Sabina Saikh No Advance
Rohini Sector 6 Call Girls Delhi 9999965857 @Sabina Saikh No Advance
 
'Future Evolution of the Internet' delivered by Geoff Huston at Everything Op...
'Future Evolution of the Internet' delivered by Geoff Huston at Everything Op...'Future Evolution of the Internet' delivered by Geoff Huston at Everything Op...
'Future Evolution of the Internet' delivered by Geoff Huston at Everything Op...
 
Call Girls In Model Towh Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Model Towh Delhi 💯Call Us 🔝8264348440🔝Call Girls In Model Towh Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Model Towh Delhi 💯Call Us 🔝8264348440🔝
 
VIP Kolkata Call Girl Dum Dum 👉 8250192130 Available With Room
VIP Kolkata Call Girl Dum Dum 👉 8250192130  Available With RoomVIP Kolkata Call Girl Dum Dum 👉 8250192130  Available With Room
VIP Kolkata Call Girl Dum Dum 👉 8250192130 Available With Room
 
VIP Kolkata Call Girls Salt Lake 8250192130 Available With Room
VIP Kolkata Call Girls Salt Lake 8250192130 Available With RoomVIP Kolkata Call Girls Salt Lake 8250192130 Available With Room
VIP Kolkata Call Girls Salt Lake 8250192130 Available With Room
 
VIP 7001035870 Find & Meet Hyderabad Call Girls Dilsukhnagar high-profile Cal...
VIP 7001035870 Find & Meet Hyderabad Call Girls Dilsukhnagar high-profile Cal...VIP 7001035870 Find & Meet Hyderabad Call Girls Dilsukhnagar high-profile Cal...
VIP 7001035870 Find & Meet Hyderabad Call Girls Dilsukhnagar high-profile Cal...
 
Rohini Sector 26 Call Girls Delhi 9999965857 @Sabina Saikh No Advance
Rohini Sector 26 Call Girls Delhi 9999965857 @Sabina Saikh No AdvanceRohini Sector 26 Call Girls Delhi 9999965857 @Sabina Saikh No Advance
Rohini Sector 26 Call Girls Delhi 9999965857 @Sabina Saikh No Advance
 
Gram Darshan PPT cyber rural in villages of india
Gram Darshan PPT cyber rural  in villages of indiaGram Darshan PPT cyber rural  in villages of india
Gram Darshan PPT cyber rural in villages of india
 
GDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark Web
GDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark WebGDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark Web
GDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark Web
 
Rohini Sector 22 Call Girls Delhi 9999965857 @Sabina Saikh No Advance
Rohini Sector 22 Call Girls Delhi 9999965857 @Sabina Saikh No AdvanceRohini Sector 22 Call Girls Delhi 9999965857 @Sabina Saikh No Advance
Rohini Sector 22 Call Girls Delhi 9999965857 @Sabina Saikh No Advance
 
Model Call Girl in Jamuna Vihar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in  Jamuna Vihar Delhi reach out to us at 🔝9953056974🔝Model Call Girl in  Jamuna Vihar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Jamuna Vihar Delhi reach out to us at 🔝9953056974🔝
 

Deep Learning Module 2A Training MLP.pptx

  • 1. Deep Learning DSE Module 2 Sivagami R BITS Pilani
  • 2. The author of this deck, Prof. Seetha Parameswaran, is gratefully acknowledging the authors who made their course materials freely available online. 2
  • 3. Single Perceptron for Regression Real Valued Output 3
  • 4. Linear Regression Example ● Suppose that we wish to estimate the prices of houses (in dollars) based on their area (in square feet) and age (in years). ● The linearity assumption just says that the target (price) can be expressed as a weighted sum of the features (area and age): price = warea * area + wage * age + b ● warea and wage are called weights, and b is called a bias. ● The weights determine the influence of each feature on our prediction and the bias just says what value the predicted price should take when all of the features take value 0. 4
  • 5. Data ● Data ○ The dataset is called a training dataset or training set. ○ Each row is called an example (or data point, data instance, sample). ○ The thing we are trying to predict is called a label (or target). ○ The independent variables upon which the predictions are based are called features (or covariates). 5
  • 6. Affine transformations and Linear Models ● The equation of the form ● is an affine transformation of input features, which is characterized by a linear transformation of features via weighted sum, combined with a translation via the added bias. ● Models whose output prediction is determined by the affine transformation of input features are linear models. ● The affine transformation is specified by the chosen weights (w) and bias (b). 6
  • 7. Loss Function ● Loss function is a quality measure for some given model or a measure of fitness. ● The loss function quantifies the distance between the real and predicted value of the target. ● The loss will usually be a non-negative number where smaller values are better and perfect predictions incur a loss of 0. ● The most popular loss function in regression problems is the squared error. 7
  • 8. Squared Error Loss Function ● The most popular loss function in regression problems is the squared error. ● For each example, ● For the entire dataset of n examples ○ average (or equivalently, sum) the losses ● When training the model, find parameters (w∗, b∗ ) that minimize the total loss across all training examples 8
  • 9. Minibatch Stochastic Gradient Descent (SGD) ● Apply Gradient descent algorithm on a random minibatch of examples every time we need to compute the update. ● In each iteration, ○ Step 1: randomly sample a minibatch B consisting of a fixed number of training examples. ○ Step 2: compute the derivative (gradient) of the average loss on the minibatch with regard to the model parameters. ○ Step 3: multiply the gradient by a predetermined positive value η and subtract the resulting term from the current parameter values. 9
  • 10. Training using SGD Algorithm PS: The number of epochs and the learning rate are both hyperparameters. Setting hyperparameters requires some adjustment by trial and error. 10
  • 11. Prediction ● Estimating targets given features is commonly called prediction or inference. ● Given the learned model, values of target can be predicted, for any set of features. 11
  • 12. Single-Layer Neural Network Linear regression is a single-layer neural network ○ Number of inputs (or feature dimensionality) in the input layer is d. ○ The inputs are x1 , . . . , xd. ○ The output is o1. ○ Number of outputs in the output layer is 1. ○ Number of layers for the neural network is 1. (conventionally we do not consider the input layer when counting layers.) ○ Every input is connected to every output, This transformation is a fully-connected layer or dense layer. 12
  • 13. Multiple Perceptrons for Classification Binary Outputs 13
  • 14. Classification Example ● Each input consists of a 2 × 2 grayscale image. ● Represent each pixel value with a single scalar, giving four features x1 , x2, x3 , x4. ● Assume that each image belongs to one among the categories “square”, “triangle”, and “circle”. ● How to represent the labels? ○ Use label encoding. y ∈ {1, 2, 3}, where the integers represent {circle, square, triangle} respectively. ○ Use one-hot encoding. y ∈ {(1, 0, 0), (0, 1, 0), (0, 0, 1)}. ■ y would be a three-dimensional vector, with (1, 0, 0) corresponding to “circle”, (0, 1, 0) to “square”, and (0, 0, 1) to “triangle”. 14
  • 15. Network Architecture ● A model with multiple outputs, one per class. Each output will correspond to its own affine function. ○ 4 features and 3 possible output categories 15
  • 16. Network Architecture ○ 12 scalars to represent the weights and 3 scalars to represent the biases ○ compute three logits, o1, o2, and o3, for each input. ○ weights is a 3×4 matrix and bias is 1×4 matrix 16
  • 17. Softmax Operation ● Interpret the outputs of our model as probabilities. ○ Any output ŷj is interpreted as the probability that a given item belongs to class j. Then choose the class with the largest output value as our prediction argmaxj yj . ○ If ŷ1 , ŷ2 , and ŷ3 are 0.1, 0.8, and 0.1, respectively, then predict category 2. ○ To interpret the outputs as probabilities, we must guarantee that, they will be nonnegative and sum up to 1. ● The softmax function transforms the outputs such that they become nonnegative and sum to 1, while requiring that the model remains differentiable. ○ first exponentiate each logit (ensuring non-negativity) and then divide by their sum (ensuring that they sum to 1) ● Softmax is a nonlinear function. 17
  • 18. Log-Likelihood Loss Function / Cross-Entropy loss ● The softmax function gives us a vector ŷ, which we can interpret as estimated conditional probabilities of each class given any input x, ○ ŷ1 = P (y = cat | x). ● Compare the estimates with reality by checking how probable the actual classes are according to our model, given the features: ● Maximize P (Y | X) = minimizing the negative log-likelihood 18
  • 20. Multilayer Perceptron ● With deep neural networks, use the data to jointly learn both a representation via hidden layers and a linear predictor that acts upon that representation. ● Add many hidden layers by stacking many fully-connected layers on top of each other. Each layer feeds into the layer above it, until we generate outputs. ● The first (L−1) layers learns the representation and the final layer is the linear predictor. This architecture is commonly called a multilayer perceptron (MLP). 20
  • 21. MLP Architecture ○ MLP has 4 inputs, 3 outputs, and its hidden layer contains 5 hidden units. ○ Number of layers in this MLP is 2. ○ The layers are both fully connected. Every input influences every neuron in the hidden layer, and each of these in turn influences every neuron in the output layer. ○ The outputs of the hidden layer are called as hidden representations or hidden- layer variable or a hidden variable. 21
  • 24. Activation function ● Activation function of a neuron defines the output of that neuron given an input or set of inputs. ● Activation functions decide whether a neuron should be activated or not by calculating the weighted sum and adding bias with it. ● They are differentiable operators to transform input signals to outputs, while most of them add non-linearity. ● Artificial neural networks are designed as universal function approximators, they must have the ability to calculate and learn any nonlinear function. 24
  • 26. 2. Sigmoid (Logistic) Activation Function Squashing function as: it squashes any input in the range (-inf, inf) to some value in the range (0, 1) 26
  • 27. 3. Tanh (hyperbolic tangent) Activation Function ● More efficient because it has a wider range for faster learning and grading. ● The tanh activation usually works better than sigmoid activation function for hidden units because the mean of its output is closer to zero, and so it centers the data better for the next layer. ● Issues with tanh ○ computationally expensive ○ lead to vanishing gradients 27
  • 28. 4. ReLU Activation Function ● If we combine two ReLU units, we can recover a piecewise linear approximation of the Sigmoid function. ● Some ReLU variants: Softplus (Smooth ReLU), Noisy ReLU, Leaky ReLU, Parametric ReLU and Exponential ReLU (ELU). ● Advantages ○ Fast Learning and Efficient computation ○ Fewer vanishing gradient problems ○ Sparse activation ○ Scale invariant (max operation) ● Disadvantages ○ Leads to exploding gradient. 28
  • 31. Two Layer Neural Network 31
  • 34. Neural Network Training – Forward Pass 34
  • 35. Neural Network Training – Forward Pass 35
  • 36. Neural Network Training – Forward Pass 36
  • 38. Computation Graph for Forward Pass 38
  • 40. Neural Network Training – Backward Pass 40
  • 41. Neural Network Training – Backward Pass 41
  • 42. Neural Network Training – Backward Pass 42
  • 43. Backpropagation Algorithm to compute Gradients 43
  • 44. Computation Graph for BackProp 44
  • 45. Neural Network Training – Update Parameters 45
  • 47. Training of MultiLayer Perceptron (MLP) Requires 1. Forward Pass through each layer to compute the output. 2. Compute the deviation or error between the desired output and computed output in the forward pass (first step). This morphs into objective function, as we want to minimize this deviation or error. 3. The deviation has to be send back through each layer to compute the delta or change in the parameter values. This is achieved using back propagation algorithm. 4. Update the parameters. 47
  • 48. 48
  • 49. Scaling up for L layers in MLP 49
  • 53. Ref: Chapter 3 and 4 of T1 53

Editor's Notes

  1. Ch 3 of T1
  2. cardinality |B| represents the number of examples in each minibatch (the batch size) and η denotes the learning rate. We emphasize that the values of the batch size and learning rate are manually pre-specified and not typically learned through model training. These parameters that are tunable but not updated in the training loop are called hyperparameters. Hyperparameter tuning is the process by which hyperparameters are chosen, and typically requires that we adjust them based on the results of the training loop as assessed on a separate validation dataset (or validation set).
  3. Ch 3 of T1
  4. A one-hot encoding is a vector with as many components as we have categories. The component corresponding to particular instanceʼs category is set to 1 and all other components are set to 0.
  5. softmax is a nonlinear function, the outputs of softmax regression are still determined by an affine transformation of input features; thus, softmax regression is a linear model.
  6. Chapter 4 of T1
  7. We will lean computation graph in module 3
  8. We will lean computation graph in module 3