SlideShare a Scribd company logo
1 of 63
DEEP LEARNING
MULTILAYERED / HIERARCHICAL NEURAL NETWORK BASED INFORMATION PROCESSING
ROBUST, GENERALIZABLE, AND SCALABLE
AGENDA
• WHAT DOES IT MEAN FOR A MACHINE TO LEARN?
• ML VS DL
• APPLICATIONS OF DL
• DL CONCEPTS
• PROMINENT DL ARCHITECTURES
• DL BASED SAMPLE PROJECTS
HOW DOES MACHINE LEARNING WORK?
• Consider an Equation Ax = b, where b is the actual RHS value
• We make a machine predict the values of RHS as b’
• Machine learning is a set of algorithmic techniques to minimize
the error (b-b’) in this equation through optimization.
• This is done by changing the values of weights in the x column
vector (parameter vector) until we find a good set of values that
gives us the closest outcomes to the actual values of b.
•
DIFFERENCE BETWEEN ML AND DL
IMPORTANT APPLICATIONS OF ML
PROMINENT APPLICATIONS OF DL
MORE APPLICATIONS OF DL
Real Time Image Recognition Sentiment Analysis
Search Ranking Personalization
Speaker Identification Text Prediction
Handwriting Recognition Machine
Translation
Face Detection Music Tagging
Entity Recognition Style Transfer
Image Captioning Emotion Detection
Text Summarization
DL CONCEPTS
COMPUTATIONAL GRAPH
BIOLOGICAL NEURON & NETWORK
DEEP ARTIFICIAL NEURAL NETWORK
W1
W2
W3
f(x)
1.4
-2.5
-0.06
2.7
-8.6
0.002
f(x)
1.4
-2.5
-0.06
x = -0.06×2.7 + 2.5×8.6 + 1.4×0.002 = 21.34
A dataset
Fields class
1.4 2.7 1.9 0
3.8 3.4 3.2 0
6.4 2.8 1.7 1
4.1 0.1 0.2 0
etc …
Training the neural network
Fields class
1.4 2.7 1.9 0
3.8 3.4 3.2 0
6.4 2.8 1.7 1
4.1 0.1 0.2 0
etc …
Training data
Fields class
1.4 2.7 1.9 0
3.8 3.4 3.2 0
6.4 2.8 1.7 1
4.1 0.1 0.2 0
etc …
Initialise with random weights
Training data
Fields class
1.4 2.7 1.9 0
3.8 3.4 3.2 0
6.4 2.8 1.7 1
4.1 0.1 0.2 0
etc …
Present a training pattern
1.4
2.7
1.9
Training data
Fields class
1.4 2.7 1.9 0
3.8 3.4 3.2 0
6.4 2.8 1.7 1
4.1 0.1 0.2 0
etc …
Feed it through to get output
1.4
2.7 0.8
1.9
Training data
Fields class
1.4 2.7 1.9 0
3.8 3.4 3.2 0
6.4 2.8 1.7 1
4.1 0.1 0.2 0
etc …
Compare with target output
1.4
2.7 0.8
0
1.9 error 0.8
Training data
Fields class
1.4 2.7 1.9 0
3.8 3.4 3.2 0
6.4 2.8 1.7 1
4.1 0.1 0.2 0
etc …
Adjust weights based on error
1.4
2.7 0.8
0
1.9 error 0.8
Training data
Fields class
1.4 2.7 1.9 0
3.8 3.4 3.2 0
6.4 2.8 1.7 1
4.1 0.1 0.2 0
etc …
Present a training pattern
6.4
2.8
1.7
Training data
Fields class
1.4 2.7 1.9 0
3.8 3.4 3.2 0
6.4 2.8 1.7 1
4.1 0.1 0.2 0
etc …
Feed it through to get output
6.4
2.8 0.9
1.7
Training data
Fields class
1.4 2.7 1.9 0
3.8 3.4 3.2 0
6.4 2.8 1.7 1
4.1 0.1 0.2 0
etc …
Compare with target output
6.4
2.8 0.9
1
1.7 error -0.1
Training data
Fields class
1.4 2.7 1.9 0
3.8 3.4 3.2 0
6.4 2.8 1.7 1
4.1 0.1 0.2 0
etc …
Adjust weights based on error
6.4
2.8 0.9
1
1.7 error -0.1
Training data
Fields class
1.4 2.7 1.9 0
3.8 3.4 3.2 0
6.4 2.8 1.7 1
4.1 0.1 0.2 0
etc …
And so on ….
6.4
2.8 0.9
1
1.7 error -0.1
Repeat this thousands, maybe millions of times – each time taking a random training instance, and making
slight weight adjustments.
Algorithms for weight adjustment are designed to make changes that will reduce the error
A SAMPLE NEURAL NETWORK AND
CORRESPONDING COMPUTATION
The Forward Propagation
THE ENTIRE PROCESS BEHIND
OPTIMIZATION
• In each iteration:
• Select a network architecture, i.e. number of hidden layers, number of
neurons in each layer and activation function
• Initialize weights randomly
• Use forward propagation to determine the output node
• Find the error of the model using the known labels
• Back-propagate the error into the network and determine the error for
each node and Update the weights to minimize gradient
THE PSEUDOCODE FOR CALCULATING
OUTPUT OF FORWARD-PROPAGATING
NEURAL NETWORK
•# node[] := array of topologically sorted nodes,
An edge from a to b means a is to the left of b
•# If the Neural Network has R inputs and S
outputs, then first R nodes are input nodes and
last S nodes are output nodes.
•# incoming[x] := nodes connected to node x
•# weight[x] := weights of incoming edges to x
• For each neuron x, from left to right −
• if x <= R: do nothing # its an input node
• inputs[x] = [output[i] for i in incoming[x]]
• weighted_sum = dot_product(weights[x], inputs[x])
• output[x] = Activation_function(weighted_sum)
To train a neural network, use the iterative gradient descent method. After
random initialization, we make predictions on some subset of the data with
forward-propagation process, compute the corresponding cost function C,
and update each weight w by an amount proportional to dC/dw, i.e., the
derivative of the cost functions w.r.t. the weight. The proportionality
constant is known as the learning rate.
We calculate the gradients backwards, i.e., first calculate the gradients of
the output layer, then the top-most hidden layer, followed by the preceding
hidden layer, and so on, ending at the input layer.
GRADIENT DESCENT OPTIMIZATION
TECHNIQUE
TO FIND WHICH WEIGHT PRODUCES THE LEAST ERROR
• The ratio between network Error and each of the weights is a
derivative, dE/dw that calculates the extent to which a slight
change in a weight causes a slight change in the error
• Use the chain rule of calculus to work back through the
network activations and outputs. This leads us to the weight in
question, and its relationship to overall error.
• We can calculate how a change in weight affects a change in
error by first calculating how a change in activation affects a
change in Error, and how a change in weight affects a change in
activation.
learning rate
Gradient Descent
objective/cost function
𝑱(𝜽)
𝜃𝑗
𝑛𝑒𝑤
= 𝜃𝑗
𝑜𝑙𝑑
− 𝛼
ⅆ
ⅆ𝜃𝑗
𝑜𝑙𝑑 𝐽(𝜃) Update each element of θ
𝜃 𝑛𝑒𝑤
= 𝜃 𝑜𝑙𝑑
− 𝛼𝛻𝜃 𝐽(𝜃) Matrix notation for all parameters
Recursively apply chain rule though each node
OVERFITTING
• When the network learns the rare dependencies in training data
but can not produce the correct output for test data.
• Overfitting is combatted by Regularization.
• Regularization Methods – Drop out, early stopping, data
augmentation, transfer learning
DROPOUT
• Dropout is a technique where during each iteration of gradient
descent, we drop/ignore a set of randomly selected nodes.
• Each neuron is kept with a probability of q and dropped randomly
with probability 1-q. The value q may be different for each layer in
the neural network. A value of 0.5 for the hidden layers, and 0 for
input layer works well on a wide range of tasks.
• During evaluation and prediction, no dropout is used. The output
of each neuron is multiplied by q so that the input to the next layer
has the same expected value.
EARLY STOPPING
• We stop training when the error starts to increase.
• Here, by error, we mean the error measured on validation data,
which is the part of training data used for tuning hyper-
parameters.
• In this case, the hyper-parameter is the stop criteria.
DATA AUGMENTATION
• We increase the quantum of data we have or augment it by
using existing data and applying some transformations on it.
• For instance, in many computer vision tasks such as object
classification, an effective data augmentation technique is
adding new data points that are cropped or translated versions
of original data.
TRANSFER LEARNING
• The process of taking a pre-trained model and “fine-tuning”
the model with our own dataset is called transfer learning.
• We train the pre-trained model on a large dataset. Then, we
remove the last layer of the network and replace it with a new
layer with random weights.
• We then freeze the weights of all the other layers and train the
network normally. The pre-trained model will act as a feature
extractor, and only the last layer will be trained on the current
task.
CORE COMPONENTS OF A DEEP NET
• Parameters - weights on the connections in the network
• Layers- Layers are a fundamental architectural unit in deep networks
• Activation functions- A nonlinear transform applied to the output of
the previous layer. : • Sigmoid • Tanh • Hard tanh • Rectified linear
unit (ReLU)
• Loss functions- To determine the penalty for an incorrect
classification e.g. Squared loss • Logistic loss • Hinge loss • Negative
log likelihood classification of an input
• Optimization methods-Gradient Descent, Genetic Algo, Simulated
Annealing, PSO, ACO
• Hyper-parameters- Layer size , learning rate, Regularization
DEEP LEARNING
MODELS/ARCHITECTURES/ALGORITHMS
DEEP LEARNING
MODELS/ARCHITECTURES/ALGORITHMS
• Unsupervised Pre-trained Networks (UPNs)
• Auto-encoders • Deep Belief Networks (DBNs) • Generative Adversarial
Networks (GANs)
• Convolutional Neural Networks (CNNs)
• Recurrent Neural Networks(RNNs)
• Recursive Neural Networks(ReNNs)
AUTO-ENCODERS
TO LEARN COMPRESSED REPRESENTATIONS OF DATASETS
Auto-encoders differ from
multilayer perceptron:
• They use unlabeled data in
unsupervised learning.
• They build a compressed
representation of the input
data.
Use backpropagation to update
their weights. The main
difference between RBMs and
auto-encoders is in how they
calculate the gradients.
RESTRICTED BOLTZMANN MACHINES(RBM)
With RBMs, every visible unit is connected to
every hidden unit, yet no units from the same
layer are connected. Pre-training using RBMs
means teaching it to reconstruct the original
data from a limited sample of that data
Contrastive Divergence
RBMs calculate gradients by using an
algorithm called contrastive divergence. It
minimizes the KL divergence (the delta
between the real distribution of the data and
the guess) by sampling k steps of a Markov
chain to compute a guess.
RECONSTRUCTION IN RBMS
DEEP BELIEF NETWORKS
DBNS ARE COMPOSED OF LAYERS OF RESTRICTED BOLTZMANN MACHINES (RBMS) FOR THE PRE-
TRAIN
PHASE AND THEN A FEED-FORWARD NETWORK FOR THE FINE-TUNE PHASE.
GENERATIVE ADVERSARIAL
NETWORKS(GANS)
• GANs use unsupervised learning to train two adversarial models
in parallel.
• The generative network - The generative network in GANs
generates data (or images) with a special kind of layer called a
de-convolutional layer
• The discriminator network –the secondary network, generally
an CNN for image classification tasks
CONVOLUTIONAL NEURAL NETWORKS
(CNN) –
LEARN HIGHER-ORDER FEATURES IN THE DATA VIA
CONVOLUTIONS
• Convolutional layers transform the input data by using a patch
of locally connecting neurons from the previous layer. The layer
will compute a dot product between the region of the neurons
in the input layer and the weights to which they are locally
connected in the output layer.
• A convolution is defined as a mathematical operation
describing a rule for how to merge two sets of information. The
convolution operation is known as the feature detector of a
CNN.
CONVOLUTIONAL LAYER HYPER-
PARAMETERS
• Filter (or kernel) size (field size)
• Output depth
• Stride
• Zero-padding
POOLING LAYERS
• They reduce the data representation progressively over the
network and help control overfitting. The pooling layer
operates independently on every depth slice of the input.
• The pooling layer uses the max() operation to resize the input
data spatially (width, height). This operation is referred to as
max pooling. With a 2 × 2 filter size, the max() operation is
taking the largest of four numbers in the filter area. This
operation does not affect the depth dimension.
FULLY CONNECTED LAYERS
• To compute class scores that we’ll use as output of
the network (e.g., the output layer at the end of the
network). The dimensions of the output volume is [ 1
× 1 × N ], where N is the number of output classes
we’re evaluating.
• Fully connected layers perform transformations on the
input data volume that are a function of the
activations in the input volume and the parameters
RECURRENT NEURAL NETWORKS
• Recurrent Neural Networks take each vector from a sequence of
input vectors and model them one at a time. This allows the
network to retain state while modeling each input vector across
the window of input vectors. Modeling the time dimension is
done by Recurrent Neural Networks.
• A Recurrent Neural Network includes a feedback loop
to learn from sequences of varying lengths.
• Has an extra parameter matrix for the connections
between time-steps to capture the temporal
relationships in the data.
• RNNs are trained to generate sequences, in which the
output at each time-step is based on both the current
input and the input at all previous time steps.
• Normal Recurrent Neural Networks compute a
gradient with an algorithm called backpropagation
through time (BPTT).
VANISHING GRADIENT PROBLEM
• Recurrent Neural Networks are known to have issues with
the “vanishing gradient problem.”
• This issue occurs when the gradients become too large or
too small and make it difficult to model long-range
dependencies (10 time-steps or more) in the structure of
the input dataset.
• The most effective way to get around this issue is to use
the LSTM variant of Recurrent Neural Networks.
LSTM NETWORKS
• The critical component of the LSTM is the memory cell and the gates
(the forget gate, the input gate). The contents of the memory cell are
modulated by the input gates and forget gates.
• Assuming that both of these gates are closed, the contents of the
memory cell will remain unmodified between one time-step and the
next.
• The gating structure allows information to be retained across many
time-steps, and consequently also allows gradients to flow across
many time-steps.
• This allows the LSTM model to overcome the vanishing gradient
problem that occurs with most Recurrent Neural Network models.
USE CASES OF LSTMS
• Generating sentences (e.g.,
character-level language
models)
• Classifying time-series
• Speech recognition
• Handwriting recognition
• Polyphonic music modeling
RECURSIVE NEURAL NETWORKS
• Recursive Neural Networks, like Recurrent Neural Networks, can deal
with variable length input, can model hierarchical structures in
training dataset.
• Applications - Deconstructing scenes to not only identify the objects
in the scene, but also how the objects relate to form the scene, scene
and sentence parsing.
• Architecture - A shared-weight matrix and a binary tree structure
that allows the recursive network to learn varying sequences of words
or parts of an image.
• Recursive Neural Networks use a variation of backpropagation called
backpropagation through structure (BPTS). The feed-forward pass
happens bottom-up, and backpropagation is top-down.
• TENSORFLOW
• THEANO ( Keras, PyLearn2, Lasagne, Blocks)
• CAFFE
• NOLEARN
• DEEPNET
• DEEPPY
• DEEPLEARNING
PROMINENT DEEP LEARNING LIBRARIES IN
PYTHON
CHOOSING A DEEP NET FOR YOUR
RESEARCH
• To extract patterns from a set of un-labelled data, we use a RBM or an Auto
encoder.
• For text processing, sentiment analysis, parsing and name entity
recognition, we use a RNN or Recursive Neural Tensor Network or RNTN.
• For any language model that operates at character level, we use the RNN.
• For image recognition, we use deep belief network DBN or convolutional
network CNN.
• For object recognition, we use a RNTN or a CNN.
• For speech recognition, we use RNN.
• In general, DBN and MLP with RELU are both good choices for classification.
• For time series analysis, it is always recommended to use RNN.
WHEN TO USE DEEP LEARNING
• Simpler models (logistic regression) don’t achieve the accuracy
level your use case needs
• You have complex pattern matching in images, NLP, or audio to
deal with
• You have high dimensionality data
• You have the dimension of time in your vectors (sequences)
WHEN TO STICK WITH TRADITIONAL MACHINE
LEARNING
• You have high-quality, low-dimensional data; for example,
columnar data from a database export
• You’re not trying to find complex patterns in image data
• You’ll achieve poor results from both methods when the data is
incomplete and/or of poor quality.
DEEP LEARNING FOR DETECTING CANCER
• https://youtu.be/9Mz84cwVmS0
• Examples
READING LIST
• http://deeplearning.net/reading-list/
• http://ufldl.stanford.edu/wiki/index.php/
• http://www.cs.toronto.edu/∼hinton/
• http://deeplearning.net/tutorial/
• http://ufldl.stanford.edu/wiki/index.php/UFLDL_Tutorial
• http://ufldl.stanford.edu/wiki/index.php/Neural_Networks
THANK YOU

More Related Content

What's hot

backpropagation in neural networks
backpropagation in neural networksbackpropagation in neural networks
backpropagation in neural networksAkash Goel
 
An overview of gradient descent optimization algorithms
An overview of gradient descent optimization algorithms An overview of gradient descent optimization algorithms
An overview of gradient descent optimization algorithms Hakky St
 
Hyperparameter Tuning
Hyperparameter TuningHyperparameter Tuning
Hyperparameter TuningJon Lederman
 
Deep Feed Forward Neural Networks and Regularization
Deep Feed Forward Neural Networks and RegularizationDeep Feed Forward Neural Networks and Regularization
Deep Feed Forward Neural Networks and RegularizationYan Xu
 
Learning to Learn by Gradient Descent by Gradient Descent
Learning to Learn by Gradient Descent by Gradient DescentLearning to Learn by Gradient Descent by Gradient Descent
Learning to Learn by Gradient Descent by Gradient DescentKaty Lee
 
Neural net and back propagation
Neural net and back propagationNeural net and back propagation
Neural net and back propagationMohit Shrivastava
 
Evolving Reinforcement Learning Algorithms, JD. Co-Reyes et al, 2021
Evolving Reinforcement Learning Algorithms, JD. Co-Reyes et al, 2021Evolving Reinforcement Learning Algorithms, JD. Co-Reyes et al, 2021
Evolving Reinforcement Learning Algorithms, JD. Co-Reyes et al, 2021Chris Ohk
 
Classification By Back Propagation
Classification By Back PropagationClassification By Back Propagation
Classification By Back PropagationBineeshJose99
 
Setting Artificial Neural Networks parameters
Setting Artificial Neural Networks parametersSetting Artificial Neural Networks parameters
Setting Artificial Neural Networks parametersMadhumita Tamhane
 
Feedforward neural network
Feedforward neural networkFeedforward neural network
Feedforward neural networkSopheaktra YONG
 
Lecture 5 machine learning updated
Lecture 5   machine learning updatedLecture 5   machine learning updated
Lecture 5 machine learning updatedVajira Thambawita
 
Activation functions and Training Algorithms for Deep Neural network
Activation functions and Training Algorithms for Deep Neural networkActivation functions and Training Algorithms for Deep Neural network
Activation functions and Training Algorithms for Deep Neural networkGayatri Khanvilkar
 
PR-270: PP-YOLO: An Effective and Efficient Implementation of Object Detector
PR-270: PP-YOLO: An Effective and Efficient Implementation of Object DetectorPR-270: PP-YOLO: An Effective and Efficient Implementation of Object Detector
PR-270: PP-YOLO: An Effective and Efficient Implementation of Object DetectorJinwon Lee
 
PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...
PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...
PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...Jinwon Lee
 
Classification using back propagation algorithm
Classification using back propagation algorithmClassification using back propagation algorithm
Classification using back propagation algorithmKIRAN R
 
Algorithms and Programming
Algorithms and ProgrammingAlgorithms and Programming
Algorithms and ProgrammingMelanie Knight
 
Kaggle Higgs Boson Machine Learning Challenge
Kaggle Higgs Boson Machine Learning ChallengeKaggle Higgs Boson Machine Learning Challenge
Kaggle Higgs Boson Machine Learning ChallengeBernard Ong
 
nural network ER. Abhishek k. upadhyay
nural network ER. Abhishek  k. upadhyaynural network ER. Abhishek  k. upadhyay
nural network ER. Abhishek k. upadhyayabhishek upadhyay
 

What's hot (20)

backpropagation in neural networks
backpropagation in neural networksbackpropagation in neural networks
backpropagation in neural networks
 
An overview of gradient descent optimization algorithms
An overview of gradient descent optimization algorithms An overview of gradient descent optimization algorithms
An overview of gradient descent optimization algorithms
 
Hyperparameter Tuning
Hyperparameter TuningHyperparameter Tuning
Hyperparameter Tuning
 
Deep Feed Forward Neural Networks and Regularization
Deep Feed Forward Neural Networks and RegularizationDeep Feed Forward Neural Networks and Regularization
Deep Feed Forward Neural Networks and Regularization
 
Ffnn
FfnnFfnn
Ffnn
 
Learning to Learn by Gradient Descent by Gradient Descent
Learning to Learn by Gradient Descent by Gradient DescentLearning to Learn by Gradient Descent by Gradient Descent
Learning to Learn by Gradient Descent by Gradient Descent
 
Neural net and back propagation
Neural net and back propagationNeural net and back propagation
Neural net and back propagation
 
Ml7 bagging
Ml7 baggingMl7 bagging
Ml7 bagging
 
Evolving Reinforcement Learning Algorithms, JD. Co-Reyes et al, 2021
Evolving Reinforcement Learning Algorithms, JD. Co-Reyes et al, 2021Evolving Reinforcement Learning Algorithms, JD. Co-Reyes et al, 2021
Evolving Reinforcement Learning Algorithms, JD. Co-Reyes et al, 2021
 
Classification By Back Propagation
Classification By Back PropagationClassification By Back Propagation
Classification By Back Propagation
 
Setting Artificial Neural Networks parameters
Setting Artificial Neural Networks parametersSetting Artificial Neural Networks parameters
Setting Artificial Neural Networks parameters
 
Feedforward neural network
Feedforward neural networkFeedforward neural network
Feedforward neural network
 
Lecture 5 machine learning updated
Lecture 5   machine learning updatedLecture 5   machine learning updated
Lecture 5 machine learning updated
 
Activation functions and Training Algorithms for Deep Neural network
Activation functions and Training Algorithms for Deep Neural networkActivation functions and Training Algorithms for Deep Neural network
Activation functions and Training Algorithms for Deep Neural network
 
PR-270: PP-YOLO: An Effective and Efficient Implementation of Object Detector
PR-270: PP-YOLO: An Effective and Efficient Implementation of Object DetectorPR-270: PP-YOLO: An Effective and Efficient Implementation of Object Detector
PR-270: PP-YOLO: An Effective and Efficient Implementation of Object Detector
 
PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...
PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...
PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...
 
Classification using back propagation algorithm
Classification using back propagation algorithmClassification using back propagation algorithm
Classification using back propagation algorithm
 
Algorithms and Programming
Algorithms and ProgrammingAlgorithms and Programming
Algorithms and Programming
 
Kaggle Higgs Boson Machine Learning Challenge
Kaggle Higgs Boson Machine Learning ChallengeKaggle Higgs Boson Machine Learning Challenge
Kaggle Higgs Boson Machine Learning Challenge
 
nural network ER. Abhishek k. upadhyay
nural network ER. Abhishek  k. upadhyaynural network ER. Abhishek  k. upadhyay
nural network ER. Abhishek k. upadhyay
 

Similar to Nimrita deep learning

NEURAL Network Design Training
NEURAL Network Design  TrainingNEURAL Network Design  Training
NEURAL Network Design TrainingESCOM
 
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...Universitat Politècnica de Catalunya
 
Deep Learning Interview Questions And Answers | AI & Deep Learning Interview ...
Deep Learning Interview Questions And Answers | AI & Deep Learning Interview ...Deep Learning Interview Questions And Answers | AI & Deep Learning Interview ...
Deep Learning Interview Questions And Answers | AI & Deep Learning Interview ...Simplilearn
 
An Introduction to Deep Learning
An Introduction to Deep LearningAn Introduction to Deep Learning
An Introduction to Deep Learningmilad abbasi
 
Introduction to Deep Learning
Introduction to Deep LearningIntroduction to Deep Learning
Introduction to Deep LearningMehrnaz Faraz
 
Machine Learning Essentials Demystified part2 | Big Data Demystified
Machine Learning Essentials Demystified part2 | Big Data DemystifiedMachine Learning Essentials Demystified part2 | Big Data Demystified
Machine Learning Essentials Demystified part2 | Big Data DemystifiedOmid Vahdaty
 
ML Module 3 Non Linear Learning.pptx
ML Module 3 Non Linear Learning.pptxML Module 3 Non Linear Learning.pptx
ML Module 3 Non Linear Learning.pptxDebabrataPain1
 
Introduction to Perceptron and Neural Network.pptx
Introduction to Perceptron and Neural Network.pptxIntroduction to Perceptron and Neural Network.pptx
Introduction to Perceptron and Neural Network.pptxPoonam60376
 
A Framework for Scene Recognition Using Convolutional Neural Network as Featu...
A Framework for Scene Recognition Using Convolutional Neural Network as Featu...A Framework for Scene Recognition Using Convolutional Neural Network as Featu...
A Framework for Scene Recognition Using Convolutional Neural Network as Featu...Tahmid Abtahi
 
NLP Classifier Models & Metrics
NLP Classifier Models & MetricsNLP Classifier Models & Metrics
NLP Classifier Models & MetricsSanghamitra Deb
 
Cerebellar Model Articulation Controller
Cerebellar Model Articulation ControllerCerebellar Model Articulation Controller
Cerebellar Model Articulation ControllerZahra Sadeghi
 
Dr. Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf SEA - 5/20/16
Dr. Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf SEA - 5/20/16Dr. Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf SEA - 5/20/16
Dr. Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf SEA - 5/20/16MLconf
 
Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf ATL 2016
Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf ATL 2016Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf ATL 2016
Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf ATL 2016MLconf
 
Deep learning crash course
Deep learning crash courseDeep learning crash course
Deep learning crash courseVishwas N
 
Hands on machine learning with scikit-learn and tensor flow by ahmed yousry
Hands on machine learning with scikit-learn and tensor flow by ahmed yousryHands on machine learning with scikit-learn and tensor flow by ahmed yousry
Hands on machine learning with scikit-learn and tensor flow by ahmed yousryAhmed Yousry
 
Supervised and unsupervised learning
Supervised and unsupervised learningSupervised and unsupervised learning
Supervised and unsupervised learningAmAn Singh
 
DeepLearningLecture.pptx
DeepLearningLecture.pptxDeepLearningLecture.pptx
DeepLearningLecture.pptxssuserf07225
 

Similar to Nimrita deep learning (20)

Deeplearning
Deeplearning Deeplearning
Deeplearning
 
NEURAL Network Design Training
NEURAL Network Design  TrainingNEURAL Network Design  Training
NEURAL Network Design Training
 
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
 
Deep Learning Interview Questions And Answers | AI & Deep Learning Interview ...
Deep Learning Interview Questions And Answers | AI & Deep Learning Interview ...Deep Learning Interview Questions And Answers | AI & Deep Learning Interview ...
Deep Learning Interview Questions And Answers | AI & Deep Learning Interview ...
 
An Introduction to Deep Learning
An Introduction to Deep LearningAn Introduction to Deep Learning
An Introduction to Deep Learning
 
Introduction to Deep Learning
Introduction to Deep LearningIntroduction to Deep Learning
Introduction to Deep Learning
 
Unit 2 ml.pptx
Unit 2 ml.pptxUnit 2 ml.pptx
Unit 2 ml.pptx
 
Machine Learning Essentials Demystified part2 | Big Data Demystified
Machine Learning Essentials Demystified part2 | Big Data DemystifiedMachine Learning Essentials Demystified part2 | Big Data Demystified
Machine Learning Essentials Demystified part2 | Big Data Demystified
 
ML Module 3 Non Linear Learning.pptx
ML Module 3 Non Linear Learning.pptxML Module 3 Non Linear Learning.pptx
ML Module 3 Non Linear Learning.pptx
 
Introduction to Perceptron and Neural Network.pptx
Introduction to Perceptron and Neural Network.pptxIntroduction to Perceptron and Neural Network.pptx
Introduction to Perceptron and Neural Network.pptx
 
A Framework for Scene Recognition Using Convolutional Neural Network as Featu...
A Framework for Scene Recognition Using Convolutional Neural Network as Featu...A Framework for Scene Recognition Using Convolutional Neural Network as Featu...
A Framework for Scene Recognition Using Convolutional Neural Network as Featu...
 
NLP Classifier Models & Metrics
NLP Classifier Models & MetricsNLP Classifier Models & Metrics
NLP Classifier Models & Metrics
 
Cerebellar Model Articulation Controller
Cerebellar Model Articulation ControllerCerebellar Model Articulation Controller
Cerebellar Model Articulation Controller
 
Dr. Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf SEA - 5/20/16
Dr. Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf SEA - 5/20/16Dr. Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf SEA - 5/20/16
Dr. Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf SEA - 5/20/16
 
Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf ATL 2016
Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf ATL 2016Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf ATL 2016
Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf ATL 2016
 
Deep learning crash course
Deep learning crash courseDeep learning crash course
Deep learning crash course
 
Hands on machine learning with scikit-learn and tensor flow by ahmed yousry
Hands on machine learning with scikit-learn and tensor flow by ahmed yousryHands on machine learning with scikit-learn and tensor flow by ahmed yousry
Hands on machine learning with scikit-learn and tensor flow by ahmed yousry
 
cnn ppt.pptx
cnn ppt.pptxcnn ppt.pptx
cnn ppt.pptx
 
Supervised and unsupervised learning
Supervised and unsupervised learningSupervised and unsupervised learning
Supervised and unsupervised learning
 
DeepLearningLecture.pptx
DeepLearningLecture.pptxDeepLearningLecture.pptx
DeepLearningLecture.pptx
 

More from Nimrita Koul

Tools for research plotting
Tools for research plottingTools for research plotting
Tools for research plottingNimrita Koul
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language ProcessingNimrita Koul
 
Templates and Exception Handling in C++
Templates and Exception Handling in C++Templates and Exception Handling in C++
Templates and Exception Handling in C++Nimrita Koul
 
Shorter bioinformatics
Shorter bioinformaticsShorter bioinformatics
Shorter bioinformaticsNimrita Koul
 
Linear regression analysis
Linear regression analysisLinear regression analysis
Linear regression analysisNimrita Koul
 
Nimrita koul Machine Learning
Nimrita koul  Machine LearningNimrita koul  Machine Learning
Nimrita koul Machine LearningNimrita Koul
 
Hands on data science with r.pptx
Hands  on data science with r.pptxHands  on data science with r.pptx
Hands on data science with r.pptxNimrita Koul
 
Python Traning presentation
Python Traning presentationPython Traning presentation
Python Traning presentationNimrita Koul
 

More from Nimrita Koul (9)

Tools for research plotting
Tools for research plottingTools for research plotting
Tools for research plotting
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Structures in C
Structures in CStructures in C
Structures in C
 
Templates and Exception Handling in C++
Templates and Exception Handling in C++Templates and Exception Handling in C++
Templates and Exception Handling in C++
 
Shorter bioinformatics
Shorter bioinformaticsShorter bioinformatics
Shorter bioinformatics
 
Linear regression analysis
Linear regression analysisLinear regression analysis
Linear regression analysis
 
Nimrita koul Machine Learning
Nimrita koul  Machine LearningNimrita koul  Machine Learning
Nimrita koul Machine Learning
 
Hands on data science with r.pptx
Hands  on data science with r.pptxHands  on data science with r.pptx
Hands on data science with r.pptx
 
Python Traning presentation
Python Traning presentationPython Traning presentation
Python Traning presentation
 

Recently uploaded

AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 

Recently uploaded (20)

AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort ServiceHot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 

Nimrita deep learning

  • 1. DEEP LEARNING MULTILAYERED / HIERARCHICAL NEURAL NETWORK BASED INFORMATION PROCESSING ROBUST, GENERALIZABLE, AND SCALABLE
  • 2. AGENDA • WHAT DOES IT MEAN FOR A MACHINE TO LEARN? • ML VS DL • APPLICATIONS OF DL • DL CONCEPTS • PROMINENT DL ARCHITECTURES • DL BASED SAMPLE PROJECTS
  • 3. HOW DOES MACHINE LEARNING WORK? • Consider an Equation Ax = b, where b is the actual RHS value • We make a machine predict the values of RHS as b’ • Machine learning is a set of algorithmic techniques to minimize the error (b-b’) in this equation through optimization. • This is done by changing the values of weights in the x column vector (parameter vector) until we find a good set of values that gives us the closest outcomes to the actual values of b. •
  • 4.
  • 8. MORE APPLICATIONS OF DL Real Time Image Recognition Sentiment Analysis Search Ranking Personalization Speaker Identification Text Prediction Handwriting Recognition Machine Translation Face Detection Music Tagging Entity Recognition Style Transfer Image Captioning Emotion Detection Text Summarization
  • 14. 2.7 -8.6 0.002 f(x) 1.4 -2.5 -0.06 x = -0.06×2.7 + 2.5×8.6 + 1.4×0.002 = 21.34
  • 15. A dataset Fields class 1.4 2.7 1.9 0 3.8 3.4 3.2 0 6.4 2.8 1.7 1 4.1 0.1 0.2 0 etc …
  • 16. Training the neural network Fields class 1.4 2.7 1.9 0 3.8 3.4 3.2 0 6.4 2.8 1.7 1 4.1 0.1 0.2 0 etc …
  • 17. Training data Fields class 1.4 2.7 1.9 0 3.8 3.4 3.2 0 6.4 2.8 1.7 1 4.1 0.1 0.2 0 etc … Initialise with random weights
  • 18. Training data Fields class 1.4 2.7 1.9 0 3.8 3.4 3.2 0 6.4 2.8 1.7 1 4.1 0.1 0.2 0 etc … Present a training pattern 1.4 2.7 1.9
  • 19. Training data Fields class 1.4 2.7 1.9 0 3.8 3.4 3.2 0 6.4 2.8 1.7 1 4.1 0.1 0.2 0 etc … Feed it through to get output 1.4 2.7 0.8 1.9
  • 20. Training data Fields class 1.4 2.7 1.9 0 3.8 3.4 3.2 0 6.4 2.8 1.7 1 4.1 0.1 0.2 0 etc … Compare with target output 1.4 2.7 0.8 0 1.9 error 0.8
  • 21. Training data Fields class 1.4 2.7 1.9 0 3.8 3.4 3.2 0 6.4 2.8 1.7 1 4.1 0.1 0.2 0 etc … Adjust weights based on error 1.4 2.7 0.8 0 1.9 error 0.8
  • 22. Training data Fields class 1.4 2.7 1.9 0 3.8 3.4 3.2 0 6.4 2.8 1.7 1 4.1 0.1 0.2 0 etc … Present a training pattern 6.4 2.8 1.7
  • 23. Training data Fields class 1.4 2.7 1.9 0 3.8 3.4 3.2 0 6.4 2.8 1.7 1 4.1 0.1 0.2 0 etc … Feed it through to get output 6.4 2.8 0.9 1.7
  • 24. Training data Fields class 1.4 2.7 1.9 0 3.8 3.4 3.2 0 6.4 2.8 1.7 1 4.1 0.1 0.2 0 etc … Compare with target output 6.4 2.8 0.9 1 1.7 error -0.1
  • 25. Training data Fields class 1.4 2.7 1.9 0 3.8 3.4 3.2 0 6.4 2.8 1.7 1 4.1 0.1 0.2 0 etc … Adjust weights based on error 6.4 2.8 0.9 1 1.7 error -0.1
  • 26. Training data Fields class 1.4 2.7 1.9 0 3.8 3.4 3.2 0 6.4 2.8 1.7 1 4.1 0.1 0.2 0 etc … And so on …. 6.4 2.8 0.9 1 1.7 error -0.1 Repeat this thousands, maybe millions of times – each time taking a random training instance, and making slight weight adjustments. Algorithms for weight adjustment are designed to make changes that will reduce the error
  • 27. A SAMPLE NEURAL NETWORK AND CORRESPONDING COMPUTATION The Forward Propagation
  • 28. THE ENTIRE PROCESS BEHIND OPTIMIZATION • In each iteration: • Select a network architecture, i.e. number of hidden layers, number of neurons in each layer and activation function • Initialize weights randomly • Use forward propagation to determine the output node • Find the error of the model using the known labels • Back-propagate the error into the network and determine the error for each node and Update the weights to minimize gradient
  • 29. THE PSEUDOCODE FOR CALCULATING OUTPUT OF FORWARD-PROPAGATING NEURAL NETWORK •# node[] := array of topologically sorted nodes, An edge from a to b means a is to the left of b •# If the Neural Network has R inputs and S outputs, then first R nodes are input nodes and last S nodes are output nodes. •# incoming[x] := nodes connected to node x •# weight[x] := weights of incoming edges to x
  • 30. • For each neuron x, from left to right − • if x <= R: do nothing # its an input node • inputs[x] = [output[i] for i in incoming[x]] • weighted_sum = dot_product(weights[x], inputs[x]) • output[x] = Activation_function(weighted_sum) To train a neural network, use the iterative gradient descent method. After random initialization, we make predictions on some subset of the data with forward-propagation process, compute the corresponding cost function C, and update each weight w by an amount proportional to dC/dw, i.e., the derivative of the cost functions w.r.t. the weight. The proportionality constant is known as the learning rate. We calculate the gradients backwards, i.e., first calculate the gradients of the output layer, then the top-most hidden layer, followed by the preceding hidden layer, and so on, ending at the input layer.
  • 31. GRADIENT DESCENT OPTIMIZATION TECHNIQUE TO FIND WHICH WEIGHT PRODUCES THE LEAST ERROR • The ratio between network Error and each of the weights is a derivative, dE/dw that calculates the extent to which a slight change in a weight causes a slight change in the error • Use the chain rule of calculus to work back through the network activations and outputs. This leads us to the weight in question, and its relationship to overall error. • We can calculate how a change in weight affects a change in error by first calculating how a change in activation affects a change in Error, and how a change in weight affects a change in activation.
  • 32. learning rate Gradient Descent objective/cost function 𝑱(𝜽) 𝜃𝑗 𝑛𝑒𝑤 = 𝜃𝑗 𝑜𝑙𝑑 − 𝛼 ⅆ ⅆ𝜃𝑗 𝑜𝑙𝑑 𝐽(𝜃) Update each element of θ 𝜃 𝑛𝑒𝑤 = 𝜃 𝑜𝑙𝑑 − 𝛼𝛻𝜃 𝐽(𝜃) Matrix notation for all parameters Recursively apply chain rule though each node
  • 33. OVERFITTING • When the network learns the rare dependencies in training data but can not produce the correct output for test data. • Overfitting is combatted by Regularization. • Regularization Methods – Drop out, early stopping, data augmentation, transfer learning
  • 34. DROPOUT • Dropout is a technique where during each iteration of gradient descent, we drop/ignore a set of randomly selected nodes. • Each neuron is kept with a probability of q and dropped randomly with probability 1-q. The value q may be different for each layer in the neural network. A value of 0.5 for the hidden layers, and 0 for input layer works well on a wide range of tasks. • During evaluation and prediction, no dropout is used. The output of each neuron is multiplied by q so that the input to the next layer has the same expected value.
  • 35. EARLY STOPPING • We stop training when the error starts to increase. • Here, by error, we mean the error measured on validation data, which is the part of training data used for tuning hyper- parameters. • In this case, the hyper-parameter is the stop criteria.
  • 36. DATA AUGMENTATION • We increase the quantum of data we have or augment it by using existing data and applying some transformations on it. • For instance, in many computer vision tasks such as object classification, an effective data augmentation technique is adding new data points that are cropped or translated versions of original data.
  • 37. TRANSFER LEARNING • The process of taking a pre-trained model and “fine-tuning” the model with our own dataset is called transfer learning. • We train the pre-trained model on a large dataset. Then, we remove the last layer of the network and replace it with a new layer with random weights. • We then freeze the weights of all the other layers and train the network normally. The pre-trained model will act as a feature extractor, and only the last layer will be trained on the current task.
  • 38. CORE COMPONENTS OF A DEEP NET • Parameters - weights on the connections in the network • Layers- Layers are a fundamental architectural unit in deep networks • Activation functions- A nonlinear transform applied to the output of the previous layer. : • Sigmoid • Tanh • Hard tanh • Rectified linear unit (ReLU) • Loss functions- To determine the penalty for an incorrect classification e.g. Squared loss • Logistic loss • Hinge loss • Negative log likelihood classification of an input • Optimization methods-Gradient Descent, Genetic Algo, Simulated Annealing, PSO, ACO • Hyper-parameters- Layer size , learning rate, Regularization
  • 40. DEEP LEARNING MODELS/ARCHITECTURES/ALGORITHMS • Unsupervised Pre-trained Networks (UPNs) • Auto-encoders • Deep Belief Networks (DBNs) • Generative Adversarial Networks (GANs) • Convolutional Neural Networks (CNNs) • Recurrent Neural Networks(RNNs) • Recursive Neural Networks(ReNNs)
  • 41. AUTO-ENCODERS TO LEARN COMPRESSED REPRESENTATIONS OF DATASETS Auto-encoders differ from multilayer perceptron: • They use unlabeled data in unsupervised learning. • They build a compressed representation of the input data. Use backpropagation to update their weights. The main difference between RBMs and auto-encoders is in how they calculate the gradients.
  • 42. RESTRICTED BOLTZMANN MACHINES(RBM) With RBMs, every visible unit is connected to every hidden unit, yet no units from the same layer are connected. Pre-training using RBMs means teaching it to reconstruct the original data from a limited sample of that data Contrastive Divergence RBMs calculate gradients by using an algorithm called contrastive divergence. It minimizes the KL divergence (the delta between the real distribution of the data and the guess) by sampling k steps of a Markov chain to compute a guess.
  • 44. DEEP BELIEF NETWORKS DBNS ARE COMPOSED OF LAYERS OF RESTRICTED BOLTZMANN MACHINES (RBMS) FOR THE PRE- TRAIN PHASE AND THEN A FEED-FORWARD NETWORK FOR THE FINE-TUNE PHASE.
  • 45. GENERATIVE ADVERSARIAL NETWORKS(GANS) • GANs use unsupervised learning to train two adversarial models in parallel. • The generative network - The generative network in GANs generates data (or images) with a special kind of layer called a de-convolutional layer • The discriminator network –the secondary network, generally an CNN for image classification tasks
  • 46. CONVOLUTIONAL NEURAL NETWORKS (CNN) – LEARN HIGHER-ORDER FEATURES IN THE DATA VIA CONVOLUTIONS
  • 47. • Convolutional layers transform the input data by using a patch of locally connecting neurons from the previous layer. The layer will compute a dot product between the region of the neurons in the input layer and the weights to which they are locally connected in the output layer. • A convolution is defined as a mathematical operation describing a rule for how to merge two sets of information. The convolution operation is known as the feature detector of a CNN.
  • 48. CONVOLUTIONAL LAYER HYPER- PARAMETERS • Filter (or kernel) size (field size) • Output depth • Stride • Zero-padding
  • 49. POOLING LAYERS • They reduce the data representation progressively over the network and help control overfitting. The pooling layer operates independently on every depth slice of the input. • The pooling layer uses the max() operation to resize the input data spatially (width, height). This operation is referred to as max pooling. With a 2 × 2 filter size, the max() operation is taking the largest of four numbers in the filter area. This operation does not affect the depth dimension.
  • 50. FULLY CONNECTED LAYERS • To compute class scores that we’ll use as output of the network (e.g., the output layer at the end of the network). The dimensions of the output volume is [ 1 × 1 × N ], where N is the number of output classes we’re evaluating. • Fully connected layers perform transformations on the input data volume that are a function of the activations in the input volume and the parameters
  • 51. RECURRENT NEURAL NETWORKS • Recurrent Neural Networks take each vector from a sequence of input vectors and model them one at a time. This allows the network to retain state while modeling each input vector across the window of input vectors. Modeling the time dimension is done by Recurrent Neural Networks.
  • 52. • A Recurrent Neural Network includes a feedback loop to learn from sequences of varying lengths. • Has an extra parameter matrix for the connections between time-steps to capture the temporal relationships in the data. • RNNs are trained to generate sequences, in which the output at each time-step is based on both the current input and the input at all previous time steps. • Normal Recurrent Neural Networks compute a gradient with an algorithm called backpropagation through time (BPTT).
  • 53. VANISHING GRADIENT PROBLEM • Recurrent Neural Networks are known to have issues with the “vanishing gradient problem.” • This issue occurs when the gradients become too large or too small and make it difficult to model long-range dependencies (10 time-steps or more) in the structure of the input dataset. • The most effective way to get around this issue is to use the LSTM variant of Recurrent Neural Networks.
  • 54. LSTM NETWORKS • The critical component of the LSTM is the memory cell and the gates (the forget gate, the input gate). The contents of the memory cell are modulated by the input gates and forget gates. • Assuming that both of these gates are closed, the contents of the memory cell will remain unmodified between one time-step and the next. • The gating structure allows information to be retained across many time-steps, and consequently also allows gradients to flow across many time-steps. • This allows the LSTM model to overcome the vanishing gradient problem that occurs with most Recurrent Neural Network models.
  • 55. USE CASES OF LSTMS • Generating sentences (e.g., character-level language models) • Classifying time-series • Speech recognition • Handwriting recognition • Polyphonic music modeling
  • 56. RECURSIVE NEURAL NETWORKS • Recursive Neural Networks, like Recurrent Neural Networks, can deal with variable length input, can model hierarchical structures in training dataset. • Applications - Deconstructing scenes to not only identify the objects in the scene, but also how the objects relate to form the scene, scene and sentence parsing. • Architecture - A shared-weight matrix and a binary tree structure that allows the recursive network to learn varying sequences of words or parts of an image. • Recursive Neural Networks use a variation of backpropagation called backpropagation through structure (BPTS). The feed-forward pass happens bottom-up, and backpropagation is top-down.
  • 57. • TENSORFLOW • THEANO ( Keras, PyLearn2, Lasagne, Blocks) • CAFFE • NOLEARN • DEEPNET • DEEPPY • DEEPLEARNING PROMINENT DEEP LEARNING LIBRARIES IN PYTHON
  • 58. CHOOSING A DEEP NET FOR YOUR RESEARCH • To extract patterns from a set of un-labelled data, we use a RBM or an Auto encoder. • For text processing, sentiment analysis, parsing and name entity recognition, we use a RNN or Recursive Neural Tensor Network or RNTN. • For any language model that operates at character level, we use the RNN. • For image recognition, we use deep belief network DBN or convolutional network CNN. • For object recognition, we use a RNTN or a CNN. • For speech recognition, we use RNN. • In general, DBN and MLP with RELU are both good choices for classification. • For time series analysis, it is always recommended to use RNN.
  • 59. WHEN TO USE DEEP LEARNING • Simpler models (logistic regression) don’t achieve the accuracy level your use case needs • You have complex pattern matching in images, NLP, or audio to deal with • You have high dimensionality data • You have the dimension of time in your vectors (sequences)
  • 60. WHEN TO STICK WITH TRADITIONAL MACHINE LEARNING • You have high-quality, low-dimensional data; for example, columnar data from a database export • You’re not trying to find complex patterns in image data • You’ll achieve poor results from both methods when the data is incomplete and/or of poor quality.
  • 61. DEEP LEARNING FOR DETECTING CANCER • https://youtu.be/9Mz84cwVmS0 • Examples
  • 62. READING LIST • http://deeplearning.net/reading-list/ • http://ufldl.stanford.edu/wiki/index.php/ • http://www.cs.toronto.edu/∼hinton/ • http://deeplearning.net/tutorial/ • http://ufldl.stanford.edu/wiki/index.php/UFLDL_Tutorial • http://ufldl.stanford.edu/wiki/index.php/Neural_Networks

Editor's Notes

  1. Adaptive Gradient Algorithm (AdaGrad) that maintains a per-parameter learning rate that improves performance on problems with sparse gradients (e.g. natural language and computer vision problems). Root Mean Square Propagation (RMSProp) that also maintains per-parameter learning rates that are adapted based on the average of recent magnitudes of the gradients for the weight (e.g. how quickly it is changing). This means the algorithm does well on online and non-stationary problems (e.g. noisy). Adam realizes the benefits of both AdaGrad and RMSProp. Instead of adapting the parameter learning rates based on the average first moment (the mean) as in RMSProp, Adam also makes use of the average of the second moments of the gradients (the uncentered variance). Specifically, the algorithm calculates an exponential moving average of the gradient and the squared gradient, and the parameters beta1 and beta2 control the decay rates of these moving averages. The initial value of the moving averages and beta1 and beta2 values close to 1.0 (recommended) result in a bias of moment estimates towards zero. This bias is overcome by first calculating the biased estimates before then calculating bias-corrected estimates.