Deep learning

Introduction to Deep Learning
RATNAKAR PANDEY

Is Artificial
Intelligence, Machine
Learning and Deep Learning
the same thing? What about
Data Science?

Source: https://www.linkedin.com/pulse/artificial-intelligence-machine-learning-deep-same-thing-pandey/

Artificial Intelligence
• AI is any technique, code or algorithm that enables machines to develop,
demonstrate and mimic human cognitive behavior or intelligence and hence the
name “Artificial Intelligence”
• AI doesn’t mean that everything machines will be doing, rather AI can be better
represented as “Augmented Intelligence”, i.e. Man+Machine to solve business
problems better and faster
• AI won’t replace managers, but managers who use AI will replace those who
don’t.
• Some of the most successful applications of AI around us can be seen in
Robotics, Computer Vision, Virtual Reality, Speech Recognition, Automation,
Gaming and so on…

Machine Learning
• Machine learning is the sub field of AI,
which gives machines the ability to
improve its performance over time
without explicit intervention or help
from the human being
• In this approach machines are shown
thousands or millions of examples and
trained how to correctly solve a
problem.
• Most of the current applications of
the machine learning leverage
supervised learning
• Other usage of ML can be broadly
classified between unsupervised
learning and reinforced learning.
Source: https://hbr.org/cover-story/2017/07/the-business-of-artificial-intelligence

Data Science
• Data Science is a field which intersects AI, Machine
Learning and Deep Learning and enables statistically
driven decision making.
• Data science is the Art and Science of drawing
actionable insights from the data.
• Data Science + Business Knowledge = Impact/Value
Creation for the Business.
• Generally speaking, Data Scientists and Analytics
Professionals try to answer following questions via
their analysis-
• Descriptive Analytics ( What has happened?)
• Diagnostic Analytics ( Why it has happened?)
• Predictive Analytics ( What may happen in future?)
• Prescriptive Analytics ( What plan of action we should
follow?)

Deep Learning
• Deep learning is a sub field of
Machine Learning that very closely
tries to mimic human brain's
working using neurons.
• These techniques focus on building
Artificial Neural Networks (ANN)
using several hidden layers.
• There are variety of deep learning
networks such as Multilayer
Perceptron ( MLP), Autoencoders
(AE), Convolution Neural Network
(CNN), Recurrent Neural Network
(RNN)
Source: https://www.quora.com/What-are-the-types-of-deep-neural-networks-and-how-can-one-categorize-them-and-their-related-algorithms-as-
either-shallow-or-deep/answer/Ratnakar-Pandey-RP

Why Deep Learning is Growing
• Processing power needed for Deep
learning is readily becoming
available using GPUs, Distributed
Computing and powerful CPUs
• Moreover, as the data amount
grows, Deep Learning models seem
to outperform Machine Learning
models
• Explosion of features and datasets
• Focus on customization and real
time decisioning

Why Deep Learning is Growing
• Uncover hard to detect patterns
(using traditional techniques) when
the incidence rate is low
• Find latent features (super variables)
without significant manual feature
engineering
• Real time fraud detection and self
learning models using streaming data
(KAFKA, MapR)
• Ensure consistent customer
experience and regulatory compliance
• Higher operational efficiency
10,000 +
Features
Unstructured
Transactional
Social
Device
&
IP
Third Parties
Bureau

Challenges with Deep Learning
• Works better with large amount of
data
• Some models are very hard to train,
may take weeks or months
• Overfitting
• Black box and hence may have
regulatory challenges, particularly
for BFSI

Source : http://www.npr.org/sections/thesalt/2016/03/11/470084215/canine-or-cuisine-this-photo-meme-is-fetching

Multilayer Perceptron (MLP)
• These are the most basic networks
and feed forward the inputs to
create output. They consist of an
input layer and an output layer and
many interconnected hidden layers
and neurons between the input and
the output layers.
• They generally use some non linear
activation function such as Relu or
Tanh and compute the losses ( the
difference between the true output
and computed output) such as
Mean Square Error ( MSE), Logloss.
• This loss is backward propagated to
adjust the weights and training to
minimize the losses or make the
models more accurate.
w1
w2
wn
A
c
t
i
v
a
t
i
o
n
Activation Function
Inputs Weights Bias

Key Components and Hyperparameters
• Number of layers- Input layer, output layer and hidden layers. More the number of
layers, deeper the network.
• Number of Neurons- how many neurons in each layer. Input layer neurons depend of
the number of features, output layer neurons on number of outputs and hidden layer
neurons need to be optimized
• Weights- importance given to each factor in computing the output. Typically chosen
randomly in the first run and optimized using backward propagation.
• Activation Function- Function used to generate outputs by matrix multiplication of
inputs and weights along with bias
• Forward Propagation- Weights for each input are initialized to make predictions and
compute error. Output from each layer is fed forward to the next layer.
• Loss Function- To compute error between actual and prediction values and measure
models performance. Hyperparameters are fine tuned to minimize the loss function.
Some common loss functions are- Mean Square Error, Log loss, Cross entropy,

Popular Activation Functions
Most of the activation functions are non-linear as most of the real world problems are non linear
Source: https://en.wikipedia.org/wiki/Activation_function

Key Components and Hyperparameters
• Backpropagation- Back propagate the error (starting from the output layer) to the
previous layer and update weights
• Gradient Descent and Optimization Algorithms- Used for optimize weights based on
the error signal backward propagated and chain rules
• Epochs- One complete set of feedforward and back propagation to train the entire
network.
• Batch Size- No of input observation which are processed in one epoch.
• Dropout- x% of nodes are dropped out to ensure weight regularization and
overfitting and leverage community effects of neuron, rather than dependence on
few players
• Optimizer and Learning Rate- Optimizer are used to optimize learning rates by
Stochastic Gradient Descent (SGD) and find the best solution. If network learns very
fast, it may find suboptimal solutions If it learns very slow, it will take very long to
train a network. Common optimizers are Adam, SGD, RMSprop etc.

Autoencoders
• Autoencoders follow “Representation
Learning”
• The concept of the AE is quite simple-
here input vectors are used to compute
the output vectors, but output vectors
are same as the input vectors.
• The reconstruction error is computed
and data points with the higher
reconstruction error are supposed to be
outliers
• AE are used for unsupervised learning,
feature reduction, speech and image
recognition.
w1
w2
wn

Convolution Neural Network (CNN)
• Convolution Neural Networks (CNN) significantly
enhances the capabilities of the feed forward
network such as MLP by inserting convolution
layers.
• They are particularly suitable for spatial data, object
recognition and image analysis using
multidimensional neurons structures.
• CNNs use convolutions ( a linear operation) rather
than matrix multiplication as in MLP
• Typically a CNN will have three stages- convolution
stage, detector layer ( non linear activator) and
pooling layer
w1
w2
wn

Convolution Neural Network (CNN)
• Convolution Layer- The most important component
in the CNN. The layer has Kernels ( learnable filters)
and the input x and y dimensions are convoluted (
dot product) to generate feature map
• Detector Layer- The feature maps are passed to this
stage using a not linear activation function such as
ReLU activation function to accentuate the non
linear components of the feature maps
• Pooling Layer- A pooling layer such as “max
pooling” summarizes (sub-sampling) the responses
from several inputs from the previous layer and
serves to reduce the size of the spatial
representation. Allowing the next layer to look at
bigger region
w1
w2
wn
Source : MIT Deeplearningbook

Recurrent Neural Network(RNN)
• RNNs are also a feed forward network, however
with recurrent memory loops which take the input
from the previous and/or same layers or states.
• This gives them a unique capability to model along
the time dimension and arbitrary sequence of
events and inputs.
• RNNs are used for sequenced data analysis such as
time-series, sentiment analysis, NLP, language
translation, speech recognition, image captioning,
and script recognition among other things.
• These are also called networks with the memory, as
the previous inputs or states may persist (stored) in
the model to do a sequential analysis. These
memories become an input as well
w1
w2
wn

Recurrent Neural Network(RNN)
• Long Short Term Memory is one of the most
frequently ( LSTM) used RNN model
• These sort of models help us overcome the NLP
challenges which can’t be solved by “Bag of
Words” analysis -
“ The flight was good, not bad at all”
vs
“ The flight was bad, not good at all”
w1
w2
wn

Deep learning

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Deep learning

Similar to Deep learning (20)

More from Ratnakar Pandey

More from Ratnakar Pandey (8)

Recently uploaded

Recently uploaded (20)

Deep learning