SlideShare a Scribd company logo
1 of 55
Download to read offline
Deep Learning
Content
• Image enhancement
• Spatial domain enhancement
• Point operations, Spatial operations
• Digital Negative
• Contrast Stretching
• Thresholding
• Gray level slicing
• Bit plane slicing
• Dynamic range compression
• Histogram modeling
• Adaptive Histogram Equalization
• Frequency domain enhancement
• Fourier Transforms / Spectral (Frequency) Analysis
• Color Image
• Color Image Enhancement
AI Vs Machine Learning Vs Deep Learning
Artificial Neural Networks — Mapping the Human Brain
• The human brain consists of neurons or nerve cells which transmit and process
the information received from our senses.
• Many such nerve cells are arranged together in our brain to form a network of
nerves. These nerves pass electrical impulses i.e the excitation from one
neuron to the other.
• The dendrites receive the impulse from the terminal button or synapse of an
adjoining neuron.
• Dendrites carry the impulse to the nucleus of the nerve cell (soma).
• Here , the electrical impulse is processed and then passed on to the axon.
• The axon is longer branch among the dendrites which carries the impulse from
the soma to the synapse.
• The synapse then , passes the impulse to dendrites of the second neuron.
• Thus, a complex network of neurons is created in the human brain.
Artificial Neuron Construction
• Like above, the neurons are created artificially on a computer. Connecting
many such artificial neurons creates an artificial neural network.
• Artificial neuron works similar to a neuron present in human brain.
• The data in the network flows through each neuron by a connection.
• Every connection has a specific weight by which the flow of data is
regulated.
Construction of an Artificial Neuron
• The same concept of the network of neurons is used in machine learning
algorithms.
• In this case , the neurons are created artificially on a computer . Connecting many
such artificial neurons creates an artificial neural network.
• The working of an artificial neuron is similar to that of a neuron present in our
brain.
• The data in the network flows through each neuron by a connection.
• Every connection has a specific weight by which the flow of data is regulated.
Neural networks Applications
✓Translate text
✓Identify faces
✓Recognize speech
✓Read handwritten text
✓Control robots
Artificial Neuron Working Principle
• x1 and x2 are the two inputs. They could be an integer, float etc. Assume them as 1
and 0 respectively.
• When these inputs pass through the connections ( through W1 and W2 ), they are
adjusted depending on the connection weights.
• Let’s assume, W1 = 0.5, W2 = 0.6 and W3 = 0.2 , then adjusting the weights using
z = x1 * W1 + x2 * W2 = 1 * 0.5 + 0 * 0.6 = 0.5
• Hence , 0.5 is the value adjusted by the weights of the connection.
• The connections are assumed to be the dendrites in an artificial neuron.
• To process the information, need an activation function ( soma ) .
• Here, use the simple sigmoid function: f(x) = 1/(1+e-z)
• f ( x ) = f ( 0.5 ) = 0.6224
• The above value is the output of the neuron ( axon ).
• This value needs to be multiplied by W3 .
• 0.6224 * W3 = 0.6224 * 0.2 = 0.12448
• Now, Again apply the activation function to the above value ,
• f ( x ) = f ( 0.12448 ) = 0.5310
• Hence, y ( final prediction) = 0.5310
• In this example the weights were randomly generated.
Working of Neural Network
➢ A neural network contains different layers.
➢ Input layer - The first layer is the, it picks up the input signals and passes
them to the next layer.
➢ Hidden layer - The next layer does all kinds of calculations and feature
extractions.
➢ Often, possibility of more than one hidden layer.
➢ Output layer - Provides the final result.
Classification of Neural Networks
1. Shallow neural network: The Shallow neural network has only one
hidden layer between the input and output.
2. Deep neural network: Deep neural networks have more than one hidden
layers. For instance, Google LeNet model for image recognition counts 22
layers.
Working of Neural Network
The complete training process of a neural network involves two steps.
1. Forward Propagation - Receive input data, process the information, and generate output
• Images are fed into the input layer in the form of numbers. These numerical values denote the intensity of
pixels in the image.
• The neurons in the hidden layers apply a few mathematical operations on these values to learn the features.
• To perform these mathematical operations, there are certain parameter values that are randomly initialized.
• Pass these mathematical operations at the hidden layer, the result is sent to the output layer which generates
the final prediction.
2. Backward Propagation - Calculate error and update the parameters of the network
• Once the output is generated, the next step is to compare the output with the actual (Ground truth/label) value
y.
• Based on the final output, and how close or far (difference) this is from the actual value (error), the values of
the parameters are updated.
• The forward propagation process is repeated using the updated parameter values and new outputs are
generated.
Activation function
• Activation function defines the output of given input or set of inputs or in other terms defines node of the
output of node that is given in inputs.
• They will deactivate or activate neurons to get the desired output.
• It also performs a nonlinear transformation on the input to get better results on a complex neural network.
• Activation function also helps to normalize the output of any input in the range between 1 to -1.
• Activation function must be efficient and it should reduce the computation time because the neural network
sometimes trained on millions of data points.
• Activation function decides in any neural network that given input or receiving information is relevant or it is
irrelevant
Types of Activation Functions
1. Binary step
2. Linear activation function
Non Linear Activation Functions:
3. ReLU
4. LeakyReLU
5. Sigmoid
6. Tanh
7. Softmax
1. Binary (Unit) Step Activation Function
• Binary step activation function is a threshold bases classifier. If the input value is above or below a certain
threshold, the neuron is activated and sends exactly the same signal to the next layer.
• f(x) = 1 if x > 0
0 if x < 0
• Here, threshold value is 0.
• It is very simple and useful to
classify binary problems or
classifier.
• It does not allow multi-value outputs
i.e. cannot support classifying the
inputs into one of several categories.
2. Linear Activation Function
• It is a simple straight line activation function where function is directly proportional to the weighted sum of neurons or
input. Linear activation functions are better in giving a wide range of activations and a line of a positive slope may
increase the firing rate as the input rate increases. Range is –infinity to +infinity
f(x) = X
• Not possible to use backpropagation (gradient descent) to train the model—the derivative of the function with respect
to X is a constant, and has no relation to the input, X. So it’s not possible to go back and understand which weights in the
input neurons can provide a better prediction.
• All layers of the neural network collapse into single layer with linear activation functions, no matter how many layers
in the neural network, the last layer will be a linear function of the first layer (because a linear combination of linear
functions is still a linear function). So a linear activation function turns the neural network into just one layer.
• A neural network with a linear activation function is simply a linear regression model. It has limited power and ability to
handle complexity varying parameters of input data.
3. ReLU( Rectified Linear unit) Activation function
• Rectified linear unit or ReLU is most widely used activation function. Range is from 0 to infinity.
• It is converging quickly. So, Fast Computation.
• Disadvantage:
• Dying ReLU’ problem: when inputs approach zero, or are negative, the gradient of the function becomes zero, the
network cannot perform backpropagation and cannot learn.
• To avoid this unfitting, use Leaky ReLU function instead of ReLU, in Leaky ReLU range is expanded which
enhances the performance.
4. Leaky ReLU Activation Function
• Leaky ReLU activation function to solve the ‘Dying ReLU’ problem. This is variation of ReLU.
• Hence, it has a small positive slope in the negative area, so it does enable backpropagation, even for negative
input values.
• Disadvantage: Leaky ReLU does not provide consistent predictions for negative input values.
5. Sigmoid Activation Function
• The sigmoid activation function is a probabilistic approach towards decision making and output range is
between 0 to 1 i.e. normalizing the each output of neuron.
• Since the range is the minimum, prediction (decision) would be more accurate.
• The equation for the sigmoid function is f(x) = 1/(1+e(-x) )
• Disadvantage: Vanishing gradient problem occurs because it convert large input in between the range of 0 to 1
and therefore their derivatives become much smaller which does not give satisfactory output.
• To solve this problem, ReLU is used which do not have a small derivative problem.
6. Hyperbolic Tangent Activation Function(Tanh)
• This activation function is slightly better than the sigmoid function, like the sigmoid function.
• Zero centered— Making it easier to model inputs that have strongly negative, neutral, and strongly positive
values.
• Range is in between -1 to 1.
7. Softmax Activation Function
• Softmax is used at the last layer i.e. output to classify inputs into multiple classes. Range is 0 to 1.
• The softmax basically gives value to the input variable according to their weight and the sum of these weights
(proportional to each other) is one i.e. sum of all the probabilities will be equal to one.
• For multi-class classification problem, use softmax and cross-entropy along with it.
• Negative inputs will be converted into nonnegative values using exponential function.
Parameter
Learning Rate
• Learning rate (step size) is hyper-parameter and range between 0.0 and 1.0
• The amount that the weights are updated during training.
• The learning rate controls how quickly the model is adapted to the problem.
• Smaller learning rates require more training epochs given the smaller changes made to the weights each update,
whereas larger learning rates result in rapid changes and require fewer training epochs.
Bias
• The bias is a node that is always 'on'. i.e. its value is set to 1.
• Bias is like the intercept in a linear equation. It is an additional parameter in the Neural Network which is used
to adjust the output along with the weighted sum of the inputs to the neuron.
Output = sum (W * X) + bias
• Therefore Bias is a constant which helps the model in a way that it can fit best for the given data.
• Weight decide how fast the activation function will trigger whereas bias
is used to delay the triggering of the activation function.
Bias, Variance, underfit, overfit
➢ Bias: Error of the training data
➢ Variance: error of the test data
➢ Overfitting: low bias and high variance
➢ Under-fitting: high bias and high variance
➢ Goal always should be low bias and low variance.
➢ Eaxmple: Regression problem(Lets take linear regression model to create a best fit line.
L2 = weight decay
• Regularization term that penalizes big weights, added to the objective function
• Weight decay value determines how dominant regularization is during gradient computation
• Big weight decay coefficient → big penalty for big weights
Regularization
Dropout
• Randomly drop units (neurons along with their connections) during training
• Each unit retained with fixed probability p, independent of other units
• Hyper-parameter p to be chosen (tuned)
𝐽𝑟𝑒𝑔 𝜃 = 𝐽 𝜃 + 𝜆 ෍
𝑘
𝜃𝑘
2
Early-stopping
• Use validation error to decide when to stop training
• Stop when monitored quantity has not improved after n subsequent epochs
• n is called patience
Tuning hyper-parameters
• “Grid and random search of 9 trials for optimizing function g(x) ≈ g(x) + h(y)
• With grid search, nine trials only test g(x) in three distinct places.
• With random search, all nine (parameters) trials explore distinct values of g. ”
• Both try configurations randomly and blindly
• Next trial is independent to all the trials done before
• Bayesian optimization for hyper-parameter tuning:
• Make smarter choice for the next trial, minimize the number of trials
• Collect the performance at several configurations
• Make inference and decide what configuration to try next.
g(x) ≈ g(x) + h(y)
g(x) shown in green
h(y) is shown in yellow
• A neural network is structured like the human brain and consists of artificial neurons, also known as nodes. These
nodes are stacked next to each other in three layers:
❖ input layer
❖ hidden layer(s)
❖ output layer
• Data provides each node with information in the form of inputs. The node multiplies the inputs with random
weights, calculates them, and adds a bias. Finally, nonlinear functions, also known as activation functions, are
applied to determine which neuron to fire.
Working of Neural Network (Recall……….)
Deep Learning
• Deep learning has an inbuilt automatic multi stage feature learning process that learns rich hierarchical
representations (i.e. features).
• Deep Neural Network – Neural Network that contains more than one hidden layer between input and output.
• The series of hidden layers between input & output perform feature identification and processing in a series of
stages like our brain.
• Many algorithms are learning the weights for networks with more hidden layers in deep learning.
Input Layer Hidden Layer1 Hidden Layer2 Hidden Layer3 Hidden Layer4
Output Layer
Why is DL needed?
Performance vs Sample Size
Size of Data
Performance
Traditional ML
algorithms
o Manually designed features in ML are often over-specified, incomplete and take a long time to design and validate.
o Learned Features are easy to adapt, fast to learn.
o Deep learning provides a very flexible, universal, learnable framework for representing world, visual and linguistic
information.
o Can learn both unsupervised and supervised
o Effective end-to-end joint system learning
o Utilize large amounts of training data
Multi-Layer Neural Network training
Train this layer first
Train this layer first
then this layer
Multi-Layer Neural Network training
Train this layer first
then this layer
then this layer
Multi-Layer Neural Network training
Train this layer first
then this layer
then this layer
then this layer
Multi-Layer Neural Network training
Train this layer first
then this layer
then this layer
then this layer
finally this layer
Multi-Layer Neural Network training
EACH of the (non-output) layers is trained
to learn good features that describe what
comes from the previous layer
Multi-Layer Neural Network training
Deep learning Process
• A deep neural network learn automatically, without predefined knowledge explicitly coded by the
programmers.
• Each layer represents a deeper level of knowledge, i.e., the hierarchy of knowledge.
• The learning occurs in two phases.
• The first phase (Forward Propagation) consists of applying a nonlinear transformation of the
input and create a statistical model as output.
• The second phase (Backpropagation) aims at improving (optimize)the model with a mathematical
method known as derivative.
• The neural network repeats these two phases hundreds to thousands of time until it has reached a
tolerable level (converging) of accuracy. The repeat of this two-phase is called an iteration.
Deep learning Process
• Deep learning has an inbuilt automatic multi stage feature learning process that learns
rich hierarchical representations (i.e. features).
Low Level
Features
Mid Level
Features
Output
(e.g. outdoor,
indoor)
High
Level
Features
Trainable
Classifier
• Image
Pixel Edge Texture Motif Part Object
• Text
Character Word Word-group Clause Sentence Story
E.g.
Deep Learning Frameworks
1. TensorFlow
2. Keras
3. PyTorch
4. Theano
5. DL4J
6. Caffe
7. Chainer
8. Microsoft CNTK
Deep Learning Applications
• Social Network Filtering,
• Fraud Detection,
• Image and Speech Recognition,
• Audio Recognition,
• Computer Vision,
• Medical Image Processing,
• Bioinformatics,
• Customer Relationship Management,
• Driverless car
• Mobile phone
• Google Search Engine
• TV, etc.
Types of Deep Learning Algorithms
1. Convolutional Neural Networks (CNNs)
2. Recurrent Neural Networks (RNNs)
3. Long Short Term Memory Networks (LSTMs)
4. Generative Adversarial Networks (GANs)
5. Radial Basis Function Networks (RBFNs)
6. Multilayer Perceptrons (MLPs)
7. Self Organizing Maps (SOMs)
8. Deep Belief Networks (DBNs)
9. Restricted Boltzmann Machines( RBMs)
10. Autoencoders
Convolutional Neural Networks (CNNs)
• CNN's (ConvNets) consist multiple layers.
• Yann LeCun developed the first CNN in 1988 called LeNet. It was used for recognizing characters like ZIP
codes and digits.
• CNN's are widely used for image processing such as identify satellite images, process medical images, forecast
time series and object detection such as detect anomalies.
• Output: Binary, Multinomial, Continuous, Count
• Input: fixed size, can use padding to make all images same size.
• Architecture: User’s Choice (depends on application) which requires experimentation.
• Optimization: Backward propagation
• hyper parameters for very deep model can be estimated properly only if you have billions of images.
• Use an architecture and trained hyper parameters that already experimented by researchers (Imagenet or
Microsoft/Google APIs etc)
Convolutional Neural Networks (CNNs)
• Build a Convolutional Neural Network
• Initialize the weights and bias randomly.
• Fix the input and output.
• Forward Propagation: Receive input data, process the information, and generate output
• Backward Propagation: Calculate error, gradient and update the parameters of the network
Architecture:
• Build a Convolutional Neural Network with hidden layers.
• Each hidden layer may have separate activation function like Relu or sigmoid.
• Final layer will have Softmax.
• Error is calculated using cross-entropy.
CNN's have multiple layers that process the input and extract features from data:
1. Convolution Layer
• CNN has a convolution layer that has several filters to perform the convolution operation.
2. Rectified Linear Unit (ReLU)
• CNN's have a ReLU layer to perform operations on elements. The output is a rectified feature map.
3. Pooling Layer
• The rectified feature map next feeds into a pooling layer. Pooling is a down-sampling operation that reduces the
dimensions of the feature map.
• The pooling layer then converts the resulting two-dimensional arrays from the pooled feature map into a single, long,
continuous, linear vector by flattening it.
4. Fully Connected Layer
• A fully connected layer forms when the flattened matrix from the pooling layer is fed as an input, which classifies and
identifies the images.
Convolutional Neural Networks (CNNs) - Working Principle
Parameters Computation
4*4 + 4 +1
x1
x2
x3
x4
Input Hidden Layers
Y
Output
a1(1)
a2(1)
a3(1)
a4(1)
a1(2)
Softmax
Number of Parameters
480000*480000 + 480000 +1 = approximately 230 Billion !!!
400 X 400 X 3
x1
x2
x3
x480000
Input Hidden Layers
Y
Output
a1(1)
a2(1)
a3(1)
a480000(1)
a1(2)
Parameters Computation for Image
Convolutional Layer – Feature Extraction
• Filter
1 1 1 1 1 1 0.015686 0.015686 0.011765 0.015686 0.015686 0.015686 0.015686 0.964706 0.988235 0.964706 0.866667 0.031373 0.023529 0.007843
0.007843 0.741176 1 1 0.984314 0.023529 0.019608 0.015686 0.015686 0.015686 0.011765 0.101961 0.972549 1 1 0.996078 0.996078 0.996078 0.058824 0.015686
0.019608 0.513726 1 1 1 0.019608 0.015686 0.015686 0.015686 0.007843 0.011765 1 1 1 0.996078 0.031373 0.015686 0.019608 1 0.011765
0.015686 0.733333 1 1 0.996078 0.019608 0.019608 0.015686 0.015686 0.011765 0.984314 1 1 0.988235 0.027451 0.015686 0.007843 0.007843 1 0.352941
0.015686 0.823529 1 1 0.988235 0.019608 0.019608 0.015686 0.015686 0.019608 1 1 0.980392 0.015686 0.015686 0.015686 0.015686 0.996078 1 0.996078
0.015686 0.913726 1 1 0.996078 0.019608 0.019608 0.019608 0.019608 1 1 0.984314 0.015686 0.015686 0.015686 0.015686 0.952941 1 1 0.992157
0.019608 0.913726 1 1 0.988235 0.019608 0.019608 0.019608 0.039216 0.996078 1 0.015686 0.015686 0.015686 0.015686 0.996078 1 1 1 0.007843
0.019608 0.898039 1 1 0.988235 0.019608 0.015686 0.019608 0.968628 0.996078 0.980392 0.027451 0.015686 0.019608 0.980392 0.972549 1 1 1 0.019608
0.043137 0.905882 1 1 1 0.015686 0.035294 0.968628 1 1 0.023529 1 0.792157 0.996078 1 1 0.980392 0.992157 0.039216 0.023529
1 1 1 1 1 0.992157 0.992157 1 1 0.984314 0.015686 0.015686 0.858824 0.996078 1 0.992157 0.501961 0.019608 0.019608 0.023529
0.996078 0.992157 1 1 1 0.933333 0.003922 0.996078 1 0.988235 1 0.992157 1 1 1 0.988235 1 1 1 1
0.015686 0.74902 1 1 0.984314 0.019608 0.019608 0.031373 0.984314 0.023529 0.015686 0.015686 1 1 1 0 0.003922 0.027451 0.980392 1
0.019608 0.023529 1 1 1 0.019608 0.019608 0.564706 0.894118 0.019608 0.015686 0.015686 1 1 1 0.015686 0.015686 0.015686 0.05098 1
0.015686 0.015686 1 1 1 0.047059 0.019608 0.992157 0.007843 0.011765 0.011765 0.015686 1 1 1 0.015686 0.019608 0.996078 0.023529 0.996078
0.019608 0.015686 0.243137 1 1 0.976471 0.035294 1 0.003922 0.011765 0.011765 0.015686 1 1 1 0.988235 0.988235 1 0.003922 0.015686
0.019608 0.019608 0.027451 1 1 0.992157 0.223529 0.662745 0.011765 0.011765 0.011765 0.015686 1 1 1 0.015686 0.023529 0.996078 0.011765 0.011765
0.015686 0.015686 0.011765 1 1 1 1 0.035294 0.011765 0.011765 0.011765 0.015686 1 1 1 0.015686 0.015686 0.964706 0.003922 0.996078
0.007843 0.019608 0.011765 0.054902 1 1 0.988235 0.007843 0.011765 0.011765 0.015686 0.011765 1 1 1 0.015686 0.015686 0.015686 0.023529 1
0.007843 0.007843 0.015686 0.015686 0.960784 1 0.490196 0.015686 0.015686 0.015686 0.007843 0.027451 1 1 1 0.011765 0.011765 0.043137 1 1
0.023529 0.003922 0.007843 0.023529 0.980392 0.976471 0.039216 0.019608 0.007843 0.019608 0.015686 1 1 1 1 1 1 1 1 1
0 1 0
1 -4 1
0 1 0
Input Image Convoluted Image
▪ Inspired by the neurophysiological experiments conducted by Hubel and Wiesel 1962.
• What is Convolution?
Input Image Convolved Image
(Feature Map)
a b c d
e f g h
i j k l
m n o p
w1 w2
w3 w4
Filter
h1 h2
ℎ1 = 𝑓 𝑎 ∗ 𝑤1 + 𝑏 ∗ 𝑤2 + 𝑒 ∗ 𝑤3 + 𝑓 ∗ 𝑤4
ℎ2 = 𝑓 𝑏 ∗ 𝑤1 + 𝑐 ∗ 𝑤2 + 𝑓 ∗ 𝑤3 + 𝑔 ∗ 𝑤4
Convolutional Layer Process
http://deeplearning.stanford.edu/wiki/index.php/Feature_extraction_using_convolution
Input matrix
Convolutional
3x3 filter
• Example:
▪ In Convolutional neural networks, hidden units are only connected to local receptive field.
Input Image
Layer 1
Feature Map
Layer 2
Feature Map
w1 w2
w3 w4
w5 w6
w7 w8
Filter 1
Filter 2
Convolutional Layer Process
Pooling
• Max pooling: Provides the maximum output within a rectangular neighborhood.
• Average pooling: Provides the average output of a rectangular neighborhood.
• Stride – Number of steps (columns when left to right /rows while top to bottom) movement of pooling
process
1 3 5 3
4 2 3 1
3 1 1 3
0 1 0 4
MaxPool with 2 X 2 filter with
stride of 2
Input Matrix Output Matrix
4 5
3 4
Fully Connected Layer
• So far, the convolution layer has extracted some valuable features from the data. These features are sent to the fully
connected layer that generates the final results.
• The fully connected layer in a CNN is just the traditional neural network.
• The output from the convolution layer is a 2D matrix. Ideally, we would want each row to represent a single input
image.
• Actually, the fully connected layer can only work with 1D data. Hence, the values generated from the previous
operation are first converted into a 1D format.
• Once the data is converted into a 1D array, it is sent to the fully connected layer. All of these individual values are
treated as separate features that represent the image.
• The fully connected layer performs two operations on the incoming data – a linear transformation (by activation
function) and a non-linear transformation (capture complex relationships).
CNN – Backward Propagation
• During the forward propagation process, we randomly initialized the weights, biases and filters.
• These values are treated as parameters from the convolutional neural network algorithm.
• In the backward propagation process, the model tries to update the parameters such that the overall predictions are more
accurate.
• For updating these parameters, use the gradient descent technique.
Gradient Descent to minimize (optimize) the loss(error):
• Consider that following in the curve for loss function where we have a parameter a:
• During the random initialization of the parameter, we get the value of a as a2.
• It is clear from the picture that the minimum value of loss is at a1 and not a2.
• The gradient descent technique tries to find this value of parameter (a) at which the loss is minimum.
• From the figure, We understand that we need to update the value a2 and bring it closer to a1.
• To decide the direction of movement, i.e. whether to increase or decrease the value of the parameter, we calculate the
gradient or slope at the current point.
• Based on the value of the gradient, we can determine the updated parameter values.
• When the slope is negative, the value of the parameter will be increased.
• When the slope is positive, the value of the parameter should be decreased by a small amount.
Generic equation for updating the parameter values:
• new_parameter = old_parameter - (learning_rate * gradient_of_parameter)
• The learning rate is a constant that controls the amount of change being made to the old value. The slope or the gradient determine the direction of the new
values, that is, should the values be increased or decreased. So, we need to find the gradients, that is, change in error with respect to the parameters in order to
CNN – Backward Propagation
• CNN contains three parameters – weights, biases and filters. Let calculate the gradients for these parameters one by one.
Backward Propagation at Fully Connected Layer
• Fully connected layer has two parameters – weight matrix and bias matrix.
• Let us start by calculating the change in error with respect to weights – ∂E/∂W.
• Since the error is not directly dependent on the weight matrix, we will use the concept of chain rule to find this value.
• The computation graph shown below will help us define ∂E/∂W:
• ∂E/∂W = ∂E/∂O . ∂O/∂Z2. ∂z/∂W
Find these derivatives separately:
1. Change in error with respect to output
• Suppose the actual values for the data are denoted as y’ and
• the predicted output is represented as O.
• Then the error is: E = (y' - O)2/2
• If we differentiate the error with respect to the output: ∂E/∂O = -(y'-O)
2. Change in output with respect to Z2 (linear transformation output)
• To find the derivative of output O with respect to Z2, we must first define O in terms of Z2. The output is sigmoid of Z2.
Thus, ∂O/∂Z2 is effectively the derivative of Sigmoid function: f(x) = 1/(1+e^-x)
• The derivative of this function: f'(x) = (1+e-x)-1[1-(1+e-x)-1]
f'(x) = sigmoid(x)(1-sigmoid(x))
∂O/∂Z2 = (O)(1-O)
CNN – Backward Propagation
3. Change in Z2 with respect to W (Weights)
• The value Z2 is the result of the linear transformation process. Here is the equation of Z2 in terms of weights:
Z2 = WT.A1 + b
• On differentiating Z2 with respect to W, we will get the value A1 itself:
∂Z2/∂W = A1
• Now that we have the individual derivations, we can use the chain rule to find the change in error with respect to weights:
∂E/∂W = ∂E/∂O . ∂O/∂Z2. ∂Z2/∂W
∂E/∂W = -(y'-o) . sigmoid'. A1
• The shape of ∂E/∂W will be the same as the weight matrix W.
•
• We can update the values in the weight matrix:
W_new = W_old - lr*∂E/∂W
4. Change in Z2 with respect to b (bias)
b_new = b_old - lr*∂E/∂b
CNN – Backward Propagation
Backward Propagation at Convolution Layer
• For the convolution layer, we had the filter matrix as a parameter. During the forward propagation process, we randomly
initialized the filter matrix.
• Update these values using:
new_parameter = old_parameter - (learning_rate * gradient_of_parameter)
• To update the filter matrix, find the gradient of the parameter ∂E/∂f.
• Figure shows the computation for backward propagation.
• The derivative ∂E/∂f as:
∂E/∂f = ∂E/∂O.∂O/∂Z2.∂Z2/∂A1 .∂A1/∂Z1.∂Z1/∂f
• We have already determined the values for ∂E/∂O and ∂O/∂Z2.
• Let us find the values for the remaining derivatives.
1. Change in Z2 with respect to A1
• To find the value for ∂Z2/∂A1 , we need to have the equation for Z2 in terms of A1:
Z2 = WT.A1 + b
• On differentiating the above equation with respect to A1, we get WT as the result: ∂Z2/∂A1 = WT
2. Change in A1 with respect to Z1
• The next value that we need to determine is ∂A1/∂Z1. Have a look at the equation of A1. A1 = sigmoid(Z1)
• This is simply the Sigmoid function. The derivative of Sigmoid would be: ∂A1/∂Z1 = (A1)(1-A1)
CNN – Backward Propagation
Backward Propagation at Convolution Layer
3. Change in Z1 with respect to filter f
• Finally, we need the value for ∂Z1/∂f. Here’s the equation for Z1
Z1 = X * f
• Differentiating Z with respect to X will simply give us X:
∂Z1/∂f = X
• Now that we have all the required values,
• let’s find the overall change in error with respect to the filter:
∂E/∂f = ∂E/∂O.∂O/∂Z2.∂Z2/∂A1 .∂A1/∂Z1 * ∂Z1/∂f
• Notice that in the equation above, the value (∂E/∂O.∂O/∂Z2.∂Z2/∂A1 .∂A1/∂Z) is convolved with ∂Z1/∂f instead of using a
simple dot product.
• Why? The main reason is that during forward propagation, we perform a convolution operation for the images and filters.
• This is repeated in the backward propagation process.
• Once we have the value for ∂E/∂f, we will use this value to update the original filter value:
f = f - lr*(∂E/∂f)
• This completes the backpropagation for convolutional neural networks.
Recall - Deep learning Process
• Deep learning has an inbuilt automatic multi stage feature learning process that learns
rich hierarchical representations (i.e. features).
Low Level
Features
Mid Level
Features
Output
(e.g. outdoor,
indoor)
High
Level
Features
Trainable
Classifier
• Image
Pixel Edge Texture Motif Part Object
• Text
Character Word Word-group Clause Sentence Story
E.g.
Top 7 Applications of Convolutional Neural
Networks
1.Decoding Facial Recognition
Facial recognition is broken down by a convolutional neural network into the
following major components -
• Identifying every face in the picture
• Focusing on each face despite external factors, such as light, angle, pose,
etc.
• Identifying unique features
• Comparing all the collected data with already existing data in the database
to match a face with a name.
similar process is followed for scene labeling as well.
2.Analyzing Documents
Convolutional neural networks can also be used for document analysis. This is not just useful for
handwriting analysis, but also has a major stake in recognizers. For a machine to be able to scan an
individual's writing, and then compare that to the wide database it has, it must execute almost a
million commands a minute. It is said with the use of CNNs and newer models and algorithms, the
error rate has been brought down to a minimum of 0.4% at a character level, though it's complete
testing is yet to be widely seen.
3.Historic and Environmental Collections
CNNs are also used for more complex purposes such as natural history collections. These
collections act as key players in documenting major parts of history such as biodiversity, evolution,
habitat loss, biological invasion, and climate change.
4.Understanding Climate
CNNs can be used to play a major role in the fight against climate change, especially in
understanding the reasons why we see such drastic changes and how we could experiment in
curbing the effect. It is said that the data in such natural history collections can also provide greater
social and scientific insights, but this would require skilled human resources such as researchers
who can physically visit these types of repositories. There is a need for more manpower to carry
out deeper experiments in this field.
5.Grey Areas
Introduction of the grey area into CNNs is posed to provide a much more realistic picture of the
real world. Currently, CNNs largely function exactly like a machine, seeing a true and false
value for every question. However, as humans, we understand that the real world plays out in a
thousand shades of grey. Allowing the machine to understand and process fuzzier logic will help
it understand the grey area us humans live in and strive to work against. This will help CNNs get
a more holistic view of what human sees.
6.Advertising
CNNs have already brought in a world of difference to advertising with the introduction of
programmatic buying and data-driven personalized advertising.
7. Other Interesting Fields
CNNs are poised to be the future with their introduction into driverless cars, robots that can mimic human
behavior, aides to human genome mapping projects, predicting earthquakes and natural disasters, and maybe
even self-diagnoses of medical problems.
So, you wouldn't even have to drive down to a clinic or schedule an appointment with a doctor to ensure your
sneezing attack or high fever is just the simple flu and not symptoms of some rare disease.
One problem that researchers are working on with CNNs is brain cancer detection.
The earlier detection of brain cancer can prove to be a big step in saving more lives affected by this illness.
References
• Richard Szeliski, Computer Vision: Algorithms and Applications, Springer 2010
• Ian Goodfellow, Yoshua Bengio, Aaron Courvill, Deep Learning

More Related Content

What's hot

Introduction to CNN
Introduction to CNNIntroduction to CNN
Introduction to CNNShuai Zhang
 
Deep Learning - Convolutional Neural Networks
Deep Learning - Convolutional Neural NetworksDeep Learning - Convolutional Neural Networks
Deep Learning - Convolutional Neural NetworksChristian Perone
 
Training Neural Networks
Training Neural NetworksTraining Neural Networks
Training Neural NetworksDatabricks
 
Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...
Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...
Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...Simplilearn
 
Feed forward ,back propagation,gradient descent
Feed forward ,back propagation,gradient descentFeed forward ,back propagation,gradient descent
Feed forward ,back propagation,gradient descentMuhammad Rasel
 
Feedforward neural network
Feedforward neural networkFeedforward neural network
Feedforward neural networkSopheaktra YONG
 
Introduction to Deep Learning
Introduction to Deep LearningIntroduction to Deep Learning
Introduction to Deep LearningOswald Campesato
 
Perceptron (neural network)
Perceptron (neural network)Perceptron (neural network)
Perceptron (neural network)EdutechLearners
 
Convolutional Neural Network (CNN)
Convolutional Neural Network (CNN)Convolutional Neural Network (CNN)
Convolutional Neural Network (CNN)Muhammad Haroon
 
Artificial Neural Networks Lect3: Neural Network Learning rules
Artificial Neural Networks Lect3: Neural Network Learning rulesArtificial Neural Networks Lect3: Neural Network Learning rules
Artificial Neural Networks Lect3: Neural Network Learning rulesMohammed Bennamoun
 
Machine Learning: Introduction to Neural Networks
Machine Learning: Introduction to Neural NetworksMachine Learning: Introduction to Neural Networks
Machine Learning: Introduction to Neural NetworksFrancesco Collova'
 
Neuro-fuzzy systems
Neuro-fuzzy systemsNeuro-fuzzy systems
Neuro-fuzzy systemsSagar Ahire
 
Deep Learning With Neural Networks
Deep Learning With Neural NetworksDeep Learning With Neural Networks
Deep Learning With Neural NetworksAniket Maurya
 
Deep Belief Networks
Deep Belief NetworksDeep Belief Networks
Deep Belief NetworksHasan H Topcu
 
Introduction to Neural Networks
Introduction to Neural NetworksIntroduction to Neural Networks
Introduction to Neural NetworksDatabricks
 

What's hot (20)

Introduction to CNN
Introduction to CNNIntroduction to CNN
Introduction to CNN
 
Deep Learning - Convolutional Neural Networks
Deep Learning - Convolutional Neural NetworksDeep Learning - Convolutional Neural Networks
Deep Learning - Convolutional Neural Networks
 
Mc culloch pitts neuron
Mc culloch pitts neuronMc culloch pitts neuron
Mc culloch pitts neuron
 
Training Neural Networks
Training Neural NetworksTraining Neural Networks
Training Neural Networks
 
Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...
Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...
Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...
 
Feed forward ,back propagation,gradient descent
Feed forward ,back propagation,gradient descentFeed forward ,back propagation,gradient descent
Feed forward ,back propagation,gradient descent
 
Feedforward neural network
Feedforward neural networkFeedforward neural network
Feedforward neural network
 
Introduction to Deep Learning
Introduction to Deep LearningIntroduction to Deep Learning
Introduction to Deep Learning
 
Perceptron (neural network)
Perceptron (neural network)Perceptron (neural network)
Perceptron (neural network)
 
Convolutional Neural Network (CNN)
Convolutional Neural Network (CNN)Convolutional Neural Network (CNN)
Convolutional Neural Network (CNN)
 
Cnn
CnnCnn
Cnn
 
Artificial Neural Networks Lect3: Neural Network Learning rules
Artificial Neural Networks Lect3: Neural Network Learning rulesArtificial Neural Networks Lect3: Neural Network Learning rules
Artificial Neural Networks Lect3: Neural Network Learning rules
 
Machine Learning: Introduction to Neural Networks
Machine Learning: Introduction to Neural NetworksMachine Learning: Introduction to Neural Networks
Machine Learning: Introduction to Neural Networks
 
Neuro-fuzzy systems
Neuro-fuzzy systemsNeuro-fuzzy systems
Neuro-fuzzy systems
 
Artificial neural network
Artificial neural networkArtificial neural network
Artificial neural network
 
Deep Learning With Neural Networks
Deep Learning With Neural NetworksDeep Learning With Neural Networks
Deep Learning With Neural Networks
 
Recurrent neural network
Recurrent neural networkRecurrent neural network
Recurrent neural network
 
Activation function
Activation functionActivation function
Activation function
 
Deep Belief Networks
Deep Belief NetworksDeep Belief Networks
Deep Belief Networks
 
Introduction to Neural Networks
Introduction to Neural NetworksIntroduction to Neural Networks
Introduction to Neural Networks
 

Similar to Deep learning

Neural Network_basic_Reza_Lecture_3.pptx
Neural Network_basic_Reza_Lecture_3.pptxNeural Network_basic_Reza_Lecture_3.pptx
Neural Network_basic_Reza_Lecture_3.pptxshamimreza94
 
Activation function
Activation functionActivation function
Activation functionAstha Jain
 
Introduction to Perceptron and Neural Network.pptx
Introduction to Perceptron and Neural Network.pptxIntroduction to Perceptron and Neural Network.pptx
Introduction to Perceptron and Neural Network.pptxPoonam60376
 
Deep Learning Interview Questions And Answers | AI & Deep Learning Interview ...
Deep Learning Interview Questions And Answers | AI & Deep Learning Interview ...Deep Learning Interview Questions And Answers | AI & Deep Learning Interview ...
Deep Learning Interview Questions And Answers | AI & Deep Learning Interview ...Simplilearn
 
Multilayer Perceptron Neural Network MLP
Multilayer Perceptron Neural Network MLPMultilayer Perceptron Neural Network MLP
Multilayer Perceptron Neural Network MLPAbdullah al Mamun
 
Neural-Networks.ppt
Neural-Networks.pptNeural-Networks.ppt
Neural-Networks.pptRINUSATHYAN
 
20200428135045cfbc718e2c.pdf
20200428135045cfbc718e2c.pdf20200428135045cfbc718e2c.pdf
20200428135045cfbc718e2c.pdfTitleTube
 
2011 0480.neural-networks
2011 0480.neural-networks2011 0480.neural-networks
2011 0480.neural-networksParneet Kaur
 
neuralnetwork.pptx
neuralnetwork.pptxneuralnetwork.pptx
neuralnetwork.pptxSherinRappai
 
Activation functions and Training Algorithms for Deep Neural network
Activation functions and Training Algorithms for Deep Neural networkActivation functions and Training Algorithms for Deep Neural network
Activation functions and Training Algorithms for Deep Neural networkGayatri Khanvilkar
 
33.-Multi-Layer-Perceptron.pdf
33.-Multi-Layer-Perceptron.pdf33.-Multi-Layer-Perceptron.pdf
33.-Multi-Layer-Perceptron.pdfgnans Kgnanshek
 

Similar to Deep learning (20)

UNIT 5-ANN.ppt
UNIT 5-ANN.pptUNIT 5-ANN.ppt
UNIT 5-ANN.ppt
 
Neural Network_basic_Reza_Lecture_3.pptx
Neural Network_basic_Reza_Lecture_3.pptxNeural Network_basic_Reza_Lecture_3.pptx
Neural Network_basic_Reza_Lecture_3.pptx
 
14_cnn complete.pptx
14_cnn complete.pptx14_cnn complete.pptx
14_cnn complete.pptx
 
Activation function
Activation functionActivation function
Activation function
 
Introduction to Perceptron and Neural Network.pptx
Introduction to Perceptron and Neural Network.pptxIntroduction to Perceptron and Neural Network.pptx
Introduction to Perceptron and Neural Network.pptx
 
Unit 2 ml.pptx
Unit 2 ml.pptxUnit 2 ml.pptx
Unit 2 ml.pptx
 
Deep Learning Interview Questions And Answers | AI & Deep Learning Interview ...
Deep Learning Interview Questions And Answers | AI & Deep Learning Interview ...Deep Learning Interview Questions And Answers | AI & Deep Learning Interview ...
Deep Learning Interview Questions And Answers | AI & Deep Learning Interview ...
 
Multilayer Perceptron Neural Network MLP
Multilayer Perceptron Neural Network MLPMultilayer Perceptron Neural Network MLP
Multilayer Perceptron Neural Network MLP
 
UNIT-3 .PPTX
UNIT-3 .PPTXUNIT-3 .PPTX
UNIT-3 .PPTX
 
Neural-Networks.ppt
Neural-Networks.pptNeural-Networks.ppt
Neural-Networks.ppt
 
ANN - UNIT 2.pptx
ANN - UNIT 2.pptxANN - UNIT 2.pptx
ANN - UNIT 2.pptx
 
20200428135045cfbc718e2c.pdf
20200428135045cfbc718e2c.pdf20200428135045cfbc718e2c.pdf
20200428135045cfbc718e2c.pdf
 
2011 0480.neural-networks
2011 0480.neural-networks2011 0480.neural-networks
2011 0480.neural-networks
 
neuralnetwork.pptx
neuralnetwork.pptxneuralnetwork.pptx
neuralnetwork.pptx
 
neuralnetwork.pptx
neuralnetwork.pptxneuralnetwork.pptx
neuralnetwork.pptx
 
Activation functions and Training Algorithms for Deep Neural network
Activation functions and Training Algorithms for Deep Neural networkActivation functions and Training Algorithms for Deep Neural network
Activation functions and Training Algorithms for Deep Neural network
 
33.-Multi-Layer-Perceptron.pdf
33.-Multi-Layer-Perceptron.pdf33.-Multi-Layer-Perceptron.pdf
33.-Multi-Layer-Perceptron.pdf
 
Perceptron
Perceptron Perceptron
Perceptron
 
Lec 6-bp
Lec 6-bpLec 6-bp
Lec 6-bp
 
Unit 6: Application of AI
Unit 6: Application of AIUnit 6: Application of AI
Unit 6: Application of AI
 

More from Kuppusamy P

Recurrent neural networks rnn
Recurrent neural networks   rnnRecurrent neural networks   rnn
Recurrent neural networks rnnKuppusamy P
 
Image segmentation
Image segmentationImage segmentation
Image segmentationKuppusamy P
 
Image enhancement
Image enhancementImage enhancement
Image enhancementKuppusamy P
 
Feature detection and matching
Feature detection and matchingFeature detection and matching
Feature detection and matchingKuppusamy P
 
Image processing, Noise, Noise Removal filters
Image processing, Noise, Noise Removal filtersImage processing, Noise, Noise Removal filters
Image processing, Noise, Noise Removal filtersKuppusamy P
 
Flowchart design for algorithms
Flowchart design for algorithmsFlowchart design for algorithms
Flowchart design for algorithmsKuppusamy P
 
Algorithm basics
Algorithm basicsAlgorithm basics
Algorithm basicsKuppusamy P
 
Problem solving using Programming
Problem solving using ProgrammingProblem solving using Programming
Problem solving using ProgrammingKuppusamy P
 
Parts of Computer, Hardware and Software
Parts of Computer, Hardware and Software Parts of Computer, Hardware and Software
Parts of Computer, Hardware and Software Kuppusamy P
 
Java methods or Subroutines or Functions
Java methods or Subroutines or FunctionsJava methods or Subroutines or Functions
Java methods or Subroutines or FunctionsKuppusamy P
 
Java iterative statements
Java iterative statementsJava iterative statements
Java iterative statementsKuppusamy P
 
Java conditional statements
Java conditional statementsJava conditional statements
Java conditional statementsKuppusamy P
 
Java introduction
Java introductionJava introduction
Java introductionKuppusamy P
 
Logistic regression in Machine Learning
Logistic regression in Machine LearningLogistic regression in Machine Learning
Logistic regression in Machine LearningKuppusamy P
 
Anomaly detection (Unsupervised Learning) in Machine Learning
Anomaly detection (Unsupervised Learning) in Machine LearningAnomaly detection (Unsupervised Learning) in Machine Learning
Anomaly detection (Unsupervised Learning) in Machine LearningKuppusamy P
 
Machine Learning Performance metrics for classification
Machine Learning Performance metrics for classificationMachine Learning Performance metrics for classification
Machine Learning Performance metrics for classificationKuppusamy P
 
Machine learning Introduction
Machine learning IntroductionMachine learning Introduction
Machine learning IntroductionKuppusamy P
 

More from Kuppusamy P (20)

Recurrent neural networks rnn
Recurrent neural networks   rnnRecurrent neural networks   rnn
Recurrent neural networks rnn
 
Image segmentation
Image segmentationImage segmentation
Image segmentation
 
Image enhancement
Image enhancementImage enhancement
Image enhancement
 
Feature detection and matching
Feature detection and matchingFeature detection and matching
Feature detection and matching
 
Image processing, Noise, Noise Removal filters
Image processing, Noise, Noise Removal filtersImage processing, Noise, Noise Removal filters
Image processing, Noise, Noise Removal filters
 
Flowchart design for algorithms
Flowchart design for algorithmsFlowchart design for algorithms
Flowchart design for algorithms
 
Algorithm basics
Algorithm basicsAlgorithm basics
Algorithm basics
 
Problem solving using Programming
Problem solving using ProgrammingProblem solving using Programming
Problem solving using Programming
 
Parts of Computer, Hardware and Software
Parts of Computer, Hardware and Software Parts of Computer, Hardware and Software
Parts of Computer, Hardware and Software
 
Strings in java
Strings in javaStrings in java
Strings in java
 
Java methods or Subroutines or Functions
Java methods or Subroutines or FunctionsJava methods or Subroutines or Functions
Java methods or Subroutines or Functions
 
Java arrays
Java arraysJava arrays
Java arrays
 
Java iterative statements
Java iterative statementsJava iterative statements
Java iterative statements
 
Java conditional statements
Java conditional statementsJava conditional statements
Java conditional statements
 
Java data types
Java data typesJava data types
Java data types
 
Java introduction
Java introductionJava introduction
Java introduction
 
Logistic regression in Machine Learning
Logistic regression in Machine LearningLogistic regression in Machine Learning
Logistic regression in Machine Learning
 
Anomaly detection (Unsupervised Learning) in Machine Learning
Anomaly detection (Unsupervised Learning) in Machine LearningAnomaly detection (Unsupervised Learning) in Machine Learning
Anomaly detection (Unsupervised Learning) in Machine Learning
 
Machine Learning Performance metrics for classification
Machine Learning Performance metrics for classificationMachine Learning Performance metrics for classification
Machine Learning Performance metrics for classification
 
Machine learning Introduction
Machine learning IntroductionMachine learning Introduction
Machine learning Introduction
 

Recently uploaded

ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxiammrhaywood
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxmanuelaromero2013
 
Capitol Tech U Doctoral Presentation - April 2024.pptx
Capitol Tech U Doctoral Presentation - April 2024.pptxCapitol Tech U Doctoral Presentation - April 2024.pptx
Capitol Tech U Doctoral Presentation - April 2024.pptxCapitolTechU
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsanshu789521
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatYousafMalik24
 
Hierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of managementHierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of managementmkooblal
 
How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17Celine George
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentInMediaRes1
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdfssuser54595a
 
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxEPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxRaymartEstabillo3
 
Historical philosophical, theoretical, and legal foundations of special and i...
Historical philosophical, theoretical, and legal foundations of special and i...Historical philosophical, theoretical, and legal foundations of special and i...
Historical philosophical, theoretical, and legal foundations of special and i...jaredbarbolino94
 
Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for BeginnersSabitha Banu
 
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfEnzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfSumit Tiwari
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxthorishapillay1
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Educationpboyjonauth
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon AUnboundStockton
 
AmericanHighSchoolsprezentacijaoskolama.
AmericanHighSchoolsprezentacijaoskolama.AmericanHighSchoolsprezentacijaoskolama.
AmericanHighSchoolsprezentacijaoskolama.arsicmarija21
 

Recently uploaded (20)

ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptx
 
Capitol Tech U Doctoral Presentation - April 2024.pptx
Capitol Tech U Doctoral Presentation - April 2024.pptxCapitol Tech U Doctoral Presentation - April 2024.pptx
Capitol Tech U Doctoral Presentation - April 2024.pptx
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha elections
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice great
 
Hierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of managementHierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of management
 
How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media Component
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptx
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
 
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
 
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxEPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
 
Historical philosophical, theoretical, and legal foundations of special and i...
Historical philosophical, theoretical, and legal foundations of special and i...Historical philosophical, theoretical, and legal foundations of special and i...
Historical philosophical, theoretical, and legal foundations of special and i...
 
Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for Beginners
 
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfEnzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptx
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Education
 
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon A
 
AmericanHighSchoolsprezentacijaoskolama.
AmericanHighSchoolsprezentacijaoskolama.AmericanHighSchoolsprezentacijaoskolama.
AmericanHighSchoolsprezentacijaoskolama.
 

Deep learning

  • 2. Content • Image enhancement • Spatial domain enhancement • Point operations, Spatial operations • Digital Negative • Contrast Stretching • Thresholding • Gray level slicing • Bit plane slicing • Dynamic range compression • Histogram modeling • Adaptive Histogram Equalization • Frequency domain enhancement • Fourier Transforms / Spectral (Frequency) Analysis • Color Image • Color Image Enhancement
  • 3. AI Vs Machine Learning Vs Deep Learning
  • 4. Artificial Neural Networks — Mapping the Human Brain • The human brain consists of neurons or nerve cells which transmit and process the information received from our senses. • Many such nerve cells are arranged together in our brain to form a network of nerves. These nerves pass electrical impulses i.e the excitation from one neuron to the other. • The dendrites receive the impulse from the terminal button or synapse of an adjoining neuron. • Dendrites carry the impulse to the nucleus of the nerve cell (soma). • Here , the electrical impulse is processed and then passed on to the axon. • The axon is longer branch among the dendrites which carries the impulse from the soma to the synapse. • The synapse then , passes the impulse to dendrites of the second neuron. • Thus, a complex network of neurons is created in the human brain. Artificial Neuron Construction • Like above, the neurons are created artificially on a computer. Connecting many such artificial neurons creates an artificial neural network. • Artificial neuron works similar to a neuron present in human brain. • The data in the network flows through each neuron by a connection. • Every connection has a specific weight by which the flow of data is regulated.
  • 5. Construction of an Artificial Neuron • The same concept of the network of neurons is used in machine learning algorithms. • In this case , the neurons are created artificially on a computer . Connecting many such artificial neurons creates an artificial neural network. • The working of an artificial neuron is similar to that of a neuron present in our brain. • The data in the network flows through each neuron by a connection. • Every connection has a specific weight by which the flow of data is regulated. Neural networks Applications ✓Translate text ✓Identify faces ✓Recognize speech ✓Read handwritten text ✓Control robots
  • 6. Artificial Neuron Working Principle • x1 and x2 are the two inputs. They could be an integer, float etc. Assume them as 1 and 0 respectively. • When these inputs pass through the connections ( through W1 and W2 ), they are adjusted depending on the connection weights. • Let’s assume, W1 = 0.5, W2 = 0.6 and W3 = 0.2 , then adjusting the weights using z = x1 * W1 + x2 * W2 = 1 * 0.5 + 0 * 0.6 = 0.5 • Hence , 0.5 is the value adjusted by the weights of the connection. • The connections are assumed to be the dendrites in an artificial neuron. • To process the information, need an activation function ( soma ) . • Here, use the simple sigmoid function: f(x) = 1/(1+e-z) • f ( x ) = f ( 0.5 ) = 0.6224 • The above value is the output of the neuron ( axon ). • This value needs to be multiplied by W3 . • 0.6224 * W3 = 0.6224 * 0.2 = 0.12448 • Now, Again apply the activation function to the above value , • f ( x ) = f ( 0.12448 ) = 0.5310 • Hence, y ( final prediction) = 0.5310 • In this example the weights were randomly generated.
  • 7. Working of Neural Network ➢ A neural network contains different layers. ➢ Input layer - The first layer is the, it picks up the input signals and passes them to the next layer. ➢ Hidden layer - The next layer does all kinds of calculations and feature extractions. ➢ Often, possibility of more than one hidden layer. ➢ Output layer - Provides the final result. Classification of Neural Networks 1. Shallow neural network: The Shallow neural network has only one hidden layer between the input and output. 2. Deep neural network: Deep neural networks have more than one hidden layers. For instance, Google LeNet model for image recognition counts 22 layers.
  • 8. Working of Neural Network The complete training process of a neural network involves two steps. 1. Forward Propagation - Receive input data, process the information, and generate output • Images are fed into the input layer in the form of numbers. These numerical values denote the intensity of pixels in the image. • The neurons in the hidden layers apply a few mathematical operations on these values to learn the features. • To perform these mathematical operations, there are certain parameter values that are randomly initialized. • Pass these mathematical operations at the hidden layer, the result is sent to the output layer which generates the final prediction. 2. Backward Propagation - Calculate error and update the parameters of the network • Once the output is generated, the next step is to compare the output with the actual (Ground truth/label) value y. • Based on the final output, and how close or far (difference) this is from the actual value (error), the values of the parameters are updated. • The forward propagation process is repeated using the updated parameter values and new outputs are generated.
  • 9. Activation function • Activation function defines the output of given input or set of inputs or in other terms defines node of the output of node that is given in inputs. • They will deactivate or activate neurons to get the desired output. • It also performs a nonlinear transformation on the input to get better results on a complex neural network. • Activation function also helps to normalize the output of any input in the range between 1 to -1. • Activation function must be efficient and it should reduce the computation time because the neural network sometimes trained on millions of data points. • Activation function decides in any neural network that given input or receiving information is relevant or it is irrelevant
  • 10. Types of Activation Functions 1. Binary step 2. Linear activation function Non Linear Activation Functions: 3. ReLU 4. LeakyReLU 5. Sigmoid 6. Tanh 7. Softmax
  • 11. 1. Binary (Unit) Step Activation Function • Binary step activation function is a threshold bases classifier. If the input value is above or below a certain threshold, the neuron is activated and sends exactly the same signal to the next layer. • f(x) = 1 if x > 0 0 if x < 0 • Here, threshold value is 0. • It is very simple and useful to classify binary problems or classifier. • It does not allow multi-value outputs i.e. cannot support classifying the inputs into one of several categories.
  • 12. 2. Linear Activation Function • It is a simple straight line activation function where function is directly proportional to the weighted sum of neurons or input. Linear activation functions are better in giving a wide range of activations and a line of a positive slope may increase the firing rate as the input rate increases. Range is –infinity to +infinity f(x) = X • Not possible to use backpropagation (gradient descent) to train the model—the derivative of the function with respect to X is a constant, and has no relation to the input, X. So it’s not possible to go back and understand which weights in the input neurons can provide a better prediction. • All layers of the neural network collapse into single layer with linear activation functions, no matter how many layers in the neural network, the last layer will be a linear function of the first layer (because a linear combination of linear functions is still a linear function). So a linear activation function turns the neural network into just one layer. • A neural network with a linear activation function is simply a linear regression model. It has limited power and ability to handle complexity varying parameters of input data.
  • 13. 3. ReLU( Rectified Linear unit) Activation function • Rectified linear unit or ReLU is most widely used activation function. Range is from 0 to infinity. • It is converging quickly. So, Fast Computation. • Disadvantage: • Dying ReLU’ problem: when inputs approach zero, or are negative, the gradient of the function becomes zero, the network cannot perform backpropagation and cannot learn. • To avoid this unfitting, use Leaky ReLU function instead of ReLU, in Leaky ReLU range is expanded which enhances the performance.
  • 14. 4. Leaky ReLU Activation Function • Leaky ReLU activation function to solve the ‘Dying ReLU’ problem. This is variation of ReLU. • Hence, it has a small positive slope in the negative area, so it does enable backpropagation, even for negative input values. • Disadvantage: Leaky ReLU does not provide consistent predictions for negative input values.
  • 15. 5. Sigmoid Activation Function • The sigmoid activation function is a probabilistic approach towards decision making and output range is between 0 to 1 i.e. normalizing the each output of neuron. • Since the range is the minimum, prediction (decision) would be more accurate. • The equation for the sigmoid function is f(x) = 1/(1+e(-x) ) • Disadvantage: Vanishing gradient problem occurs because it convert large input in between the range of 0 to 1 and therefore their derivatives become much smaller which does not give satisfactory output. • To solve this problem, ReLU is used which do not have a small derivative problem.
  • 16. 6. Hyperbolic Tangent Activation Function(Tanh) • This activation function is slightly better than the sigmoid function, like the sigmoid function. • Zero centered— Making it easier to model inputs that have strongly negative, neutral, and strongly positive values. • Range is in between -1 to 1.
  • 17. 7. Softmax Activation Function • Softmax is used at the last layer i.e. output to classify inputs into multiple classes. Range is 0 to 1. • The softmax basically gives value to the input variable according to their weight and the sum of these weights (proportional to each other) is one i.e. sum of all the probabilities will be equal to one. • For multi-class classification problem, use softmax and cross-entropy along with it. • Negative inputs will be converted into nonnegative values using exponential function.
  • 18. Parameter Learning Rate • Learning rate (step size) is hyper-parameter and range between 0.0 and 1.0 • The amount that the weights are updated during training. • The learning rate controls how quickly the model is adapted to the problem. • Smaller learning rates require more training epochs given the smaller changes made to the weights each update, whereas larger learning rates result in rapid changes and require fewer training epochs. Bias • The bias is a node that is always 'on'. i.e. its value is set to 1. • Bias is like the intercept in a linear equation. It is an additional parameter in the Neural Network which is used to adjust the output along with the weighted sum of the inputs to the neuron. Output = sum (W * X) + bias • Therefore Bias is a constant which helps the model in a way that it can fit best for the given data. • Weight decide how fast the activation function will trigger whereas bias is used to delay the triggering of the activation function.
  • 19. Bias, Variance, underfit, overfit ➢ Bias: Error of the training data ➢ Variance: error of the test data ➢ Overfitting: low bias and high variance ➢ Under-fitting: high bias and high variance ➢ Goal always should be low bias and low variance. ➢ Eaxmple: Regression problem(Lets take linear regression model to create a best fit line.
  • 20. L2 = weight decay • Regularization term that penalizes big weights, added to the objective function • Weight decay value determines how dominant regularization is during gradient computation • Big weight decay coefficient → big penalty for big weights Regularization Dropout • Randomly drop units (neurons along with their connections) during training • Each unit retained with fixed probability p, independent of other units • Hyper-parameter p to be chosen (tuned) 𝐽𝑟𝑒𝑔 𝜃 = 𝐽 𝜃 + 𝜆 ෍ 𝑘 𝜃𝑘 2 Early-stopping • Use validation error to decide when to stop training • Stop when monitored quantity has not improved after n subsequent epochs • n is called patience
  • 21. Tuning hyper-parameters • “Grid and random search of 9 trials for optimizing function g(x) ≈ g(x) + h(y) • With grid search, nine trials only test g(x) in three distinct places. • With random search, all nine (parameters) trials explore distinct values of g. ” • Both try configurations randomly and blindly • Next trial is independent to all the trials done before • Bayesian optimization for hyper-parameter tuning: • Make smarter choice for the next trial, minimize the number of trials • Collect the performance at several configurations • Make inference and decide what configuration to try next. g(x) ≈ g(x) + h(y) g(x) shown in green h(y) is shown in yellow
  • 22. • A neural network is structured like the human brain and consists of artificial neurons, also known as nodes. These nodes are stacked next to each other in three layers: ❖ input layer ❖ hidden layer(s) ❖ output layer • Data provides each node with information in the form of inputs. The node multiplies the inputs with random weights, calculates them, and adds a bias. Finally, nonlinear functions, also known as activation functions, are applied to determine which neuron to fire. Working of Neural Network (Recall……….)
  • 23. Deep Learning • Deep learning has an inbuilt automatic multi stage feature learning process that learns rich hierarchical representations (i.e. features). • Deep Neural Network – Neural Network that contains more than one hidden layer between input and output. • The series of hidden layers between input & output perform feature identification and processing in a series of stages like our brain. • Many algorithms are learning the weights for networks with more hidden layers in deep learning. Input Layer Hidden Layer1 Hidden Layer2 Hidden Layer3 Hidden Layer4 Output Layer
  • 24. Why is DL needed? Performance vs Sample Size Size of Data Performance Traditional ML algorithms o Manually designed features in ML are often over-specified, incomplete and take a long time to design and validate. o Learned Features are easy to adapt, fast to learn. o Deep learning provides a very flexible, universal, learnable framework for representing world, visual and linguistic information. o Can learn both unsupervised and supervised o Effective end-to-end joint system learning o Utilize large amounts of training data
  • 25. Multi-Layer Neural Network training Train this layer first
  • 26. Train this layer first then this layer Multi-Layer Neural Network training
  • 27. Train this layer first then this layer then this layer Multi-Layer Neural Network training
  • 28. Train this layer first then this layer then this layer then this layer Multi-Layer Neural Network training
  • 29. Train this layer first then this layer then this layer then this layer finally this layer Multi-Layer Neural Network training
  • 30. EACH of the (non-output) layers is trained to learn good features that describe what comes from the previous layer Multi-Layer Neural Network training
  • 31. Deep learning Process • A deep neural network learn automatically, without predefined knowledge explicitly coded by the programmers. • Each layer represents a deeper level of knowledge, i.e., the hierarchy of knowledge. • The learning occurs in two phases. • The first phase (Forward Propagation) consists of applying a nonlinear transformation of the input and create a statistical model as output. • The second phase (Backpropagation) aims at improving (optimize)the model with a mathematical method known as derivative. • The neural network repeats these two phases hundreds to thousands of time until it has reached a tolerable level (converging) of accuracy. The repeat of this two-phase is called an iteration.
  • 32. Deep learning Process • Deep learning has an inbuilt automatic multi stage feature learning process that learns rich hierarchical representations (i.e. features). Low Level Features Mid Level Features Output (e.g. outdoor, indoor) High Level Features Trainable Classifier • Image Pixel Edge Texture Motif Part Object • Text Character Word Word-group Clause Sentence Story E.g.
  • 33. Deep Learning Frameworks 1. TensorFlow 2. Keras 3. PyTorch 4. Theano 5. DL4J 6. Caffe 7. Chainer 8. Microsoft CNTK
  • 34. Deep Learning Applications • Social Network Filtering, • Fraud Detection, • Image and Speech Recognition, • Audio Recognition, • Computer Vision, • Medical Image Processing, • Bioinformatics, • Customer Relationship Management, • Driverless car • Mobile phone • Google Search Engine • TV, etc.
  • 35. Types of Deep Learning Algorithms 1. Convolutional Neural Networks (CNNs) 2. Recurrent Neural Networks (RNNs) 3. Long Short Term Memory Networks (LSTMs) 4. Generative Adversarial Networks (GANs) 5. Radial Basis Function Networks (RBFNs) 6. Multilayer Perceptrons (MLPs) 7. Self Organizing Maps (SOMs) 8. Deep Belief Networks (DBNs) 9. Restricted Boltzmann Machines( RBMs) 10. Autoencoders
  • 36. Convolutional Neural Networks (CNNs) • CNN's (ConvNets) consist multiple layers. • Yann LeCun developed the first CNN in 1988 called LeNet. It was used for recognizing characters like ZIP codes and digits. • CNN's are widely used for image processing such as identify satellite images, process medical images, forecast time series and object detection such as detect anomalies. • Output: Binary, Multinomial, Continuous, Count • Input: fixed size, can use padding to make all images same size. • Architecture: User’s Choice (depends on application) which requires experimentation. • Optimization: Backward propagation • hyper parameters for very deep model can be estimated properly only if you have billions of images. • Use an architecture and trained hyper parameters that already experimented by researchers (Imagenet or Microsoft/Google APIs etc)
  • 37. Convolutional Neural Networks (CNNs) • Build a Convolutional Neural Network • Initialize the weights and bias randomly. • Fix the input and output. • Forward Propagation: Receive input data, process the information, and generate output • Backward Propagation: Calculate error, gradient and update the parameters of the network Architecture: • Build a Convolutional Neural Network with hidden layers. • Each hidden layer may have separate activation function like Relu or sigmoid. • Final layer will have Softmax. • Error is calculated using cross-entropy.
  • 38. CNN's have multiple layers that process the input and extract features from data: 1. Convolution Layer • CNN has a convolution layer that has several filters to perform the convolution operation. 2. Rectified Linear Unit (ReLU) • CNN's have a ReLU layer to perform operations on elements. The output is a rectified feature map. 3. Pooling Layer • The rectified feature map next feeds into a pooling layer. Pooling is a down-sampling operation that reduces the dimensions of the feature map. • The pooling layer then converts the resulting two-dimensional arrays from the pooled feature map into a single, long, continuous, linear vector by flattening it. 4. Fully Connected Layer • A fully connected layer forms when the flattened matrix from the pooling layer is fed as an input, which classifies and identifies the images. Convolutional Neural Networks (CNNs) - Working Principle
  • 39. Parameters Computation 4*4 + 4 +1 x1 x2 x3 x4 Input Hidden Layers Y Output a1(1) a2(1) a3(1) a4(1) a1(2) Softmax
  • 40. Number of Parameters 480000*480000 + 480000 +1 = approximately 230 Billion !!! 400 X 400 X 3 x1 x2 x3 x480000 Input Hidden Layers Y Output a1(1) a2(1) a3(1) a480000(1) a1(2) Parameters Computation for Image
  • 41. Convolutional Layer – Feature Extraction • Filter 1 1 1 1 1 1 0.015686 0.015686 0.011765 0.015686 0.015686 0.015686 0.015686 0.964706 0.988235 0.964706 0.866667 0.031373 0.023529 0.007843 0.007843 0.741176 1 1 0.984314 0.023529 0.019608 0.015686 0.015686 0.015686 0.011765 0.101961 0.972549 1 1 0.996078 0.996078 0.996078 0.058824 0.015686 0.019608 0.513726 1 1 1 0.019608 0.015686 0.015686 0.015686 0.007843 0.011765 1 1 1 0.996078 0.031373 0.015686 0.019608 1 0.011765 0.015686 0.733333 1 1 0.996078 0.019608 0.019608 0.015686 0.015686 0.011765 0.984314 1 1 0.988235 0.027451 0.015686 0.007843 0.007843 1 0.352941 0.015686 0.823529 1 1 0.988235 0.019608 0.019608 0.015686 0.015686 0.019608 1 1 0.980392 0.015686 0.015686 0.015686 0.015686 0.996078 1 0.996078 0.015686 0.913726 1 1 0.996078 0.019608 0.019608 0.019608 0.019608 1 1 0.984314 0.015686 0.015686 0.015686 0.015686 0.952941 1 1 0.992157 0.019608 0.913726 1 1 0.988235 0.019608 0.019608 0.019608 0.039216 0.996078 1 0.015686 0.015686 0.015686 0.015686 0.996078 1 1 1 0.007843 0.019608 0.898039 1 1 0.988235 0.019608 0.015686 0.019608 0.968628 0.996078 0.980392 0.027451 0.015686 0.019608 0.980392 0.972549 1 1 1 0.019608 0.043137 0.905882 1 1 1 0.015686 0.035294 0.968628 1 1 0.023529 1 0.792157 0.996078 1 1 0.980392 0.992157 0.039216 0.023529 1 1 1 1 1 0.992157 0.992157 1 1 0.984314 0.015686 0.015686 0.858824 0.996078 1 0.992157 0.501961 0.019608 0.019608 0.023529 0.996078 0.992157 1 1 1 0.933333 0.003922 0.996078 1 0.988235 1 0.992157 1 1 1 0.988235 1 1 1 1 0.015686 0.74902 1 1 0.984314 0.019608 0.019608 0.031373 0.984314 0.023529 0.015686 0.015686 1 1 1 0 0.003922 0.027451 0.980392 1 0.019608 0.023529 1 1 1 0.019608 0.019608 0.564706 0.894118 0.019608 0.015686 0.015686 1 1 1 0.015686 0.015686 0.015686 0.05098 1 0.015686 0.015686 1 1 1 0.047059 0.019608 0.992157 0.007843 0.011765 0.011765 0.015686 1 1 1 0.015686 0.019608 0.996078 0.023529 0.996078 0.019608 0.015686 0.243137 1 1 0.976471 0.035294 1 0.003922 0.011765 0.011765 0.015686 1 1 1 0.988235 0.988235 1 0.003922 0.015686 0.019608 0.019608 0.027451 1 1 0.992157 0.223529 0.662745 0.011765 0.011765 0.011765 0.015686 1 1 1 0.015686 0.023529 0.996078 0.011765 0.011765 0.015686 0.015686 0.011765 1 1 1 1 0.035294 0.011765 0.011765 0.011765 0.015686 1 1 1 0.015686 0.015686 0.964706 0.003922 0.996078 0.007843 0.019608 0.011765 0.054902 1 1 0.988235 0.007843 0.011765 0.011765 0.015686 0.011765 1 1 1 0.015686 0.015686 0.015686 0.023529 1 0.007843 0.007843 0.015686 0.015686 0.960784 1 0.490196 0.015686 0.015686 0.015686 0.007843 0.027451 1 1 1 0.011765 0.011765 0.043137 1 1 0.023529 0.003922 0.007843 0.023529 0.980392 0.976471 0.039216 0.019608 0.007843 0.019608 0.015686 1 1 1 1 1 1 1 1 1 0 1 0 1 -4 1 0 1 0 Input Image Convoluted Image ▪ Inspired by the neurophysiological experiments conducted by Hubel and Wiesel 1962.
  • 42. • What is Convolution? Input Image Convolved Image (Feature Map) a b c d e f g h i j k l m n o p w1 w2 w3 w4 Filter h1 h2 ℎ1 = 𝑓 𝑎 ∗ 𝑤1 + 𝑏 ∗ 𝑤2 + 𝑒 ∗ 𝑤3 + 𝑓 ∗ 𝑤4 ℎ2 = 𝑓 𝑏 ∗ 𝑤1 + 𝑐 ∗ 𝑤2 + 𝑓 ∗ 𝑤3 + 𝑔 ∗ 𝑤4 Convolutional Layer Process http://deeplearning.stanford.edu/wiki/index.php/Feature_extraction_using_convolution Input matrix Convolutional 3x3 filter • Example:
  • 43. ▪ In Convolutional neural networks, hidden units are only connected to local receptive field. Input Image Layer 1 Feature Map Layer 2 Feature Map w1 w2 w3 w4 w5 w6 w7 w8 Filter 1 Filter 2 Convolutional Layer Process
  • 44. Pooling • Max pooling: Provides the maximum output within a rectangular neighborhood. • Average pooling: Provides the average output of a rectangular neighborhood. • Stride – Number of steps (columns when left to right /rows while top to bottom) movement of pooling process 1 3 5 3 4 2 3 1 3 1 1 3 0 1 0 4 MaxPool with 2 X 2 filter with stride of 2 Input Matrix Output Matrix 4 5 3 4
  • 45. Fully Connected Layer • So far, the convolution layer has extracted some valuable features from the data. These features are sent to the fully connected layer that generates the final results. • The fully connected layer in a CNN is just the traditional neural network. • The output from the convolution layer is a 2D matrix. Ideally, we would want each row to represent a single input image. • Actually, the fully connected layer can only work with 1D data. Hence, the values generated from the previous operation are first converted into a 1D format. • Once the data is converted into a 1D array, it is sent to the fully connected layer. All of these individual values are treated as separate features that represent the image. • The fully connected layer performs two operations on the incoming data – a linear transformation (by activation function) and a non-linear transformation (capture complex relationships).
  • 46. CNN – Backward Propagation • During the forward propagation process, we randomly initialized the weights, biases and filters. • These values are treated as parameters from the convolutional neural network algorithm. • In the backward propagation process, the model tries to update the parameters such that the overall predictions are more accurate. • For updating these parameters, use the gradient descent technique. Gradient Descent to minimize (optimize) the loss(error): • Consider that following in the curve for loss function where we have a parameter a: • During the random initialization of the parameter, we get the value of a as a2. • It is clear from the picture that the minimum value of loss is at a1 and not a2. • The gradient descent technique tries to find this value of parameter (a) at which the loss is minimum. • From the figure, We understand that we need to update the value a2 and bring it closer to a1. • To decide the direction of movement, i.e. whether to increase or decrease the value of the parameter, we calculate the gradient or slope at the current point. • Based on the value of the gradient, we can determine the updated parameter values. • When the slope is negative, the value of the parameter will be increased. • When the slope is positive, the value of the parameter should be decreased by a small amount. Generic equation for updating the parameter values: • new_parameter = old_parameter - (learning_rate * gradient_of_parameter) • The learning rate is a constant that controls the amount of change being made to the old value. The slope or the gradient determine the direction of the new values, that is, should the values be increased or decreased. So, we need to find the gradients, that is, change in error with respect to the parameters in order to
  • 47. CNN – Backward Propagation • CNN contains three parameters – weights, biases and filters. Let calculate the gradients for these parameters one by one. Backward Propagation at Fully Connected Layer • Fully connected layer has two parameters – weight matrix and bias matrix. • Let us start by calculating the change in error with respect to weights – ∂E/∂W. • Since the error is not directly dependent on the weight matrix, we will use the concept of chain rule to find this value. • The computation graph shown below will help us define ∂E/∂W: • ∂E/∂W = ∂E/∂O . ∂O/∂Z2. ∂z/∂W Find these derivatives separately: 1. Change in error with respect to output • Suppose the actual values for the data are denoted as y’ and • the predicted output is represented as O. • Then the error is: E = (y' - O)2/2 • If we differentiate the error with respect to the output: ∂E/∂O = -(y'-O) 2. Change in output with respect to Z2 (linear transformation output) • To find the derivative of output O with respect to Z2, we must first define O in terms of Z2. The output is sigmoid of Z2. Thus, ∂O/∂Z2 is effectively the derivative of Sigmoid function: f(x) = 1/(1+e^-x) • The derivative of this function: f'(x) = (1+e-x)-1[1-(1+e-x)-1] f'(x) = sigmoid(x)(1-sigmoid(x)) ∂O/∂Z2 = (O)(1-O)
  • 48. CNN – Backward Propagation 3. Change in Z2 with respect to W (Weights) • The value Z2 is the result of the linear transformation process. Here is the equation of Z2 in terms of weights: Z2 = WT.A1 + b • On differentiating Z2 with respect to W, we will get the value A1 itself: ∂Z2/∂W = A1 • Now that we have the individual derivations, we can use the chain rule to find the change in error with respect to weights: ∂E/∂W = ∂E/∂O . ∂O/∂Z2. ∂Z2/∂W ∂E/∂W = -(y'-o) . sigmoid'. A1 • The shape of ∂E/∂W will be the same as the weight matrix W. • • We can update the values in the weight matrix: W_new = W_old - lr*∂E/∂W 4. Change in Z2 with respect to b (bias) b_new = b_old - lr*∂E/∂b
  • 49. CNN – Backward Propagation Backward Propagation at Convolution Layer • For the convolution layer, we had the filter matrix as a parameter. During the forward propagation process, we randomly initialized the filter matrix. • Update these values using: new_parameter = old_parameter - (learning_rate * gradient_of_parameter) • To update the filter matrix, find the gradient of the parameter ∂E/∂f. • Figure shows the computation for backward propagation. • The derivative ∂E/∂f as: ∂E/∂f = ∂E/∂O.∂O/∂Z2.∂Z2/∂A1 .∂A1/∂Z1.∂Z1/∂f • We have already determined the values for ∂E/∂O and ∂O/∂Z2. • Let us find the values for the remaining derivatives. 1. Change in Z2 with respect to A1 • To find the value for ∂Z2/∂A1 , we need to have the equation for Z2 in terms of A1: Z2 = WT.A1 + b • On differentiating the above equation with respect to A1, we get WT as the result: ∂Z2/∂A1 = WT 2. Change in A1 with respect to Z1 • The next value that we need to determine is ∂A1/∂Z1. Have a look at the equation of A1. A1 = sigmoid(Z1) • This is simply the Sigmoid function. The derivative of Sigmoid would be: ∂A1/∂Z1 = (A1)(1-A1)
  • 50. CNN – Backward Propagation Backward Propagation at Convolution Layer 3. Change in Z1 with respect to filter f • Finally, we need the value for ∂Z1/∂f. Here’s the equation for Z1 Z1 = X * f • Differentiating Z with respect to X will simply give us X: ∂Z1/∂f = X • Now that we have all the required values, • let’s find the overall change in error with respect to the filter: ∂E/∂f = ∂E/∂O.∂O/∂Z2.∂Z2/∂A1 .∂A1/∂Z1 * ∂Z1/∂f • Notice that in the equation above, the value (∂E/∂O.∂O/∂Z2.∂Z2/∂A1 .∂A1/∂Z) is convolved with ∂Z1/∂f instead of using a simple dot product. • Why? The main reason is that during forward propagation, we perform a convolution operation for the images and filters. • This is repeated in the backward propagation process. • Once we have the value for ∂E/∂f, we will use this value to update the original filter value: f = f - lr*(∂E/∂f) • This completes the backpropagation for convolutional neural networks.
  • 51. Recall - Deep learning Process • Deep learning has an inbuilt automatic multi stage feature learning process that learns rich hierarchical representations (i.e. features). Low Level Features Mid Level Features Output (e.g. outdoor, indoor) High Level Features Trainable Classifier • Image Pixel Edge Texture Motif Part Object • Text Character Word Word-group Clause Sentence Story E.g.
  • 52. Top 7 Applications of Convolutional Neural Networks 1.Decoding Facial Recognition Facial recognition is broken down by a convolutional neural network into the following major components - • Identifying every face in the picture • Focusing on each face despite external factors, such as light, angle, pose, etc. • Identifying unique features • Comparing all the collected data with already existing data in the database to match a face with a name. similar process is followed for scene labeling as well.
  • 53. 2.Analyzing Documents Convolutional neural networks can also be used for document analysis. This is not just useful for handwriting analysis, but also has a major stake in recognizers. For a machine to be able to scan an individual's writing, and then compare that to the wide database it has, it must execute almost a million commands a minute. It is said with the use of CNNs and newer models and algorithms, the error rate has been brought down to a minimum of 0.4% at a character level, though it's complete testing is yet to be widely seen. 3.Historic and Environmental Collections CNNs are also used for more complex purposes such as natural history collections. These collections act as key players in documenting major parts of history such as biodiversity, evolution, habitat loss, biological invasion, and climate change. 4.Understanding Climate CNNs can be used to play a major role in the fight against climate change, especially in understanding the reasons why we see such drastic changes and how we could experiment in curbing the effect. It is said that the data in such natural history collections can also provide greater social and scientific insights, but this would require skilled human resources such as researchers who can physically visit these types of repositories. There is a need for more manpower to carry out deeper experiments in this field.
  • 54. 5.Grey Areas Introduction of the grey area into CNNs is posed to provide a much more realistic picture of the real world. Currently, CNNs largely function exactly like a machine, seeing a true and false value for every question. However, as humans, we understand that the real world plays out in a thousand shades of grey. Allowing the machine to understand and process fuzzier logic will help it understand the grey area us humans live in and strive to work against. This will help CNNs get a more holistic view of what human sees. 6.Advertising CNNs have already brought in a world of difference to advertising with the introduction of programmatic buying and data-driven personalized advertising. 7. Other Interesting Fields CNNs are poised to be the future with their introduction into driverless cars, robots that can mimic human behavior, aides to human genome mapping projects, predicting earthquakes and natural disasters, and maybe even self-diagnoses of medical problems. So, you wouldn't even have to drive down to a clinic or schedule an appointment with a doctor to ensure your sneezing attack or high fever is just the simple flu and not symptoms of some rare disease. One problem that researchers are working on with CNNs is brain cancer detection. The earlier detection of brain cancer can prove to be a big step in saving more lives affected by this illness.
  • 55. References • Richard Szeliski, Computer Vision: Algorithms and Applications, Springer 2010 • Ian Goodfellow, Yoshua Bengio, Aaron Courvill, Deep Learning