2. What do Neural Networks Learn?
Every input to a neural network is a feature.
The neural network outputs whether the data point corresponding to
the input features belongs to class A or B.
In simple terms, a neural network builds a mapping from a non linear
combination of inputs (features) to the outputs (class labels).
In the case of image recognition, the input is an image itself. Usually
every pixel is a separate input to the neural network.
The MLP learns a non linear combination of pixel values to predict a
class label.
Amit Praseed Classification November 5, 2019 2 / 31
3. Neural Networks and Image Recognition
Amit Praseed Classification November 5, 2019 3 / 31
4. Neural Networks and Image Recognition
Amit Praseed Classification November 5, 2019 4 / 31
5. Neural Networks and Image Recognition
Amit Praseed Classification November 5, 2019 5 / 31
6. Neural Networks and Image Recognition
MLPs perform poorly in recognizing images.
MLPs cannot learn spatial correlations between images.
Prone to overfitting due to an abnormally large number of inputs and
weights.
There is no concept of ”features” because each feature is a pixel value.
How do we specify features in an image?
Amit Praseed Classification November 5, 2019 6 / 31
7. Neural Networks and Image Recognition
Amit Praseed Classification November 5, 2019 7 / 31
8. Neural Networks and Image Recognition
Amit Praseed Classification November 5, 2019 8 / 31
9. Neural Networks and Image Recognition
Amit Praseed Classification November 5, 2019 9 / 31
10. Neural Networks and Image Recognition
Amit Praseed Classification November 5, 2019 10 / 31
11. Neural Networks and Image Recognition
It would be marvelous if the network could learn ”features” by itself...
which is what CNN does.
Amit Praseed Classification November 5, 2019 11 / 31
12. Basic Idea behind CNN
Amit Praseed Classification November 5, 2019 12 / 31
13. Filters and Convolutions
An image is basically a matrix of pixel values.
So it makes sense to define all operations on images in terms of oper-
ations of a matrix as well.
All operations such as smoothing, sharpening, blurring, edge detection
etc can be defined in terms of operations on a matrix.
For this, an operation is denoted by a smaller matrix called a filter.
The filter is moved over the image from left to right and top to bottom,
and the corresponding elements of the image and the filter are multi-
plied and added. The resultant value is the pixel value of the modified
image.
This operation is called as convolution.
Amit Praseed Classification November 5, 2019 13 / 31
16. Edge Detection using Convolution
Amit Praseed Classification November 5, 2019 16 / 31
17. Edge Detection using Convolution
Amit Praseed Classification November 5, 2019 17 / 31
18. Edge Detection using Convolution
Amit Praseed Classification November 5, 2019 18 / 31
19. Learning Low Level Features in CNN
The first step in a CNN is to learn the low level features such as edges
from the input image.
This is done using a Convolutional Layer.
For this, a filter is moved over the entire image as in the case of
convolution.
Are the filter weights static?
NO.
The filter weights are randomly initialized and the weights are updated
using backpropagation till the weights stabilize.
This means we have no idea which feature (horizontal edge, vertical
edge, slanting lines...) a particular filter learns to recognize.
A number of filters are used in a CNN, and each filter learns to recognize
a particular feature.
Each filter is said to output a Feature Map.
Amit Praseed Classification November 5, 2019 19 / 31
20. Peculiarities of the Convolutional Layer
Local Receptive Fields
Not fully connected as an MLP
Shared Weights
Learn to recognize a feature irrespective of its absolute location
Fewer parameters, hence less prone to overfitting
ReLu activation function
Amit Praseed Classification November 5, 2019 20 / 31
24. Pooling
The convolutional layers recognize the presence of features in the im-
age.
However, the output of these layers also contain positional information
i.e. where these features were found.
Usually positional information acts as a burden in classification. We
want relative positional information of features, not where the absolute
position of a feature is.
The Pooling Layer removes positional information from the output of
the Convolutional Layers.
Amit Praseed Classification November 5, 2019 24 / 31
26. That’s It!!!
A CNN is essentially comprised of multiple convolutional and pooling
layers one after the other.
Each successive layer recognizes more sophisticated features using low
level features detected by the previous layers.
Amit Praseed Classification November 5, 2019 26 / 31
28. A Note on the Output Layer
While all the other layers are only partially connected, the output layer
is fully connected.
The number of nodes in the output layer is usually equally to the
number of classes in the classification problem. For example, if you
want to classify cats, dogs, wolves and foxes, the output layer will have
four nodes.
The nodes in the output layer have a special activation function, called
Softmax Activation Function.
aL
j =
exL
j
k exL
k
Amit Praseed Classification November 5, 2019 28 / 31
29. Softmax Activation
Softmax Activation forms a probability distribution, and gives the prob-
ability that the given input belongs to class j.
Along with a new log likelihood cost function given by
C = −ln aL
j
the network can counter learning slowdown as well.
Amit Praseed Classification November 5, 2019 29 / 31
30. A Note on Overfitting
Even though CNN uses much fewer weights than MLP, it can still suffer
from overfitting.
Techniques to counter overfitting, such as regularization, validation,
acquiring new data etc. can still be used here.
Another technique usually used to reduce the effects of overfitting is
the use of Ensemble Classifiers.
Similar to Random Forests, we can use a number of neural networks
(CNN or MLP), train them separately and employ a majority voting to
decide the class during testing.
However, MLP or CNN need a lot more time to train and hence main-
taining multiple models is infeasible.
Rather, there is a technique that tries to use only one physical model,
but train multiple virtual models in it.
Amit Praseed Classification November 5, 2019 30 / 31
31. Dropout
The idea behind dropout is to
randomly disable or drop 50%
of the neurons during different
stages of training.
This is done so that the neu-
ral network as a whole becomes
more robust.
Virtually, we are training mul-
tiple neural networks for the
same input, which can help in
reducing overfitting.
Amit Praseed Classification November 5, 2019 31 / 31
32. How does CNN overcome the difficulties in training Deep
Networks?
Learning Slowdown → Softmax Activation function in the output layer
+ Log Likelihood Cost
Vanishing Gradient → ReLu Activation Function in convolutional layers
Overfitting → Shared Weights and Biases, Regularization, Dropout
Amit Praseed Classification November 5, 2019 32 / 31