Deep into to Deep Learning Starting from Basics

Neurons Multilayer NN’s Back-propagation Autoencoders CNN’s Popular CNN’s Conclusion
Deep into Deep Learning
Dr V N Krishnachandran
23 January 2024

Outline I
1 Neurons
Activation functions: Examples
Artificial neuron: Examples
2 Multi-layer neural networks
General problem
3 Back-propagation algorithm
Outline of the algorithm
Implementation in R
4 Autoencoders
Under-complete autoencoder
Sparse autoencoder
Denoising autoencoder

Outline II
Contractive autoencoder
5 Convolutional neural networks (CNN’s)
Convoution in mathematics
Convoution in neural networks
Pooling
Building blocks of CNN
6 Popular CNN’s
LeNet
AlexNet
GoogLeNet
FaceNet
7 Conclusion

Neurons

Biological neuron

Artificial neuron

Activation function: Threshold function
ϕ(x) =
(
0 if x < 0
1 if x ≥ 0
Threshold function (Unit step function)

Activation function: Sigmoid (or) Logistic Function
Sigmoid (or) logistic function

Activation function: ReLU (Rectified linear unit)
ReLU (Rectified linear unit)

Activation function: tanh (Hyperbolic tangent function)
tanh (Hyperbolic tangent function)

Artificial neuron: Perceptron
If in an artificial neuronm the activation function is the threshold
function, the neuron is called a percptron. The perceptron was
invented in 1943 by Warren McCulloch and Walter Pitts.

Artificial neuron: Logical OR
x1 x2 x1 OR x2
0 0 0
0 1 1
1 0 1
1 1 1

Artificial neuron: Logical OR
(See next slide)

x1 x2 Output expression Output value
z = w0 + w1x1 + w2x2 y = f (z)
0 0 z = −0.5 + 1 × 0 + 1 × 0 = −0.5 0
0 1 z = −0.5 + 1 × 0 + 1 × 1 = 0.5 1
1 0 z = −0.5 + 1 × 1 + 1 × 0 = 0.5 1
1 1 z = −0.5 + 1 × 1 + 1 × 1 = 1.5 1

Artificial neuron: Logical AND
x1 x2 x1 AND x2
0 0 0
0 1 0
1 0 0
1 1 1

Artificial neuron: Logical AND
(See next slide)

x1 x2 Output expression Output value
z = w0 + w1x1 + w2x2 y = f (z)
0 0 z = −1.5 + 1 × 0 + 1 × 0 = −1.5 0
0 1 z = −1.5 + 1 × 0 + 1 × 1 = −0.5 0
1 0 z = −1.5 + 1 × 1 + 1 × 0 = −0.5 0
1 1 z = −1.5 + 1 × 1 + 1 × 1 = 0.5 1

Artificial neuron: Logical NOT
x NOT x
0 1
1 0

Artificial neuron: Logical XOR
x1 x2 x1 XOR x2
0 0 0
0 1 1
1 0 1
1 1 0
This function cannot be represented by a neuron.
However, it can be represented by a “multi-layer neuron”.
See next slide.

Logical XOR: Multi-layer neural network representation

Multi-layer neural networks

Multi-layer neural networks
Multi-layer neural network with two hidden layers.
First hidden layer has 4 nodes, second hidden layer has 3 nodes.

General problem
General problem

General problem
General problem
Given the following data, find a neural network that outputs the
given output values for the given input values:
Input variables Output variables
x1 x2 · · · xn y1 y2 · · · ym
Values Values
x11 x21 · · · xn1 y11 y21 · · · ym1
x12 x22 · · · xn2 y12 y22 · · · ym2
x13 x23 · · · xn3 y13 y23 · · · ym3
· · · · · · · · · · · · · · · · · · · · · · · ·
x1N x2N · · · xnN y1N y2N · · · ymN

General problem
General problem
Global parameters
Number of hidden layers
Number of nodes in each of the hidden layers
Choice of the activation function
Loss function/Error estimate/Cost function
Let ŷij be the estimated value of the output variable yij . Then
Error estimate =
m
X
i=1
N
X
j=1
(ŷij − yij )2
.

General problem
General problem: Example
Find the weights wij -s such that the neural network outputs given
output values for given input values.

Back-propagation algorithm

Back-propagation algorithm
The backpropagation algorithm is an algorithm in which the error,
which is the difference between the current output of the neural
network and the desired output signal, is used to adjust the
weights in the output layer, and is then used to adjust the weights
in the hidden layers, always going back through the network
towards the inputs.

Outline of the algorithm: Gradient descent
Gradient descent
The backpropagation algorithm makes use of the the direction of
the gradient descent to adjust the weights in various layers.
A simplified model of the error surface
showing the direction of gradient

Outline of the algorithm - I
Initially the weights are assigned at random.
Then the algorithm iterates through many cycles of two
processes until a stopping criterion is reached. Each cycle is
known as an epoch. Each epoch includes:
(Continued in the next slide.)

Outline of the algorithm - II
1 Forward phase
A forward phase in which the neurons are activated in
sequence from the input layer to the output layer, applying
each neuron’s weights and activation function along the way.
Upon reaching the final layer, an output signal is produced.
2 Backward phase
A backward phase in which the network’s output signal
resulting from the forward phase is compared to the true
target value in the training data. The difference is an error
that is propagated backwards in the network to modify the
connection weights between neurons and reduce future errors.
(Continued in the next slide.)

Outline of the algorithm - III
The technique used to determine how much a weight should
be changed is known as gradient descent method. At every
stage of the computation, the error is a function of the
weights. If we plot the error against the wights, we get a
higher dimensional analog of something like a curve or
surface. At any point on this surface, the gradient suggests
how steeply the error will be reduced or increased for a change
in the weight. The algorithm will attempt to change the
weights that result in the greatest reduction in error.

Implementation in R
Implementation in R

Implementation in R
Implementation in R: Step 1
Download R package from:
https://cran.r-project.org/bin/windows/base/
Install R package.
Start RGui or RStudio. User interface of RGui is shown in
next slide.

Implementation in R
Press Ctrl + L to clear the console window.

Implementation in R
Install neuralnet package by typing the following command
(after the prompt “>”)
install.packages(‘neuralnet’)
and pressing the Enter key. Wait for the package to be
installed.
Load the neuralnet package by issung the following
command:
library(neuralnet)

Implementation in R
Implementing neural network in R: Example
Problem
Construct a neural network with a single hidden layer having 2
nodes to represent the following data:
x1 x2 y
0 0 0
0 1 1
1 0 1
1 1 0

Implementation in R
Implementing neural network in R: Example
Issue the following commands:
> x1 <- c(0, 0, 1, 1)
> x2 <- c(0, 1, 0, 1)
> y <- c(0, 1, 1, 0)
> data <- data.frame(x1, x2, y)
> net <- neuralnet(y ~ x1 + x2, data, hidden = 2)
> plot(net)
The output is shown in the next slide.

Implementation in R

Autoencoders

Autoencoder: Idea

Autoencoders
An autoencoder is a type of artificial neural network used to learn
efficient codings of unlabeled data (unsupervised learning). An
autoencoder learns two functions: an encoding function that
transforms the input data, and a decoding function that recreates
the input data from the encoded representation. The autoencoder
learns an efficient representation (encoding) for a set of data,
typically for dimensionality reduction.

Autoencoders
Autoencoders are applied to many problems, including facial
recognition, feature detection, anomaly detection and acquiring the
meaning of words. Autoencoders are also generative models which
can randomly generate new data that is similar to the input data
(training data).

Autoencoder: Architecture

Autoencoder vs PCA
Autoencoder vs Principal Component Analysis (PCA)
Both autoencoder and PCA can be used for dimensionality
reduction.
Autoencoder works for both linear and non-linear surfaces,
whereas PCA only works for linear surfaces.
PCA is faster than autoencoder.
Autoencoders are more prone to get the condition of
overfitting of data than PCA,

Autoencoders: Different types

Under-complete autoencoders

An under-complete autoencoder is one in which the number of
nodes in the hidden layer is much less is less than the number of
nodes in the input layer or the number of nodes in the output layer.

Sparse autoencoder
Sparse autoencoder
Sparse autoencoders

Sparse autoencoder
Sparse autoencoder
Sparse autoencoder
Sparse autoencoders are designed to be sensitive to specific types
of high-level features in the data, while being insensitive to most
other features. This is achieved by imposing a sparsity constraint
on the hidden units during training.

Sparse autoencoder
Sparse autoencoder
A sparse autoencoder with a single hidden layer. The hidden nodes
in bright yellow are activated, while the light yellow ones are
inactive. The activation depends on the input.

Sparse autoencoder
Sparse autoencoder

Denoising autoencoders

A denoising autoencoder is a modification on the autoencoder to
prevent the network learning the identity function. Specifically, if
the autoencoder is too big, then it can just learn the data, so the
output equals the input, and does not perform any useful
representation learning or dimensionality reduction. Denoising
autoencoders solve this problem by corrupting the input data on
purpose, adding noise or masking some of the input values.

Contractive autoencoders

Contractive encoder
The concept of contractive autoencoder is that, if the input data is
very similar then the encoded output of all those input data must
also be very similar. This is is achieved by imposing the condition
that the variation of the activations in the hidden layer with
respect to input data should be small.

Contractive encoder
Contractive autoencoder adds an extra term in the loss function of
autoencoder, it is given as:
∥Jh(X)∥2
F =
X
ij

∂hj (X)
∂Xi
2

Convolutional neural networks
(CNN’s)

Convolution in mathematics

Idea of convolution in mathematics
Idea of convolution in mathematics: 1D case
The convolution of the sequences {an}∞
−∞ and {bn}∞
−∞ is the
sequence {cn}∞
−∞ defined by
cn =
∞
X
k=−∞
an−kbk
.
The convolution of two functions f (x) and g(x) is the
function h(x) defined by
h(x) =
Z ∞
−∞
f (x − t)g(t) dt.

Idea of convolution in mathematics
Idea of convolution in mathematics: 2D case
The convolution of the sequences {am,n}∞
−∞ and {bm,n}∞
−∞ is
the sequence {cm,n}∞
−∞ defined by
cm,n =
∞
X
h=−∞
∞
X
k=−∞
am−h,n−kbh,k
.
The convolution of two functions f (x, y) and g(x, y) is the
function h(x, y) defined by
h(x, y) =
Z ∞
−∞
Z ∞
u=−∞
f (x − t, y − u)g(t, u) dt du

Convolution in neural networks

Idea of convolution in neural networks

Input image as a vector:
[a, b, c, d, e, f , g, h, i, j, k, l, m, n, o, p]
Kernel as a vector:
[w, x, y, z]
Output as a vector:
[aw + bz + eyfz, bw + cx + fy + gz, . . . , kw + lx + oy + pz]

Covolution example

Covolution with padding

Pooling
Pooling
Pooling in neural networks

Pooling
Pooling in neural networks
Max pooling

Pooling
Average pooling
Average pooling

Building blocks of CNN architecture
Convolution layer
Nonlinear activation function
Pooling layer
Fully connected layer
Last layer activation function
Loss function (or, error function)

CNN architecture
CNN architecture

LeNet
Popular CNN’s

LeNet
Popular CNN: LeNet

LeNet
Popular CNN’s: LeNet
LeNet
LeNet is a convolutional neural network introduced by Yann
LeCun et al in 1989.
It was not popular at the time of introduction due to a lack of
hardware.
Could read numbers correctly and successfully applied it in
identifying handwritten zip code numbers provided by the US
Postal Service.

LeNet
LeNet: First CNN success story

LeNet
The research on LeNet achieved great success and aroused the
interest of scholars in the study of neural networks. While the
architecture of the best performing neural networks today are not
the same as that of LeNet, the network was the starting point for a
large number of neural network architectures, and also brought
inspiration to the field.

LeNet
LeNet

LeNet
Poplar CNN’s: LeNet
LeNet

AlexNet
AlexNet: ILSVRC 2012 winner

AlexNet
Poplar CNN’s: AlextNet
AlexNet is a convolutional neural network designed by Alex
Krizhevsky and his collaborators.
AlexNet competed in the ILSVRC 2012 with and achieved a
top-5 error of 15.3%, more than 10.8 percentage points lower
than that of the runner up.

AlexNet
ImageNet
Sample images from ImageNet

AlexNet
ILSVRC
The goal of ILSVRC was to estimate the content of photographs
for the purpose of retrieval and automatic annotation using a
subset of the ImageNet dataset (containing more than 10,000,000
labelled images depicting 10,000+ object categories) as training.
Test images will be presented with no labels and algorithms would
have to produce labellings specifying what objects are present in
the images. The general goal was to identify the main objects
present in images.

AlexNet
AlextNet

AlexNet
Summary of AlexNet architecture

AlexNet
AlexNet
AlexNet was the first large-scale CNN.
The activation function used in all layers is ReLU. The
activation function used in the output layer is Softmax.
Introduced and implemented the concept of “local response
normalisation” to solve the “gradient explosion problem”.
AlexNet popularized CNN architecture.

AlexNet
AlexNet
The CNN architecture had 10 hidden layers.
The depths of the various layers in AlexNet sum to 11,176
compared with 258 for LeNet.
AlexNet contains around 650,000 neurons compared with
6,508 for LeNet, while the number of trainable parameters is
some 60 million compared with 60,000 for LeNet.
AlexNet takes a color image of size 224 Ö 224, whereas LeNet
could only manage a bi-level 32 Ö 32 input image. So overall,
AlexNet is larger than LeNet by a factor between 100 and
1000, depending on which factors should be regarded as the
most relevant.

AlexNet
AlexNet
For more details:
“ImageNet Classification with Deep Convolutional Neural
Networks” (2012)
https://proceedings.neurips.cc/paper/
4824-imagenet-classification-with-deep-
convolutional-neural-networks.pdf

GoogLeNet
GoogLeNet: ILSVRC 2014 winner

GoogLeNet
GoogLeNet
GoogLeNet is one of the most successful models of the earlier
years of convolutional neural networks. Szegedy et al. from
Google Inc. published the model in their paper named Going
Deeper with Convolutions and won ILSVRC-2014 with a large
margin.

GoogLeNet
GoogLeNet
Features of GoogLeNet
It is a 22-layers-deep network.
1Ö1 convolution
Global average pooling
Inception module: This combines the outputs of differently
sized filters.
Auxiliary classifier for training: A method for tackling the
vanishing gradient problem.

GoogLeNet
GoogLeNet
GoogLeNet: Architecture

GoogLeNet
GoogLeNet
Architecture of inception module

GoogLeNet
GoogLeNet
Details of architecture

GoogLeNet
GoogLeNet
For more details:
“Going Deeper with Convolutions” (Sep 2014)
https://arxiv.org/abs/1409.4842

FaceNet
FaceNet

FaceNet
FaceNet
FaceNet is a facial recognition system developed by Florian
Schroff, Dmitry Kalenichenko and James Philbina, a group of
researchers affiliated to Google. The system was first presented in
the IEEE Conference on Computer Vision and Pattern Recognition
held in 2015.
The models are initialized from random and trained on a CPU
cluster for 1,000 to 2,000 hours.

FaceNet
Facenet
FaceNet learns a mapping from a set of face images to the
128-dimensional Euclidean space.
The similarity between two face images is assessed based on
the square of the Euclidean distance between the
corresponding normalized vectors in the 128-dimensional
Euclidean space.
The system used the triplet loss function as the cost
function and introduced a new online triplet mining method.
The system achieved an accuracy of 99.63% which is the
highest score on Labelled Faces in the Wild dataset.
The models are initialized from random and trained on a CPU
cluster for 1,000 to 2,000 hours, that is, 40 to 80 days!

FaceNet
FaceNet
The Triplet Loss minimizes the distance between an anchor and a
positive, both of which have the same identity, and maximizes the
distance between the anchor and a negative of a different identity.

FaceNet
FaceNet
For more details:
“FaceNet: A Unified Embedding for Face Recognition and
Clustering” (March 2015)
https://arxiv.org/abs/1503.03832

Thank you.

Deep into to Deep Learning Starting from Basics

Recommended

Recommended

More Related Content

Similar to Deep into to Deep Learning Starting from Basics

Similar to Deep into to Deep Learning Starting from Basics (20)

More from PlusOrMinusZero

More from PlusOrMinusZero (20)

Recently uploaded

Recently uploaded (20)

Deep into to Deep Learning Starting from Basics