In this thesis, three bio-inspired algorithms viz. genetic algorithm, particle swarm optimizer (PSO) and grey wolf optimizer (GWO) are used to optimally determine the architecture of a convolutional neural network (CNN) that is used to classify handwritten numbers. The CNN is a class of deep feed-forward network, which have seen major success in the field of visual image analysis. During training, a good CNN architecture is capable of extracting complex features from the given training data; however, at present, there is no standard way to determine the architecture of a CNN. Domain knowledge and human expertise are required in order to design a CNN architecture. Typically architectures are created by experimenting and modifying a few existing networks.
The bio-inspired algorithms determine the exact architecture of a CNN by evolving the various hyperparameters of the architecture for a given application. The proposed method was tested on the MNIST dataset, which is a large database of handwritten digits that is commonly used in many machine-learning models. The experiment was carried out on an Amazon Web Services (AWS) GPU instance, which helped to speed up the experiment time. The performance of all three algorithms was comparatively studied. The results show that the bio-inspired algorithms are capable of generating successful CNN architectures. The proposed method performs the entire process of architecture generation without any human intervention.
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
Bio-inspired Algorithms for Evolving the Architecture of Convolutional Neural Networks
1. Masters Thesis Defense:
Bio-inspired Algorithms for
Evolving the Architecture of
Convolutional Neural Networks
By Ashray Bhandare Thesis Advisor:
Dr. Devinder Kaur
2. Page 2
Agenda
Introduction
Convolutional Neural Network
– How ConvNet Works
ConvNet Layers
– Convolutional Layer
– Pooling Layer
– Normalization Layer (ReLU)
– Fully-Connected Layer
Hyper Parameters
Genetic Algorithm (GA)
– Workings of GA
– Selection
– Crossover
– Mutation
EECS6960 Research and Thesis
EECS6960 Research and Thesis
Mapping GA chromosome
GA Tuner Evaluation & Results
Particle Swarm Optimmization (PSO)
– Workings of PSO
– PSO Simulation
Mapping PSO Paticle
PSO Tuner Evaluation & Results
Grey Wolf Optimization (GWO)
– Workings of GWO
Mapping GWO Candidate Solution
GWO Tuner Evaluation & Results
Conclusion
3. Page 3
Agenda
Introduction
Convolutional Neural Network
– How ConvNet Works
ConvNet Layers
– Convolutional Layer
– Pooling Layer
– Normalization Layer (ReLU)
– Fully-Connected Layer
Hyper Parameters
Genetic Algorithm (GA)
– Workings of GA
– Selection
– Crossover
– Mutation
EECS6960 Research and Thesis
EECS6960 Research and Thesis
Mapping GA chromosome
GA Tuner Evaluation & Results
Particle Swarm Optimmization (PSO)
– Workings of PSO
– PSO Simulation
Mapping PSO Paticle
PSO Tuner Evaluation & Results
Grey Wolf Optimization (GWO)
– Workings of GWO
Mapping GWO Candidate Solution
GWO Tuner Evaluation & Results
Conclusion
4. Page 4
Agenda
Introduction
Convolutional Neural Network
– How ConvNet Works
ConvNet Layers
– Convolutional Layer
– Pooling Layer
– Normalization Layer (ReLU)
– Fully-Connected Layer
Hyper Parameters
Genetic Algorithm (GA)
– Workings of GA
– Selection
– Crossover
– Mutation
EECS6960 Research and Thesis
EECS6960 Research and Thesis
Mapping GA chromosome
GA Tuner Evaluation & Results
Particle Swarm Optimmization (PSO)
– Workings of PSO
– PSO Simulation
Mapping PSO Paticle
PSO Tuner Evaluation & Results
Grey Wolf Optimization (GWO)
– Workings of GWO
Mapping GWO Candidate Solution
GWO Tuner Evaluation & Results
Conclusion
5. Page 5
Agenda
Introduction
Convolutional Neural Network
– How ConvNet Works
ConvNet Layers
– Convolutional Layer
– Pooling Layer
– Normalization Layer (ReLU)
– Fully-Connected Layer
Hyper Parameters
Genetic Algorithm (GA)
– Workings of GA
– Selection
– Crossover
– Mutation
EECS6960 Research and Thesis
EECS6960 Research and Thesis
Mapping GA chromosome
GA Tuner Evaluation & Results
Particle Swarm Optimmization (PSO)
– Workings of PSO
– PSO Simulation
Mapping PSO Paticle
PSO Tuner Evaluation & Results
Grey Wolf Optimization (GWO)
– Workings of GWO
Mapping GWO Candidate Solution
GWO Tuner Evaluation & Results
Conclusion
6. Page 6
Introduction
A programmer has to tell the computer what kinds of things it should be
looking for (Feature Extraction) when dealing with Traditional Machine
Learning algorithms.
Due to this, the success of the algorithm is dependent on the programmer
and his understanding of the data.
Deep networks can solve this problem as it is capable of finding the right
features on its own, requiring very little assistance from the programmer.
Convolutional Neural Network (CNN) is one such type of deep networks.
EECS6960 Research and Thesis
EECS6960 Research and Thesis
7. Page 7
Introduction contd.
Many researchers are exploring the use of CNN in machine learning
problems like image recognition, video analysis, natural language
processing and so on.
A CNN architecture consists of various layers and each layer consists of
many hyperparameters.
The vast amount of architectures that can be generated based on the
choices of hyperparameters makes it impossible for an exhaustive manual
search.
EECS6960 Research and Thesis
EECS6960 Research and Thesis
8. Page 8
Problem Statement
In this thesis, three bio-inspired algorithms viz. genetic algorithm, particle
swarm optimizer (PSO) and grey wolf optimizer (GWO) are used to
optimally determine the architecture of a convolutional neural network
(CNN) that is used to classify handwritten numbers.
Currently, there is no standard way to automatically determine the
architecture of a CNN. Domain knowledge and human expertise are
required in order to design a CNN architecture. Typically architectures are
created by experimenting and modifying a few existing networks.
The bio-inspired algorithms determine the exact architecture of a CNN by
evolving the various hyperparameters of the architecture for a given
application.
EECS6960 Research and Thesis
EECS6960 Research and Thesis
9. Page 9
MNIST Dataset
EECS6960 Research and Thesis
EECS6960 Research and Thesis
The MNIST dataset is scanned images of handwritten digits and the
associated labels describe which digit 0-9 is contained in each image.
This classification problem is one of the benchmark problems and is
widely used in deep learning research. It is one of the popular datasets
as it allows researchers to study their proposed methods in a
controlled environment.
10. Page 10
Agenda
Introduction
Convolutional Neural Network
– How ConvNet Works
ConvNet Layers
– Convolutional Layer
– Pooling Layer
– Normalization Layer (ReLU)
– Fully-Connected Layer
Hyper Parameters
Genetic Algorithm (GA)
– Workings of GA
– Selection
– Crossover
– Mutation
EECS6960 Research and Thesis
EECS6960 Research and Thesis
Mapping GA chromosome
GA Tuner Evaluation & Results
Particle Swarm Optimmization (PSO)
– Workings of PSO
– PSO Simulation
Mapping PSO Paticle
PSO Tuner Evaluation & Results
Grey Wolf Optimization (GWO)
– Workings of GWO
Mapping GWO Candidate Solution
GWO Tuner Evaluation & Results
Conclusion
11. Page 11
Convolutional Neural Network
A convolutional neural network (or ConvNet) is a type of feed-forward
artificial neural network
The architecture of a ConvNet is designed to take advantage of the 2D
structure of an input image.
A ConvNet is comprised of one or more convolutional layers (often with a
pooling step) and then followed by one or more fully connected layers as
in a standard multilayer neural network.
EECS6960 Research and Thesis
VS
EECS6960 Research and Thesis
12. Page 12
Motivation behind ConvNets
Consider an image of size 200x200x3 (200 wide, 200 high, 3 color
channels)
– a single fully-connected neuron in a first hidden layer of a regular Neural
Network would have 200*200*3 = 120,000 weights.
– Due to the presence of several such neurons, this full connectivity is wasteful
and the huge number of parameters would quickly lead to overfitting
However, in a ConvNet, the neurons in a layer will only be connected to a
small region of the layer before it, instead of all of the neurons in a fully-
connected manner.
– the final output layer would have dimensions 1x1xN, because by the end of the
ConvNet architecture we will reduce the full image into a single vector of class
scores (for N classes), arranged along the depth dimension
EECS6960 Research and Thesis
EECS6960 Research and Thesis
13. Page 13
MLP vs ConvNet
A regular 3-layer Neural
Network.
A ConvNet arranges its
neurons in three
dimensions (width, height,
depth), as visualized in
one of the layers.
EECS6960 Research and Thesis
EECS6960 Research and Thesis
14. Page 14
How ConvNet Works
For example, a ConvNet takes the input as an image which can be
classified as ‘X’ or ‘O’
In a simple case, ‘X’ would look like:
X or OCNN
A two-dimensional
array of pixels
EECS6960 Research and Thesis
15. Page 15
How ConvNet Works
What about trickier cases?
CNN
X
CNN
O
EECS6960 Research and Thesis
18. Page 18
-1 -1 -1 -1 -1 -1 -1 -1 -1
-1 X -1 -1 -1 -1 X X -1
-1 X X -1 -1 X X -1 -1
-1 -1 X 1 -1 1 -1 -1 -1
-1 -1 -1 -1 1 -1 -1 -1 -1
-1 -1 -1 1 -1 1 X -1 -1
-1 -1 X X -1 -1 X X -1
-1 X X -1 -1 -1 -1 X -1
-1 -1 -1 -1 -1 -1 -1 -1 -1
How ConvNet Works – What Computer Sees
Since the pattern does not match exactly, the computer will not be able to
classify this as ‘X’
EECS6960 Research and Thesis
19. Page 19
Agenda
Introduction
Convolutional Neural Network
– How ConvNet Works
ConvNet Layers
– Convolutional Layer
– Pooling Layer
– Normalization Layer (ReLU)
– Fully-Connected Layer
Hyper Parameters
Genetic Algorithm (GA)
– Workings of GA
– Selection
– Crossover
– Mutation
EECS6960 Research and Thesis
EECS6960 Research and Thesis
Mapping GA chromosome
GA Tuner Evaluation & Results
Particle Swarm Optimmization (PSO)
– Workings of PSO
– PSO Simulation
Mapping PSO Paticle
PSO Tuner Evaluation & Results
Grey Wolf Optimization (GWO)
– Workings of GWO
Mapping GWO Candidate Solution
GWO Tuner Evaluation & Results
Conclusion
20. Page 20
ConvNet Layers (At a Glance)
CONV layer will compute the output of neurons that are connected to local
regions in the input, each computing a dot product between their weights
and a small region they are connected to in the input volume.
RELU layer will apply an elementwise activation function, such as the
max(0,x) thresholding at zero. This leaves the size of the volume
unchanged.
POOL layer will perform a downsampling operation along the spatial
dimensions (width, height).
FC (i.e. fully-connected) layer will compute the class scores, resulting in
volume of size [1x1xN], where each of the N numbers correspond to a
class score, such as among the N categories.
EECS6960 Research and Thesis
EECS6960 Research and Thesis
21. Page 21
Since the pattern does not match exactly, the computer will not be able to
classify this as ‘X’
What got changed?
-1 -1 -1 -1 -1 -1 -1 -1 -1
-1 X -1 -1 -1 -1 X X -1
-1 X X -1 -1 X X -1 -1
-1 -1 X 1 -1 1 -1 -1 -1
-1 -1 -1 -1 1 -1 -1 -1 -1
-1 -1 -1 1 -1 1 X -1 -1
-1 -1 X X -1 -1 X X -1
-1 X X -1 -1 -1 -1 X -1
-1 -1 -1 -1 -1 -1 -1 -1 -1
Recall – What Computer Sees
EECS6960 Research and Thesis
22. Page 22
=
=
=
Convolution layer will work to identify patterns (features) instead of
individual pixels
EECS6960 Research and Thesis
Convolutional Layer
23. Page 23
1 -1 -1
-1 1 -1
-1 -1 1
-1 -1 1
-1 1 -1
1 -1 -1
1 -1 1
-1 1 -1
1 -1 1
Convolutional Layer - Filters
The CONV layer’s parameters consist of a set of learnable filters.
Every filter is small spatially (along width and height), but extends through
the full depth of the input volume.
During the forward pass, we slide (more precisely, convolve) each filter
across the width and height of the input volume and compute dot products
between the entries of the filter and the input at any position.
EECS6960 Research and Thesis
24. Page 24
1 -1 -1
-1 1 -1
-1 -1 1
-1 -1 1
-1 1 -1
1 -1 -1
1 -1 1
-1 1 -1
1 -1 1
Convolutional Layer - Filters
Sliding the filter over the width and height of the input gives 2-dimensional
activation map that responds to that filter at every spatial position.
EECS6960 Research and Thesis
49. Page 49
Convolutional Layer - Strides
• The distance that filter is moved across the input from the previous
layer each activation is referred to as the stride.
EECS6960 Research and Thesis
Stride: 1 Stride: 2
50. Page 50
Convolutional Layer - Padding
Sometimes it is convenient to pad the input volume with zeros around the
border.
Zero padding is allows us to preserve the spatial size of the output
volumes
EECS6960 Research and Thesis
EECS6960 Research and Thesis
Padding: 1 Padding: 2
59. Page 59
Pooling Layer
The pooling layers down-sample the previous layers feature map.
Its function is to progressively reduce the spatial size of the representation
to reduce the amount of parameters and computation in the network
The pooling layer often uses the Max operation to perform the
downsampling process
EECS6960 Research and ThesisEECS6960 Research and Thesis
69. Page 69
Fully connected layer
Fully connected layers are the
normal flat feed-forward neural
network layers.
These layers may have a non-
linear activation function or a
softmax activation in order to
predict classes.
To compute our output, we simply
re-arrange the output matrices as
a 1-D array.
1.00 0.55
0.55 1.00
0.55 1.00
1.00 0.55
1.00 0.55
0.55 0.55
1.00
0.55
0.55
1.00
1.00
0.55
0.55
0.55
0.55
1.00
1.00
0.55
EECS6960 Research and ThesisEECS6960 Research and Thesis
70. Page 70
Fully connected layer
A summation of product of inputs and weights at each output node
determines the final prediction
X
O
0.55
1.00
1.00
0.55
0.55
0.55
0.55
0.55
1.00
0.55
0.55
1.00
EECS6960 Research and ThesisEECS6960 Research and Thesis
72. Page 72
Hyperparameters
Convolution
– Filter Size
– Number of Filters
– Padding
– Stride
Pooling
– Window Size
– Stride
Fully Connected
– Number of neurons
EECS6960 Research and ThesisEECS6960 Research and Thesis
73. Page 73
Agenda
Introduction
Convolutional Neural Network
– How ConvNet Works
ConvNet Layers
– Convolutional Layer
– Pooling Layer
– Normalization Layer (ReLU)
– Fully-Connected Layer
Hyper Parameters
Genetic Algorithm (GA)
– Workings of GA
– Selection
– Crossover
– Mutation
EECS6960 Research and Thesis
EECS6960 Research and Thesis
Mapping GA chromosome
GA Tuner Evaluation & Results
Particle Swarm Optimmization (PSO)
– Workings of PSO
– PSO Simulation
Mapping PSO Paticle
PSO Tuner Evaluation & Results
Grey Wolf Optimization (GWO)
– Workings of GWO
Mapping GWO Candidate Solution
GWO Tuner Evaluation & Results
Conclusion
74. Page 74
Genetic Algorithm (GA)
Genetic Algorithm (or GA) is inspired by natural process of evolution.
It is based on two foundations
– Foundation I: Darwin’s Theory of Natural Selection
– Foundation II: Mendel’s Theory of Genetics
EECS6960 Research and ThesisEECS6960 Research and Thesis
75. Page 75
Genetic Algorithm (GA)
EECS6960 Research and ThesisEECS6960 Research and Thesis
76. Page 76
Selection
Selection operators give preference to better solutions (chromosomes),
allowing them to pass on their 'genes' to the next generation of the
algorithm.
The best solutions are determined using some form of objective function
(also known as a 'fitness function' in genetic algorithm), before being
passed to the crossover operator.
EECS6960 Research and ThesisEECS6960 Research and Thesis
77. Page 77
Tournament Selection
In tournament selection, K individuals from the population are selected at
random and select the best out of these to become a parent. K is known
as the tournament selection size.
In the above example, K=3
EECS6960 Research and ThesisEECS6960 Research and Thesis
78. Page 78
Crossover
Crossover is the process of taking more than one parent solutions
(chromosomes) and producing a child solution from them.
By recombining portions of good solutions, the genetic algorithm is more
likely to create a better solution.
EECS6960 Research and ThesisEECS6960 Research and Thesis
Chromosome X
Chromosome Y
Pivot Point
Offspring A
Offspring B
A single point crossover calls
for a single pivot point
(crossover point) to be selected
on the parent chromosomes.
All data beyond this pivot point
is swapped in both parent
chromosomes. This results in
the formation of two offspring
chromosomes.
79. Page 79
Mutation
The purpose of the mutation operator is to encourage genetic diversity
amongst the chromosomes.
If the chromosomes are similar to each other, the genetic algorithm
converges to a local minimum. The mutation operator prevents this from
happening.
EECS6960 Research and ThesisEECS6960 Research and Thesis
The Mutation operator
flips a randomly
selected gene in a
chromosome.
80. Page 80
Agenda
Introduction
Convolutional Neural Network
– How ConvNet Works
ConvNet Layers
– Convolutional Layer
– Pooling Layer
– Normalization Layer (ReLU)
– Fully-Connected Layer
Hyper Parameters
Genetic Algorithm (GA)
– Workings of GA
– Selection
– Crossover
– Mutation
EECS6960 Research and Thesis
EECS6960 Research and Thesis
Mapping GA chromosome
GA Tuner Evaluation & Results
Particle Swarm Optimmization (PSO)
– Workings of PSO
– PSO Simulation
Mapping PSO Paticle
PSO Tuner Evaluation & Results
Grey Wolf Optimization (GWO)
– Workings of GWO
Mapping GWO Candidate Solution
GWO Tuner Evaluation & Results
Conclusion
81. Page 81
Hyperparameters in CNN
Convolution
– Filter Size
– Number of Filters
– Padding
– Stride
Pooling
– Window Size
– Stride
Fully Connected
– Number of neurons
EECS6960 Research and ThesisEECS6960 Research and Thesis
82. Page 82
Hyper parameter Range
No. of Epoch (0 - 127)
Batch Size (0 - 256)
No. of Convolution Layers (0 - 8)
No. of Filters at each Convo layer (0 - 64)
Convo Filter Size at each Convo layer (0 - 8)
Activations used at each Convo layer (sigmoid, tanh, relu, linear)
Maxpool layer after each Convo layer (true, false)
Maxpool Pool Size for each Maxpool layer (0 - 8)
No. of Feed-Forward Hidden Layers (0 - 8)
No. of Feed-Forward Hidden Neurons at each layer (0 - 64)
Activations used at each Feed-Forward layer (sigmoid, tanh, softmax, relu)
Optimizer (Adagrad, Adadelta, RMS, SGD)
EECS6960 Research and ThesisEECS6960 Research and Thesis
Hyperparameters in CNN
105. Page 105
Mapping of GA Chromosome to CNN Hyperparameters
1 1 0 0 1 0 0 No. of Epochs: 100
0 1 0 0 0 0 0 0 Batch Size: 64
0 1 0 No. of Convolutions: 2
0 0 1 0 1 0 No. of Filters at 1st Convolution : 10
1 0 1 Filter Size at 1st Convolution : 5
0 1 Activations used at 1st Convolution : Tanh
1 Maxpool layer after 1st Convolution layer : True
1 0 1 Maxpool Pool Size for 1st Maxpool : 5
0 0 1 1 1 1 No. of Filters at 2nd Convolution : 15
0 1 1 Filter Size at 2nd Convolution layer : 3
0 0 Activations used at 2nd Convolution: Sigmoid
1 Maxpool layer after 2nd Convolution layer : True
1 0 1 Maxpool Pool Size for 2nd Maxpool : 5
0 1 1 No. of Feed-Forward Hidden Layers : 3
1 0 0 0 0 0 No. of Feed-Forward Hidden Neurons at 1st layer: 32
0 0 Activations used at 1st Feed-Forward layer : Sigmoid
1 1 0 0 1 0 No. of Feed-Forward Hidden Neurons at 2nd layer: 50
1 1 Activations used at 2nd Feed-Forward layer : Linear
0 0 1 0 1 0 No. of Feed-Forward Hidden Neurons at 3rd layer: 10
1 0 Activations used at 3rd Feed-Forward Layer: Softmax
0 0 Optimizer: Adagrad
EECS6960 Research and ThesisEECS6960 Research and Thesis
106. Page 106
Fitness Function
The fitness function used in this study is the classification accuracy which
determines the number of correctly classified patterns.
This classification accuracy ( ranges from 0 and 1) is the fitness value of a
particular CNN architecture.
For the evaluation of the CNN, Keras – which is a high-level neural
networks API, written in Python, is used to train the convolutional neural
networks. It is a deep learning library which allows easy and fast
prototyping. It supports all the layers of a CNN and can train the network
using various optimization algorithms.
Keras generates a classification accuracy when a CNN architecture is fully
trained.
EECS6960 Research and ThesisEECS6960 Research and Thesis
107. Page 107
Agenda
Introduction
Convolutional Neural Network
– How ConvNet Works
ConvNet Layers
– Convolutional Layer
– Pooling Layer
– Normalization Layer (ReLU)
– Fully-Connected Layer
Hyper Parameters
Genetic Algorithm (GA)
– Workings of GA
– Selection
– Crossover
– Mutation
EECS6960 Research and Thesis
EECS6960 Research and Thesis
Mapping GA chromosome
GA Tuner Evaluation & Results
Particle Swarm Optimmization (PSO)
– Workings of PSO
– PSO Simulation
Mapping PSO Paticle
PSO Tuner Evaluation & Results
Grey Wolf Optimization (GWO)
– Workings of GWO
Mapping GWO Candidate Solution
GWO Tuner Evaluation & Results
Conclusion
108. Page 108
Evaluation
The Genetic algorithm tuner was implemented with the MNIST dataset
with 50,000 images as its training set and another 10,000 images as its
testing set.
Genetic algorithm with 10 chromosomes generated randomly was
executed 10 times, each time with randomly chosen chromosomes
EECS6960 Research and ThesisEECS6960 Research and Thesis
109. Page 109
Results – GA Tuning
Experiment No. Highest Fitness Value
1 0.987799989104
2 0.978100001216
3 0.947200008678
4 0.954100004768
5 0.961800005841
6 0.985799998164
7 0.991900001359
8 0.98910000065
9 0.986600002062
10 0.990600002396
EECS6960 Research and Thesis
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
1.1
1.2
1 2 3 4 5 6 7 8 9 10
Score
Generation
GA Tuner: Classification Accuracy vs
Generation
EECS6960 Research and Thesis
Convergence process of GA tuning
110. Page 110
Generated Output after GA Tuning
EECS6960 Research and ThesisEECS6960 Research and Thesis
111. Page 111
Final CNN Architecture after GA Tuning
EECS6960 Research and ThesisEECS6960 Research and Thesis
112. Page 112
Agenda
Introduction
Convolutional Neural Network
– How ConvNet Works
ConvNet Layers
– Convolutional Layer
– Pooling Layer
– Normalization Layer (ReLU)
– Fully-Connected Layer
Hyper Parameters
Genetic Algorithm (GA)
– Workings of GA
– Selection
– Crossover
– Mutation
EECS6960 Research and Thesis
EECS6960 Research and Thesis
Mapping GA chromosome
GA Tuner Evaluation & Results
Particle Swarm Optimmization (PSO)
– Workings of PSO
– PSO Simulation
Mapping PSO Paticle
PSO Tuner Evaluation & Results
Grey Wolf Optimization (GWO)
– Workings of GWO
Mapping GWO Candidate Solution
GWO Tuner Evaluation & Results
Conclusion
113. Page 113
Particle Swarm Optimization Algorithm (PSO)
Inspired from the nature social behavior and dynamic movements with
communications of insects, birds and fish.
Uses a number of agents (particles) that constitute a swarm moving
around in the search space looking for the best solution.
Each particle adjusts its travelling speed dynamically corresponding to the
flying experiences of itself and its colleagues.
EECS6960 Research and ThesisEECS6960 Research and Thesis
114. Page 114
Particle Swarm Optimization Algorithm (PSO)
EECS6960 Research and ThesisEECS6960 Research and Thesis
115. Page 115
Position Update Rule
The position of a particle i is given by xi, which is an L-dimensional vector in ℜL.
The change of position of a particle is denoted by Δxi, which is a vector that is
added to the position coordinates in order to move the particle from one iteration t
to the other t + 1
The vector Δxi is commonly referred to as the velocity vi of the particle.
EECS6960 Research and ThesisEECS6960 Research and Thesis
xi t + 1 = xi(t) + Δxi t + 1
116. Page 116
Velovity Update Rule
The particle swarm algorithm samples the search-space by modifying the velocity
of each particle.
Velocity term Δxi(t + 1) at iteration t + 1 is influenced by the current velocity
Δxi(t), the location of the particle’s best success so far Pi and the best position
found by any member of the swarm Pg
Here ϕ1 and ϕ2 represent positive random vectors composed of numbers
drawn from uniform distributions.
EECS6960 Research and ThesisEECS6960 Research and Thesis
Δxi t + 1
117. Page 117
PSO – Simulation
EECS6960 Research and ThesisEECS6960 Research and Thesis
118. Page 118
Agenda
Introduction
Convolutional Neural Network
– How ConvNet Works
ConvNet Layers
– Convolutional Layer
– Pooling Layer
– Normalization Layer (ReLU)
– Fully-Connected Layer
Hyper Parameters
Genetic Algorithm (GA)
– Workings of GA
– Selection
– Crossover
– Mutation
EECS6960 Research and Thesis
EECS6960 Research and Thesis
Mapping GA chromosome
GA Tuner Evaluation & Results
Particle Swarm Optimmization (PSO)
– Workings of PSO
– PSO Simulation
Mapping PSO Paticle
PSO Tuner Evaluation & Results
Grey Wolf Optimization (GWO)
– Workings of GWO
Mapping GWO Candidate Solution
GWO Tuner Evaluation & Results
Conclusion
120. Page 120
Mapping of PSO Particle to CNN Hyperparameters
1 1 0 0 1 0 0 No. of Epochs: 100
0 1 0 0 0 0 0 0 Batch Size: 64
0 1 0 No. of Convolutions: 2
0 0 1 0 1 0 No. of Filters at 1st Convolution : 10
1 0 1 Filter Size at 1st Convolution : 5
0 1 Activations used at 1st Convolution : Tanh
1 Maxpool layer after 1st Convolution layer : True
1 0 1 Maxpool Pool Size for 1st Maxpool : 5
0 0 1 1 1 1 No. of Filters at 2nd Convolution : 15
0 1 1 Filter Size at 2nd Convolution layer : 3
0 0 Activations used at 2nd Convolution: Sigmoid
1 Maxpool layer after 2nd Convolution layer : True
1 0 1 Maxpool Pool Size for 2nd Maxpool : 5
0 1 1 No. of Feed-Forward Hidden Layers : 3
1 0 0 0 0 0 No. of Feed-Forward Hidden Neurons at 1st layer: 32
0 0 Activations used at 1st Feed-Forward layer : Sigmoid
1 1 0 0 1 0 No. of Feed-Forward Hidden Neurons at 2nd layer: 50
1 1 Activations used at 2nd Feed-Forward layer : Linear
0 0 1 0 1 0 No. of Feed-Forward Hidden Neurons at 3rd layer: 10
1 0 Activations used at 3rd Feed-Forward Layer: Softmax
0 0 Optimizer: Adagrad
EECS6960 Research and ThesisEECS6960 Research and Thesis
121. Page 121
Agenda
Introduction
Convolutional Neural Network
– How ConvNet Works
ConvNet Layers
– Convolutional Layer
– Pooling Layer
– Normalization Layer (ReLU)
– Fully-Connected Layer
Hyper Parameters
Genetic Algorithm (GA)
– Workings of GA
– Selection
– Crossover
– Mutation
EECS6960 Research and Thesis
EECS6960 Research and Thesis
Mapping GA chromosome
GA Tuner Evaluation & Results
Particle Swarm Optimmization (PSO)
– Workings of PSO
– PSO Simulation
Mapping PSO Paticle
PSO Tuner Evaluation & Results
Grey Wolf Optimization (GWO)
– Workings of GWO
Mapping GWO Candidate Solution
GWO Tuner Evaluation & Results
Conclusion
122. Page 122
Evaluation
The PSO tuner was implemented with the MNIST dataset with 50,000
images as its training set and another 10,000 images as its testing set.
Particle swarm optimizer with 10 particles generated randomly was
executed 10 times, each time with a randomly chosen particle.
EECS6960 Research and ThesisEECS6960 Research and Thesis
123. Page 123
Results – PSO Tuning
Exp No. Highest Fitness Value
1 0.984499992943
2 0.973899998105
3 0.988800008184
4 0.993600005358
5 0.947799991965
6 0.949000005102
7 0.983099997652
8 0.979799999475
9 0.956399999567
10 0.992350000068
EECS6960 Research and Thesis
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
1.1
1.2
1 2 3 4 5 6 7 8 9 10
Score
Generation
PSO Tuner: Classification Accuracy vs Generation
EECS6960 Research and Thesis
Convergence process of PSO tuning
124. Page 124
Generated Output after PSO Tuning
EECS6960 Research and ThesisEECS6960 Research and Thesis
125. Page 125
Final Architecture after PSO Tuning
EECS6960 Research and ThesisEECS6960 Research and Thesis
126. Page 126
Agenda
Introduction
Convolutional Neural Network
– How ConvNet Works
ConvNet Layers
– Convolutional Layer
– Pooling Layer
– Normalization Layer (ReLU)
– Fully-Connected Layer
Hyper Parameters
Genetic Algorithm (GA)
– Workings of GA
– Selection
– Crossover
– Mutation
EECS6960 Research and Thesis
EECS6960 Research and Thesis
Mapping GA chromosome
GA Tuner Evaluation & Results
Particle Swarm Optimmization (PSO)
– Workings of PSO
– PSO Simulation
Mapping PSO Paticle
PSO Tuner Evaluation & Results
Grey Wolf Optimization (GWO)
– Workings of GWO
Mapping GWO Candidate Solution
GWO Tuner Evaluation & Results
Conclusion
127. Page 127
Grey Wolf Optimization Algorithm (GWO)
EECS6960 Research and ThesisEECS6960 Research and Thesis
The GWO algorithm mimics the leadership hierarchy and hunting
mechanism of gray wolves in nature proposed by Mirjalili et al. in 2014.
Four types of grey wolves such as alpha, beta, delta, and omega are
employed for simulating the leadership hierarchy
α
(Alpha)
β
(Beta)
δ
(Delta)
ω
(Omega)
128. Page 128 EECS6960 Research and ThesisEECS6960 Research and Thesis
In addition to the social hierarchy of wolves, group hunting is another
interesting social behavior of grey wolves. The main phases of grey wolf
hunting are as follows:
• Tracking, chasing, and approaching the prey
• Pursuing, encircling, and harassing the prey until it stops moving
• Attack the prey
Hunting behavior of grey
wolves: (A) chasing,
approaching, and tracking
prey (B–D) pursuing,
harassing, and encircling
(E) stationary situation
and attack
Grey Wolf Optimization Algorithm (GWO)
129. Page 129 EECS6960 Research and ThesisEECS6960 Research and Thesis
Grey Wolf Optimizer – Encircling the prey
Encircling is mathematically modelled as follows
Where t indicates the current iteration, 𝐴 and 𝐶 are coefficient vectors, 𝑋 𝑝
is the position vector of the prey, and 𝑋 indicates the position vector of a
grey wolf. 𝐴 and 𝐶 are given by Equations
Where components of 𝑎 are linearly decreased from 2 to 0 over the course
of iterations and r1, r2 are random vectors in the interval [0, 1].
𝐷 = 𝐶. 𝑋 𝑝 𝑡 − 𝑋 𝑡
𝑋(𝑡 + 1) = 𝑥 𝑝 − 𝐴. 𝐷
𝐴 = 2. 𝑎. 𝑟1 − 𝑎
𝐶 = 2. 𝑟2
130. Page 130 EECS6960 Research and ThesisEECS6960 Research and Thesis
Grey Wolf Optimizer – Attacking the prey
Grey wolves have the ability to recognize the location of prey and encircle
them. The hunt is usually guided by the alpha. The beta and delta might
also participate in hunting occasionally.
A new beta and delta emerge in each iteration as all the other wolves
update their positions.
We assume that the alpha (best candidate solution) beta, and delta have
better knowledge about the potential location of prey.
The first three best solutions obtained so far are saved (α, β and δ ) and
the positions of the other search agents (the omegas) are updated
according to the position of the best search agent.
131. Page 131 EECS6960 Research and ThesisEECS6960 Research and Thesis
Grey Wolf Optimizer – Attacking the prey
Attacking is mathematically modelled with the following equations
𝐷 𝛼 = |𝐶1. 𝑋 𝛼 − 𝑋|
𝐷 𝛽 = |𝐶2. 𝑋 𝛽 − 𝑋|
𝐷 𝛾 = |𝐶3. 𝑋 𝛿 − 𝑋|
𝑋1 = 𝑋 𝛼 − 𝐴1. (𝐷 𝛼)
𝑋2 = 𝑋 𝛽 − 𝐴2. (𝐷 𝛽)
𝑋3 = 𝑋 𝛿 − 𝐴3. (𝐷 𝛿)
𝑋 𝑡 + 1 =
𝑋1 + 𝑋2 + 𝑋3
3
132. Page 132 EECS6960 Research and ThesisEECS6960 Research and Thesis
Grey Wolf Optimization Algorithm (GWO)
133. Page 133
Agenda
Introduction
Convolutional Neural Network
– How ConvNet Works
ConvNet Layers
– Convolutional Layer
– Pooling Layer
– Normalization Layer (ReLU)
– Fully-Connected Layer
Hyper Parameters
Genetic Algorithm (GA)
– Workings of GA
– Selection
– Crossover
– Mutation
EECS6960 Research and Thesis
EECS6960 Research and Thesis
Mapping GA chromosome
GA Tuner Evaluation & Results
Particle Swarm Optimmization (PSO)
– Workings of PSO
– PSO Simulation
Mapping PSO Paticle
PSO Tuner Evaluation & Results
Grey Wolf Optimization (GWO)
– Workings of GWO
Mapping GWO Candidate Solution
GWO Tuner Evaluation & Results
Conclusion
135. Page 135
Mapping of GWO Solution to CNN Hyperparameters
1 1 0 0 1 0 0 No. of Epochs: 100
0 1 0 0 0 0 0 0 Batch Size: 64
0 1 0 No. of Convolutions: 2
0 0 1 0 1 0 No. of Filters at 1st Convolution : 10
1 0 1 Filter Size at 1st Convolution : 5
0 1 Activations used at 1st Convolution : Tanh
1 Maxpool layer after 1st Convolution layer : True
1 0 1 Maxpool Pool Size for 1st Maxpool : 5
0 0 1 1 1 1 No. of Filters at 2nd Convolution : 15
0 1 1 Filter Size at 2nd Convolution layer : 3
0 0 Activations used at 2nd Convolution: Sigmoid
1 Maxpool layer after 2nd Convolution layer : True
1 0 1 Maxpool Pool Size for 2nd Maxpool : 5
0 1 1 No. of Feed-Forward Hidden Layers : 3
1 0 0 0 0 0 No. of Feed-Forward Hidden Neurons at 1st layer: 32
0 0 Activations used at 1st Feed-Forward layer : Sigmoid
1 1 0 0 1 0 No. of Feed-Forward Hidden Neurons at 2nd layer: 50
1 1 Activations used at 2nd Feed-Forward layer : Linear
0 0 1 0 1 0 No. of Feed-Forward Hidden Neurons at 3rd layer: 10
1 0 Activations used at 3rd Feed-Forward Layer: Softmax
0 0 Optimizer: Adagrad
EECS6960 Research and ThesisEECS6960 Research and Thesis
136. Page 136
Agenda
Introduction
Convolutional Neural Network
– How ConvNet Works
ConvNet Layers
– Convolutional Layer
– Pooling Layer
– Normalization Layer (ReLU)
– Fully-Connected Layer
Hyper Parameters
Genetic Algorithm (GA)
– Workings of GA
– Selection
– Crossover
– Mutation
EECS6960 Research and Thesis
EECS6960 Research and Thesis
Mapping GA chromosome
GA Tuner Evaluation & Results
Particle Swarm Optimmization (PSO)
– Workings of PSO
– PSO Simulation
Mapping PSO Paticle
PSO Tuner Evaluation & Results
Grey Wolf Optimization (GWO)
– Workings of GWO
Mapping GWO Candidate Solution
GWO Tuner Evaluation & Results
Conclusion
137. Page 137
Evaluation
The GWO algorithm tuner was implemented with the MNIST dataset with
50,000 images as its training set and another 10,000 images as its testing
set.
Grey wolf optimization algorithm with 10 solutions generated randomly
was executed 10 times, each time with a randomly chosen solution.
EECS6960 Research and ThesisEECS6960 Research and Thesis
138. Page 138
Results – GWO Tuning
Experiment No. Highest Fitness Value
1 0.946400008178
2 0.948899995995
3 0.994200000004
4 0.97359999752
5 0.961999999666
6 0.877199997282
7 0.985900000003
8 0.899900003791
9 0.959000001717
10 0.932900003999
EECS6960 Research and Thesis
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
1.1
1.2
1 2 3 4 5 6 7 8 9 10
Score
Generation
GWO Tuner: Classification Accuracy vs Generation
EECS6960 Research and Thesis
Convergence process of GWO tuning
139. Page 139
Generated Output after GWO Tuning
EECS6960 Research and ThesisEECS6960 Research and Thesis
140. Page 140
Final CNN Architecture after GWO Tuning
EECS6960 Research and ThesisEECS6960 Research and Thesis
141. Page 141
Agenda
Introduction
Convolutional Neural Network
– How ConvNet Works
ConvNet Layers
– Convolutional Layer
– Pooling Layer
– Normalization Layer (ReLU)
– Fully-Connected Layer
Hyper Parameters
Genetic Algorithm (GA)
– Workings of GA
– Selection
– Crossover
– Mutation
EECS6960 Research and Thesis
EECS6960 Research and Thesis
Mapping GA chromosome
GA Tuner Evaluation & Results
Particle Swarm Optimmization (PSO)
– Workings of PSO
– PSO Simulation
Mapping PSO Paticle
PSO Tuner Evaluation & Results
Grey Wolf Optimization (GWO)
– Workings of GWO
Mapping GWO Candidate Solution
GWO Tuner Evaluation & Results
Conclusion
142. Page 142
Conclusion
In this thesis, three bio-inspired algorithms, viz. GA, PSO, and GWO were
used to generate fully trained CNN architectures for the MNIST dataset.
It has been demonstrated that the proposed method is capable of
choosing relevant hyperparameters thus forming an optimum CNN
architecture. The architectures were generated automatically and without
any human intervention.
All experiments carried out using the GA and PSO algorithm yielded
classification accuracies of more than 90% with the highest accuracy
being 99.2% and 99.36% respectively. The GWO experiments yielded
classification accuracies of more than 85%, with the highest accuracy
being 99.4%.
EECS6960 Research and ThesisEECS6960 Research and Thesis
143. Page 143
Conclusion contd.
In the future, this work can be extended to other bio-inspired algorithms.
Also, this work can be implemented on other datasets. These datasets
may consist of colored images and may be greater in size, provided there
is access to better processing power.
EECS6960 Research and ThesisEECS6960 Research and Thesis
Algorithm Approx. Processing
Time
(in Hours)
Results
(Classification Accuracy)
Best Run Worst Run
Genetic Algorithm 4-5 0.9919 0.9472
Particle Swarm
Optimization Algorithm
4-5 0.9936 0.9478
Grey Wolf Optimization
Algorithm
5-6 0.9942 0.8772
144. Page 144
References
Karpathy, A. (n.d.). CS231n Convolutional Neural Networks for Visual Recognition. Retrieved
from http://cs231n.github.io/convolutional-networks/#overview
Rohrer, B. (n.d.). How do Convolutional Neural Networks work?. Retrieved from
http://brohrer.github.io/how_convolutional_neural_networks_work.html
Brownlee, J. (n.d.). Crash Course in Convolutional Neural Networks for Machine Learning.
Retrieved from http://machinelearningmastery.com/crash-course-convolutional-neural-
networks/
Lidinwise (n.d.). The revolution of depth. Retrieved from https://medium.com/@Lidinwise/the-
revolution-of-depth-facf174924f5#.8or5c77ss
Nervana. (n.d.). Tutorial: Convolutional neural networks. Retrieved from
https://www.nervanasys.com/convolutional-neural-networks/
L. N. d. Castro, Fundamentals of Natural Computing: Basic Concepts, Algorithms, and
Applications, Chapman and Hall/CRC , 2006.
S. Mirjalili, S. M. Mirjalili and A. Lewis, "Grey Wolf Optimizer," Advances in Engineering Software,
vol. 69, pp. 46-61, 2014.
A. Bhandare and D. Kaur, "Comparative Analysis of Swarm Intelligence Techniques," in
International Conference of Artificial Intelligence, 2017.
EECS6980:006 Social Network Analysis