SlideShare a Scribd company logo
1 of 6
Download to read offline
Comparing Performance of Different CNNs using
SparkNet on Large Image Datasets
Saptashwa Mitra and Sitakanta Mishra
Abstract— Image recognition is one of the hot topics
in the field of computer vision nowadays. Because of the
diverse nature of real world images, the model needed to
efficiently classify any image in the real world should be
able to learn from a large scale dataset. Convolutional
Neural Networks are such a class of models whose
capacity can be controlling their depth and breadth and
also, unlike standard feed-forward neural networks, they
have fewer connections and parameters involved which
would make training them more feasible and less time
consuming. However, the time required to train these
models on a single machine is still pretty considerable,
which is why many implementations of CNNs take the use
of multiple GPUs on a machine to increase the training
speed of these networks. In this term project, we decided
on training a Convolutional Neural Net on a large image
dataset using Apache Spark, a distributed, in-memory
cluster computing framework and tried to find out if
there is any improvement in terms of speed of training
the model up to a decent accuracy.
I. INTRODUCTION
Large scale object recognition and image clas-
sification requires a wide application of machine
learning algorithms. Because of the diverse nature
of images in the real world and the diversity of
different object categories, in order for any model
to be able to classify an image with reasonable
accuracy, the model needs to be trained on a large
number of training examples. So, we can say that
in order to create a powerful model, a large image
dataset needs to be input to the model as its
training dataset.
So, given the size of the training set, these
models would need a lot of training time, mostly
in order of days. Cluster computing could be
a solution to improving this long training time.
Distributing training of batches on multiple nodes
in a cluster could give us better performance in
terms of time required to train. Popular batch-
processing computational frameworks like Hadoop
and Spark are not well suited for communication-
intensive such as this, because the model parame-
ters would need to be communicated to the master
frequently in between iterations. For that reason,
we have used SparkNet in our project. SparkNet
is a framework built on top of Spark that built for
training deep networks.
The goal of our project is to see if there is
any speedup in training time on a large image
dataset using SparkNet. We plan on using multiple
Convolutional Neural Network architectures and
compare their performances in terms of training
time and accuracy obtained.
II. DATA SETS
Before we go into more detail about the methods
used, we would like to introduce the two data-sets
that we worked on for this project.
A. CIFAR-10
The CIFAR-10 is a dataset of 60,000 tiny im-
ages, each of size 32x32. There are 10 possible
classes that the images can be categorized into
and the number of images per class is 6,000. The
CIFAR-10 dataset comes in two parts, a training
set of 50,000 images and a test imageset of 10,000
images. The test-set is created by taken 1,000
random images from each class out of the 60,000
total images. The remaining images go into the
training set.
FIG 1:. A sample of CIFAR-10 images
B. IMAGENET
ImageNet is a dataset of more than 1.2 million
images meant to be used in object recognition
research. This dataset has been the subject of the
ILSVRC challenge since 2010. The goal is to
correctly as little error rate on classification of
these images as possible. The training set itself
is about 1.2 million images and the validation set
consists of 50,000 images and the test set contains
100,000 images. Both the training and validation
imageset come with a text file identifying each
image filename to its corresponding category. A
test dataset of 150,000 images is also provided in
the imagenet data. The images in this dataset are
not of any fixed size and can be large images. We
have used the Imagenet dataset from 2012, which
has a total size of over 150 gb.
FIG 2:. A sample of Imagenet images
III. METHODS
Convolution Neural Networks have emerged as
the forerunners when it comes to classifying large
image datasets. Over the years, all the winners of
the ILSVRC competitions have come up with some
variation of a CNN model. Although simple learn-
ing tasks on smaller datasets can be easily done
without the use of CNNs, classifying thousands or
millions of images requires a training model with
a large training capacity.
Also, the immense complexity of the object
recognition task means that this problem cannot
be specified even by a large, so our model should
also have lots of prior knowledge to compensate
for all the data we dont have as explained below.
A. Data Preprocessing
Our software requires the images to be input
at a fixed dimension. Cifar-10s images are all of
the same dimension. But, since the input images
of ImageNet are varying in resolution, we have
used the method used in AlexNets[1] paper to get
around the problem.
Here, we down-sampled the images to a fixed
resolution of 256 x 256. Given a rectangular image,
we first rescaled the image such that the shorter
side was of length 256, and then cropped out the
central 256 x 256 patch from the resulting image.
These images are to be treated as input to our
model. During training also, our model takes a 227
x 227 random crop of each input image to avoid
overfitting.
B. Convolutional Neural Networks
Convolutional Neural Networks (CNN) are a
specialized type of neural network that are de-
signed to train large sized training data having a
grid-like topology. Image data fit this description
perfectly.
A convolution network is a specialized type of
neural network that prefers convolution instead on
general matrix multiplication. CNN architecture
make the explicit assumption that the inputs are
images, which allows us to encode certain prop-
erties into the architecture. These then make the
forward function more efficient to implement and
vastly reduce the amount of parameters in the
network.
Since for regular neural network, each neuron
in one layer is fully connected to neurons in the
previous layer, the problem at hand would become
very complicated and time consuming if regular
neural networks were used, simply because of the
huge number of parameters that would be involved
in the training process.
A convolution Network consists of a sequence
of layers. The purpose of these layers is to take
one volume of input activations and convert them
into output activations. There are three main types
of layers that are used to build ConvNet archi-
tectures: Convolutional Layer(CONV), Pooling
Layer(POOL), and Fully-Connected Layer (FC,
exactly as seen in regular Neural Networks). These
layers get stacked in different combinations to get
different implementations of a ConvNet architec-
ture that we see today.
Convolutional Layer: The CONV layers pa-
rameters consist of a set of learnable filters. Every
filter is small spatially (along width and height),
but extends through the full depth of the input
volume. During the forward pass, we slide (con-
volve) each filter across the width and height
of the input volume and compute dot products
between the entries of the filter and the input
at any position. As we slide the filter over the
width and height of the input volume we will
produce a 2-dimensional activation map that gives
the responses of that filter at every spatial position.
Following is an example of how multiple activation
maps from using multiple filters can be clubbed
together to form an input for the next stage of a
CNN. Each convolution layer is also followed by
an elementwise activation function (ReLU).
FIG 3:. An example of Convolutional layer
Mathematically we can say, a convolution
layer:
• Accepts a volume of size W1xH1xD1
• Requires four hyperparameters
– Number of filters (K)
– Spatial extent of the filters (F)
– Stride (S)
– Size of zero Padding (P)
• Produces an output of dimension
W2xH2xD2 where:
– W2 = (W1F + 2P)/S + 1W2 =
(W1F + 2P)/S + 1
– H2 = (H1F + 2P)/S + 1
– D2 = K
Pooling Layer: Periodically, in between each
convolution layers, we insert Pooling Layers. The
purpose of pooling layers is to down-sample the
volume spatially.
A pooling function replaces the output of the
net at certain location with a summary statistic of
its nearby outputs. We have used max-pooling for
our experiment. In the Max-pooling operation, the
maximum output of a rectangular area is reported.
Pooling helps us to achieve invariance to trans-
formation as well as invariance to inputs of varying
size.
Fully Connected Layer: Neurons in a fully
connected layer have full connections to all ac-
tivations in the previous layer, as seen in regular
Neural Networks. Their activations can hence be
computed with a matrix multiplication followed by
a bias offset. The final layer of a CNN must always
be fully connected.
FIG 4:. An example of Max Pooling
So essentially, a CNN is just a sequence of the
following structure:
INPUT− > [[CONV − > RELU] ∗ N− >
POOL?] ∗ M− > [FC− > RELU] ∗ K− > FC
C. SparkNet
As mentioned before, for our project we have
worked with SparkNet, which is a framework for
training deep networks in Spark. It includes a
convenient interface for reading data from Spark
RDDs, a Scala interface to the Caffe deep learning
framework and lightweight multidimensional ten-
sor library. It builds on Apache Spark and the Caff
deep learning library.
In the implementation, the Net class wraps Caffe
and exposes a simple API containing methods. The
NetParams type specifies a network architecture
and the weightCollection type is map from layer
names to list of weights. It allows the manipu-
lation of network components and the storage of
weights and outputs for individual layers. NDArray
class, which is a lightweight multi-dimensional
tensor library which facilitates manipulation of
data and weights without copying memory from
Caffe. Spark consists of a single master node and
a number of worker nodes. The data is split among
the Spark workers. In every iteration, the Spark
master broadcasts the model parameters to each
worker. Each worker runs SGD on the model with
its subset of data for a fixed number of iterations
or for a fixed length of time. Then the resulting
model parameters on each worker are sent to
the master and averaged to form the new model
parameters.[3]
Caffe provides us with a specific format to
specify a CNN architecture to the program. Using
this format, we have specified the different CNN
architecture we have used. The following is a
sample code for specifying the protocol for a layer:
FIG 5:. Layer Specification on Caffe
D. Different CNN architecture
Some popular CNN architecture that we at-
tempted to implement were:
For ImageNet dataset:
Alexnet
[227x227x3]INPUT
[55x55x96]CONV 1 :
9611x11filtersatstride4, pad0
[27x27x96]MAXPOOL1 :
3x3filtersatstride2
[27x27x96]NORM1 : Normalizationlayer
[27x27x256]CONV 2 :
2565x5filtersatstride1, pad2
[13x13x256]MAXPOOL2 :
3x3filtersatstride2
[13x13x256]NORM2 : Normalizationlayer
[13x13x384]CONV 3 :
3843x3filtersatstride1, pad1
[13x13x384]CONV 4 :
3843x3filtersatstride1, pad1
[13x13x256]CONV 5 :
2563x3filtersatstride1, pad1
[6x6x256]MAXPOOL3 :
3x3filtersatstride2
[4096]FC6 : 4096neurons
[4096]FC7 : 4096neurons
[1000]FC8 : 1000neurons(classscores)
For Cifar-10
We were not able to implement AlexNet due to
reasons mentioned later. As a result, we decided
to train a set of different CNN architecture on the
Cifar-10 dataset. We tested out 2 CNNs on the
Cifar-10 dataset to see which one trained faster
and which one gave a better accuracy after a
certain number of iterations. The following are the
architecture we tried out on Cifar-10:
Trial #1:
32x32x3INPUT
CONV 1 : 5x5x3filtersatstride1, pad2
MAXPOOL1 : 3x3filtersatstride2
CONV 2 : 5x5x3filtersatstride1, pad2
ReLU
AV GPOOL2 : 3x3filtersatstride2
CONV 3 : 5x5x3filtersatstride1, pad2
ReLU
AV GPOOL3 : 3x3filtersatstride2
FC(SoftMax)
FC
FC(Softmaxwithloss)[10neurons]
Trial #2:
32x32x3INPUT
CONV 1 : 5x5x3filtersatstride1, pad2
MAXPOOL1 : 3x3filtersatstride2
CONV 2 : 5x5x3filtersatstride1, pad2
ReLU
AV GPOOL2 : 3x3filtersatstride2
CONV 3 : 5x5x3filtersatstride1, pad2
ReLU
AV GPOOL3 : 3x3filtersatstride2
CONV 4 : 5x5x3filtersatstride1, pad2
ReLU
AV GPOOL4 : 3x3filtersatstride2
FC(SoftMax)
FC
FC(Softmaxwithloss)[10neurons]
IV. RESULTS AND DISCUSSION
A. Working with ImageNet
We encountered a roadblock while training the
ImageNet dataset on a spark cluster we created
on the CS120 lab machines. We had added 23
nodes on our cluster, deployed Spark on them and
installed SparkNet on top. However, the job that
we submitted failed after running for a few hours.
We received RPC timeout exception on sub-
mitting our jobs. We believe it has to do with
the Master process not being able to accumulate
results from worker tasks on Spark. We noticed
that while running their respective tasks, some of
the worker nodes got disconnected midway leading
to the master not getting a heartbeat from the
worker processes which led to the failure.
So, specifically, workers handling large amount
of data were sometimes going offline for some
reason. We believe this problem is with the CS120
machines and the problem would not occur if we
used Spark on EC2 machines instead of the CS120
machines.
B. Working with Cifar-10 data
We had better luck training the Cifar-10 datset
with the two CNN architectures we mentioned
before. We logged the training time, iterations re-
quired and the accuracy achieved with the test data
after each iteration for both the implementations.
The following are the results we obtained:
FIG 6:. Plotting Accuracy vs. Time
FIG 7:. Plotting Accuracy vs. Training Iterations
We found that with an extra layer of CONV-
RELU-POOL in layer 4 for the Trial#2, we get
better accuracy over time. The training is faster
for Trial#2 that for Trial#1. We also see that for
a fixed number of iterations, the accuracy score
of the CNN with an extra layer is better than the
other.
V. CONCLUSION
Although CIFAR-10 is a far smaller dataset than
the ImageNet dataset, we believe that the conclu-
sions we draw from the CIFAR dataset could be
applied to ImageNet data as well. The tensorflow
site mentions that with Cifar, they achieved an
86% accuracy with almost 300k iterations. We
notice our model achieving nearly 80% accuracy
on the training set after about an hour with only
4k iterations. So we believe that training a CNN
on a large dataset could actually be beneficial on
SparkNet.
VI. FUTURE WORK
Our future work would include getting the
SparkNet to be able to train ImageNet dataset on
the CS120 machines. If the problems persist, EC2
clusters could be a good solution. The reason we
did not use EC2 for our class project was due to
the cost that would be involved due to charge for
data transfer to and within the clusters.
Once SparkNet is able to train on ImageNet
dataset, the next step would be to test out different
implementations of CNNs starting from AlexNet,
ZFNet etc. on the same cluster and compare their
performance based on the time it takes to train
the models and the accuracy obtained through
iterations.
A study on the accuracy of the different models
could also be done based on the top 5 scores on the
test dataset and using that data, we could determine
which of these models take less time and give
sufficiently less error rate on classifying the test
dataset.
VII. CONTRIBUTION
Installation of Spark and SparkNet was done by
SitaKanta. Data pre-processing was done by both
Saptashwa and Sitakanta. The training of the data
set was done by Saptashwa. The testing and valida-
tion was done by Sitakanta. The final project report
was majorly done by Saptashwa. Some editing was
done by Sitakanta. The poster presentation was
prepared by Sitakanta. Debugging of issues and
bugs were fixed with equal contribution of both
the team mates.
REFERENCES
[1] 1. Alex Krizhevsky; Ilya Sutskever; Geoffrey E. Hinton.
”ImageNet Classification with Deep Convolutional Neural
Networks”.
[2] http://cs231n.github.io/convolutional-networks/
[3] Philipp Moritz; Robert Nishihara; Ion Stoica; Michael I.
Jordan . ”SPARKNET: TRAINING DEEP NETWORKS IN
SPARK”.
[4] https://arxiv.org/pdf/1311.2901v3.pdf
[5] http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=7753615
[6] Matei Zaharia, Mosharaf Chowdhury, Michael J. Franklin,
Scott Shenker, Ion Stoica Spark: Cluster Computing with
Working Sets
[7] https://www.cs.toronto.edu/ kriz/learning-features-2009-
TR.pdf
[8] http://image-net.org/

More Related Content

What's hot

CNNs: from the Basics to Recent Advances
CNNs: from the Basics to Recent AdvancesCNNs: from the Basics to Recent Advances
CNNs: from the Basics to Recent AdvancesDmytro Mishkin
 
Convolutional neural network from VGG to DenseNet
Convolutional neural network from VGG to DenseNetConvolutional neural network from VGG to DenseNet
Convolutional neural network from VGG to DenseNetSungminYou
 
Convolutional Neural Network and RNN for OCR problem.
Convolutional Neural Network and RNN for OCR problem.Convolutional Neural Network and RNN for OCR problem.
Convolutional Neural Network and RNN for OCR problem.Vishal Mishra
 
PR-217: EfficientDet: Scalable and Efficient Object Detection
PR-217: EfficientDet: Scalable and Efficient Object DetectionPR-217: EfficientDet: Scalable and Efficient Object Detection
PR-217: EfficientDet: Scalable and Efficient Object DetectionJinwon Lee
 
Transfer Learning and Fine-tuning Deep Neural Networks
 Transfer Learning and Fine-tuning Deep Neural Networks Transfer Learning and Fine-tuning Deep Neural Networks
Transfer Learning and Fine-tuning Deep Neural NetworksPyData
 
A Framework for Scene Recognition Using Convolutional Neural Network as Featu...
A Framework for Scene Recognition Using Convolutional Neural Network as Featu...A Framework for Scene Recognition Using Convolutional Neural Network as Featu...
A Framework for Scene Recognition Using Convolutional Neural Network as Featu...Tahmid Abtahi
 
Introduction to Convolutional Neural Networks
Introduction to Convolutional Neural NetworksIntroduction to Convolutional Neural Networks
Introduction to Convolutional Neural NetworksHannes Hapke
 
PR-197: One ticket to win them all: generalizing lottery ticket initializatio...
PR-197: One ticket to win them all: generalizing lottery ticket initializatio...PR-197: One ticket to win them all: generalizing lottery ticket initializatio...
PR-197: One ticket to win them all: generalizing lottery ticket initializatio...Jinwon Lee
 
Efficient Neural Architecture Search via Parameter Sharing
Efficient Neural Architecture Search via Parameter SharingEfficient Neural Architecture Search via Parameter Sharing
Efficient Neural Architecture Search via Parameter SharingJinwon Lee
 
Review-image-segmentation-by-deep-learning
Review-image-segmentation-by-deep-learningReview-image-segmentation-by-deep-learning
Review-image-segmentation-by-deep-learningTrong-An Bui
 
PR-183: MixNet: Mixed Depthwise Convolutional Kernels
PR-183: MixNet: Mixed Depthwise Convolutional KernelsPR-183: MixNet: Mixed Depthwise Convolutional Kernels
PR-183: MixNet: Mixed Depthwise Convolutional KernelsJinwon Lee
 
Deep Learning - CNN and RNN
Deep Learning - CNN and RNNDeep Learning - CNN and RNN
Deep Learning - CNN and RNNAshray Bhandare
 
CNN and its applications by ketaki
CNN and its applications by ketakiCNN and its applications by ketaki
CNN and its applications by ketakiKetaki Patwari
 
Convolutional neural network
Convolutional neural networkConvolutional neural network
Convolutional neural networkFerdous ahmed
 
Advancements in HPCC Systems Machine Learning
Advancements in HPCC Systems Machine LearningAdvancements in HPCC Systems Machine Learning
Advancements in HPCC Systems Machine LearningHPCC Systems
 
Convolutional Neural Networks : Popular Architectures
Convolutional Neural Networks : Popular ArchitecturesConvolutional Neural Networks : Popular Architectures
Convolutional Neural Networks : Popular Architecturesananth
 
[PR12] PR-050: Convolutional LSTM Network: A Machine Learning Approach for Pr...
[PR12] PR-050: Convolutional LSTM Network: A Machine Learning Approach for Pr...[PR12] PR-050: Convolutional LSTM Network: A Machine Learning Approach for Pr...
[PR12] PR-050: Convolutional LSTM Network: A Machine Learning Approach for Pr...Taegyun Jeon
 
PR-155: Exploring Randomly Wired Neural Networks for Image Recognition
PR-155: Exploring Randomly Wired Neural Networks for Image RecognitionPR-155: Exploring Randomly Wired Neural Networks for Image Recognition
PR-155: Exploring Randomly Wired Neural Networks for Image RecognitionJinwon Lee
 

What's hot (20)

CNNs: from the Basics to Recent Advances
CNNs: from the Basics to Recent AdvancesCNNs: from the Basics to Recent Advances
CNNs: from the Basics to Recent Advances
 
Convolutional neural network from VGG to DenseNet
Convolutional neural network from VGG to DenseNetConvolutional neural network from VGG to DenseNet
Convolutional neural network from VGG to DenseNet
 
Convolutional Neural Network and RNN for OCR problem.
Convolutional Neural Network and RNN for OCR problem.Convolutional Neural Network and RNN for OCR problem.
Convolutional Neural Network and RNN for OCR problem.
 
PR-217: EfficientDet: Scalable and Efficient Object Detection
PR-217: EfficientDet: Scalable and Efficient Object DetectionPR-217: EfficientDet: Scalable and Efficient Object Detection
PR-217: EfficientDet: Scalable and Efficient Object Detection
 
Transfer Learning and Fine-tuning Deep Neural Networks
 Transfer Learning and Fine-tuning Deep Neural Networks Transfer Learning and Fine-tuning Deep Neural Networks
Transfer Learning and Fine-tuning Deep Neural Networks
 
Deep Learning
Deep LearningDeep Learning
Deep Learning
 
A Framework for Scene Recognition Using Convolutional Neural Network as Featu...
A Framework for Scene Recognition Using Convolutional Neural Network as Featu...A Framework for Scene Recognition Using Convolutional Neural Network as Featu...
A Framework for Scene Recognition Using Convolutional Neural Network as Featu...
 
Introduction to Convolutional Neural Networks
Introduction to Convolutional Neural NetworksIntroduction to Convolutional Neural Networks
Introduction to Convolutional Neural Networks
 
PR-197: One ticket to win them all: generalizing lottery ticket initializatio...
PR-197: One ticket to win them all: generalizing lottery ticket initializatio...PR-197: One ticket to win them all: generalizing lottery ticket initializatio...
PR-197: One ticket to win them all: generalizing lottery ticket initializatio...
 
Efficient Neural Architecture Search via Parameter Sharing
Efficient Neural Architecture Search via Parameter SharingEfficient Neural Architecture Search via Parameter Sharing
Efficient Neural Architecture Search via Parameter Sharing
 
Review-image-segmentation-by-deep-learning
Review-image-segmentation-by-deep-learningReview-image-segmentation-by-deep-learning
Review-image-segmentation-by-deep-learning
 
PR-183: MixNet: Mixed Depthwise Convolutional Kernels
PR-183: MixNet: Mixed Depthwise Convolutional KernelsPR-183: MixNet: Mixed Depthwise Convolutional Kernels
PR-183: MixNet: Mixed Depthwise Convolutional Kernels
 
Deep Learning - CNN and RNN
Deep Learning - CNN and RNNDeep Learning - CNN and RNN
Deep Learning - CNN and RNN
 
CNN and its applications by ketaki
CNN and its applications by ketakiCNN and its applications by ketaki
CNN and its applications by ketaki
 
Convolutional neural network
Convolutional neural networkConvolutional neural network
Convolutional neural network
 
Advancements in HPCC Systems Machine Learning
Advancements in HPCC Systems Machine LearningAdvancements in HPCC Systems Machine Learning
Advancements in HPCC Systems Machine Learning
 
Convolutional Neural Networks : Popular Architectures
Convolutional Neural Networks : Popular ArchitecturesConvolutional Neural Networks : Popular Architectures
Convolutional Neural Networks : Popular Architectures
 
[PR12] PR-050: Convolutional LSTM Network: A Machine Learning Approach for Pr...
[PR12] PR-050: Convolutional LSTM Network: A Machine Learning Approach for Pr...[PR12] PR-050: Convolutional LSTM Network: A Machine Learning Approach for Pr...
[PR12] PR-050: Convolutional LSTM Network: A Machine Learning Approach for Pr...
 
Bj31416421
Bj31416421Bj31416421
Bj31416421
 
PR-155: Exploring Randomly Wired Neural Networks for Image Recognition
PR-155: Exploring Randomly Wired Neural Networks for Image RecognitionPR-155: Exploring Randomly Wired Neural Networks for Image Recognition
PR-155: Exploring Randomly Wired Neural Networks for Image Recognition
 

Similar to Comparing CNN Performance on Large Datasets Using SparkNet

ImageNet Classification with Deep Convolutional Neural Networks
ImageNet Classification with Deep Convolutional Neural NetworksImageNet Classification with Deep Convolutional Neural Networks
ImageNet Classification with Deep Convolutional Neural NetworksWilly Marroquin (WillyDevNET)
 
RunPool: A Dynamic Pooling Layer for Convolution Neural Network
RunPool: A Dynamic Pooling Layer for Convolution Neural NetworkRunPool: A Dynamic Pooling Layer for Convolution Neural Network
RunPool: A Dynamic Pooling Layer for Convolution Neural NetworkPutra Wanda
 
Inception v4 vs Inception Resnet v2.pdf
Inception v4 vs Inception Resnet v2.pdfInception v4 vs Inception Resnet v2.pdf
Inception v4 vs Inception Resnet v2.pdfChauVVan
 
Image Captioning Generator using Deep Machine Learning
Image Captioning Generator using Deep Machine LearningImage Captioning Generator using Deep Machine Learning
Image Captioning Generator using Deep Machine Learningijtsrd
 
Devanagari Digit and Character Recognition Using Convolutional Neural Network
Devanagari Digit and Character Recognition Using Convolutional Neural NetworkDevanagari Digit and Character Recognition Using Convolutional Neural Network
Devanagari Digit and Character Recognition Using Convolutional Neural NetworkIRJET Journal
 
Spine net learning scale permuted backbone for recognition and localization
Spine net learning scale permuted backbone for recognition and localizationSpine net learning scale permuted backbone for recognition and localization
Spine net learning scale permuted backbone for recognition and localizationDevansh16
 
(Im2col)accelerating deep neural networks on low power heterogeneous architec...
(Im2col)accelerating deep neural networks on low power heterogeneous architec...(Im2col)accelerating deep neural networks on low power heterogeneous architec...
(Im2col)accelerating deep neural networks on low power heterogeneous architec...Bomm Kim
 
Introducing Deep learning with Matlab
Introducing Deep learning with MatlabIntroducing Deep learning with Matlab
Introducing Deep learning with MatlabMassimo Talia
 
Efficient Mobile Implementation ofA CNN-based Object Recogni.docx
Efficient Mobile Implementation ofA CNN-based Object Recogni.docxEfficient Mobile Implementation ofA CNN-based Object Recogni.docx
Efficient Mobile Implementation ofA CNN-based Object Recogni.docxtoltonkendal
 
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020Universitat Politècnica de Catalunya
 
Metaphorical Analysis of diseases in Tomato leaves using Deep Learning Algori...
Metaphorical Analysis of diseases in Tomato leaves using Deep Learning Algori...Metaphorical Analysis of diseases in Tomato leaves using Deep Learning Algori...
Metaphorical Analysis of diseases in Tomato leaves using Deep Learning Algori...IRJET Journal
 
A Neural Network that Understands Handwriting
A Neural Network that Understands HandwritingA Neural Network that Understands Handwriting
A Neural Network that Understands HandwritingShivam Sawhney
 
A Survey on Image Processing using CNN in Deep Learning
A Survey on Image Processing using CNN in Deep LearningA Survey on Image Processing using CNN in Deep Learning
A Survey on Image Processing using CNN in Deep LearningIRJET Journal
 
Transformer models for FER
Transformer models for FERTransformer models for FER
Transformer models for FERIRJET Journal
 
Handwritten Digit Recognition using Convolutional Neural Networks
Handwritten Digit Recognition using Convolutional Neural  NetworksHandwritten Digit Recognition using Convolutional Neural  Networks
Handwritten Digit Recognition using Convolutional Neural NetworksIRJET Journal
 
Anomaly Detection with Azure and .NET
Anomaly Detection with Azure and .NETAnomaly Detection with Azure and .NET
Anomaly Detection with Azure and .NETMarco Parenzan
 

Similar to Comparing CNN Performance on Large Datasets Using SparkNet (20)

ImageNet Classification with Deep Convolutional Neural Networks
ImageNet Classification with Deep Convolutional Neural NetworksImageNet Classification with Deep Convolutional Neural Networks
ImageNet Classification with Deep Convolutional Neural Networks
 
Mnist report
Mnist reportMnist report
Mnist report
 
RunPool: A Dynamic Pooling Layer for Convolution Neural Network
RunPool: A Dynamic Pooling Layer for Convolution Neural NetworkRunPool: A Dynamic Pooling Layer for Convolution Neural Network
RunPool: A Dynamic Pooling Layer for Convolution Neural Network
 
Inception v4 vs Inception Resnet v2.pdf
Inception v4 vs Inception Resnet v2.pdfInception v4 vs Inception Resnet v2.pdf
Inception v4 vs Inception Resnet v2.pdf
 
Mnist report ppt
Mnist report pptMnist report ppt
Mnist report ppt
 
Image Captioning Generator using Deep Machine Learning
Image Captioning Generator using Deep Machine LearningImage Captioning Generator using Deep Machine Learning
Image Captioning Generator using Deep Machine Learning
 
Devanagari Digit and Character Recognition Using Convolutional Neural Network
Devanagari Digit and Character Recognition Using Convolutional Neural NetworkDevanagari Digit and Character Recognition Using Convolutional Neural Network
Devanagari Digit and Character Recognition Using Convolutional Neural Network
 
Spine net learning scale permuted backbone for recognition and localization
Spine net learning scale permuted backbone for recognition and localizationSpine net learning scale permuted backbone for recognition and localization
Spine net learning scale permuted backbone for recognition and localization
 
Dl
DlDl
Dl
 
(Im2col)accelerating deep neural networks on low power heterogeneous architec...
(Im2col)accelerating deep neural networks on low power heterogeneous architec...(Im2col)accelerating deep neural networks on low power heterogeneous architec...
(Im2col)accelerating deep neural networks on low power heterogeneous architec...
 
Introducing Deep learning with Matlab
Introducing Deep learning with MatlabIntroducing Deep learning with Matlab
Introducing Deep learning with Matlab
 
Efficient Mobile Implementation ofA CNN-based Object Recogni.docx
Efficient Mobile Implementation ofA CNN-based Object Recogni.docxEfficient Mobile Implementation ofA CNN-based Object Recogni.docx
Efficient Mobile Implementation ofA CNN-based Object Recogni.docx
 
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
 
Metaphorical Analysis of diseases in Tomato leaves using Deep Learning Algori...
Metaphorical Analysis of diseases in Tomato leaves using Deep Learning Algori...Metaphorical Analysis of diseases in Tomato leaves using Deep Learning Algori...
Metaphorical Analysis of diseases in Tomato leaves using Deep Learning Algori...
 
A Neural Network that Understands Handwriting
A Neural Network that Understands HandwritingA Neural Network that Understands Handwriting
A Neural Network that Understands Handwriting
 
kanimozhi2019.pdf
kanimozhi2019.pdfkanimozhi2019.pdf
kanimozhi2019.pdf
 
A Survey on Image Processing using CNN in Deep Learning
A Survey on Image Processing using CNN in Deep LearningA Survey on Image Processing using CNN in Deep Learning
A Survey on Image Processing using CNN in Deep Learning
 
Transformer models for FER
Transformer models for FERTransformer models for FER
Transformer models for FER
 
Handwritten Digit Recognition using Convolutional Neural Networks
Handwritten Digit Recognition using Convolutional Neural  NetworksHandwritten Digit Recognition using Convolutional Neural  Networks
Handwritten Digit Recognition using Convolutional Neural Networks
 
Anomaly Detection with Azure and .NET
Anomaly Detection with Azure and .NETAnomaly Detection with Azure and .NET
Anomaly Detection with Azure and .NET
 

Comparing CNN Performance on Large Datasets Using SparkNet

  • 1. Comparing Performance of Different CNNs using SparkNet on Large Image Datasets Saptashwa Mitra and Sitakanta Mishra Abstract— Image recognition is one of the hot topics in the field of computer vision nowadays. Because of the diverse nature of real world images, the model needed to efficiently classify any image in the real world should be able to learn from a large scale dataset. Convolutional Neural Networks are such a class of models whose capacity can be controlling their depth and breadth and also, unlike standard feed-forward neural networks, they have fewer connections and parameters involved which would make training them more feasible and less time consuming. However, the time required to train these models on a single machine is still pretty considerable, which is why many implementations of CNNs take the use of multiple GPUs on a machine to increase the training speed of these networks. In this term project, we decided on training a Convolutional Neural Net on a large image dataset using Apache Spark, a distributed, in-memory cluster computing framework and tried to find out if there is any improvement in terms of speed of training the model up to a decent accuracy. I. INTRODUCTION Large scale object recognition and image clas- sification requires a wide application of machine learning algorithms. Because of the diverse nature of images in the real world and the diversity of different object categories, in order for any model to be able to classify an image with reasonable accuracy, the model needs to be trained on a large number of training examples. So, we can say that in order to create a powerful model, a large image dataset needs to be input to the model as its training dataset. So, given the size of the training set, these models would need a lot of training time, mostly in order of days. Cluster computing could be a solution to improving this long training time. Distributing training of batches on multiple nodes in a cluster could give us better performance in terms of time required to train. Popular batch- processing computational frameworks like Hadoop and Spark are not well suited for communication- intensive such as this, because the model parame- ters would need to be communicated to the master frequently in between iterations. For that reason, we have used SparkNet in our project. SparkNet is a framework built on top of Spark that built for training deep networks. The goal of our project is to see if there is any speedup in training time on a large image dataset using SparkNet. We plan on using multiple Convolutional Neural Network architectures and compare their performances in terms of training time and accuracy obtained. II. DATA SETS Before we go into more detail about the methods used, we would like to introduce the two data-sets that we worked on for this project. A. CIFAR-10 The CIFAR-10 is a dataset of 60,000 tiny im- ages, each of size 32x32. There are 10 possible classes that the images can be categorized into and the number of images per class is 6,000. The CIFAR-10 dataset comes in two parts, a training set of 50,000 images and a test imageset of 10,000 images. The test-set is created by taken 1,000 random images from each class out of the 60,000 total images. The remaining images go into the training set. FIG 1:. A sample of CIFAR-10 images
  • 2. B. IMAGENET ImageNet is a dataset of more than 1.2 million images meant to be used in object recognition research. This dataset has been the subject of the ILSVRC challenge since 2010. The goal is to correctly as little error rate on classification of these images as possible. The training set itself is about 1.2 million images and the validation set consists of 50,000 images and the test set contains 100,000 images. Both the training and validation imageset come with a text file identifying each image filename to its corresponding category. A test dataset of 150,000 images is also provided in the imagenet data. The images in this dataset are not of any fixed size and can be large images. We have used the Imagenet dataset from 2012, which has a total size of over 150 gb. FIG 2:. A sample of Imagenet images III. METHODS Convolution Neural Networks have emerged as the forerunners when it comes to classifying large image datasets. Over the years, all the winners of the ILSVRC competitions have come up with some variation of a CNN model. Although simple learn- ing tasks on smaller datasets can be easily done without the use of CNNs, classifying thousands or millions of images requires a training model with a large training capacity. Also, the immense complexity of the object recognition task means that this problem cannot be specified even by a large, so our model should also have lots of prior knowledge to compensate for all the data we dont have as explained below. A. Data Preprocessing Our software requires the images to be input at a fixed dimension. Cifar-10s images are all of the same dimension. But, since the input images of ImageNet are varying in resolution, we have used the method used in AlexNets[1] paper to get around the problem. Here, we down-sampled the images to a fixed resolution of 256 x 256. Given a rectangular image, we first rescaled the image such that the shorter side was of length 256, and then cropped out the central 256 x 256 patch from the resulting image. These images are to be treated as input to our model. During training also, our model takes a 227 x 227 random crop of each input image to avoid overfitting. B. Convolutional Neural Networks Convolutional Neural Networks (CNN) are a specialized type of neural network that are de- signed to train large sized training data having a grid-like topology. Image data fit this description perfectly. A convolution network is a specialized type of neural network that prefers convolution instead on general matrix multiplication. CNN architecture make the explicit assumption that the inputs are images, which allows us to encode certain prop- erties into the architecture. These then make the forward function more efficient to implement and vastly reduce the amount of parameters in the network. Since for regular neural network, each neuron in one layer is fully connected to neurons in the previous layer, the problem at hand would become very complicated and time consuming if regular neural networks were used, simply because of the huge number of parameters that would be involved in the training process. A convolution Network consists of a sequence of layers. The purpose of these layers is to take one volume of input activations and convert them into output activations. There are three main types of layers that are used to build ConvNet archi- tectures: Convolutional Layer(CONV), Pooling Layer(POOL), and Fully-Connected Layer (FC, exactly as seen in regular Neural Networks). These layers get stacked in different combinations to get different implementations of a ConvNet architec- ture that we see today. Convolutional Layer: The CONV layers pa- rameters consist of a set of learnable filters. Every filter is small spatially (along width and height), but extends through the full depth of the input volume. During the forward pass, we slide (con- volve) each filter across the width and height
  • 3. of the input volume and compute dot products between the entries of the filter and the input at any position. As we slide the filter over the width and height of the input volume we will produce a 2-dimensional activation map that gives the responses of that filter at every spatial position. Following is an example of how multiple activation maps from using multiple filters can be clubbed together to form an input for the next stage of a CNN. Each convolution layer is also followed by an elementwise activation function (ReLU). FIG 3:. An example of Convolutional layer Mathematically we can say, a convolution layer: • Accepts a volume of size W1xH1xD1 • Requires four hyperparameters – Number of filters (K) – Spatial extent of the filters (F) – Stride (S) – Size of zero Padding (P) • Produces an output of dimension W2xH2xD2 where: – W2 = (W1F + 2P)/S + 1W2 = (W1F + 2P)/S + 1 – H2 = (H1F + 2P)/S + 1 – D2 = K Pooling Layer: Periodically, in between each convolution layers, we insert Pooling Layers. The purpose of pooling layers is to down-sample the volume spatially. A pooling function replaces the output of the net at certain location with a summary statistic of its nearby outputs. We have used max-pooling for our experiment. In the Max-pooling operation, the maximum output of a rectangular area is reported. Pooling helps us to achieve invariance to trans- formation as well as invariance to inputs of varying size. Fully Connected Layer: Neurons in a fully connected layer have full connections to all ac- tivations in the previous layer, as seen in regular Neural Networks. Their activations can hence be computed with a matrix multiplication followed by a bias offset. The final layer of a CNN must always be fully connected. FIG 4:. An example of Max Pooling So essentially, a CNN is just a sequence of the following structure: INPUT− > [[CONV − > RELU] ∗ N− > POOL?] ∗ M− > [FC− > RELU] ∗ K− > FC C. SparkNet As mentioned before, for our project we have worked with SparkNet, which is a framework for training deep networks in Spark. It includes a convenient interface for reading data from Spark RDDs, a Scala interface to the Caffe deep learning framework and lightweight multidimensional ten- sor library. It builds on Apache Spark and the Caff deep learning library. In the implementation, the Net class wraps Caffe and exposes a simple API containing methods. The NetParams type specifies a network architecture and the weightCollection type is map from layer names to list of weights. It allows the manipu- lation of network components and the storage of weights and outputs for individual layers. NDArray class, which is a lightweight multi-dimensional tensor library which facilitates manipulation of data and weights without copying memory from Caffe. Spark consists of a single master node and a number of worker nodes. The data is split among the Spark workers. In every iteration, the Spark master broadcasts the model parameters to each worker. Each worker runs SGD on the model with its subset of data for a fixed number of iterations or for a fixed length of time. Then the resulting model parameters on each worker are sent to the master and averaged to form the new model parameters.[3]
  • 4. Caffe provides us with a specific format to specify a CNN architecture to the program. Using this format, we have specified the different CNN architecture we have used. The following is a sample code for specifying the protocol for a layer: FIG 5:. Layer Specification on Caffe D. Different CNN architecture Some popular CNN architecture that we at- tempted to implement were: For ImageNet dataset: Alexnet [227x227x3]INPUT [55x55x96]CONV 1 : 9611x11filtersatstride4, pad0 [27x27x96]MAXPOOL1 : 3x3filtersatstride2 [27x27x96]NORM1 : Normalizationlayer [27x27x256]CONV 2 : 2565x5filtersatstride1, pad2 [13x13x256]MAXPOOL2 : 3x3filtersatstride2 [13x13x256]NORM2 : Normalizationlayer [13x13x384]CONV 3 : 3843x3filtersatstride1, pad1 [13x13x384]CONV 4 : 3843x3filtersatstride1, pad1 [13x13x256]CONV 5 : 2563x3filtersatstride1, pad1 [6x6x256]MAXPOOL3 : 3x3filtersatstride2 [4096]FC6 : 4096neurons [4096]FC7 : 4096neurons [1000]FC8 : 1000neurons(classscores) For Cifar-10 We were not able to implement AlexNet due to reasons mentioned later. As a result, we decided to train a set of different CNN architecture on the Cifar-10 dataset. We tested out 2 CNNs on the Cifar-10 dataset to see which one trained faster and which one gave a better accuracy after a certain number of iterations. The following are the architecture we tried out on Cifar-10: Trial #1: 32x32x3INPUT CONV 1 : 5x5x3filtersatstride1, pad2 MAXPOOL1 : 3x3filtersatstride2 CONV 2 : 5x5x3filtersatstride1, pad2 ReLU AV GPOOL2 : 3x3filtersatstride2 CONV 3 : 5x5x3filtersatstride1, pad2 ReLU AV GPOOL3 : 3x3filtersatstride2 FC(SoftMax) FC FC(Softmaxwithloss)[10neurons] Trial #2: 32x32x3INPUT CONV 1 : 5x5x3filtersatstride1, pad2 MAXPOOL1 : 3x3filtersatstride2 CONV 2 : 5x5x3filtersatstride1, pad2 ReLU AV GPOOL2 : 3x3filtersatstride2 CONV 3 : 5x5x3filtersatstride1, pad2 ReLU AV GPOOL3 : 3x3filtersatstride2 CONV 4 : 5x5x3filtersatstride1, pad2 ReLU AV GPOOL4 : 3x3filtersatstride2 FC(SoftMax) FC FC(Softmaxwithloss)[10neurons] IV. RESULTS AND DISCUSSION A. Working with ImageNet We encountered a roadblock while training the ImageNet dataset on a spark cluster we created on the CS120 lab machines. We had added 23 nodes on our cluster, deployed Spark on them and installed SparkNet on top. However, the job that we submitted failed after running for a few hours. We received RPC timeout exception on sub- mitting our jobs. We believe it has to do with
  • 5. the Master process not being able to accumulate results from worker tasks on Spark. We noticed that while running their respective tasks, some of the worker nodes got disconnected midway leading to the master not getting a heartbeat from the worker processes which led to the failure. So, specifically, workers handling large amount of data were sometimes going offline for some reason. We believe this problem is with the CS120 machines and the problem would not occur if we used Spark on EC2 machines instead of the CS120 machines. B. Working with Cifar-10 data We had better luck training the Cifar-10 datset with the two CNN architectures we mentioned before. We logged the training time, iterations re- quired and the accuracy achieved with the test data after each iteration for both the implementations. The following are the results we obtained: FIG 6:. Plotting Accuracy vs. Time FIG 7:. Plotting Accuracy vs. Training Iterations We found that with an extra layer of CONV- RELU-POOL in layer 4 for the Trial#2, we get better accuracy over time. The training is faster for Trial#2 that for Trial#1. We also see that for a fixed number of iterations, the accuracy score of the CNN with an extra layer is better than the other. V. CONCLUSION Although CIFAR-10 is a far smaller dataset than the ImageNet dataset, we believe that the conclu- sions we draw from the CIFAR dataset could be applied to ImageNet data as well. The tensorflow site mentions that with Cifar, they achieved an 86% accuracy with almost 300k iterations. We notice our model achieving nearly 80% accuracy on the training set after about an hour with only 4k iterations. So we believe that training a CNN on a large dataset could actually be beneficial on SparkNet. VI. FUTURE WORK Our future work would include getting the SparkNet to be able to train ImageNet dataset on the CS120 machines. If the problems persist, EC2 clusters could be a good solution. The reason we did not use EC2 for our class project was due to the cost that would be involved due to charge for data transfer to and within the clusters. Once SparkNet is able to train on ImageNet dataset, the next step would be to test out different implementations of CNNs starting from AlexNet, ZFNet etc. on the same cluster and compare their performance based on the time it takes to train the models and the accuracy obtained through iterations. A study on the accuracy of the different models could also be done based on the top 5 scores on the test dataset and using that data, we could determine which of these models take less time and give sufficiently less error rate on classifying the test dataset. VII. CONTRIBUTION Installation of Spark and SparkNet was done by SitaKanta. Data pre-processing was done by both Saptashwa and Sitakanta. The training of the data set was done by Saptashwa. The testing and valida- tion was done by Sitakanta. The final project report was majorly done by Saptashwa. Some editing was done by Sitakanta. The poster presentation was prepared by Sitakanta. Debugging of issues and
  • 6. bugs were fixed with equal contribution of both the team mates. REFERENCES [1] 1. Alex Krizhevsky; Ilya Sutskever; Geoffrey E. Hinton. ”ImageNet Classification with Deep Convolutional Neural Networks”. [2] http://cs231n.github.io/convolutional-networks/ [3] Philipp Moritz; Robert Nishihara; Ion Stoica; Michael I. Jordan . ”SPARKNET: TRAINING DEEP NETWORKS IN SPARK”. [4] https://arxiv.org/pdf/1311.2901v3.pdf [5] http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=7753615 [6] Matei Zaharia, Mosharaf Chowdhury, Michael J. Franklin, Scott Shenker, Ion Stoica Spark: Cluster Computing with Working Sets [7] https://www.cs.toronto.edu/ kriz/learning-features-2009- TR.pdf [8] http://image-net.org/