Dp2 ppt by_bikramjit_chowdhury_final

Optimize Convolutional Neural
Networks by OpenMP and MPI
Bikramjit Chowdhury
Mtech in HPCS
Vel Tech University
VTP1561
Guided By
Aditya Kumar Sinha
Joint Director, CDAC-Pune,
Mrs. Dr.K.Meena
Associate professor, Vel Tech University
Co-Guided by
Ruchika Vyas
Project Engineer, CDAC-Pune

Index
• Abstract
• Image Processing
• Neural Network
• Convolutional Neural Network
• Region wise selection & Pooling
• Literature survey
• Project objective
• Proposed Architecture
• Workflow
• Implementation Input output
• Conclusion
• References

Abstract
• Neural network is one of the most acceptable methods for image
processing.
• The Convolutional Neural Networks is mainly used to overcome
over fitting problem of neural network.
• OpenMP is used in shared memory approach.
• MPI is used in distributed memory approach.
• This report gives a brief summary of neural network and pixel wish
convolutional neural network.
• Also it is focused on the need of benchmark of OpenMP and Hybrid
implementation of the application pixel wise convolutional neural
network.

Image Analysis
• Image analysis is the extraction of meaningful information from
images
• Application of Image Analysis
– Machine vision
– Face recognition
– Space applications
– Medical image analysis
– Autonomous vehicles
– Defence

Steps of Image Analysis
• Manipulate an image to extract information to help solve a
problem.
– Preprocessing - get rid of unnecessary information
– Data reduction - transform the image to a useable form
– Feature analysis - make inferences about the image

Neural Network
Parameters of Neural Network
•Weight: learning parameter
•Bais: Constant
A basic neural network

Disadvantage of Neural Network
• Over fitting :
• Large amount of computation power:
• Total parameter of a hidden layer in Neural Network:
– (Input size) x (no of future of the hidden layer)

Convolutional Neural Networks
• 1)Convolutional layer
• 2)Pooling Layer
• 3)Fully-Connected
LeNet [12]

Region wise selection & Pooling
Region wise selection Region wise selection
2x2 pooling

OpenMP
• API for share memory parallel application.
• Fork and Join
• Why OpenMP is used in Convolutional Neural Network?
• How OpenMP is used in our model?
Fork-join method

MPI
• Message-passing library interface
MPI-model

Literature survey
Referred Paper Authors/Publication Explanation Conclusion
[1] Efficient
Multitraining
Framework
of Image Deep
Learning on GPU
Cluster
• Chun-Fu (Richard)Chen
• Gwo Giun (Chris) Lee
•Yinglong Xia
•W. Sabrina Lin
•Toyotaro Suzumura
•Ching-Yung Lin
2015, IEEE
Deep learning model
is developed base on
pipelining schema for
image on GPU cluster.
This framework
saves time in
training multiple
models using large
dataset with
complicated
network
[13]A MapReduce
Computing
Framework Based
on GPU Cluster
•Heng Gao
•Jie Tang
•Gangshan Wu
2013 IEEE
It is parallel GPU
programming
framework based on
MapReduce.
In this framework, a
distributed file system
(GlusterFS) is used to
store data distributed.
The dynamic load
balancing was taken
into consideration
more specifically

Literature survey
[14] Theano-MPI •He Ma
•Fei Mao,
•Graham W,
arXiv:1605.08325v1
[cs.LG] 26 May 2016
It is a training
framework that can
utilize GPUs across
nodes in a cluster .
It accelerates the training
of deep learning models
based on data parallelism
and parameter exchange
among GPUs is based on
CUDA-aware MPI.
[15]CNNLab
•Chao Wang,
•Yuan Xie
arXiv:1606.06234v1[
cs.LG] 2016
This framework
defined an API-like
library for CNN
operation using GPU
and OpenCL.
Based on the framework
in CNN lab, the tasks can
be distributed to either
GPU and FPGA-based
accelerators.

Literature survey
[21]SegNet Vijay
Badrinarayanan,
Alex Kendall,
Roberto Cipolla,
Senior Member of
IEEE
Reduced version of
FCN.
It show algorithm can
improved 10 times higher
speed up in common FCN.
[22]Efficient
Convolutional
Neural Network for
pixel-wish
Classification on
Heterogeneous
HardwareSystem
Fabian Tschopp,
Julien N. P. Martel,
Srinivas C. Turaga,
Matthew Cook, Jan
Funke
It reduce time
complexity by
replacing sliding
window by strided
kernel.
It show the performance
improved by 52 times. It is
also improved in parallel
version of GPU cluster.

Literature survey
[23]Learning
Deconvolution
Network for
Semantic
Segmentation
Hyeonwoo Noh
Seunghoon Hong
Bohyung Han
It is used encryption
and decryption
neural network like
FCN.
It used FCN with addition
of future learning on
Deconvolutional Layer

Project objective
• Implement a serial algorithm of the convolutional neural network
for Large data set(Satellite).
• Calculate performance of the program.
• Parallelize the code using OpenMP & MPI library.
• Apply this algorithm on Satellite data set.
• Calculate performance of the program.

Region wise selection
Region wise selection

Complexity of Region wise selection
Total of weight =
Total of column wish weight =
Total of row wish weight =
•Region1 complexity =
•Region2 complexity=
•Region 4 complexity=

Complexity of Region wise selection
• In this algorithm Total complexity =
or
or
• LeNet Total complexity =
or

Proposed Architecture
Serial Computation
Pixel wise
Convolutional
Neural Network
Output in form of
score, Total
Execution time
Input Sate light Data
set
Parallel Computation
Pixel wise
Convolutional
Neural Network
Output in form of
score, Total
Execution time
set
Parallel Computation with optimize Algorithm
Pixel wise
Convolutional
Neural Network
Output in form of
score, Total
Execution time
set
Performance
Comparison
Proposed Architecture

Implementation
Serial Cod Output

MPI-OpenMP Implementation
MPI-OpenMP Implementation and sliding the image into parts

Implementation Input output
Input Output
•Dstl Satellite data set
 Tree band image 12GB
 RGB(11bit,3sample)
 Sixteen band (7.30gb)
 Panchromatic(11bit,1sample, 0.31m)
 Multispectral(11bit,8sample,1.24 m)
 SWIR(11bit,8sample, 7.5m)
 Grid sizes(7.17kb)
 Train level (11.08mb)
Optimize the result.
Show the output value.

Memory Usage of MPI-OpenMP code with respect to
Serial Code
Memory Usage of MPI-OpenMP code with respect to Serial Code

I/O waiting time of serial and parallel code
I/O waiting time of serial and parallel code

Memory usage with respect to the no of threads
Memory usage with respect to the no of threads with
fixed size Image

Conclusion
• Two popular algorithm were study .
• Most poplar one is the convolution neural network .
• Serial algorithm for the convolution neural network for large
Satellite dataset is Implemented.
• Parallel algorithm for Satellite dataset is implemented.
• Performance of the program is calculated.

References
[1] Chun-Fu (Richard) Chen , Gwo Giun (Chris) Lee , Yinglong Xia ,W. Sabrina Lin ,
Toyotaro Suzumura , Ching-Yung Lin ,”Efficient Multi-Training Framework of Image
Deep Learning on GPU Cluster”,978-1-5090-0379-2/15 $31.00 © 2015 IEEE
[2] Ming Chen, Lu Zhang, Jan P. Allebach ,”LEARNING DEEP FEATURES FOR
IMAGE EMOTION CLASSIFICATION”,978-1-4799-8339-1/15/$31.00 ©2015 IEEE
[3] Zhilu Chen , Jing Wang , Haibo He , Xinming Huang ,”A Fast Deep Learning
System Using GPU”,978-1-4799-3432-4/14/$31.00 ©2014 IEEE
[4] Bonaventura Del Monte, Radu Prodan ,”A Scalable GPU-enabled Framework for
Training Deep Neural Networks”,978-1-4673-6615-1/16/$31.00 ©2016 IEEE
[5] Teng Li, Yong Dou, lingfei liang, Yueqing Wang, Qi Lv,” Optimized Deep Belief
Networks on CUDA GPUs”,978-1-4799-1959-8/15/$31.00 ©2015 IEEE
[6] Salima Hassairi, Ridha Ejbali and Mourad Zaied,”Supervised image classification
using Deep Convolutional Wavelets Network”,1082-3409/15 $31.00 © 2015 IEEE
[7] Zhan Wu , Min Peng , Tong Chen,”Thermal Face Recognition Using Convolutional
Neural Network”,978-1-5090-0880-3/16/$31.00 ©20 16 IEEE
[8] http://cdac.in/index.aspx?id=ev_hpc_hypack13_about_downloads
[9] Alex Krizhevsky,Ilya Sutskever,Geoffrey E. Hinton ,” ImageNet Classification with
Deep Convolutional Neural Networks”
[10] Tianyi Liu, Shuangsang Fang, Yuehui Zhao, Peng Wang, Jun
Zhang,”Implementation of Training Convolutional Neural Networks”
[11] Abu Asaduzzaman,Angel Martinez,Aras Sepehri,”A time-efficient image
processing algorithm for multicore/manycore parallel computing”,978-1-4673-7300-
5 ,IEEE 2015

References
[12] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document
recognition,” Proceedings of the IEEE,1998
[13] Heng Gao, Jie Tang, Gangshan Wu, State Key Laboratory for Novel Software Technology,Department of
Computer Science And Technology,Nanjing University Nanjing, China,A MapReduce Computing
Framework Based on GPU Cluster,2013 IEEE International Conference on High Performance Computing
and Communications & 2013 IEEE International Conference on Embedded and Ubiquitous Computing
[14] He Ma,School of Engineering, University of Guelph,Fei Mao,SHARCNET, Compute Canada,and Graham
W, School of Engineering, University of Guelph,. Taylor,Theano-MPI:a Theano-based Distributed Training
Framework
[15] http://cs231n.github.io
[16] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural
networks,” in NIPS, 2012.
[17] https://en.wikipedia.org/wiki/Image_analysis
[18] http://www.ida.liu.se/~746A27/Literature/Image%20Processing%20and%20Analysis.pdf
[19] Hui Wu, Hui Zhang, Jinfang Zhang, FanjiangXu,Institute of Software Chinese Academy of Sciences,
China,"FAST AIRCRAFT DETECTION IN SATELLITE IMAGES BASED ON CONVOLUTIONAL NEURAL
NETWORKS",978-1-4799-8339-1/15/$31.00 ©2015 IEEE
[20] Vijay Badrinarayanan, Alex Kendall, Roberto Cipolla,SegNet, arXiv:1511.00561v3 [cs.CV] 10 Oct 2016
[21] Fabian Tschopp, Julien N. P. Martel, Srinivas C. Turaga, Matthew Cook, Jan Funke, Efficient
Convolutional Neural Network for pixel-wish Classification on Heterogeneous HardwareSystem,
arXiv:1509.03371v1 [cs.CV] 11 Sep 2015
[22] Hyeonwoo Noh Seunghoon Hong Bohyung Han ,Learning Deconvolution Network for Semantic
Segmentation, arXiv:1505.04366v1 [cs.CV] 17 May 2015

Dp2 ppt by_bikramjit_chowdhury_final

More Related Content

What's hot

Similar to Dp2 ppt by_bikramjit_chowdhury_final

Recently uploaded

Dp2 ppt by_bikramjit_chowdhury_final

Editor's Notes