Optimize Convolutional Neural
Networks by OpenMP and MPI
Bikramjit Chowdhury
Mtech in HPCS
Vel Tech University
VTP1561
Guided By
Aditya Kumar Sinha
Joint Director, CDAC-Pune,
Mrs. Dr.K.Meena
Associate professor, Vel Tech University
Co-Guided by
Ruchika Vyas
Project Engineer, CDAC-Pune
Index
• Abstract
• Image Processing
• Neural Network
• Convolutional Neural Network
• Region wise selection & Pooling
• Literature survey
• Project objective
• Proposed Architecture
• Workflow
• Implementation Input output
• Conclusion
• References
Abstract
• Neural network is one of the most acceptable methods for image
processing.
• The Convolutional Neural Networks is mainly used to overcome
over fitting problem of neural network.
• OpenMP is used in shared memory approach.
• MPI is used in distributed memory approach.
• This report gives a brief summary of neural network and pixel wish
convolutional neural network.
• Also it is focused on the need of benchmark of OpenMP and Hybrid
implementation of the application pixel wise convolutional neural
network.
Image Analysis
• Image analysis is the extraction of meaningful information from
images
• Application of Image Analysis
– Machine vision
– Face recognition
– Space applications
– Medical image analysis
– Autonomous vehicles
– Defence
Steps of Image Analysis
• Manipulate an image to extract information to help solve a
problem.
– Preprocessing - get rid of unnecessary information
– Data reduction - transform the image to a useable form
– Feature analysis - make inferences about the image
Neural Network
Parameters of Neural Network
•Weight: learning parameter
•Bais: Constant
A basic neural network
Disadvantage of Neural Network
• Over fitting :
• Large amount of computation power:
• Total parameter of a hidden layer in Neural Network:
– (Input size) x (no of future of the hidden layer)
Convolutional Neural Networks
• 1)Convolutional layer
• 2)Pooling Layer
• 3)Fully-Connected
LeNet [12]
Region wise selection & Pooling
Region wise selection Region wise selection
2x2 pooling
OpenMP
• API for share memory parallel application.
• Fork and Join
• Why OpenMP is used in Convolutional Neural Network?
• How OpenMP is used in our model?
Fork-join method
MPI
• Message-passing library interface
MPI-model
Literature survey
Referred Paper Authors/Publication Explanation Conclusion
[1] Efficient
Multitraining
Framework
of Image Deep
Learning on GPU
Cluster
• Chun-Fu (Richard)Chen
• Gwo Giun (Chris) Lee
•Yinglong Xia
•W. Sabrina Lin
•Toyotaro Suzumura
•Ching-Yung Lin
2015, IEEE
Deep learning model
is developed base on
pipelining schema for
image on GPU cluster.
This framework
saves time in
training multiple
models using large
dataset with
complicated
network
[13]A MapReduce
Computing
Framework Based
on GPU Cluster
•Heng Gao
•Jie Tang
•Gangshan Wu
2013 IEEE
It is parallel GPU
programming
framework based on
MapReduce.
In this framework, a
distributed file system
(GlusterFS) is used to
store data distributed.
The dynamic load
balancing was taken
into consideration
more specifically
Literature survey
Referred Paper Authors/Publication Explanation Conclusion
[14] Theano-MPI •He Ma
•Fei Mao,
•Graham W,
arXiv:1605.08325v1
[cs.LG] 26 May 2016
It is a training
framework that can
utilize GPUs across
nodes in a cluster .
It accelerates the training
of deep learning models
based on data parallelism
and parameter exchange
among GPUs is based on
CUDA-aware MPI.
[15]CNNLab
•Chao Wang,
•Yuan Xie
arXiv:1606.06234v1[
cs.LG] 2016
This framework
defined an API-like
library for CNN
operation using GPU
and OpenCL.
Based on the framework
in CNN lab, the tasks can
be distributed to either
GPU and FPGA-based
accelerators.
Literature survey
Referred Paper Authors/Publication Explanation Conclusion
[21]SegNet Vijay
Badrinarayanan,
Alex Kendall,
Roberto Cipolla,
Senior Member of
IEEE
Reduced version of
FCN.
It show algorithm can
improved 10 times higher
speed up in common FCN.
[22]Efficient
Convolutional
Neural Network for
pixel-wish
Classification on
Heterogeneous
HardwareSystem
Fabian Tschopp,
Julien N. P. Martel,
Srinivas C. Turaga,
Matthew Cook, Jan
Funke
It reduce time
complexity by
replacing sliding
window by strided
kernel.
It show the performance
improved by 52 times. It is
also improved in parallel
version of GPU cluster.
Literature survey
Referred Paper Authors/Publication Explanation Conclusion
[23]Learning
Deconvolution
Network for
Semantic
Segmentation
Hyeonwoo Noh
Seunghoon Hong
Bohyung Han
It is used encryption
and decryption
neural network like
FCN.
It used FCN with addition
of future learning on
Deconvolutional Layer
Project objective
• Implement a serial algorithm of the convolutional neural network
for Large data set(Satellite).
• Calculate performance of the program.
• Parallelize the code using OpenMP & MPI library.
• Apply this algorithm on Satellite data set.
• Calculate performance of the program.
Region wise selection
Region wise selection
Complexity of Region wise selection
Total of weight =
Total of column wish weight =
Total of row wish weight =
•Region1 complexity =
•Region2 complexity=
•Region3 complexity=
•Region 4 complexity=
•Region 5 complexity=
•Region 6 complexity=
•Region 7 complexity=
•Region 8 complexity=
•Region9 complexity=
Complexity of Region wise selection
• In this algorithm Total complexity =
or
or
• LeNet Total complexity =
or
Proposed Architecture
Serial Computation
Pixel wise
Convolutional
Neural Network
Output in form of
score, Total
Execution time
Input Sate light Data
set
Parallel Computation
Pixel wise
Convolutional
Neural Network
Output in form of
score, Total
Execution time
Input Sate light Data
set
Parallel Computation with optimize Algorithm
Pixel wise
Convolutional
Neural Network
Output in form of
score, Total
Execution time
Input Sate light Data
set
Performance
Comparison
Proposed Architecture
Workflow
Workflow
Implementation
Serial Cod Output
MPI-OpenMP Implementation
MPI-OpenMP Implementation and sliding the image into parts
Implementation Input output
Input Output
•Dstl Satellite data set
 Tree band image 12GB
 RGB(11bit,3sample)
 Sixteen band (7.30gb)
 Panchromatic(11bit,1sample, 0.31m)
 Multispectral(11bit,8sample,1.24 m)
 SWIR(11bit,8sample, 7.5m)
 Grid sizes(7.17kb)
 Train level (11.08mb)
Optimize the result.
Show the output value.
Memory Usage of MPI-OpenMP code with respect to
Serial Code
Memory Usage of MPI-OpenMP code with respect to Serial Code
I/O waiting time of serial and parallel code
I/O waiting time of serial and parallel code
Memory usage with respect to the no of threads
Memory usage with respect to the no of threads with
fixed size Image
Conclusion
• Two popular algorithm were study .
• Most poplar one is the convolution neural network .
• Serial algorithm for the convolution neural network for large
Satellite dataset is Implemented.
• Parallel algorithm for Satellite dataset is implemented.
• Performance of the program is calculated.
References
[1] Chun-Fu (Richard) Chen , Gwo Giun (Chris) Lee , Yinglong Xia ,W. Sabrina Lin ,
Toyotaro Suzumura , Ching-Yung Lin ,”Efficient Multi-Training Framework of Image
Deep Learning on GPU Cluster”,978-1-5090-0379-2/15 $31.00 © 2015 IEEE
[2] Ming Chen, Lu Zhang, Jan P. Allebach ,”LEARNING DEEP FEATURES FOR
IMAGE EMOTION CLASSIFICATION”,978-1-4799-8339-1/15/$31.00 ©2015 IEEE
[3] Zhilu Chen , Jing Wang , Haibo He , Xinming Huang ,”A Fast Deep Learning
System Using GPU”,978-1-4799-3432-4/14/$31.00 ©2014 IEEE
[4] Bonaventura Del Monte, Radu Prodan ,”A Scalable GPU-enabled Framework for
Training Deep Neural Networks”,978-1-4673-6615-1/16/$31.00 ©2016 IEEE
[5] Teng Li, Yong Dou, lingfei liang, Yueqing Wang, Qi Lv,” Optimized Deep Belief
Networks on CUDA GPUs”,978-1-4799-1959-8/15/$31.00 ©2015 IEEE
[6] Salima Hassairi, Ridha Ejbali and Mourad Zaied,”Supervised image classification
using Deep Convolutional Wavelets Network”,1082-3409/15 $31.00 © 2015 IEEE
[7] Zhan Wu , Min Peng , Tong Chen,”Thermal Face Recognition Using Convolutional
Neural Network”,978-1-5090-0880-3/16/$31.00 ©20 16 IEEE
[8] http://cdac.in/index.aspx?id=ev_hpc_hypack13_about_downloads
[9] Alex Krizhevsky,Ilya Sutskever,Geoffrey E. Hinton ,” ImageNet Classification with
Deep Convolutional Neural Networks”
[10] Tianyi Liu, Shuangsang Fang, Yuehui Zhao, Peng Wang, Jun
Zhang,”Implementation of Training Convolutional Neural Networks”
[11] Abu Asaduzzaman,Angel Martinez,Aras Sepehri,”A time-efficient image
processing algorithm for multicore/manycore parallel computing”,978-1-4673-7300-
5 ,IEEE 2015
References
[12] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document
recognition,” Proceedings of the IEEE,1998
[13] Heng Gao, Jie Tang, Gangshan Wu, State Key Laboratory for Novel Software Technology,Department of
Computer Science And Technology,Nanjing University Nanjing, China,A MapReduce Computing
Framework Based on GPU Cluster,2013 IEEE International Conference on High Performance Computing
and Communications & 2013 IEEE International Conference on Embedded and Ubiquitous Computing
[14] He Ma,School of Engineering, University of Guelph,Fei Mao,SHARCNET, Compute Canada,and Graham
W, School of Engineering, University of Guelph,. Taylor,Theano-MPI:a Theano-based Distributed Training
Framework
[15] http://cs231n.github.io
[16] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural
networks,” in NIPS, 2012.
[17] https://en.wikipedia.org/wiki/Image_analysis
[18] http://www.ida.liu.se/~746A27/Literature/Image%20Processing%20and%20Analysis.pdf
[19] Hui Wu, Hui Zhang, Jinfang Zhang, FanjiangXu,Institute of Software Chinese Academy of Sciences,
China,"FAST AIRCRAFT DETECTION IN SATELLITE IMAGES BASED ON CONVOLUTIONAL NEURAL
NETWORKS",978-1-4799-8339-1/15/$31.00 ©2015 IEEE
[20] Vijay Badrinarayanan, Alex Kendall, Roberto Cipolla,SegNet, arXiv:1511.00561v3 [cs.CV] 10 Oct 2016
[21] Fabian Tschopp, Julien N. P. Martel, Srinivas C. Turaga, Matthew Cook, Jan Funke, Efficient
Convolutional Neural Network for pixel-wish Classification on Heterogeneous HardwareSystem,
arXiv:1509.03371v1 [cs.CV] 11 Sep 2015
[22] Hyeonwoo Noh Seunghoon Hong Bohyung Han ,Learning Deconvolution Network for Semantic
Segmentation, arXiv:1505.04366v1 [cs.CV] 17 May 2015
Thank you

Dp2 ppt by_bikramjit_chowdhury_final

  • 1.
    Optimize Convolutional Neural Networksby OpenMP and MPI Bikramjit Chowdhury Mtech in HPCS Vel Tech University VTP1561 Guided By Aditya Kumar Sinha Joint Director, CDAC-Pune, Mrs. Dr.K.Meena Associate professor, Vel Tech University Co-Guided by Ruchika Vyas Project Engineer, CDAC-Pune
  • 2.
    Index • Abstract • ImageProcessing • Neural Network • Convolutional Neural Network • Region wise selection & Pooling • Literature survey • Project objective • Proposed Architecture • Workflow • Implementation Input output • Conclusion • References
  • 3.
    Abstract • Neural networkis one of the most acceptable methods for image processing. • The Convolutional Neural Networks is mainly used to overcome over fitting problem of neural network. • OpenMP is used in shared memory approach. • MPI is used in distributed memory approach. • This report gives a brief summary of neural network and pixel wish convolutional neural network. • Also it is focused on the need of benchmark of OpenMP and Hybrid implementation of the application pixel wise convolutional neural network.
  • 4.
    Image Analysis • Imageanalysis is the extraction of meaningful information from images • Application of Image Analysis – Machine vision – Face recognition – Space applications – Medical image analysis – Autonomous vehicles – Defence
  • 5.
    Steps of ImageAnalysis • Manipulate an image to extract information to help solve a problem. – Preprocessing - get rid of unnecessary information – Data reduction - transform the image to a useable form – Feature analysis - make inferences about the image
  • 6.
    Neural Network Parameters ofNeural Network •Weight: learning parameter •Bais: Constant A basic neural network
  • 7.
    Disadvantage of NeuralNetwork • Over fitting : • Large amount of computation power: • Total parameter of a hidden layer in Neural Network: – (Input size) x (no of future of the hidden layer)
  • 8.
    Convolutional Neural Networks •1)Convolutional layer • 2)Pooling Layer • 3)Fully-Connected LeNet [12]
  • 9.
    Region wise selection& Pooling Region wise selection Region wise selection 2x2 pooling
  • 10.
    OpenMP • API forshare memory parallel application. • Fork and Join • Why OpenMP is used in Convolutional Neural Network? • How OpenMP is used in our model? Fork-join method
  • 11.
  • 12.
    Literature survey Referred PaperAuthors/Publication Explanation Conclusion [1] Efficient Multitraining Framework of Image Deep Learning on GPU Cluster • Chun-Fu (Richard)Chen • Gwo Giun (Chris) Lee •Yinglong Xia •W. Sabrina Lin •Toyotaro Suzumura •Ching-Yung Lin 2015, IEEE Deep learning model is developed base on pipelining schema for image on GPU cluster. This framework saves time in training multiple models using large dataset with complicated network [13]A MapReduce Computing Framework Based on GPU Cluster •Heng Gao •Jie Tang •Gangshan Wu 2013 IEEE It is parallel GPU programming framework based on MapReduce. In this framework, a distributed file system (GlusterFS) is used to store data distributed. The dynamic load balancing was taken into consideration more specifically
  • 13.
    Literature survey Referred PaperAuthors/Publication Explanation Conclusion [14] Theano-MPI •He Ma •Fei Mao, •Graham W, arXiv:1605.08325v1 [cs.LG] 26 May 2016 It is a training framework that can utilize GPUs across nodes in a cluster . It accelerates the training of deep learning models based on data parallelism and parameter exchange among GPUs is based on CUDA-aware MPI. [15]CNNLab •Chao Wang, •Yuan Xie arXiv:1606.06234v1[ cs.LG] 2016 This framework defined an API-like library for CNN operation using GPU and OpenCL. Based on the framework in CNN lab, the tasks can be distributed to either GPU and FPGA-based accelerators.
  • 14.
    Literature survey Referred PaperAuthors/Publication Explanation Conclusion [21]SegNet Vijay Badrinarayanan, Alex Kendall, Roberto Cipolla, Senior Member of IEEE Reduced version of FCN. It show algorithm can improved 10 times higher speed up in common FCN. [22]Efficient Convolutional Neural Network for pixel-wish Classification on Heterogeneous HardwareSystem Fabian Tschopp, Julien N. P. Martel, Srinivas C. Turaga, Matthew Cook, Jan Funke It reduce time complexity by replacing sliding window by strided kernel. It show the performance improved by 52 times. It is also improved in parallel version of GPU cluster.
  • 15.
    Literature survey Referred PaperAuthors/Publication Explanation Conclusion [23]Learning Deconvolution Network for Semantic Segmentation Hyeonwoo Noh Seunghoon Hong Bohyung Han It is used encryption and decryption neural network like FCN. It used FCN with addition of future learning on Deconvolutional Layer
  • 16.
    Project objective • Implementa serial algorithm of the convolutional neural network for Large data set(Satellite). • Calculate performance of the program. • Parallelize the code using OpenMP & MPI library. • Apply this algorithm on Satellite data set. • Calculate performance of the program.
  • 17.
  • 18.
    Complexity of Regionwise selection Total of weight = Total of column wish weight = Total of row wish weight = •Region1 complexity = •Region2 complexity= •Region3 complexity= •Region 4 complexity= •Region 5 complexity= •Region 6 complexity= •Region 7 complexity= •Region 8 complexity= •Region9 complexity=
  • 19.
    Complexity of Regionwise selection • In this algorithm Total complexity = or or • LeNet Total complexity = or
  • 20.
    Proposed Architecture Serial Computation Pixelwise Convolutional Neural Network Output in form of score, Total Execution time Input Sate light Data set Parallel Computation Pixel wise Convolutional Neural Network Output in form of score, Total Execution time Input Sate light Data set Parallel Computation with optimize Algorithm Pixel wise Convolutional Neural Network Output in form of score, Total Execution time Input Sate light Data set Performance Comparison Proposed Architecture
  • 21.
  • 22.
  • 23.
  • 24.
    Implementation Input output InputOutput •Dstl Satellite data set  Tree band image 12GB  RGB(11bit,3sample)  Sixteen band (7.30gb)  Panchromatic(11bit,1sample, 0.31m)  Multispectral(11bit,8sample,1.24 m)  SWIR(11bit,8sample, 7.5m)  Grid sizes(7.17kb)  Train level (11.08mb) Optimize the result. Show the output value.
  • 25.
    Memory Usage ofMPI-OpenMP code with respect to Serial Code Memory Usage of MPI-OpenMP code with respect to Serial Code
  • 26.
    I/O waiting timeof serial and parallel code I/O waiting time of serial and parallel code
  • 27.
    Memory usage withrespect to the no of threads Memory usage with respect to the no of threads with fixed size Image
  • 28.
    Conclusion • Two popularalgorithm were study . • Most poplar one is the convolution neural network . • Serial algorithm for the convolution neural network for large Satellite dataset is Implemented. • Parallel algorithm for Satellite dataset is implemented. • Performance of the program is calculated.
  • 29.
    References [1] Chun-Fu (Richard)Chen , Gwo Giun (Chris) Lee , Yinglong Xia ,W. Sabrina Lin , Toyotaro Suzumura , Ching-Yung Lin ,”Efficient Multi-Training Framework of Image Deep Learning on GPU Cluster”,978-1-5090-0379-2/15 $31.00 © 2015 IEEE [2] Ming Chen, Lu Zhang, Jan P. Allebach ,”LEARNING DEEP FEATURES FOR IMAGE EMOTION CLASSIFICATION”,978-1-4799-8339-1/15/$31.00 ©2015 IEEE [3] Zhilu Chen , Jing Wang , Haibo He , Xinming Huang ,”A Fast Deep Learning System Using GPU”,978-1-4799-3432-4/14/$31.00 ©2014 IEEE [4] Bonaventura Del Monte, Radu Prodan ,”A Scalable GPU-enabled Framework for Training Deep Neural Networks”,978-1-4673-6615-1/16/$31.00 ©2016 IEEE [5] Teng Li, Yong Dou, lingfei liang, Yueqing Wang, Qi Lv,” Optimized Deep Belief Networks on CUDA GPUs”,978-1-4799-1959-8/15/$31.00 ©2015 IEEE [6] Salima Hassairi, Ridha Ejbali and Mourad Zaied,”Supervised image classification using Deep Convolutional Wavelets Network”,1082-3409/15 $31.00 © 2015 IEEE [7] Zhan Wu , Min Peng , Tong Chen,”Thermal Face Recognition Using Convolutional Neural Network”,978-1-5090-0880-3/16/$31.00 ©20 16 IEEE [8] http://cdac.in/index.aspx?id=ev_hpc_hypack13_about_downloads [9] Alex Krizhevsky,Ilya Sutskever,Geoffrey E. Hinton ,” ImageNet Classification with Deep Convolutional Neural Networks” [10] Tianyi Liu, Shuangsang Fang, Yuehui Zhao, Peng Wang, Jun Zhang,”Implementation of Training Convolutional Neural Networks” [11] Abu Asaduzzaman,Angel Martinez,Aras Sepehri,”A time-efficient image processing algorithm for multicore/manycore parallel computing”,978-1-4673-7300- 5 ,IEEE 2015
  • 30.
    References [12] Y. LeCun,L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,” Proceedings of the IEEE,1998 [13] Heng Gao, Jie Tang, Gangshan Wu, State Key Laboratory for Novel Software Technology,Department of Computer Science And Technology,Nanjing University Nanjing, China,A MapReduce Computing Framework Based on GPU Cluster,2013 IEEE International Conference on High Performance Computing and Communications & 2013 IEEE International Conference on Embedded and Ubiquitous Computing [14] He Ma,School of Engineering, University of Guelph,Fei Mao,SHARCNET, Compute Canada,and Graham W, School of Engineering, University of Guelph,. Taylor,Theano-MPI:a Theano-based Distributed Training Framework [15] http://cs231n.github.io [16] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” in NIPS, 2012. [17] https://en.wikipedia.org/wiki/Image_analysis [18] http://www.ida.liu.se/~746A27/Literature/Image%20Processing%20and%20Analysis.pdf [19] Hui Wu, Hui Zhang, Jinfang Zhang, FanjiangXu,Institute of Software Chinese Academy of Sciences, China,"FAST AIRCRAFT DETECTION IN SATELLITE IMAGES BASED ON CONVOLUTIONAL NEURAL NETWORKS",978-1-4799-8339-1/15/$31.00 ©2015 IEEE [20] Vijay Badrinarayanan, Alex Kendall, Roberto Cipolla,SegNet, arXiv:1511.00561v3 [cs.CV] 10 Oct 2016 [21] Fabian Tschopp, Julien N. P. Martel, Srinivas C. Turaga, Matthew Cook, Jan Funke, Efficient Convolutional Neural Network for pixel-wish Classification on Heterogeneous HardwareSystem, arXiv:1509.03371v1 [cs.CV] 11 Sep 2015 [22] Hyeonwoo Noh Seunghoon Hong Bohyung Han ,Learning Deconvolution Network for Semantic Segmentation, arXiv:1505.04366v1 [cs.CV] 17 May 2015
  • 31.

Editor's Notes

  • #17 Add formula in place of word