SlideShare a Scribd company logo
1 of 18
Parallel
Convolutional Neural
Network
Abdullah Khan Zehady
azehady@purdue.edu
Objective
● Parallelizing Convolutional Neural Network
○ OpenMP - multithreaded programming
○ MPI - Distributed memory
● Lenet-5: Handwriting digit recognition with MNIST Data
● Image segmentation (Image Net)
● Object recognition (Hand position recognition)
Example of a Neural Net
Convolutional Neural Net (CNN)
● A special kind of neural networks optimized for 2D pattern
recognition.
● 2D convolution on an input matrix f, kernel matrix g with output
o is defined as
● Subsampling layer contains multiple kernels.
● Each kernel is shifted over valid but disjoint regions of the input
image
● Convolution of the image block and the kernel is computed at
each location.
Convolutional Neural Net (CNN)
● Not fully connected. Follows biological neural connection model for feature
extraction via deep network.
● Convolution + Subsampling via pooling (max, avg, stochastic) + Fully Connected
● Backward propagation through cross-entropy loss optimization
Lenet-5 CNN Layers ● Convolutional Layer (C1)
○ 5 x 5 filter with stride 1
○ kernel amount: 6
○ Image size 32 x 32
○ Feature Map 6x28x28
● Pooling (S2)
○ Smple based
discretization
○ Stride 2
○ Feature Map 6x14x14
● Convolutional Layer (C3)
○ More dense feature
○ Feature Map
16x10x10
● Pooling Layer (S4)
○ Feature Map 16x5x5
● Convolutional Layer (C5)
○ Feature Map 120x1
● Fully Connected Layer (F6)
● Output Label
○ 10 different digits
Convolutional Layer Summary
● Accepts a volume of size W1×H1×D1
● Requires four hyperparameters:
○ Number of filters K,
○ their spatial extent F,
○ the stride S,
○ the amount of zero padding P.
● Produces a volume of size W2×H2×D2 where:
○ W2=(W1−F+2P)/S+1
○ H2=(H1−F+2P)/S+1 (i.e. width and height are computed equally by
symmetry)
○ D2=K
Conv Layers are the BottleNeck
● With parameter sharing, it introduces F⋅F⋅D1 weights per filter, for a total of
(F⋅F⋅D1)⋅K weights and K biases.
● In the output volume, the d-th depth slice (of size W2×H2) is the result of
performing a valid convolution of the d-th filter over the input volume with a
stride of S, and then offset by d-th bias.
● Calculating convolutions takes more than 95% of the total training time
Conv Layers are the BottleNeck
● C1
○ ~4704 iterations
○ 38% time
● S2
○ ~1176 iterations
○ 3% time only
● C3
○ ~1600 iterations
○ 43% time
● S4
○ ~400 iterations
○ 1% time only
● C5
○ ~120 iterations
○ 12% time
● F6
○ 10 iterations
○ 1% time
Parallelizing Convolution
● Mapping Output Pixels to Threads
○ Output in the convolution layer can be computed independently
○ Map each pixel to a thread of a block.
○ Split each image into smaller regions
■ Each region mapped to a separate block.
○ This will not give us much performance gain -- it can even be slower than
the single block solution
■ False sharing.
Parallelizing Convolution
● Shared Memory
○ The ratio between number of arithmetic operations and number of
memory accesses is significantly high.
■ Load image to the shared memory instead of directly accessing
device memory.
○ Workload balancing among threads
■ Number of threads per block K < total number of pixels per image N
● Each thread will load N/K consecutive pixels
Naive Parallel - Batch Processing
● Instead of processing images one by one, group the images into batches of
256 images
● Each image mapped to one block
● With 5 kernels in the convolution layer, each batch convolution with one
kernel is independent from that with another kernel.
○ CPU can immediately continue to process on those output data, thus hiding part of the
memory latency.
● Using GPU can be even faster
Result
# Blk/Img Batch Shared Mem. Time (sec)
Time/Iter.
(sec)
Speedup v.
Seq. Accuracy
4 No No 205.14 3.2 1.3 56.62%
1 No No 148.63 2.1 2.4 47.24%
1 No Yes 123.46 1.75 3.1 55.90%
1 Yes Yes 157.07 0.21 20.4 56.22%
Further Approach: Rehash + Fuse
● Combine the
convolution
layer and
subsampling
layer into one
to further
speed up the
process.
Further Approach: Fused Parallel
● Tapping into the
producer
consumer
relationship
across layers in
Fused Parallel
Implementation.
Distributed Memory - MPI
● Broadcast output
before running next
layer
○ Between S2-C3, S4-C5
● Selective send based
on connectivity
● Couldn’t implement yet-
-next??
Graph using fused (OpenMP)
End

More Related Content

What's hot

Machine Learning: Introduction to Neural Networks
Machine Learning: Introduction to Neural NetworksMachine Learning: Introduction to Neural Networks
Machine Learning: Introduction to Neural NetworksFrancesco Collova'
 
Convolutional neural network
Convolutional neural network Convolutional neural network
Convolutional neural network Yan Xu
 
Transfer Learning and Fine Tuning for Cross Domain Image Classification with ...
Transfer Learning and Fine Tuning for Cross Domain Image Classification with ...Transfer Learning and Fine Tuning for Cross Domain Image Classification with ...
Transfer Learning and Fine Tuning for Cross Domain Image Classification with ...Sujit Pal
 
Incremental Machine Learning.pptx
Incremental Machine Learning.pptxIncremental Machine Learning.pptx
Incremental Machine Learning.pptxSHAILIPATEL19
 
Object classification using CNN & VGG16 Model (Keras and Tensorflow)
Object classification using CNN & VGG16 Model (Keras and Tensorflow) Object classification using CNN & VGG16 Model (Keras and Tensorflow)
Object classification using CNN & VGG16 Model (Keras and Tensorflow) Lalit Jain
 
Autoencoders
AutoencodersAutoencoders
AutoencodersCloudxLab
 
Deep Learning - CNN and RNN
Deep Learning - CNN and RNNDeep Learning - CNN and RNN
Deep Learning - CNN and RNNAshray Bhandare
 
Anomaly Detection using Deep Auto-Encoders
Anomaly Detection using Deep Auto-EncodersAnomaly Detection using Deep Auto-Encoders
Anomaly Detection using Deep Auto-EncodersGianmario Spacagna
 
Introduction to Convolutional Neural Networks
Introduction to Convolutional Neural NetworksIntroduction to Convolutional Neural Networks
Introduction to Convolutional Neural NetworksHannes Hapke
 
Distributed OS - An Introduction
Distributed OS - An IntroductionDistributed OS - An Introduction
Distributed OS - An IntroductionSuhit Kulkarni
 
Convolution Neural Network (CNN)
Convolution Neural Network (CNN)Convolution Neural Network (CNN)
Convolution Neural Network (CNN)Basit Rafiq
 
Mobilenetv1 v2 slide
Mobilenetv1 v2 slideMobilenetv1 v2 slide
Mobilenetv1 v2 slide威智 黃
 
Life-long / Incremental Learning (DLAI D6L1 2017 UPC Deep Learning for Artifi...
Life-long / Incremental Learning (DLAI D6L1 2017 UPC Deep Learning for Artifi...Life-long / Incremental Learning (DLAI D6L1 2017 UPC Deep Learning for Artifi...
Life-long / Incremental Learning (DLAI D6L1 2017 UPC Deep Learning for Artifi...Universitat Politècnica de Catalunya
 
Introduction to Deep Learning
Introduction to Deep LearningIntroduction to Deep Learning
Introduction to Deep LearningOswald Campesato
 
End to-end semi-supervised object detection with soft teacher ver.1.0
End to-end semi-supervised object detection with soft teacher ver.1.0End to-end semi-supervised object detection with soft teacher ver.1.0
End to-end semi-supervised object detection with soft teacher ver.1.0taeseon ryu
 

What's hot (20)

Machine Learning: Introduction to Neural Networks
Machine Learning: Introduction to Neural NetworksMachine Learning: Introduction to Neural Networks
Machine Learning: Introduction to Neural Networks
 
Deep learning
Deep learning Deep learning
Deep learning
 
Convolutional neural network
Convolutional neural network Convolutional neural network
Convolutional neural network
 
Transfer Learning and Fine Tuning for Cross Domain Image Classification with ...
Transfer Learning and Fine Tuning for Cross Domain Image Classification with ...Transfer Learning and Fine Tuning for Cross Domain Image Classification with ...
Transfer Learning and Fine Tuning for Cross Domain Image Classification with ...
 
Bayesian networks
Bayesian networksBayesian networks
Bayesian networks
 
Incremental Machine Learning.pptx
Incremental Machine Learning.pptxIncremental Machine Learning.pptx
Incremental Machine Learning.pptx
 
CNN Tutorial
CNN TutorialCNN Tutorial
CNN Tutorial
 
Dive Deep Into Deep Learning
Dive Deep Into Deep LearningDive Deep Into Deep Learning
Dive Deep Into Deep Learning
 
Object classification using CNN & VGG16 Model (Keras and Tensorflow)
Object classification using CNN & VGG16 Model (Keras and Tensorflow) Object classification using CNN & VGG16 Model (Keras and Tensorflow)
Object classification using CNN & VGG16 Model (Keras and Tensorflow)
 
Autoencoders
AutoencodersAutoencoders
Autoencoders
 
Deep Learning - CNN and RNN
Deep Learning - CNN and RNNDeep Learning - CNN and RNN
Deep Learning - CNN and RNN
 
Anomaly Detection using Deep Auto-Encoders
Anomaly Detection using Deep Auto-EncodersAnomaly Detection using Deep Auto-Encoders
Anomaly Detection using Deep Auto-Encoders
 
Introduction to Convolutional Neural Networks
Introduction to Convolutional Neural NetworksIntroduction to Convolutional Neural Networks
Introduction to Convolutional Neural Networks
 
Transfer Learning
Transfer LearningTransfer Learning
Transfer Learning
 
Distributed OS - An Introduction
Distributed OS - An IntroductionDistributed OS - An Introduction
Distributed OS - An Introduction
 
Convolution Neural Network (CNN)
Convolution Neural Network (CNN)Convolution Neural Network (CNN)
Convolution Neural Network (CNN)
 
Mobilenetv1 v2 slide
Mobilenetv1 v2 slideMobilenetv1 v2 slide
Mobilenetv1 v2 slide
 
Life-long / Incremental Learning (DLAI D6L1 2017 UPC Deep Learning for Artifi...
Life-long / Incremental Learning (DLAI D6L1 2017 UPC Deep Learning for Artifi...Life-long / Incremental Learning (DLAI D6L1 2017 UPC Deep Learning for Artifi...
Life-long / Incremental Learning (DLAI D6L1 2017 UPC Deep Learning for Artifi...
 
Introduction to Deep Learning
Introduction to Deep LearningIntroduction to Deep Learning
Introduction to Deep Learning
 
End to-end semi-supervised object detection with soft teacher ver.1.0
End to-end semi-supervised object detection with soft teacher ver.1.0End to-end semi-supervised object detection with soft teacher ver.1.0
End to-end semi-supervised object detection with soft teacher ver.1.0
 

Similar to Parallel convolutional neural network

convolutional_neural_networks in deep learning
convolutional_neural_networks in deep learningconvolutional_neural_networks in deep learning
convolutional_neural_networks in deep learningssusere5ddd6
 
Efficient Variable Size Template Matching Using Fast Normalized Cross Correla...
Efficient Variable Size Template Matching Using Fast Normalized Cross Correla...Efficient Variable Size Template Matching Using Fast Normalized Cross Correla...
Efficient Variable Size Template Matching Using Fast Normalized Cross Correla...Gurbinder Gill
 
Deep Learning for Computer Vision: Memory usage and computational considerati...
Deep Learning for Computer Vision: Memory usage and computational considerati...Deep Learning for Computer Vision: Memory usage and computational considerati...
Deep Learning for Computer Vision: Memory usage and computational considerati...Universitat Politècnica de Catalunya
 
Accelerated Logistic Regression on GPU(s)
Accelerated Logistic Regression on GPU(s)Accelerated Logistic Regression on GPU(s)
Accelerated Logistic Regression on GPU(s)RAHUL BHOJWANI
 
Neural Network Approximation.pdf
Neural Network Approximation.pdfNeural Network Approximation.pdf
Neural Network Approximation.pdfbvhrs2
 
Introduction to Applied Machine Learning
Introduction to Applied Machine LearningIntroduction to Applied Machine Learning
Introduction to Applied Machine LearningSheilaJimenezMorejon
 
Deep Neural Networks (D1L2 Insight@DCU Machine Learning Workshop 2017)
Deep Neural Networks (D1L2 Insight@DCU Machine Learning Workshop 2017)Deep Neural Networks (D1L2 Insight@DCU Machine Learning Workshop 2017)
Deep Neural Networks (D1L2 Insight@DCU Machine Learning Workshop 2017)Universitat Politècnica de Catalunya
 
Fully Homomorphic Encryption (1).pptx
Fully Homomorphic Encryption (1).pptxFully Homomorphic Encryption (1).pptx
Fully Homomorphic Encryption (1).pptxssuser1716c81
 
Reading: "Pi in the sky: Calculating a record-breaking 31.4 trillion digits o...
Reading: "Pi in the sky: Calculating a record-breaking 31.4 trillion digits o...Reading: "Pi in the sky: Calculating a record-breaking 31.4 trillion digits o...
Reading: "Pi in the sky: Calculating a record-breaking 31.4 trillion digits o...Kento Aoyama
 
Netflix machine learning
Netflix machine learningNetflix machine learning
Netflix machine learningAmer Ather
 
Assignment 1-mtat
Assignment 1-mtatAssignment 1-mtat
Assignment 1-mtatzafargilani
 
Set Transfomer: A Framework for Attention-based Permutaion-Invariant Neural N...
Set Transfomer: A Framework for Attention-based Permutaion-Invariant Neural N...Set Transfomer: A Framework for Attention-based Permutaion-Invariant Neural N...
Set Transfomer: A Framework for Attention-based Permutaion-Invariant Neural N...Thien Q. Tran
 
[EMNLP2017読み会] Efficient Attention using a Fixed-Size Memory Representation
[EMNLP2017読み会] Efficient Attention using a Fixed-Size Memory Representation[EMNLP2017読み会] Efficient Attention using a Fixed-Size Memory Representation
[EMNLP2017読み会] Efficient Attention using a Fixed-Size Memory RepresentationHayahide Yamagishi
 
Performance Analysis of Lattice QCD on GPUs in APGAS Programming Model
Performance Analysis of Lattice QCD on GPUs in APGAS Programming ModelPerformance Analysis of Lattice QCD on GPUs in APGAS Programming Model
Performance Analysis of Lattice QCD on GPUs in APGAS Programming ModelKoichi Shirahata
 
DALL-E.pdf
DALL-E.pdfDALL-E.pdf
DALL-E.pdfdsfajkh
 
Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)Gaurav Mittal
 

Similar to Parallel convolutional neural network (20)

convolutional_neural_networks in deep learning
convolutional_neural_networks in deep learningconvolutional_neural_networks in deep learning
convolutional_neural_networks in deep learning
 
Efficient Variable Size Template Matching Using Fast Normalized Cross Correla...
Efficient Variable Size Template Matching Using Fast Normalized Cross Correla...Efficient Variable Size Template Matching Using Fast Normalized Cross Correla...
Efficient Variable Size Template Matching Using Fast Normalized Cross Correla...
 
Deep Learning for Computer Vision: Memory usage and computational considerati...
Deep Learning for Computer Vision: Memory usage and computational considerati...Deep Learning for Computer Vision: Memory usage and computational considerati...
Deep Learning for Computer Vision: Memory usage and computational considerati...
 
WaveNet
WaveNetWaveNet
WaveNet
 
Accelerated Logistic Regression on GPU(s)
Accelerated Logistic Regression on GPU(s)Accelerated Logistic Regression on GPU(s)
Accelerated Logistic Regression on GPU(s)
 
Neural Network Approximation.pdf
Neural Network Approximation.pdfNeural Network Approximation.pdf
Neural Network Approximation.pdf
 
Introduction to Applied Machine Learning
Introduction to Applied Machine LearningIntroduction to Applied Machine Learning
Introduction to Applied Machine Learning
 
Deep Neural Networks (D1L2 Insight@DCU Machine Learning Workshop 2017)
Deep Neural Networks (D1L2 Insight@DCU Machine Learning Workshop 2017)Deep Neural Networks (D1L2 Insight@DCU Machine Learning Workshop 2017)
Deep Neural Networks (D1L2 Insight@DCU Machine Learning Workshop 2017)
 
Fully Homomorphic Encryption (1).pptx
Fully Homomorphic Encryption (1).pptxFully Homomorphic Encryption (1).pptx
Fully Homomorphic Encryption (1).pptx
 
Reading: "Pi in the sky: Calculating a record-breaking 31.4 trillion digits o...
Reading: "Pi in the sky: Calculating a record-breaking 31.4 trillion digits o...Reading: "Pi in the sky: Calculating a record-breaking 31.4 trillion digits o...
Reading: "Pi in the sky: Calculating a record-breaking 31.4 trillion digits o...
 
Deep Learning
Deep LearningDeep Learning
Deep Learning
 
Deep learning
Deep learningDeep learning
Deep learning
 
Netflix machine learning
Netflix machine learningNetflix machine learning
Netflix machine learning
 
Assignment 1-mtat
Assignment 1-mtatAssignment 1-mtat
Assignment 1-mtat
 
Set Transfomer: A Framework for Attention-based Permutaion-Invariant Neural N...
Set Transfomer: A Framework for Attention-based Permutaion-Invariant Neural N...Set Transfomer: A Framework for Attention-based Permutaion-Invariant Neural N...
Set Transfomer: A Framework for Attention-based Permutaion-Invariant Neural N...
 
[EMNLP2017読み会] Efficient Attention using a Fixed-Size Memory Representation
[EMNLP2017読み会] Efficient Attention using a Fixed-Size Memory Representation[EMNLP2017読み会] Efficient Attention using a Fixed-Size Memory Representation
[EMNLP2017読み会] Efficient Attention using a Fixed-Size Memory Representation
 
An35225228
An35225228An35225228
An35225228
 
Performance Analysis of Lattice QCD on GPUs in APGAS Programming Model
Performance Analysis of Lattice QCD on GPUs in APGAS Programming ModelPerformance Analysis of Lattice QCD on GPUs in APGAS Programming Model
Performance Analysis of Lattice QCD on GPUs in APGAS Programming Model
 
DALL-E.pdf
DALL-E.pdfDALL-E.pdf
DALL-E.pdf
 
Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)
 

More from Abdullah Khan Zehady

Paleo environmental bio-diversity macro-evolutionary data mining and deep lea...
Paleo environmental bio-diversity macro-evolutionary data mining and deep lea...Paleo environmental bio-diversity macro-evolutionary data mining and deep lea...
Paleo environmental bio-diversity macro-evolutionary data mining and deep lea...Abdullah Khan Zehady
 
Data mining and_visualization_of_earth_history_datasets_to_find_cause_effect_...
Data mining and_visualization_of_earth_history_datasets_to_find_cause_effect_...Data mining and_visualization_of_earth_history_datasets_to_find_cause_effect_...
Data mining and_visualization_of_earth_history_datasets_to_find_cause_effect_...Abdullah Khan Zehady
 
Change of Dynasty correlated with Climate across the world
Change of Dynasty correlated with Climate across the worldChange of Dynasty correlated with Climate across the world
Change of Dynasty correlated with Climate across the worldAbdullah Khan Zehady
 
Distributed representation of sentences and documents
Distributed representation of sentences and documentsDistributed representation of sentences and documents
Distributed representation of sentences and documentsAbdullah Khan Zehady
 
How to Create AltCoin(Alternative Cryptocurrency)?
How to Create AltCoin(Alternative Cryptocurrency)?How to Create AltCoin(Alternative Cryptocurrency)?
How to Create AltCoin(Alternative Cryptocurrency)?Abdullah Khan Zehady
 
Applying word vectors sentiment analysis
Applying word vectors sentiment analysisApplying word vectors sentiment analysis
Applying word vectors sentiment analysisAbdullah Khan Zehady
 
Word representations in vector space
Word representations in vector spaceWord representations in vector space
Word representations in vector spaceAbdullah Khan Zehady
 
Masurca genome assembly with super reads
Masurca  genome assembly with super readsMasurca  genome assembly with super reads
Masurca genome assembly with super readsAbdullah Khan Zehady
 
Rudimentary bitcoin network analysis
Rudimentary bitcoin network analysisRudimentary bitcoin network analysis
Rudimentary bitcoin network analysisAbdullah Khan Zehady
 
Bitcoin tech talk @Purdue Bitcoin Club
Bitcoin tech talk @Purdue Bitcoin ClubBitcoin tech talk @Purdue Bitcoin Club
Bitcoin tech talk @Purdue Bitcoin ClubAbdullah Khan Zehady
 

More from Abdullah Khan Zehady (18)

Paleo environmental bio-diversity macro-evolutionary data mining and deep lea...
Paleo environmental bio-diversity macro-evolutionary data mining and deep lea...Paleo environmental bio-diversity macro-evolutionary data mining and deep lea...
Paleo environmental bio-diversity macro-evolutionary data mining and deep lea...
 
Data mining and_visualization_of_earth_history_datasets_to_find_cause_effect_...
Data mining and_visualization_of_earth_history_datasets_to_find_cause_effect_...Data mining and_visualization_of_earth_history_datasets_to_find_cause_effect_...
Data mining and_visualization_of_earth_history_datasets_to_find_cause_effect_...
 
Change of Dynasty correlated with Climate across the world
Change of Dynasty correlated with Climate across the worldChange of Dynasty correlated with Climate across the world
Change of Dynasty correlated with Climate across the world
 
Distributed representation of sentences and documents
Distributed representation of sentences and documentsDistributed representation of sentences and documents
Distributed representation of sentences and documents
 
Tribeflow on bitcoin data
Tribeflow on bitcoin dataTribeflow on bitcoin data
Tribeflow on bitcoin data
 
How to Create AltCoin(Alternative Cryptocurrency)?
How to Create AltCoin(Alternative Cryptocurrency)?How to Create AltCoin(Alternative Cryptocurrency)?
How to Create AltCoin(Alternative Cryptocurrency)?
 
Applying word vectors sentiment analysis
Applying word vectors sentiment analysisApplying word vectors sentiment analysis
Applying word vectors sentiment analysis
 
Word representations in vector space
Word representations in vector spaceWord representations in vector space
Word representations in vector space
 
Masurca genome assembly with super reads
Masurca  genome assembly with super readsMasurca  genome assembly with super reads
Masurca genome assembly with super reads
 
Bitcoin Multisig Transaction
Bitcoin Multisig TransactionBitcoin Multisig Transaction
Bitcoin Multisig Transaction
 
Bitcoin ideas
Bitcoin ideasBitcoin ideas
Bitcoin ideas
 
Bitcoin investments
Bitcoin investmentsBitcoin investments
Bitcoin investments
 
Rudimentary bitcoin network analysis
Rudimentary bitcoin network analysisRudimentary bitcoin network analysis
Rudimentary bitcoin network analysis
 
Rich gets richer-Bitcoin Network
Rich gets richer-Bitcoin NetworkRich gets richer-Bitcoin Network
Rich gets richer-Bitcoin Network
 
Bitcoin tech talk @Purdue Bitcoin Club
Bitcoin tech talk @Purdue Bitcoin ClubBitcoin tech talk @Purdue Bitcoin Club
Bitcoin tech talk @Purdue Bitcoin Club
 
Bitcoin Network Analysis
Bitcoin Network AnalysisBitcoin Network Analysis
Bitcoin Network Analysis
 
Bitcoin & Bitcoin Mining
Bitcoin & Bitcoin MiningBitcoin & Bitcoin Mining
Bitcoin & Bitcoin Mining
 
The true measure of success
The true measure of successThe true measure of success
The true measure of success
 

Recently uploaded

High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escortsranjana rawat
 
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escortsranjana rawat
 
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordAsst.prof M.Gokilavani
 
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Christo Ananth
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performancesivaprakash250
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlysanyuktamishra911
 
UNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular ConduitsUNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular Conduitsrknatarajan
 
Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxpranjaldaimarysona
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130Suhani Kapoor
 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...Call Girls in Nagpur High Profile
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxupamatechverse
 
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Dr.Costas Sachpazis
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )Tsuyoshi Horigome
 
result management system report for college project
result management system report for college projectresult management system report for college project
result management system report for college projectTonystark477637
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...Soham Mondal
 

Recently uploaded (20)

High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
 
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
 
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
 
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performance
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghly
 
UNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular ConduitsUNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular Conduits
 
Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptx
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptx
 
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )
 
result management system report for college project
result management system report for college projectresult management system report for college project
result management system report for college project
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
 

Parallel convolutional neural network

  • 2. Objective ● Parallelizing Convolutional Neural Network ○ OpenMP - multithreaded programming ○ MPI - Distributed memory ● Lenet-5: Handwriting digit recognition with MNIST Data ● Image segmentation (Image Net) ● Object recognition (Hand position recognition)
  • 3. Example of a Neural Net
  • 4. Convolutional Neural Net (CNN) ● A special kind of neural networks optimized for 2D pattern recognition. ● 2D convolution on an input matrix f, kernel matrix g with output o is defined as ● Subsampling layer contains multiple kernels. ● Each kernel is shifted over valid but disjoint regions of the input image ● Convolution of the image block and the kernel is computed at each location.
  • 5. Convolutional Neural Net (CNN) ● Not fully connected. Follows biological neural connection model for feature extraction via deep network. ● Convolution + Subsampling via pooling (max, avg, stochastic) + Fully Connected ● Backward propagation through cross-entropy loss optimization
  • 6. Lenet-5 CNN Layers ● Convolutional Layer (C1) ○ 5 x 5 filter with stride 1 ○ kernel amount: 6 ○ Image size 32 x 32 ○ Feature Map 6x28x28 ● Pooling (S2) ○ Smple based discretization ○ Stride 2 ○ Feature Map 6x14x14 ● Convolutional Layer (C3) ○ More dense feature ○ Feature Map 16x10x10 ● Pooling Layer (S4) ○ Feature Map 16x5x5 ● Convolutional Layer (C5) ○ Feature Map 120x1 ● Fully Connected Layer (F6) ● Output Label ○ 10 different digits
  • 7. Convolutional Layer Summary ● Accepts a volume of size W1×H1×D1 ● Requires four hyperparameters: ○ Number of filters K, ○ their spatial extent F, ○ the stride S, ○ the amount of zero padding P. ● Produces a volume of size W2×H2×D2 where: ○ W2=(W1−F+2P)/S+1 ○ H2=(H1−F+2P)/S+1 (i.e. width and height are computed equally by symmetry) ○ D2=K
  • 8. Conv Layers are the BottleNeck ● With parameter sharing, it introduces F⋅F⋅D1 weights per filter, for a total of (F⋅F⋅D1)⋅K weights and K biases. ● In the output volume, the d-th depth slice (of size W2×H2) is the result of performing a valid convolution of the d-th filter over the input volume with a stride of S, and then offset by d-th bias. ● Calculating convolutions takes more than 95% of the total training time
  • 9. Conv Layers are the BottleNeck ● C1 ○ ~4704 iterations ○ 38% time ● S2 ○ ~1176 iterations ○ 3% time only ● C3 ○ ~1600 iterations ○ 43% time ● S4 ○ ~400 iterations ○ 1% time only ● C5 ○ ~120 iterations ○ 12% time ● F6 ○ 10 iterations ○ 1% time
  • 10. Parallelizing Convolution ● Mapping Output Pixels to Threads ○ Output in the convolution layer can be computed independently ○ Map each pixel to a thread of a block. ○ Split each image into smaller regions ■ Each region mapped to a separate block. ○ This will not give us much performance gain -- it can even be slower than the single block solution ■ False sharing.
  • 11. Parallelizing Convolution ● Shared Memory ○ The ratio between number of arithmetic operations and number of memory accesses is significantly high. ■ Load image to the shared memory instead of directly accessing device memory. ○ Workload balancing among threads ■ Number of threads per block K < total number of pixels per image N ● Each thread will load N/K consecutive pixels
  • 12. Naive Parallel - Batch Processing ● Instead of processing images one by one, group the images into batches of 256 images ● Each image mapped to one block ● With 5 kernels in the convolution layer, each batch convolution with one kernel is independent from that with another kernel. ○ CPU can immediately continue to process on those output data, thus hiding part of the memory latency. ● Using GPU can be even faster
  • 13. Result # Blk/Img Batch Shared Mem. Time (sec) Time/Iter. (sec) Speedup v. Seq. Accuracy 4 No No 205.14 3.2 1.3 56.62% 1 No No 148.63 2.1 2.4 47.24% 1 No Yes 123.46 1.75 3.1 55.90% 1 Yes Yes 157.07 0.21 20.4 56.22%
  • 14. Further Approach: Rehash + Fuse ● Combine the convolution layer and subsampling layer into one to further speed up the process.
  • 15. Further Approach: Fused Parallel ● Tapping into the producer consumer relationship across layers in Fused Parallel Implementation.
  • 16. Distributed Memory - MPI ● Broadcast output before running next layer ○ Between S2-C3, S4-C5 ● Selective send based on connectivity ● Couldn’t implement yet- -next??
  • 17. Graph using fused (OpenMP)
  • 18. End