SlideShare a Scribd company logo
1 of 23
Politecnico di Milano
Dipartimento di Elettronica, Informazione e Bioingegneria (DEIB)
marco.bacis@mail.polimi.it
Marco Bacis, Giuseppe Natale, Emanuele Del Sozzo,
Marco Domenico Santambrogio
CNN Dataflow implementation on
FPGAs
Oracle HQ
Wednesday, 7th June 2017
Introduction 2
Issues
3
Challenges
Huge set of weights and data
Memory bounded computation
Need to have a scalable design in terms of
memory and resources
without losing in performance
+
=
● Exploitation of the dataflow pattern of CNN operations
● Independent modules with parametric level of parallelism
● Streaming + Dataflow computational paradigm with
efficient memory access
Our Solution 4
Methodology for CNN acceleration on FPGA with
5
Iterative Stencil Loops
Spatial dependencies
Memory bound
Enable efficient solutions in term of
performance and power
6
● Independent modules communicating over FIFOs
● Concurrent memory access and optimal full buffering
● Scalable without increasing external memory use
Streaming StencilTimestep
7
Proposed Approach
Implementation 8
1. Convolution Module Structure
2. Fully Connected Module Structure
3. Network Design
Convolution Module Structure 9
SST
Convolution Module - Parameters 10
• Input/Output Height
• Input/OutputWidth
• Number of Input Feature Maps
• Number of Output Feature Maps
Convolution Module - Parameters 11
• Kernel Height
• KernelWidth
• Number of Input Ports
• Number of Output Ports
# Input FMs received per cycle
# Output FMs sent per cycle
Implementation 12
1. Convolution Module Structure
2. Fully Connected Module Structure
3. Network Design
Fully Connected Module Structure 13
● Treated as a 1x1 convolution
● “Compressed” streaming approac
● 1 input port, 1 output port
● Low latency Floating point accumulatio
● Issue for pipelining
● Multiple accumulators + Loop Unrolling
Implementation 14
1. Convolution Module Structure
2. Fully Connected Module Structure
3. Network Design
Network Design 15
● Convolutional Module
● Memory structure based on I/O ports
● Single vs Multi channel memory cores
● Pooling Module
● Independent from channel
● One module for each previous output port
● Fully-Connected Module -> single pipelined core
Experimental Evaluation 16
● Two evaluation designs
● CIFAR-10 network
Conv -> Pool -> Conv -> Pool -> Lin ->Lin
● USPS network
Conv -> Pool -> Conv -> Lin
● Different design choices as a proof-of-concept of the
methodology
● Tested on a XilinxVC707 board
CIFAR-10 Network 17
5 x 5
3 in FMs
12 out FMs
32 x 32
Conv 1
2 x 2
12 in FMs
12 out FMs
28 x 28
Pool 1
5 x 5
12 in FMs
36 out FMs
14 x 14
Conv 2
2 x 2
36 in FMs
36 out FMs
10 x 10
Pool 2
900 in
36 out
Lin 1
36 in
10 out
Lin 2
USPS Network 18
5 x 5
1 in FMs
6 out FMs
16 x 16
Conv 1
2 x 2
6 in FMs
6 out FMs
12 x 12
Pool 1
5 x 5
6 in FMs
16 out FMs
6 x 6
Conv 2
64 in
10 out
Lin 1
Experimental Results 19
Performance improvements with increased batch size
Experimental Results 20
Dataset GFLOPS GFLOPS/W Images/s
Test Case 1 USPS 5.2 0.25 172414
Test Case 2 CIFAR-10 28.4 1.19 7809
MSR Work [1] CIFAR-10 - - 2318
Flips Flops LUTs BRAM DSP Slices
Test Case 1 41.10% 50.86% 3.50% 55.04%
Test Case 2 61.77% 71.24% 22.82% 74.32%
Performances and Power Efficiency Results
FPGA Resources Usage
[1] K. Ovtcharov et al., “Accelerating deep convolutional neural network using specialized hardware”, Microsoft Research
Whitepaper, 2015
Conclusions 21
● Modular and scalar methodology to accelerate CNNs on
FPGAs using a dataflow approach
● Performance improvement over large batches
● High level pipeline between layers
● Improved memory bandwidth utilization
● High scalability given limited resources
FutureWorks 22
Multi-FPGA / Split layers approach
Automatic DSE / CADTool
Different precision / data type
23
Questions?
Marco Bacis
M. Bacis, G. Natale, E. Del Sozzo, and M. D. Santambrogio
“A Pipelined and Scalable Dataflow Implementation of Convolutional Neural Networks on FPGA”
IPDPS Workshops (RAW), May 2017
M. Bacis, G. Natale, and M. D. Santambrogio
“On how to design dataflow FPGA-based accelerators for Convolutional Neural Networks”
ISVLSI Conference, July 2017 – To Appear
References
marco.bacis@mail.polimi.it

More Related Content

What's hot

Memory efficient implementation of dense nets
Memory efficient implementation of dense netsMemory efficient implementation of dense nets
Memory efficient implementation of dense netsHyungjoo Cho
 
Exploiting rateless codes in cloud storage systems
Exploiting rateless codes in cloud storage systemsExploiting rateless codes in cloud storage systems
Exploiting rateless codes in cloud storage systemsieeepondy
 
クラウド時代の半導体メモリー技術
クラウド時代の半導体メモリー技術クラウド時代の半導体メモリー技術
クラウド時代の半導体メモリー技術Ryousei Takano
 
past-research-on-pc-router
past-research-on-pc-routerpast-research-on-pc-router
past-research-on-pc-routeryasu1976
 
Compositional Analysis for the Multi-Resource Server
Compositional Analysis for the Multi-Resource ServerCompositional Analysis for the Multi-Resource Server
Compositional Analysis for the Multi-Resource ServerEricsson
 

What's hot (6)

Memory efficient implementation of dense nets
Memory efficient implementation of dense netsMemory efficient implementation of dense nets
Memory efficient implementation of dense nets
 
Exploiting rateless codes in cloud storage systems
Exploiting rateless codes in cloud storage systemsExploiting rateless codes in cloud storage systems
Exploiting rateless codes in cloud storage systems
 
クラウド時代の半導体メモリー技術
クラウド時代の半導体メモリー技術クラウド時代の半導体メモリー技術
クラウド時代の半導体メモリー技術
 
past-research-on-pc-router
past-research-on-pc-routerpast-research-on-pc-router
past-research-on-pc-router
 
mTCP使ってみた
mTCP使ってみたmTCP使ってみた
mTCP使ってみた
 
Compositional Analysis for the Multi-Resource Server
Compositional Analysis for the Multi-Resource ServerCompositional Analysis for the Multi-Resource Server
Compositional Analysis for the Multi-Resource Server
 

Similar to CNN Dataflow Implementation on FPGAs

Accelerating Deep Learning Inference 
on Mobile Systems
Accelerating Deep Learning Inference 
on Mobile SystemsAccelerating Deep Learning Inference 
on Mobile Systems
Accelerating Deep Learning Inference 
on Mobile SystemsDarian Frajberg
 
Accumulo and the Convergence of Machine Learning, Big Data, and Supercomputing
Accumulo and the Convergence of Machine Learning, Big Data, and SupercomputingAccumulo and the Convergence of Machine Learning, Big Data, and Supercomputing
Accumulo and the Convergence of Machine Learning, Big Data, and SupercomputingAccumulo Summit
 
Network-aware Data Management for Large Scale Distributed Applications, IBM R...
Network-aware Data Management for Large Scale Distributed Applications, IBM R...Network-aware Data Management for Large Scale Distributed Applications, IBM R...
Network-aware Data Management for Large Scale Distributed Applications, IBM R...balmanme
 
Performance Optimization of CGYRO for Multiscale Turbulence Simulations
Performance Optimization of CGYRO for Multiscale Turbulence SimulationsPerformance Optimization of CGYRO for Multiscale Turbulence Simulations
Performance Optimization of CGYRO for Multiscale Turbulence SimulationsIgor Sfiligoi
 
Analytical Modeling of End-to-End Delay in OpenFlow Based Networks
Analytical Modeling of End-to-End Delay in OpenFlow Based NetworksAnalytical Modeling of End-to-End Delay in OpenFlow Based Networks
Analytical Modeling of End-to-End Delay in OpenFlow Based NetworksAzeem Iqbal
 
Harnessing OpenCL in Modern Coprocessors
Harnessing OpenCL in Modern CoprocessorsHarnessing OpenCL in Modern Coprocessors
Harnessing OpenCL in Modern CoprocessorsUnai Lopez-Novoa
 
Lightweight DNN Processor Design (based on NVDLA)
Lightweight DNN Processor Design (based on NVDLA)Lightweight DNN Processor Design (based on NVDLA)
Lightweight DNN Processor Design (based on NVDLA)Shien-Chun Luo
 
40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility
40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility
40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facilityinside-BigData.com
 
Evaluating UCIe based multi-die SoC to meet timing and power
Evaluating UCIe based multi-die SoC to meet timing and power Evaluating UCIe based multi-die SoC to meet timing and power
Evaluating UCIe based multi-die SoC to meet timing and power Deepak Shankar
 
fccm2015-jain-presentation
fccm2015-jain-presentationfccm2015-jain-presentation
fccm2015-jain-presentationAbhishek Jain
 
High Speed Design Closure Techniques-Balachander Krishnamurthy
High Speed Design Closure Techniques-Balachander KrishnamurthyHigh Speed Design Closure Techniques-Balachander Krishnamurthy
High Speed Design Closure Techniques-Balachander KrishnamurthyMassimo Talia
 
Michael Gschwind, Chip Multiprocessing and the Cell Broadband Engine
Michael Gschwind, Chip Multiprocessing and the Cell Broadband EngineMichael Gschwind, Chip Multiprocessing and the Cell Broadband Engine
Michael Gschwind, Chip Multiprocessing and the Cell Broadband EngineMichael Gschwind
 
Network-aware Data Management for High Throughput Flows Akamai, Cambridge, ...
Network-aware Data Management for High Throughput Flows   Akamai, Cambridge, ...Network-aware Data Management for High Throughput Flows   Akamai, Cambridge, ...
Network-aware Data Management for High Throughput Flows Akamai, Cambridge, ...balmanme
 
数据中心网络研究:机遇与挑战
数据中心网络研究:机遇与挑战数据中心网络研究:机遇与挑战
数据中心网络研究:机遇与挑战Weiwei Fang
 
CFD acceleration with FPGA (byteLAKE's presentation from PPAM 2019)
CFD acceleration with FPGA (byteLAKE's presentation from PPAM 2019)CFD acceleration with FPGA (byteLAKE's presentation from PPAM 2019)
CFD acceleration with FPGA (byteLAKE's presentation from PPAM 2019)byteLAKE
 
Model-driven Network Management
Model-driven Network ManagementModel-driven Network Management
Model-driven Network ManagementAnees Shaikh
 
CNNECST: an FPGA-based approach for the hardware acceleration of Convolutiona...
CNNECST: an FPGA-based approach for the hardware acceleration of Convolutiona...CNNECST: an FPGA-based approach for the hardware acceleration of Convolutiona...
CNNECST: an FPGA-based approach for the hardware acceleration of Convolutiona...NECST Lab @ Politecnico di Milano
 
RT15 Berkeley | ARTEMiS-SSN Features for Micro-grid / Renewable Energy Sourc...
RT15 Berkeley |  ARTEMiS-SSN Features for Micro-grid / Renewable Energy Sourc...RT15 Berkeley |  ARTEMiS-SSN Features for Micro-grid / Renewable Energy Sourc...
RT15 Berkeley | ARTEMiS-SSN Features for Micro-grid / Renewable Energy Sourc...OPAL-RT TECHNOLOGIES
 

Similar to CNN Dataflow Implementation on FPGAs (20)

CNN Dataflow Implementation on FPGAs
CNN Dataflow Implementation on FPGAsCNN Dataflow Implementation on FPGAs
CNN Dataflow Implementation on FPGAs
 
Deep Learning Initiative @ NECSTLab
Deep Learning Initiative @ NECSTLabDeep Learning Initiative @ NECSTLab
Deep Learning Initiative @ NECSTLab
 
Accelerating Deep Learning Inference 
on Mobile Systems
Accelerating Deep Learning Inference 
on Mobile SystemsAccelerating Deep Learning Inference 
on Mobile Systems
Accelerating Deep Learning Inference 
on Mobile Systems
 
Accumulo and the Convergence of Machine Learning, Big Data, and Supercomputing
Accumulo and the Convergence of Machine Learning, Big Data, and SupercomputingAccumulo and the Convergence of Machine Learning, Big Data, and Supercomputing
Accumulo and the Convergence of Machine Learning, Big Data, and Supercomputing
 
Network-aware Data Management for Large Scale Distributed Applications, IBM R...
Network-aware Data Management for Large Scale Distributed Applications, IBM R...Network-aware Data Management for Large Scale Distributed Applications, IBM R...
Network-aware Data Management for Large Scale Distributed Applications, IBM R...
 
Performance Optimization of CGYRO for Multiscale Turbulence Simulations
Performance Optimization of CGYRO for Multiscale Turbulence SimulationsPerformance Optimization of CGYRO for Multiscale Turbulence Simulations
Performance Optimization of CGYRO for Multiscale Turbulence Simulations
 
Analytical Modeling of End-to-End Delay in OpenFlow Based Networks
Analytical Modeling of End-to-End Delay in OpenFlow Based NetworksAnalytical Modeling of End-to-End Delay in OpenFlow Based Networks
Analytical Modeling of End-to-End Delay in OpenFlow Based Networks
 
Harnessing OpenCL in Modern Coprocessors
Harnessing OpenCL in Modern CoprocessorsHarnessing OpenCL in Modern Coprocessors
Harnessing OpenCL in Modern Coprocessors
 
Lightweight DNN Processor Design (based on NVDLA)
Lightweight DNN Processor Design (based on NVDLA)Lightweight DNN Processor Design (based on NVDLA)
Lightweight DNN Processor Design (based on NVDLA)
 
40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility
40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility
40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility
 
Evaluating UCIe based multi-die SoC to meet timing and power
Evaluating UCIe based multi-die SoC to meet timing and power Evaluating UCIe based multi-die SoC to meet timing and power
Evaluating UCIe based multi-die SoC to meet timing and power
 
fccm2015-jain-presentation
fccm2015-jain-presentationfccm2015-jain-presentation
fccm2015-jain-presentation
 
High Speed Design Closure Techniques-Balachander Krishnamurthy
High Speed Design Closure Techniques-Balachander KrishnamurthyHigh Speed Design Closure Techniques-Balachander Krishnamurthy
High Speed Design Closure Techniques-Balachander Krishnamurthy
 
Michael Gschwind, Chip Multiprocessing and the Cell Broadband Engine
Michael Gschwind, Chip Multiprocessing and the Cell Broadband EngineMichael Gschwind, Chip Multiprocessing and the Cell Broadband Engine
Michael Gschwind, Chip Multiprocessing and the Cell Broadband Engine
 
Network-aware Data Management for High Throughput Flows Akamai, Cambridge, ...
Network-aware Data Management for High Throughput Flows   Akamai, Cambridge, ...Network-aware Data Management for High Throughput Flows   Akamai, Cambridge, ...
Network-aware Data Management for High Throughput Flows Akamai, Cambridge, ...
 
数据中心网络研究:机遇与挑战
数据中心网络研究:机遇与挑战数据中心网络研究:机遇与挑战
数据中心网络研究:机遇与挑战
 
CFD acceleration with FPGA (byteLAKE's presentation from PPAM 2019)
CFD acceleration with FPGA (byteLAKE's presentation from PPAM 2019)CFD acceleration with FPGA (byteLAKE's presentation from PPAM 2019)
CFD acceleration with FPGA (byteLAKE's presentation from PPAM 2019)
 
Model-driven Network Management
Model-driven Network ManagementModel-driven Network Management
Model-driven Network Management
 
CNNECST: an FPGA-based approach for the hardware acceleration of Convolutiona...
CNNECST: an FPGA-based approach for the hardware acceleration of Convolutiona...CNNECST: an FPGA-based approach for the hardware acceleration of Convolutiona...
CNNECST: an FPGA-based approach for the hardware acceleration of Convolutiona...
 
RT15 Berkeley | ARTEMiS-SSN Features for Micro-grid / Renewable Energy Sourc...
RT15 Berkeley |  ARTEMiS-SSN Features for Micro-grid / Renewable Energy Sourc...RT15 Berkeley |  ARTEMiS-SSN Features for Micro-grid / Renewable Energy Sourc...
RT15 Berkeley | ARTEMiS-SSN Features for Micro-grid / Renewable Energy Sourc...
 

More from NECST Lab @ Politecnico di Milano

Embedding based knowledge graph link prediction for drug repurposing
Embedding based knowledge graph link prediction for drug repurposingEmbedding based knowledge graph link prediction for drug repurposing
Embedding based knowledge graph link prediction for drug repurposingNECST Lab @ Politecnico di Milano
 
PLASTER - PYNQ-based abandoned object detection using a map-reduce approach o...
PLASTER - PYNQ-based abandoned object detection using a map-reduce approach o...PLASTER - PYNQ-based abandoned object detection using a map-reduce approach o...
PLASTER - PYNQ-based abandoned object detection using a map-reduce approach o...NECST Lab @ Politecnico di Milano
 
EMPhASIS - An EMbedded Public Attention Stress Identification System
 EMPhASIS - An EMbedded Public Attention Stress Identification System EMPhASIS - An EMbedded Public Attention Stress Identification System
EMPhASIS - An EMbedded Public Attention Stress Identification SystemNECST Lab @ Politecnico di Milano
 
Maeve - Fast genome analysis leveraging exact string matching
Maeve - Fast genome analysis leveraging exact string matchingMaeve - Fast genome analysis leveraging exact string matching
Maeve - Fast genome analysis leveraging exact string matchingNECST Lab @ Politecnico di Milano
 

More from NECST Lab @ Politecnico di Milano (20)

Mesticheria Team - WiiReflex
Mesticheria Team - WiiReflexMesticheria Team - WiiReflex
Mesticheria Team - WiiReflex
 
Punto e virgola Team - Stressometro
Punto e virgola Team - StressometroPunto e virgola Team - Stressometro
Punto e virgola Team - Stressometro
 
BitIt Team - Stay.straight
BitIt Team - Stay.straight BitIt Team - Stay.straight
BitIt Team - Stay.straight
 
BabYodini Team - Talking Gloves
BabYodini Team - Talking GlovesBabYodini Team - Talking Gloves
BabYodini Team - Talking Gloves
 
printf("Nome Squadra"); Team - NeoTon
printf("Nome Squadra"); Team - NeoTonprintf("Nome Squadra"); Team - NeoTon
printf("Nome Squadra"); Team - NeoTon
 
BlackBoard Team - Motion Tracking Platform
BlackBoard Team - Motion Tracking PlatformBlackBoard Team - Motion Tracking Platform
BlackBoard Team - Motion Tracking Platform
 
#include<brain.h> Team - HomeBeatHome
#include<brain.h> Team - HomeBeatHome#include<brain.h> Team - HomeBeatHome
#include<brain.h> Team - HomeBeatHome
 
Flipflops Team - Wave U
Flipflops Team - Wave UFlipflops Team - Wave U
Flipflops Team - Wave U
 
Bug(atta) Team - Little Brother
Bug(atta) Team - Little BrotherBug(atta) Team - Little Brother
Bug(atta) Team - Little Brother
 
#NECSTCamp: come partecipare
#NECSTCamp: come partecipare#NECSTCamp: come partecipare
#NECSTCamp: come partecipare
 
NECSTCamp101@2020.10.1
NECSTCamp101@2020.10.1NECSTCamp101@2020.10.1
NECSTCamp101@2020.10.1
 
NECSTLab101 2020.2021
NECSTLab101 2020.2021NECSTLab101 2020.2021
NECSTLab101 2020.2021
 
TreeHouse, nourish your community
TreeHouse, nourish your communityTreeHouse, nourish your community
TreeHouse, nourish your community
 
TiReX: Tiled Regular eXpressionsmatching architecture
TiReX: Tiled Regular eXpressionsmatching architectureTiReX: Tiled Regular eXpressionsmatching architecture
TiReX: Tiled Regular eXpressionsmatching architecture
 
Embedding based knowledge graph link prediction for drug repurposing
Embedding based knowledge graph link prediction for drug repurposingEmbedding based knowledge graph link prediction for drug repurposing
Embedding based knowledge graph link prediction for drug repurposing
 
PLASTER - PYNQ-based abandoned object detection using a map-reduce approach o...
PLASTER - PYNQ-based abandoned object detection using a map-reduce approach o...PLASTER - PYNQ-based abandoned object detection using a map-reduce approach o...
PLASTER - PYNQ-based abandoned object detection using a map-reduce approach o...
 
EMPhASIS - An EMbedded Public Attention Stress Identification System
 EMPhASIS - An EMbedded Public Attention Stress Identification System EMPhASIS - An EMbedded Public Attention Stress Identification System
EMPhASIS - An EMbedded Public Attention Stress Identification System
 
Luns - Automatic lungs segmentation through neural network
Luns - Automatic lungs segmentation through neural networkLuns - Automatic lungs segmentation through neural network
Luns - Automatic lungs segmentation through neural network
 
BlastFunction: How to combine Serverless and FPGAs
BlastFunction: How to combine Serverless and FPGAsBlastFunction: How to combine Serverless and FPGAs
BlastFunction: How to combine Serverless and FPGAs
 
Maeve - Fast genome analysis leveraging exact string matching
Maeve - Fast genome analysis leveraging exact string matchingMaeve - Fast genome analysis leveraging exact string matching
Maeve - Fast genome analysis leveraging exact string matching
 

Recently uploaded

Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxupamatechverse
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxupamatechverse
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdfankushspencer015
 
UNIT-III FMM. DIMENSIONAL ANALYSIS
UNIT-III FMM.        DIMENSIONAL ANALYSISUNIT-III FMM.        DIMENSIONAL ANALYSIS
UNIT-III FMM. DIMENSIONAL ANALYSISrknatarajan
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...ranjana rawat
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingrakeshbaidya232001
 
MANUFACTURING PROCESS-II UNIT-1 THEORY OF METAL CUTTING
MANUFACTURING PROCESS-II UNIT-1 THEORY OF METAL CUTTINGMANUFACTURING PROCESS-II UNIT-1 THEORY OF METAL CUTTING
MANUFACTURING PROCESS-II UNIT-1 THEORY OF METAL CUTTINGSIVASHANKAR N
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Dr.Costas Sachpazis
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxupamatechverse
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 
result management system report for college project
result management system report for college projectresult management system report for college project
result management system report for college projectTonystark477637
 
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...ranjana rawat
 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Bookingdharasingh5698
 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...roncy bisnoi
 
Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxpranjaldaimarysona
 
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...Call Girls in Nagpur High Profile
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 

Recently uploaded (20)

Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptx
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptx
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdf
 
UNIT-III FMM. DIMENSIONAL ANALYSIS
UNIT-III FMM.        DIMENSIONAL ANALYSISUNIT-III FMM.        DIMENSIONAL ANALYSIS
UNIT-III FMM. DIMENSIONAL ANALYSIS
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writing
 
MANUFACTURING PROCESS-II UNIT-1 THEORY OF METAL CUTTING
MANUFACTURING PROCESS-II UNIT-1 THEORY OF METAL CUTTINGMANUFACTURING PROCESS-II UNIT-1 THEORY OF METAL CUTTING
MANUFACTURING PROCESS-II UNIT-1 THEORY OF METAL CUTTING
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
 
Roadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and RoutesRoadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and Routes
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptx
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
 
result management system report for college project
result management system report for college projectresult management system report for college project
result management system report for college project
 
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
 
Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptx
 
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 

CNN Dataflow Implementation on FPGAs

  • 1. Politecnico di Milano Dipartimento di Elettronica, Informazione e Bioingegneria (DEIB) marco.bacis@mail.polimi.it Marco Bacis, Giuseppe Natale, Emanuele Del Sozzo, Marco Domenico Santambrogio CNN Dataflow implementation on FPGAs Oracle HQ Wednesday, 7th June 2017
  • 3. Issues 3 Challenges Huge set of weights and data Memory bounded computation Need to have a scalable design in terms of memory and resources without losing in performance + =
  • 4. ● Exploitation of the dataflow pattern of CNN operations ● Independent modules with parametric level of parallelism ● Streaming + Dataflow computational paradigm with efficient memory access Our Solution 4 Methodology for CNN acceleration on FPGA with
  • 5. 5 Iterative Stencil Loops Spatial dependencies Memory bound Enable efficient solutions in term of performance and power
  • 6. 6 ● Independent modules communicating over FIFOs ● Concurrent memory access and optimal full buffering ● Scalable without increasing external memory use Streaming StencilTimestep
  • 8. Implementation 8 1. Convolution Module Structure 2. Fully Connected Module Structure 3. Network Design
  • 10. Convolution Module - Parameters 10 • Input/Output Height • Input/OutputWidth • Number of Input Feature Maps • Number of Output Feature Maps
  • 11. Convolution Module - Parameters 11 • Kernel Height • KernelWidth • Number of Input Ports • Number of Output Ports # Input FMs received per cycle # Output FMs sent per cycle
  • 12. Implementation 12 1. Convolution Module Structure 2. Fully Connected Module Structure 3. Network Design
  • 13. Fully Connected Module Structure 13 ● Treated as a 1x1 convolution ● “Compressed” streaming approac ● 1 input port, 1 output port ● Low latency Floating point accumulatio ● Issue for pipelining ● Multiple accumulators + Loop Unrolling
  • 14. Implementation 14 1. Convolution Module Structure 2. Fully Connected Module Structure 3. Network Design
  • 15. Network Design 15 ● Convolutional Module ● Memory structure based on I/O ports ● Single vs Multi channel memory cores ● Pooling Module ● Independent from channel ● One module for each previous output port ● Fully-Connected Module -> single pipelined core
  • 16. Experimental Evaluation 16 ● Two evaluation designs ● CIFAR-10 network Conv -> Pool -> Conv -> Pool -> Lin ->Lin ● USPS network Conv -> Pool -> Conv -> Lin ● Different design choices as a proof-of-concept of the methodology ● Tested on a XilinxVC707 board
  • 17. CIFAR-10 Network 17 5 x 5 3 in FMs 12 out FMs 32 x 32 Conv 1 2 x 2 12 in FMs 12 out FMs 28 x 28 Pool 1 5 x 5 12 in FMs 36 out FMs 14 x 14 Conv 2 2 x 2 36 in FMs 36 out FMs 10 x 10 Pool 2 900 in 36 out Lin 1 36 in 10 out Lin 2
  • 18. USPS Network 18 5 x 5 1 in FMs 6 out FMs 16 x 16 Conv 1 2 x 2 6 in FMs 6 out FMs 12 x 12 Pool 1 5 x 5 6 in FMs 16 out FMs 6 x 6 Conv 2 64 in 10 out Lin 1
  • 19. Experimental Results 19 Performance improvements with increased batch size
  • 20. Experimental Results 20 Dataset GFLOPS GFLOPS/W Images/s Test Case 1 USPS 5.2 0.25 172414 Test Case 2 CIFAR-10 28.4 1.19 7809 MSR Work [1] CIFAR-10 - - 2318 Flips Flops LUTs BRAM DSP Slices Test Case 1 41.10% 50.86% 3.50% 55.04% Test Case 2 61.77% 71.24% 22.82% 74.32% Performances and Power Efficiency Results FPGA Resources Usage [1] K. Ovtcharov et al., “Accelerating deep convolutional neural network using specialized hardware”, Microsoft Research Whitepaper, 2015
  • 21. Conclusions 21 ● Modular and scalar methodology to accelerate CNNs on FPGAs using a dataflow approach ● Performance improvement over large batches ● High level pipeline between layers ● Improved memory bandwidth utilization ● High scalability given limited resources
  • 22. FutureWorks 22 Multi-FPGA / Split layers approach Automatic DSE / CADTool Different precision / data type
  • 23. 23 Questions? Marco Bacis M. Bacis, G. Natale, E. Del Sozzo, and M. D. Santambrogio “A Pipelined and Scalable Dataflow Implementation of Convolutional Neural Networks on FPGA” IPDPS Workshops (RAW), May 2017 M. Bacis, G. Natale, and M. D. Santambrogio “On how to design dataflow FPGA-based accelerators for Convolutional Neural Networks” ISVLSI Conference, July 2017 – To Appear References marco.bacis@mail.polimi.it