SlideShare a Scribd company logo
1
DIPARTIMENTO DI ELETTRONICA,
INFORMAZIONE E BIOINGEGNERIA
Automated Design Space Exploration and Roofline
Analysis for FPGA-based HLS Applications
Marco Siracusa: marco.siracusa@mail.polimi.it
Marco Rabozzi: marco.rabozzi@polimi.it
Lorenzo Di Tucci: lorenzo.ditucci@polimi.it
Marco Santambrogio: marco.santambrogio@polimi.it
May 17-30th, 2019
NGCX, San Francisco (CA)
2
Context definition
Field-Programmable Gate Arrays (FPGAs) are an appealing solution
to overcome in a power efficient way the ever increasing computing
demand of HPC applications by several fields.
Bioinformatics Deep learningFinance
3
Problem definition
Field-Programmable Gate Arrays (FPGAs) are an appealing solution
to overcome in a power efficient way the ever increasing computing
demand of HPC applications by several fields.
Bioinformatics Deep learningFinance
However, the complex FPGA design flow and programmability limit
the widespread adoption of FPGAs as hardware accelerator.
4
Problem definition
HLS (High-Level Synthesis):
Code translation from C/C++ to an HDL
Hardware Synthesis:
Bitstream generation for target device
Test on FPGA:
Results validation on the target device
Source code HLS HW Synthesis Test on FPGA
Performance met?
Functions optimization flow
Design Space Exploration
HLS estimations
Optimization
directives insertion
O
direc
Roofline model
generation
HW description
C / C++
function
Manual code restructuring
C/C++
source
Functions optimization flow
Design Space Exploration
HLS estimations
Optimization
directives insertion
Op
directi
Roofline model
generation
HW description
C / C++
function
Manual code restructuring
Functions optimization flow
Design Space Explorat
Optimization
directives insertio
Roofline model
generation
HW description
C / C++
function
Manual co
FPGA
bitstream
HDL
code
5
Problem definition
HLS (High-Level Synthesis):
Code translation from C/C++ to an HDL
Hardware Synthesis:
Bitstream generation for target device
Test on FPGA:
Results validation on the target device
From minutes to hours
From hours to a few days
Few hours
Source code HLS HW Synthesis Test on FPGA
Performance met?
Functions optimization flow
Design Space Exploration
HLS estimations
Optimization
directives insertion
O
direc
Roofline model
generation
HW description
C / C++
function
Manual code restructuring
C/C++
source
Functions optimization flow
Design Space Exploration
HLS estimations
Optimization
directives insertion
Op
directi
Roofline model
generation
HW description
C / C++
function
Manual code restructuring
Functions optimization flow
Design Space Explorat
Optimization
directives insertio
Roofline model
generation
HW description
C / C++
function
Manual co
FPGA
bitstream
HDL
code
6
Proposed approach
We propose a framework iteratively leading the user toward the
optimal HLS code while considering
• Memory transfer constraints by means of roofline model analysis
• Computational bottlenecks through automated DSE
Source code HLS HW Synthesis Test on FPGAAutomatic
roofline analysis &
automatic DSE
Functions optimization flow
Design Space Exploration
HLS estimations
Optimization
directives insertion
Optimizatio
directives selec
Roofline model
generation
HW description
C / C++
function
Manual code restructuring
C/C++
source
Functions optimization flow
Design Space Exploration
HLS estim
Optimization
directives insertion
Roofline model
generation
HW description
C / C++
function
Manual code restructuring
optimized
HLS
code
7
The N-Body test case
N-Body physics simulation:
• Compute intensive application applied in
several scientific domains (astrophysics,
molecular dynamics)
• Simulate the evolution of a system of N
physical bodies (such as astrophysical
object) under the presence of a pairwise
force between such bodies (e.g. gravity)
F1,2
F1,3
F2,
1
F3,1
F2,3
F3,2
8
Roofline model generation
attainable performance
ceiling 2 DDR ports
ceiling 1 DDR port
Performance[pairs/s]
106
107
108
109
1010
Operational intensity [pairs/B]
10−2
10−1
1 101
102
103
9
Baseline operational intensity
attainable performance
ceiling 2 DDR ports
ceiling 1 DDR port
baseline operational intensity
Performance[pairs/s]
106
107
108
109
1010
Operational intensity [pairs/B]
10−2
10−1
1 101
102
103
10
Baseline performance estimation
attainable performance
ceiling 2 DDR ports
ceiling 1 DDR port
baseline operational intensity
baseline performance (estimation)
Performance[pairs/s]
106
107
108
109
1010
Operational intensity [pairs/B]
10−2
10−1
1 101
102
103
11
Cached version operational intensity
attainable performance
ceiling 2 DDR ports
ceiling 1 DDR port
baseline operational intensity
baseline performance (estimation)
optimized operational intensity
Performance[pairs/s]
106
107
108
109
1010
Operational intensity [pairs/B]
10−2
10−1
1 101
102
103
12
Cached version DSE output
attainable performance
ceiling 2 DDR ports
ceiling 1 DDR port
baseline operational intensity
baseline performance (estimation)
optimized operational intensity
optimized performance (estimation)
Performance[pairs/s]
106
107
108
109
1010
Operational intensity [pairs/B]
10−2
10−1
1 101
102
103
13
Cached version AWS testing
attainable performance
ceiling 2 DDR ports
ceiling 1 DDR port
baseline operational intensity
baseline performance (estimation)
optimized operational intensity
optimized performance (estimation)
optimized performance (real)
Performance[pairs/s]
106
107
108
109
1010
Operational intensity [pairs/B]
10−2
10−1
1 101
102
103
14
Conclusions
We presented a framework leading the designer toward the
optimal solution relying on
• roofline model analysis
• automated design space exploration
• fast yet accurate HLS estimations
15
DIPARTIMENTO DI ELETTRONICA,
INFORMAZIONE E BIOINGEGNERIA
Thank you!
Marco Siracusa: marco.siracusa@mail.polimi.it
Marco Rabozzi: marco.rabozzi@polimi.it
Lorenzo Di Tucci: lorenzo.ditucci@polimi.it
Marco Santambrogio: marco.santambrogio@polimi.it
May 17-30th, 2019
NGCX, San Francisco (CA)

More Related Content

What's hot

Varun Gatne - Resume - Final
Varun Gatne - Resume - FinalVarun Gatne - Resume - Final
Varun Gatne - Resume - FinalVarun Gatne
 
2016 IEEE VLSI TITES FROM MSR PROJECTS-9581464142
2016 IEEE VLSI TITES FROM MSR PROJECTS-95814641422016 IEEE VLSI TITES FROM MSR PROJECTS-9581464142
2016 IEEE VLSI TITES FROM MSR PROJECTS-9581464142MSR PROJECTS
 
PyCoRAM: Yet Another Implementation of CoRAM Memory Architecture for Modern F...
PyCoRAM: Yet Another Implementation of CoRAM Memory Architecture for Modern F...PyCoRAM: Yet Another Implementation of CoRAM Memory Architecture for Modern F...
PyCoRAM: Yet Another Implementation of CoRAM Memory Architecture for Modern F...Shinya Takamaeda-Y
 
A High Performance Heterogeneous FPGA-based Accelerator with PyCoRAM (Runner ...
A High Performance Heterogeneous FPGA-based Accelerator with PyCoRAM (Runner ...A High Performance Heterogeneous FPGA-based Accelerator with PyCoRAM (Runner ...
A High Performance Heterogeneous FPGA-based Accelerator with PyCoRAM (Runner ...Shinya Takamaeda-Y
 
On Context-Orientation in Aggregate Programming
On Context-Orientation in Aggregate ProgrammingOn Context-Orientation in Aggregate Programming
On Context-Orientation in Aggregate ProgrammingRoberto Casadei
 
Device Data Directory and Asynchronous execution: A path to heterogeneous com...
Device Data Directory and Asynchronous execution: A path to heterogeneous com...Device Data Directory and Asynchronous execution: A path to heterogeneous com...
Device Data Directory and Asynchronous execution: A path to heterogeneous com...LEGATO project
 
Python Basis Tutorial
Python Basis TutorialPython Basis Tutorial
Python Basis Tutorialmd sathees
 
CEM and Radar Cross Section @ Zeus Numerix
CEM and Radar Cross Section @ Zeus NumerixCEM and Radar Cross Section @ Zeus Numerix
CEM and Radar Cross Section @ Zeus NumerixAbhishek Jain
 
Towards ruby-3x3-performance
Towards ruby-3x3-performanceTowards ruby-3x3-performance
Towards ruby-3x3-performanceVladimir Makarov
 
STKO - A revolutionary toolkit for OpenSees
STKO - A revolutionary toolkit for OpenSeesSTKO - A revolutionary toolkit for OpenSees
STKO - A revolutionary toolkit for OpenSeesopenseesdays
 
Low Energy Task Scheduling based on Work Stealing
Low Energy Task Scheduling based on Work StealingLow Energy Task Scheduling based on Work Stealing
Low Energy Task Scheduling based on Work StealingLEGATO project
 
Full resume dr_russell_john_childs_2013
Full resume dr_russell_john_childs_2013Full resume dr_russell_john_childs_2013
Full resume dr_russell_john_childs_2013Russell Childs
 
A CGRA-based Approach for Accelerating Convolutional Neural Networks
A CGRA-based Approachfor Accelerating Convolutional Neural NetworksA CGRA-based Approachfor Accelerating Convolutional Neural Networks
A CGRA-based Approach for Accelerating Convolutional Neural NetworksShinya Takamaeda-Y
 
RVC: A Multi-Decoder CAL Composer Tool
RVC: A Multi-Decoder CAL Composer ToolRVC: A Multi-Decoder CAL Composer Tool
RVC: A Multi-Decoder CAL Composer ToolMDC_UNICA
 
A Framework for Efficient Rapid Prototyping by Virtually Enlarging FPGA Resou...
A Framework for Efficient Rapid Prototyping by Virtually Enlarging FPGA Resou...A Framework for Efficient Rapid Prototyping by Virtually Enlarging FPGA Resou...
A Framework for Efficient Rapid Prototyping by Virtually Enlarging FPGA Resou...Shinya Takamaeda-Y
 
Extracting a Rails Engine to a separated application
Extracting a Rails Engine to a separated applicationExtracting a Rails Engine to a separated application
Extracting a Rails Engine to a separated applicationJônatas Paganini
 
Elastic stockholm-meetup
Elastic stockholm-meetupElastic stockholm-meetup
Elastic stockholm-meetupAnna Ossowski
 
#OSSPARIS19 - tf-explain: Interpretability for TensorFlow 2.0 - PIERRE-HENRI ...
#OSSPARIS19 - tf-explain: Interpretability for TensorFlow 2.0 - PIERRE-HENRI ...#OSSPARIS19 - tf-explain: Interpretability for TensorFlow 2.0 - PIERRE-HENRI ...
#OSSPARIS19 - tf-explain: Interpretability for TensorFlow 2.0 - PIERRE-HENRI ...Paris Open Source Summit
 

What's hot (20)

Varun Gatne - Resume - Final
Varun Gatne - Resume - FinalVarun Gatne - Resume - Final
Varun Gatne - Resume - Final
 
2016 IEEE VLSI TITES FROM MSR PROJECTS-9581464142
2016 IEEE VLSI TITES FROM MSR PROJECTS-95814641422016 IEEE VLSI TITES FROM MSR PROJECTS-9581464142
2016 IEEE VLSI TITES FROM MSR PROJECTS-9581464142
 
PyCoRAM: Yet Another Implementation of CoRAM Memory Architecture for Modern F...
PyCoRAM: Yet Another Implementation of CoRAM Memory Architecture for Modern F...PyCoRAM: Yet Another Implementation of CoRAM Memory Architecture for Modern F...
PyCoRAM: Yet Another Implementation of CoRAM Memory Architecture for Modern F...
 
0507036
05070360507036
0507036
 
A High Performance Heterogeneous FPGA-based Accelerator with PyCoRAM (Runner ...
A High Performance Heterogeneous FPGA-based Accelerator with PyCoRAM (Runner ...A High Performance Heterogeneous FPGA-based Accelerator with PyCoRAM (Runner ...
A High Performance Heterogeneous FPGA-based Accelerator with PyCoRAM (Runner ...
 
On Context-Orientation in Aggregate Programming
On Context-Orientation in Aggregate ProgrammingOn Context-Orientation in Aggregate Programming
On Context-Orientation in Aggregate Programming
 
Device Data Directory and Asynchronous execution: A path to heterogeneous com...
Device Data Directory and Asynchronous execution: A path to heterogeneous com...Device Data Directory and Asynchronous execution: A path to heterogeneous com...
Device Data Directory and Asynchronous execution: A path to heterogeneous com...
 
Python Basis Tutorial
Python Basis TutorialPython Basis Tutorial
Python Basis Tutorial
 
CEM and Radar Cross Section @ Zeus Numerix
CEM and Radar Cross Section @ Zeus NumerixCEM and Radar Cross Section @ Zeus Numerix
CEM and Radar Cross Section @ Zeus Numerix
 
Towards ruby-3x3-performance
Towards ruby-3x3-performanceTowards ruby-3x3-performance
Towards ruby-3x3-performance
 
STKO - A revolutionary toolkit for OpenSees
STKO - A revolutionary toolkit for OpenSeesSTKO - A revolutionary toolkit for OpenSees
STKO - A revolutionary toolkit for OpenSees
 
Low Energy Task Scheduling based on Work Stealing
Low Energy Task Scheduling based on Work StealingLow Energy Task Scheduling based on Work Stealing
Low Energy Task Scheduling based on Work Stealing
 
Full resume dr_russell_john_childs_2013
Full resume dr_russell_john_childs_2013Full resume dr_russell_john_childs_2013
Full resume dr_russell_john_childs_2013
 
A CGRA-based Approach for Accelerating Convolutional Neural Networks
A CGRA-based Approachfor Accelerating Convolutional Neural NetworksA CGRA-based Approachfor Accelerating Convolutional Neural Networks
A CGRA-based Approach for Accelerating Convolutional Neural Networks
 
RVC: A Multi-Decoder CAL Composer Tool
RVC: A Multi-Decoder CAL Composer ToolRVC: A Multi-Decoder CAL Composer Tool
RVC: A Multi-Decoder CAL Composer Tool
 
A Framework for Efficient Rapid Prototyping by Virtually Enlarging FPGA Resou...
A Framework for Efficient Rapid Prototyping by Virtually Enlarging FPGA Resou...A Framework for Efficient Rapid Prototyping by Virtually Enlarging FPGA Resou...
A Framework for Efficient Rapid Prototyping by Virtually Enlarging FPGA Resou...
 
Cs 75
Cs 75Cs 75
Cs 75
 
Extracting a Rails Engine to a separated application
Extracting a Rails Engine to a separated applicationExtracting a Rails Engine to a separated application
Extracting a Rails Engine to a separated application
 
Elastic stockholm-meetup
Elastic stockholm-meetupElastic stockholm-meetup
Elastic stockholm-meetup
 
#OSSPARIS19 - tf-explain: Interpretability for TensorFlow 2.0 - PIERRE-HENRI ...
#OSSPARIS19 - tf-explain: Interpretability for TensorFlow 2.0 - PIERRE-HENRI ...#OSSPARIS19 - tf-explain: Interpretability for TensorFlow 2.0 - PIERRE-HENRI ...
#OSSPARIS19 - tf-explain: Interpretability for TensorFlow 2.0 - PIERRE-HENRI ...
 

Similar to Automated Design Space Exploration and Roofline Analysis for FPGA-based HLS Applications

The CAOS framework: Democratize the acceleration of compute intensive applica...
The CAOS framework: Democratize the acceleration of compute intensive applica...The CAOS framework: Democratize the acceleration of compute intensive applica...
The CAOS framework: Democratize the acceleration of compute intensive applica...NECST Lab @ Politecnico di Milano
 
electronics-11-03883.pdf
electronics-11-03883.pdfelectronics-11-03883.pdf
electronics-11-03883.pdfRioCarthiis
 
FPGA-based soft-processors: 6G nodes and post-quantum security in space
 FPGA-based soft-processors: 6G nodes and post-quantum security in space FPGA-based soft-processors: 6G nodes and post-quantum security in space
FPGA-based soft-processors: 6G nodes and post-quantum security in spaceFacultad de Informática UCM
 
hetshah_resume
hetshah_resumehetshah_resume
hetshah_resumehet shah
 
The CAOS framework: democratize the acceleration of compute intensive applica...
The CAOS framework: democratize the acceleration of compute intensive applica...The CAOS framework: democratize the acceleration of compute intensive applica...
The CAOS framework: democratize the acceleration of compute intensive applica...NECST Lab @ Politecnico di Milano
 
40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility
40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility
40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facilityinside-BigData.com
 
Data-Centric Parallel Programming
Data-Centric Parallel ProgrammingData-Centric Parallel Programming
Data-Centric Parallel Programminginside-BigData.com
 
Panel: NRP Science Impacts​
Panel: NRP Science Impacts​Panel: NRP Science Impacts​
Panel: NRP Science Impacts​Larry Smarr
 
11 Synchoricity as the basis for going Beyond Moore
11 Synchoricity as the basis for going Beyond Moore11 Synchoricity as the basis for going Beyond Moore
11 Synchoricity as the basis for going Beyond MooreRCCSRENKEI
 
Real time machine learning proposers day v3
Real time machine learning proposers day v3Real time machine learning proposers day v3
Real time machine learning proposers day v3mustafa sarac
 
Automatic generation of hardware memory architectures for HPC
Automatic generation of hardware memory architectures for HPCAutomatic generation of hardware memory architectures for HPC
Automatic generation of hardware memory architectures for HPCFacultad de Informática UCM
 
05 Preparing for Extreme Geterogeneity in HPC
05 Preparing for Extreme Geterogeneity in HPC05 Preparing for Extreme Geterogeneity in HPC
05 Preparing for Extreme Geterogeneity in HPCRCCSRENKEI
 
Fast Insights to Optimized Vectorization and Memory Using Cache-aware Rooflin...
Fast Insights to Optimized Vectorization and Memory Using Cache-aware Rooflin...Fast Insights to Optimized Vectorization and Memory Using Cache-aware Rooflin...
Fast Insights to Optimized Vectorization and Memory Using Cache-aware Rooflin...Intel® Software
 
Improving Efficiency of Machine Learning Algorithms using HPCC Systems
Improving Efficiency of Machine Learning Algorithms using HPCC SystemsImproving Efficiency of Machine Learning Algorithms using HPCC Systems
Improving Efficiency of Machine Learning Algorithms using HPCC SystemsHPCC Systems
 
An Introduction to Field Programmable Gate Arrays
An Introduction to Field Programmable Gate ArraysAn Introduction to Field Programmable Gate Arrays
An Introduction to Field Programmable Gate ArraysKingshukDas35
 
Aggregate Computing Platforms: Bridging the Gaps
Aggregate Computing Platforms: Bridging the GapsAggregate Computing Platforms: Bridging the Gaps
Aggregate Computing Platforms: Bridging the GapsRoberto Casadei
 
Trends and challenges in IP based SOC design
Trends and challenges in IP based SOC designTrends and challenges in IP based SOC design
Trends and challenges in IP based SOC designAishwaryaRavishankar8
 

Similar to Automated Design Space Exploration and Roofline Analysis for FPGA-based HLS Applications (20)

The CAOS framework: Democratize the acceleration of compute intensive applica...
The CAOS framework: Democratize the acceleration of compute intensive applica...The CAOS framework: Democratize the acceleration of compute intensive applica...
The CAOS framework: Democratize the acceleration of compute intensive applica...
 
electronics-11-03883.pdf
electronics-11-03883.pdfelectronics-11-03883.pdf
electronics-11-03883.pdf
 
FPGA-based soft-processors: 6G nodes and post-quantum security in space
 FPGA-based soft-processors: 6G nodes and post-quantum security in space FPGA-based soft-processors: 6G nodes and post-quantum security in space
FPGA-based soft-processors: 6G nodes and post-quantum security in space
 
hetshah_resume
hetshah_resumehetshah_resume
hetshah_resume
 
The CAOS framework: democratize the acceleration of compute intensive applica...
The CAOS framework: democratize the acceleration of compute intensive applica...The CAOS framework: democratize the acceleration of compute intensive applica...
The CAOS framework: democratize the acceleration of compute intensive applica...
 
Ramesh resume
Ramesh resumeRamesh resume
Ramesh resume
 
40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility
40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility
40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility
 
Data-Centric Parallel Programming
Data-Centric Parallel ProgrammingData-Centric Parallel Programming
Data-Centric Parallel Programming
 
Panel: NRP Science Impacts​
Panel: NRP Science Impacts​Panel: NRP Science Impacts​
Panel: NRP Science Impacts​
 
11 Synchoricity as the basis for going Beyond Moore
11 Synchoricity as the basis for going Beyond Moore11 Synchoricity as the basis for going Beyond Moore
11 Synchoricity as the basis for going Beyond Moore
 
Real time machine learning proposers day v3
Real time machine learning proposers day v3Real time machine learning proposers day v3
Real time machine learning proposers day v3
 
Automatic generation of hardware memory architectures for HPC
Automatic generation of hardware memory architectures for HPCAutomatic generation of hardware memory architectures for HPC
Automatic generation of hardware memory architectures for HPC
 
05 Preparing for Extreme Geterogeneity in HPC
05 Preparing for Extreme Geterogeneity in HPC05 Preparing for Extreme Geterogeneity in HPC
05 Preparing for Extreme Geterogeneity in HPC
 
Fast Insights to Optimized Vectorization and Memory Using Cache-aware Rooflin...
Fast Insights to Optimized Vectorization and Memory Using Cache-aware Rooflin...Fast Insights to Optimized Vectorization and Memory Using Cache-aware Rooflin...
Fast Insights to Optimized Vectorization and Memory Using Cache-aware Rooflin...
 
cc23
cc23cc23
cc23
 
Improving Efficiency of Machine Learning Algorithms using HPCC Systems
Improving Efficiency of Machine Learning Algorithms using HPCC SystemsImproving Efficiency of Machine Learning Algorithms using HPCC Systems
Improving Efficiency of Machine Learning Algorithms using HPCC Systems
 
CASFPGA1.ppt
CASFPGA1.pptCASFPGA1.ppt
CASFPGA1.ppt
 
An Introduction to Field Programmable Gate Arrays
An Introduction to Field Programmable Gate ArraysAn Introduction to Field Programmable Gate Arrays
An Introduction to Field Programmable Gate Arrays
 
Aggregate Computing Platforms: Bridging the Gaps
Aggregate Computing Platforms: Bridging the GapsAggregate Computing Platforms: Bridging the Gaps
Aggregate Computing Platforms: Bridging the Gaps
 
Trends and challenges in IP based SOC design
Trends and challenges in IP based SOC designTrends and challenges in IP based SOC design
Trends and challenges in IP based SOC design
 

More from NECST Lab @ Politecnico di Milano

Embedding based knowledge graph link prediction for drug repurposing
Embedding based knowledge graph link prediction for drug repurposingEmbedding based knowledge graph link prediction for drug repurposing
Embedding based knowledge graph link prediction for drug repurposingNECST Lab @ Politecnico di Milano
 
PLASTER - PYNQ-based abandoned object detection using a map-reduce approach o...
PLASTER - PYNQ-based abandoned object detection using a map-reduce approach o...PLASTER - PYNQ-based abandoned object detection using a map-reduce approach o...
PLASTER - PYNQ-based abandoned object detection using a map-reduce approach o...NECST Lab @ Politecnico di Milano
 
EMPhASIS - An EMbedded Public Attention Stress Identification System
 EMPhASIS - An EMbedded Public Attention Stress Identification System EMPhASIS - An EMbedded Public Attention Stress Identification System
EMPhASIS - An EMbedded Public Attention Stress Identification SystemNECST Lab @ Politecnico di Milano
 
Maeve - Fast genome analysis leveraging exact string matching
Maeve - Fast genome analysis leveraging exact string matchingMaeve - Fast genome analysis leveraging exact string matching
Maeve - Fast genome analysis leveraging exact string matchingNECST Lab @ Politecnico di Milano
 

More from NECST Lab @ Politecnico di Milano (20)

Mesticheria Team - WiiReflex
Mesticheria Team - WiiReflexMesticheria Team - WiiReflex
Mesticheria Team - WiiReflex
 
Punto e virgola Team - Stressometro
Punto e virgola Team - StressometroPunto e virgola Team - Stressometro
Punto e virgola Team - Stressometro
 
BitIt Team - Stay.straight
BitIt Team - Stay.straight BitIt Team - Stay.straight
BitIt Team - Stay.straight
 
BabYodini Team - Talking Gloves
BabYodini Team - Talking GlovesBabYodini Team - Talking Gloves
BabYodini Team - Talking Gloves
 
printf("Nome Squadra"); Team - NeoTon
printf("Nome Squadra"); Team - NeoTonprintf("Nome Squadra"); Team - NeoTon
printf("Nome Squadra"); Team - NeoTon
 
BlackBoard Team - Motion Tracking Platform
BlackBoard Team - Motion Tracking PlatformBlackBoard Team - Motion Tracking Platform
BlackBoard Team - Motion Tracking Platform
 
#include<brain.h> Team - HomeBeatHome
#include<brain.h> Team - HomeBeatHome#include<brain.h> Team - HomeBeatHome
#include<brain.h> Team - HomeBeatHome
 
Flipflops Team - Wave U
Flipflops Team - Wave UFlipflops Team - Wave U
Flipflops Team - Wave U
 
Bug(atta) Team - Little Brother
Bug(atta) Team - Little BrotherBug(atta) Team - Little Brother
Bug(atta) Team - Little Brother
 
#NECSTCamp: come partecipare
#NECSTCamp: come partecipare#NECSTCamp: come partecipare
#NECSTCamp: come partecipare
 
NECSTCamp101@2020.10.1
NECSTCamp101@2020.10.1NECSTCamp101@2020.10.1
NECSTCamp101@2020.10.1
 
NECSTLab101 2020.2021
NECSTLab101 2020.2021NECSTLab101 2020.2021
NECSTLab101 2020.2021
 
TreeHouse, nourish your community
TreeHouse, nourish your communityTreeHouse, nourish your community
TreeHouse, nourish your community
 
TiReX: Tiled Regular eXpressionsmatching architecture
TiReX: Tiled Regular eXpressionsmatching architectureTiReX: Tiled Regular eXpressionsmatching architecture
TiReX: Tiled Regular eXpressionsmatching architecture
 
Embedding based knowledge graph link prediction for drug repurposing
Embedding based knowledge graph link prediction for drug repurposingEmbedding based knowledge graph link prediction for drug repurposing
Embedding based knowledge graph link prediction for drug repurposing
 
PLASTER - PYNQ-based abandoned object detection using a map-reduce approach o...
PLASTER - PYNQ-based abandoned object detection using a map-reduce approach o...PLASTER - PYNQ-based abandoned object detection using a map-reduce approach o...
PLASTER - PYNQ-based abandoned object detection using a map-reduce approach o...
 
EMPhASIS - An EMbedded Public Attention Stress Identification System
 EMPhASIS - An EMbedded Public Attention Stress Identification System EMPhASIS - An EMbedded Public Attention Stress Identification System
EMPhASIS - An EMbedded Public Attention Stress Identification System
 
Luns - Automatic lungs segmentation through neural network
Luns - Automatic lungs segmentation through neural networkLuns - Automatic lungs segmentation through neural network
Luns - Automatic lungs segmentation through neural network
 
BlastFunction: How to combine Serverless and FPGAs
BlastFunction: How to combine Serverless and FPGAsBlastFunction: How to combine Serverless and FPGAs
BlastFunction: How to combine Serverless and FPGAs
 
Maeve - Fast genome analysis leveraging exact string matching
Maeve - Fast genome analysis leveraging exact string matchingMaeve - Fast genome analysis leveraging exact string matching
Maeve - Fast genome analysis leveraging exact string matching
 

Recently uploaded

2024 DevOps Pro Europe - Growing at the edge
2024 DevOps Pro Europe - Growing at the edge2024 DevOps Pro Europe - Growing at the edge
2024 DevOps Pro Europe - Growing at the edgePaco Orozco
 
shape functions of 1D and 2 D rectangular elements.pptx
shape functions of 1D and 2 D rectangular elements.pptxshape functions of 1D and 2 D rectangular elements.pptx
shape functions of 1D and 2 D rectangular elements.pptxVishalDeshpande27
 
Arduino based vehicle speed tracker project
Arduino based vehicle speed tracker projectArduino based vehicle speed tracker project
Arduino based vehicle speed tracker projectRased Khan
 
Event Management System Vb Net Project Report.pdf
Event Management System Vb Net  Project Report.pdfEvent Management System Vb Net  Project Report.pdf
Event Management System Vb Net Project Report.pdfKamal Acharya
 
Online blood donation management system project.pdf
Online blood donation management system project.pdfOnline blood donation management system project.pdf
Online blood donation management system project.pdfKamal Acharya
 
Hall booking system project report .pdf
Hall booking system project report  .pdfHall booking system project report  .pdf
Hall booking system project report .pdfKamal Acharya
 
WATER CRISIS and its solutions-pptx 1234
WATER CRISIS and its solutions-pptx 1234WATER CRISIS and its solutions-pptx 1234
WATER CRISIS and its solutions-pptx 1234AafreenAbuthahir2
 
Courier management system project report.pdf
Courier management system project report.pdfCourier management system project report.pdf
Courier management system project report.pdfKamal Acharya
 
A case study of cinema management system project report..pdf
A case study of cinema management system project report..pdfA case study of cinema management system project report..pdf
A case study of cinema management system project report..pdfKamal Acharya
 
Top 13 Famous Civil Engineering Scientist
Top 13 Famous Civil Engineering ScientistTop 13 Famous Civil Engineering Scientist
Top 13 Famous Civil Engineering Scientistgettygaming1
 
HYDROPOWER - Hydroelectric power generation
HYDROPOWER - Hydroelectric power generationHYDROPOWER - Hydroelectric power generation
HYDROPOWER - Hydroelectric power generationRobbie Edward Sayers
 
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptxCFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptxR&R Consult
 
Scaling in conventional MOSFET for constant electric field and constant voltage
Scaling in conventional MOSFET for constant electric field and constant voltageScaling in conventional MOSFET for constant electric field and constant voltage
Scaling in conventional MOSFET for constant electric field and constant voltageRCC Institute of Information Technology
 
Toll tax management system project report..pdf
Toll tax management system project report..pdfToll tax management system project report..pdf
Toll tax management system project report..pdfKamal Acharya
 
Automobile Management System Project Report.pdf
Automobile Management System Project Report.pdfAutomobile Management System Project Report.pdf
Automobile Management System Project Report.pdfKamal Acharya
 
Natalia Rutkowska - BIM School Course in Kraków
Natalia Rutkowska - BIM School Course in KrakówNatalia Rutkowska - BIM School Course in Kraków
Natalia Rutkowska - BIM School Course in Krakówbim.edu.pl
 
Introduction to Machine Learning Unit-4 Notes for II-II Mechanical Engineering
Introduction to Machine Learning Unit-4 Notes for II-II Mechanical EngineeringIntroduction to Machine Learning Unit-4 Notes for II-II Mechanical Engineering
Introduction to Machine Learning Unit-4 Notes for II-II Mechanical EngineeringC Sai Kiran
 
Immunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary AttacksImmunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary Attacksgerogepatton
 

Recently uploaded (20)

2024 DevOps Pro Europe - Growing at the edge
2024 DevOps Pro Europe - Growing at the edge2024 DevOps Pro Europe - Growing at the edge
2024 DevOps Pro Europe - Growing at the edge
 
shape functions of 1D and 2 D rectangular elements.pptx
shape functions of 1D and 2 D rectangular elements.pptxshape functions of 1D and 2 D rectangular elements.pptx
shape functions of 1D and 2 D rectangular elements.pptx
 
Arduino based vehicle speed tracker project
Arduino based vehicle speed tracker projectArduino based vehicle speed tracker project
Arduino based vehicle speed tracker project
 
Event Management System Vb Net Project Report.pdf
Event Management System Vb Net  Project Report.pdfEvent Management System Vb Net  Project Report.pdf
Event Management System Vb Net Project Report.pdf
 
Online blood donation management system project.pdf
Online blood donation management system project.pdfOnline blood donation management system project.pdf
Online blood donation management system project.pdf
 
Hall booking system project report .pdf
Hall booking system project report  .pdfHall booking system project report  .pdf
Hall booking system project report .pdf
 
WATER CRISIS and its solutions-pptx 1234
WATER CRISIS and its solutions-pptx 1234WATER CRISIS and its solutions-pptx 1234
WATER CRISIS and its solutions-pptx 1234
 
Courier management system project report.pdf
Courier management system project report.pdfCourier management system project report.pdf
Courier management system project report.pdf
 
A case study of cinema management system project report..pdf
A case study of cinema management system project report..pdfA case study of cinema management system project report..pdf
A case study of cinema management system project report..pdf
 
Top 13 Famous Civil Engineering Scientist
Top 13 Famous Civil Engineering ScientistTop 13 Famous Civil Engineering Scientist
Top 13 Famous Civil Engineering Scientist
 
Water Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation and Control Monthly - May 2024.pdfWater Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation and Control Monthly - May 2024.pdf
 
Standard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - NeometrixStandard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - Neometrix
 
HYDROPOWER - Hydroelectric power generation
HYDROPOWER - Hydroelectric power generationHYDROPOWER - Hydroelectric power generation
HYDROPOWER - Hydroelectric power generation
 
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptxCFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
 
Scaling in conventional MOSFET for constant electric field and constant voltage
Scaling in conventional MOSFET for constant electric field and constant voltageScaling in conventional MOSFET for constant electric field and constant voltage
Scaling in conventional MOSFET for constant electric field and constant voltage
 
Toll tax management system project report..pdf
Toll tax management system project report..pdfToll tax management system project report..pdf
Toll tax management system project report..pdf
 
Automobile Management System Project Report.pdf
Automobile Management System Project Report.pdfAutomobile Management System Project Report.pdf
Automobile Management System Project Report.pdf
 
Natalia Rutkowska - BIM School Course in Kraków
Natalia Rutkowska - BIM School Course in KrakówNatalia Rutkowska - BIM School Course in Kraków
Natalia Rutkowska - BIM School Course in Kraków
 
Introduction to Machine Learning Unit-4 Notes for II-II Mechanical Engineering
Introduction to Machine Learning Unit-4 Notes for II-II Mechanical EngineeringIntroduction to Machine Learning Unit-4 Notes for II-II Mechanical Engineering
Introduction to Machine Learning Unit-4 Notes for II-II Mechanical Engineering
 
Immunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary AttacksImmunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary Attacks
 

Automated Design Space Exploration and Roofline Analysis for FPGA-based HLS Applications

  • 1. 1 DIPARTIMENTO DI ELETTRONICA, INFORMAZIONE E BIOINGEGNERIA Automated Design Space Exploration and Roofline Analysis for FPGA-based HLS Applications Marco Siracusa: marco.siracusa@mail.polimi.it Marco Rabozzi: marco.rabozzi@polimi.it Lorenzo Di Tucci: lorenzo.ditucci@polimi.it Marco Santambrogio: marco.santambrogio@polimi.it May 17-30th, 2019 NGCX, San Francisco (CA)
  • 2. 2 Context definition Field-Programmable Gate Arrays (FPGAs) are an appealing solution to overcome in a power efficient way the ever increasing computing demand of HPC applications by several fields. Bioinformatics Deep learningFinance
  • 3. 3 Problem definition Field-Programmable Gate Arrays (FPGAs) are an appealing solution to overcome in a power efficient way the ever increasing computing demand of HPC applications by several fields. Bioinformatics Deep learningFinance However, the complex FPGA design flow and programmability limit the widespread adoption of FPGAs as hardware accelerator.
  • 4. 4 Problem definition HLS (High-Level Synthesis): Code translation from C/C++ to an HDL Hardware Synthesis: Bitstream generation for target device Test on FPGA: Results validation on the target device Source code HLS HW Synthesis Test on FPGA Performance met? Functions optimization flow Design Space Exploration HLS estimations Optimization directives insertion O direc Roofline model generation HW description C / C++ function Manual code restructuring C/C++ source Functions optimization flow Design Space Exploration HLS estimations Optimization directives insertion Op directi Roofline model generation HW description C / C++ function Manual code restructuring Functions optimization flow Design Space Explorat Optimization directives insertio Roofline model generation HW description C / C++ function Manual co FPGA bitstream HDL code
  • 5. 5 Problem definition HLS (High-Level Synthesis): Code translation from C/C++ to an HDL Hardware Synthesis: Bitstream generation for target device Test on FPGA: Results validation on the target device From minutes to hours From hours to a few days Few hours Source code HLS HW Synthesis Test on FPGA Performance met? Functions optimization flow Design Space Exploration HLS estimations Optimization directives insertion O direc Roofline model generation HW description C / C++ function Manual code restructuring C/C++ source Functions optimization flow Design Space Exploration HLS estimations Optimization directives insertion Op directi Roofline model generation HW description C / C++ function Manual code restructuring Functions optimization flow Design Space Explorat Optimization directives insertio Roofline model generation HW description C / C++ function Manual co FPGA bitstream HDL code
  • 6. 6 Proposed approach We propose a framework iteratively leading the user toward the optimal HLS code while considering • Memory transfer constraints by means of roofline model analysis • Computational bottlenecks through automated DSE Source code HLS HW Synthesis Test on FPGAAutomatic roofline analysis & automatic DSE Functions optimization flow Design Space Exploration HLS estimations Optimization directives insertion Optimizatio directives selec Roofline model generation HW description C / C++ function Manual code restructuring C/C++ source Functions optimization flow Design Space Exploration HLS estim Optimization directives insertion Roofline model generation HW description C / C++ function Manual code restructuring optimized HLS code
  • 7. 7 The N-Body test case N-Body physics simulation: • Compute intensive application applied in several scientific domains (astrophysics, molecular dynamics) • Simulate the evolution of a system of N physical bodies (such as astrophysical object) under the presence of a pairwise force between such bodies (e.g. gravity) F1,2 F1,3 F2, 1 F3,1 F2,3 F3,2
  • 8. 8 Roofline model generation attainable performance ceiling 2 DDR ports ceiling 1 DDR port Performance[pairs/s] 106 107 108 109 1010 Operational intensity [pairs/B] 10−2 10−1 1 101 102 103
  • 9. 9 Baseline operational intensity attainable performance ceiling 2 DDR ports ceiling 1 DDR port baseline operational intensity Performance[pairs/s] 106 107 108 109 1010 Operational intensity [pairs/B] 10−2 10−1 1 101 102 103
  • 10. 10 Baseline performance estimation attainable performance ceiling 2 DDR ports ceiling 1 DDR port baseline operational intensity baseline performance (estimation) Performance[pairs/s] 106 107 108 109 1010 Operational intensity [pairs/B] 10−2 10−1 1 101 102 103
  • 11. 11 Cached version operational intensity attainable performance ceiling 2 DDR ports ceiling 1 DDR port baseline operational intensity baseline performance (estimation) optimized operational intensity Performance[pairs/s] 106 107 108 109 1010 Operational intensity [pairs/B] 10−2 10−1 1 101 102 103
  • 12. 12 Cached version DSE output attainable performance ceiling 2 DDR ports ceiling 1 DDR port baseline operational intensity baseline performance (estimation) optimized operational intensity optimized performance (estimation) Performance[pairs/s] 106 107 108 109 1010 Operational intensity [pairs/B] 10−2 10−1 1 101 102 103
  • 13. 13 Cached version AWS testing attainable performance ceiling 2 DDR ports ceiling 1 DDR port baseline operational intensity baseline performance (estimation) optimized operational intensity optimized performance (estimation) optimized performance (real) Performance[pairs/s] 106 107 108 109 1010 Operational intensity [pairs/B] 10−2 10−1 1 101 102 103
  • 14. 14 Conclusions We presented a framework leading the designer toward the optimal solution relying on • roofline model analysis • automated design space exploration • fast yet accurate HLS estimations
  • 15. 15 DIPARTIMENTO DI ELETTRONICA, INFORMAZIONE E BIOINGEGNERIA Thank you! Marco Siracusa: marco.siracusa@mail.polimi.it Marco Rabozzi: marco.rabozzi@polimi.it Lorenzo Di Tucci: lorenzo.ditucci@polimi.it Marco Santambrogio: marco.santambrogio@polimi.it May 17-30th, 2019 NGCX, San Francisco (CA)