SlideShare a Scribd company logo
1 of 30
Download to read offline
1 / 30
OXiGen
Dataflow acceleration from C for FPGA
Francesco Peverelli: francesco1.peverelli@mail.polimi.it
Marco Rabozzi: marco.rabozzi@mail.polimi.it
Emanuele Del Sozzo: emanuele.delsozzo@polimi.it
May 21 2018, Kitsilano C
JW Marriott Parq Vancouver
Vancouver, British Columbia CANADA
2 / 30
FPGA with
RTL
Image property of
Design Time
Performance FPGA with
HLS
FPGA with
HLS
FPGA with
RTL
x86
GPU
DSP
x86
DSP
GPU
First working
version
Optimized version
Software project
design time limit
3 / 30
Image property of
Design Time
Performance FPGA with
HLS
FPGA with
HLS
FPGA with
RTL
FPGA with
RTL
x86
GPU
DSP
x86
DSP
GPU
First working
version
Optimized version
Software project
design time limit
4 / 30
Courtesy of: Maxeler Technology
DATAFLOW BACKGROUND
5 / 30
CONTRIBUTIONS
DIRECT TRANSLATION FROM A WIDELY USED
HIGH LEVEL LANGUAGE
AUTOMATIC DESIGN SPACE EXPLORATION
AND OPTIMAL SYSTEM IMPLEMENTATION
AUTOMATIC DESIGN SPACE EXPLORATION
AND OPTIMAL SYSTEM IMPLEMENTATION
IDENTIFICATION AND EXTRACTION
OF DATAFLOW COMPUTATIONS
6 / 30
OXIGEN TOOL OVERVIEW
FRONTEND
HLL source code
Target
function
Target board
specifications
OXIGEN
LLVM IR
BACKEND
BACKEND-SPECIFIC
OPTIMIZED CODE
FPGA bitstream
7 / 30
OXIGEN TOOL OVERVIEW
FRONTEND
HLL source code
Target
function
BACKENDOXIGENFRONTEND
8 / 30
OXIGEN TOOL OVERVIEW
HLL source code
Target
function
LLVM IR
BACKENDOXIGENFRONTEND
9 / 30
OXIGEN TOOL OVERVIEW
LLVM IR
Target board
specifications
RESOURCES AND
PERFORMANCE
OPTIMIZATION
IR PREPROCESSING
FUNCTION
ANALYSIS
DFG
CONSTRUCTION
OPTIMIZATION
CONFIGURATION
INSTRUCTION
COUNT REPORT
BACKEND
TRANSLATION
BACKEND-SPECIFIC
OPTIMIZED CODE
OXIGEN
BACKENDOXIGENFRONTEND
10 / 30
OXIGEN TOOL OVERVIEW
LLVM IR
Target board
specifications
RESOURCES AND
PERFORMANCE
OPTIMIZATION
IR PREPROCESSING
FUNCTION
ANALYSIS
DFG
CONSTRUCTION
OPTIMIZATION
CONFIGURATION
INSTRUCTION
COUNT REPORT
BACKEND
TRANSLATION
Translation flow
Optimization flow
BACKEND-SPECIFIC
OPTIMIZED CODE
BACKENDOXIGENFRONTEND
11 / 30
OXIGEN TOOL OVERVIEW
BACKENDOXIGENFRONTEND
BACKEND-SPECIFIC
OPTIMIZED CODE
FPGA
bitstream
COMMERCIAL
SYNTHESIS TOOLBACKEND
12 / 30
OXIGEN TOOL OVERVIEW
BACKENDOXIGENFRONTEND
BACKEND-SPECIFIC
OPTIMIZED CODE
FPGA
bitstream
HDL GENERATION
IMPLEMENTATION AND
BITSTREAM
GENERATION
13 / 30
DATAFLOW INTERMEDIATE REPRESENTATION
NESTED
LOOP
NODES
LOOP CARRIED
DEPENDENCY
ARC
OUTER LOOPS
void foo(type_1* in_1, type_2* in_2
type_1* out_1, scalar_type_1* v_1){
for(int i = offs; i < I_SIZE – offs_2; i++){
S1: …statements…
for(int j = …; j < 15; j++){
S2: …statements…
}
S3: …statements…
for(int j = … ){
S4: …statements…
}
}
for(int i = offs_3; … ){
S5: …statements…
}
}
14 / 30
OPTIMIZATION OPTIONS
Throughput
Required
resources
Resources driven
design
Throughput driven
design
Unoptimized design
VECTORIZATION
REROLLING
15 / 30
queue
computational
element
input
output
Replicated compute
elements from unrolled
nested loop body
REROLLING
16 / 30
computational
elements
input
output
vectorized input
replicate the
computation across
elements of vectorized
input / output
VECTORIZATION
17 / 30
DESIGN SPACE EXPLORATION
RESOURCE ESTIMATION MODEL
PERFORMANCE MODEL
18 / 30
𝑇 = {𝐷𝑆𝑃, 𝐵𝑅𝐴𝑀, 𝐹𝐹, 𝐿𝑈𝑇}
Nodes number of nodes of type n
Resource consumption of the
node given its implementation
Overall use of resources of
type t ∊ T
Optimization parameters
θ : Target specific configuration (e.g. DSP / LUT balance ...)
RESOURCE ESTIMATION MODEL
𝑟𝑡 = σ 𝑛∊𝑁 𝑐 𝑛,𝑡,θ ∗ 𝑞(𝑓0, … , 𝑓𝑖)
19 / 30
Rerolling factor
Occurrences of operations n
outside nested loops
Iterations of l after
rerolling
Number of
operations of type n
Original iterations
of nested loop l
Occurrences of operations
n in nested loop l
Loops that can be
rerolled
OPERATION COUNT(REROLLING)
20 / 30
Number of
operations of type n
Vectorization factor
Overall
occurrences of n
OPERATION COUNT(VECTORIZATION)
21 / 30
Rerolling performance
Vectorization performance
Output production rate
Maximum output
bandwidth
Maximum input
bandwidth
Input consume rateVectorization factor
Rerolling factor
PERFORMANCE MODEL
𝑝 𝑓0 = min
𝑅 𝑜𝑢𝑡
𝑓𝑜
, 𝐵 𝑜𝑢𝑡 ,
𝑅 𝑜𝑢𝑡
𝑅𝑖𝑛
∙ 𝐵𝑖𝑛
𝑝 𝑓0 = min 𝑓0 ∙ 𝑅 𝑜𝑢𝑡, 𝐵 𝑜𝑢𝑡,
𝑅 𝑜𝑢𝑡
𝑅𝑖𝑛
∙ 𝐵𝑖𝑛
22 / 30
Maximum available
resources of type t
Resources of type t used
Find the configuration of parameters that
maximizes performance
Resource types
available on the board
DESIGN SPACE EXPLORATION
𝑂𝑃𝑇𝐼𝑀𝐼𝑍𝐴𝑇𝐼𝑂𝑁 𝐹𝑈𝑁𝐶𝑇𝐼𝑂𝑁
𝑅𝐸𝑆𝑂𝑈𝑅𝐶𝐸𝑆 𝐶𝑂𝑁𝑆𝑇𝑅𝐴𝐼𝑁𝑇𝑆
𝑎𝑟𝑔𝑚𝑎𝑥 𝑓0
, … , 𝑓 𝑖
,θ 𝑝(𝑓0, … , 𝑓𝑖)
𝑟𝑡 ≤ 𝑀𝑡 ∀ t ∊ T
23 / 30
CASE STUDIES
• Sharpen image filter
• Asian option pricing
APPLICATIONS
• A compute intensive and a data intensive design
• Test diverse optimization options
CHARACTERISTICS
• Test the use of built-in library functions in the target language
24 / 30
• Speedup over software
• Hardware resources utilization
• MaxCompiler
• Galava MAX4 board
• Stratix V FPGA
EXPERIMENTAL SETTING
EVALUATED METRICS
ENVIRONMENT
25 / 30
BEFORE AFTER
SHARPENER IMAGE FILTER
26 / 30
Vector. Factor Speedup
1 4.58
2 8.86
4 15.37
8 15.85
16 15.62
Total Resources
DSP BRAM FF LUT
5.86 9.81 4.85 7.28
11.72 13.57 7.07 10.75
23.44 21.14 11.52 17.55
46.88 36.72 20.43 31.17
93.75 67.29 38.16 58.51
Compared to an Intel® CoreTM i7-6700 single threaded
implementation compiled with gcc 4.4.7 using –O3
SHARPENER IMAGE FILTER
27 / 30
ASIAN OPTION PRICING
28 / 30
Total Resources
DSP BRAM FF LUT
39.06 37.65 21.86 37.15
55.08 42.09 24.47 40.71
29.30 41.50 31.55 52.32
87.11 54.30 29.60 47.73
41.02 59.13 39.51 64.31
46.88 63.96 43.31 70.09
Rerol.
Factor
Speedup
DSP/LUT
Balance
30 15.83 1.0
15 31.22 1.0
10 45.95 0.1
8 56.91 1.0
6 74.35 0.1
5 88.10 0.1
About a day of work vs several weeks*!
*A. M. Nestorov, E. Reggiani, H. Palikareva, P. Burovskiy, T. Becker, and
M. D. Santambrogio, “A scalable dataflow implementation of curran’s approximation algorithm”
ASIAN OPTION PRICING
29 / 30
CONCLUSION AND FUTURE WORK
CONTRIBUTIONS
• Translation of high-level language functions into dataflow
kernels implemented on FPGA
• Design space exploration and automatic design optimization
• Validation of the proposed approach on two different C designs
using MaxCompiler as a backend
FUTURE WORK
• Expansion of the translation capabilities of the tool
• Verification on a broader set of case studies
• Integration of more optimization options and design space
parameters
30 / 30
Thank you!
Francesco Peverelli: francesco1.peverelli@mail.polimi.it
Marco Rabozzi: marco.rabozzi@mail.polimi.it
Emanuele Del Sozzo: emanuele.delsozzo@polimi.it
May 21 2018, Kitsilano C
JW Marriott Parq Vancouver
Vancouver, British Columbia CANADA
https://www.slideshare.net/necstlab
/groups/ReconfigurableArchitectures
Workshop/
https://necst.it/

More Related Content

What's hot

Scheduling in Time-Sensitive Networks (TSN) for Mixed-Criticality Industrial ...
Scheduling in Time-Sensitive Networks (TSN) for Mixed-Criticality Industrial ...Scheduling in Time-Sensitive Networks (TSN) for Mixed-Criticality Industrial ...
Scheduling in Time-Sensitive Networks (TSN) for Mixed-Criticality Industrial ...Voica Gavrilut
 
07 processor basics
07 processor basics07 processor basics
07 processor basicsMurali M
 
Arm tools and roadmap for SVE compiler support
Arm tools and roadmap for SVE compiler supportArm tools and roadmap for SVE compiler support
Arm tools and roadmap for SVE compiler supportLinaro
 
Leakage power optimization for ripple carry adder
Leakage power optimization for ripple carry adder Leakage power optimization for ripple carry adder
Leakage power optimization for ripple carry adder NAVEEN TOKAS
 
ESPM2 2018 - Automatic Generation of High-Order Finite-Difference Code with T...
ESPM2 2018 - Automatic Generation of High-Order Finite-Difference Code with T...ESPM2 2018 - Automatic Generation of High-Order Finite-Difference Code with T...
ESPM2 2018 - Automatic Generation of High-Order Finite-Difference Code with T...Hideyuki Tanaka
 
Surrey dl-4
Surrey dl-4Surrey dl-4
Surrey dl-4ozzie73
 
Learning Erlang (from a Prolog dropout's perspective)
Learning Erlang (from a Prolog dropout's perspective)Learning Erlang (from a Prolog dropout's perspective)
Learning Erlang (from a Prolog dropout's perspective)elliando dias
 
Keynote (Mike Muller) - Is There Anything New in Heterogeneous Computing - by...
Keynote (Mike Muller) - Is There Anything New in Heterogeneous Computing - by...Keynote (Mike Muller) - Is There Anything New in Heterogeneous Computing - by...
Keynote (Mike Muller) - Is There Anything New in Heterogeneous Computing - by...AMD Developer Central
 
CC-4005, Performance analysis of 3D Finite Difference computational stencils ...
CC-4005, Performance analysis of 3D Finite Difference computational stencils ...CC-4005, Performance analysis of 3D Finite Difference computational stencils ...
CC-4005, Performance analysis of 3D Finite Difference computational stencils ...AMD Developer Central
 
Cost optimal algorithm
Cost optimal algorithmCost optimal algorithm
Cost optimal algorithmHeman Pathak
 
Compilation of COSMO for GPU using LLVM
Compilation of COSMO for GPU using LLVMCompilation of COSMO for GPU using LLVM
Compilation of COSMO for GPU using LLVMLinaro
 
Senior Year Seminar
Senior Year Seminar Senior Year Seminar
Senior Year Seminar sandeep900
 
eBPF Tooling and Debugging Infrastructure
eBPF Tooling and Debugging InfrastructureeBPF Tooling and Debugging Infrastructure
eBPF Tooling and Debugging InfrastructureNetronome
 
Cycle’s topological optimizations and the iterative decoding problem on gener...
Cycle’s topological optimizations and the iterative decoding problem on gener...Cycle’s topological optimizations and the iterative decoding problem on gener...
Cycle’s topological optimizations and the iterative decoding problem on gener...Usatyuk Vasiliy
 
Inference accelerators
Inference acceleratorsInference accelerators
Inference acceleratorsDarshanG13
 
Foveated Ray Tracing for VR on Multiple GPUs
Foveated Ray Tracing for VR on Multiple GPUsFoveated Ray Tracing for VR on Multiple GPUs
Foveated Ray Tracing for VR on Multiple GPUsTakahiro Harada
 
Code GPU with CUDA - Device code optimization principle
Code GPU with CUDA - Device code optimization principleCode GPU with CUDA - Device code optimization principle
Code GPU with CUDA - Device code optimization principleMarina Kolpakova
 
Code GPU with CUDA - Optimizing memory and control flow
Code GPU with CUDA - Optimizing memory and control flowCode GPU with CUDA - Optimizing memory and control flow
Code GPU with CUDA - Optimizing memory and control flowMarina Kolpakova
 

What's hot (20)

Scheduling in Time-Sensitive Networks (TSN) for Mixed-Criticality Industrial ...
Scheduling in Time-Sensitive Networks (TSN) for Mixed-Criticality Industrial ...Scheduling in Time-Sensitive Networks (TSN) for Mixed-Criticality Industrial ...
Scheduling in Time-Sensitive Networks (TSN) for Mixed-Criticality Industrial ...
 
07 processor basics
07 processor basics07 processor basics
07 processor basics
 
Arm tools and roadmap for SVE compiler support
Arm tools and roadmap for SVE compiler supportArm tools and roadmap for SVE compiler support
Arm tools and roadmap for SVE compiler support
 
Leakage power optimization for ripple carry adder
Leakage power optimization for ripple carry adder Leakage power optimization for ripple carry adder
Leakage power optimization for ripple carry adder
 
ESPM2 2018 - Automatic Generation of High-Order Finite-Difference Code with T...
ESPM2 2018 - Automatic Generation of High-Order Finite-Difference Code with T...ESPM2 2018 - Automatic Generation of High-Order Finite-Difference Code with T...
ESPM2 2018 - Automatic Generation of High-Order Finite-Difference Code with T...
 
Surrey dl-4
Surrey dl-4Surrey dl-4
Surrey dl-4
 
Learning Erlang (from a Prolog dropout's perspective)
Learning Erlang (from a Prolog dropout's perspective)Learning Erlang (from a Prolog dropout's perspective)
Learning Erlang (from a Prolog dropout's perspective)
 
Keynote (Mike Muller) - Is There Anything New in Heterogeneous Computing - by...
Keynote (Mike Muller) - Is There Anything New in Heterogeneous Computing - by...Keynote (Mike Muller) - Is There Anything New in Heterogeneous Computing - by...
Keynote (Mike Muller) - Is There Anything New in Heterogeneous Computing - by...
 
CC-4005, Performance analysis of 3D Finite Difference computational stencils ...
CC-4005, Performance analysis of 3D Finite Difference computational stencils ...CC-4005, Performance analysis of 3D Finite Difference computational stencils ...
CC-4005, Performance analysis of 3D Finite Difference computational stencils ...
 
Cost optimal algorithm
Cost optimal algorithmCost optimal algorithm
Cost optimal algorithm
 
Compilation of COSMO for GPU using LLVM
Compilation of COSMO for GPU using LLVMCompilation of COSMO for GPU using LLVM
Compilation of COSMO for GPU using LLVM
 
Senior Year Seminar
Senior Year Seminar Senior Year Seminar
Senior Year Seminar
 
eBPF Tooling and Debugging Infrastructure
eBPF Tooling and Debugging InfrastructureeBPF Tooling and Debugging Infrastructure
eBPF Tooling and Debugging Infrastructure
 
PRESTO POWER
PRESTO POWERPRESTO POWER
PRESTO POWER
 
Cycle’s topological optimizations and the iterative decoding problem on gener...
Cycle’s topological optimizations and the iterative decoding problem on gener...Cycle’s topological optimizations and the iterative decoding problem on gener...
Cycle’s topological optimizations and the iterative decoding problem on gener...
 
Inference accelerators
Inference acceleratorsInference accelerators
Inference accelerators
 
Foveated Ray Tracing for VR on Multiple GPUs
Foveated Ray Tracing for VR on Multiple GPUsFoveated Ray Tracing for VR on Multiple GPUs
Foveated Ray Tracing for VR on Multiple GPUs
 
Code GPU with CUDA - Device code optimization principle
Code GPU with CUDA - Device code optimization principleCode GPU with CUDA - Device code optimization principle
Code GPU with CUDA - Device code optimization principle
 
ECE 565 FInal Project
ECE 565 FInal ProjectECE 565 FInal Project
ECE 565 FInal Project
 
Code GPU with CUDA - Optimizing memory and control flow
Code GPU with CUDA - Optimizing memory and control flowCode GPU with CUDA - Optimizing memory and control flow
Code GPU with CUDA - Optimizing memory and control flow
 

Similar to OXiGen Dataflow Acceleration from C for FPGA

International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)ijceronline
 
The CAOS framework: democratize the acceleration of compute intensive applica...
The CAOS framework: democratize the acceleration of compute intensive applica...The CAOS framework: democratize the acceleration of compute intensive applica...
The CAOS framework: democratize the acceleration of compute intensive applica...NECST Lab @ Politecnico di Milano
 
E0364025031
E0364025031E0364025031
E0364025031theijes
 
High Performance DSP with Xilinx All Programmable Devices (Design Conference ...
High Performance DSP with Xilinx All Programmable Devices (Design Conference ...High Performance DSP with Xilinx All Programmable Devices (Design Conference ...
High Performance DSP with Xilinx All Programmable Devices (Design Conference ...Analog Devices, Inc.
 
Klessydra-T: Designing Configurable Vector Co-Processors for Multi-Threaded E...
Klessydra-T: Designing Configurable Vector Co-Processors for Multi-Threaded E...Klessydra-T: Designing Configurable Vector Co-Processors for Multi-Threaded E...
Klessydra-T: Designing Configurable Vector Co-Processors for Multi-Threaded E...RISC-V International
 
Target updated track f
Target updated   track fTarget updated   track f
Target updated track fAlona Gradman
 
Chip Ex2010 Gert Goossens
Chip Ex2010 Gert GoossensChip Ex2010 Gert Goossens
Chip Ex2010 Gert GoossensAlona Gradman
 
Hardware & Software Platforms for HPC, AI and ML
Hardware & Software Platforms for HPC, AI and MLHardware & Software Platforms for HPC, AI and ML
Hardware & Software Platforms for HPC, AI and MLinside-BigData.com
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)IJERD Editor
 
FIR_Filters_with_FPGA
FIR_Filters_with_FPGAFIR_Filters_with_FPGA
FIR_Filters_with_FPGAIrvn Rynning
 
pgconfasia2016 plcuda en
pgconfasia2016 plcuda enpgconfasia2016 plcuda en
pgconfasia2016 plcuda enKohei KaiGai
 
PARTIAL PRODUCT ARRAY HEIGHT REDUCTION USING RADIX-16 FOR 64-BIT BOOTH MULTI...
PARTIAL PRODUCT ARRAY HEIGHT REDUCTION USING RADIX-16 FOR 64-BIT BOOTH MULTI...PARTIAL PRODUCT ARRAY HEIGHT REDUCTION USING RADIX-16 FOR 64-BIT BOOTH MULTI...
PARTIAL PRODUCT ARRAY HEIGHT REDUCTION USING RADIX-16 FOR 64-BIT BOOTH MULTI...Hari M
 
Cache Optimization Techniques for General Purpose Graphic Processing Units
Cache Optimization Techniques for General Purpose Graphic Processing UnitsCache Optimization Techniques for General Purpose Graphic Processing Units
Cache Optimization Techniques for General Purpose Graphic Processing UnitsVajira Thambawita
 
FPGA based Efficient Interpolator design using DALUT Algorithm
FPGA based Efficient Interpolator design using DALUT AlgorithmFPGA based Efficient Interpolator design using DALUT Algorithm
FPGA based Efficient Interpolator design using DALUT Algorithmcscpconf
 
FPGA based Efficient Interpolator design using DALUT Algorithm
FPGA based Efficient Interpolator design using DALUT AlgorithmFPGA based Efficient Interpolator design using DALUT Algorithm
FPGA based Efficient Interpolator design using DALUT Algorithmcscpconf
 
Iaetsd pipelined parallel fft architecture through folding transformation
Iaetsd pipelined parallel fft architecture through folding transformationIaetsd pipelined parallel fft architecture through folding transformation
Iaetsd pipelined parallel fft architecture through folding transformationIaetsd Iaetsd
 
PL/CUDA - Fusion of HPC Grade Power with In-Database Analytics
PL/CUDA - Fusion of HPC Grade Power with In-Database AnalyticsPL/CUDA - Fusion of HPC Grade Power with In-Database Analytics
PL/CUDA - Fusion of HPC Grade Power with In-Database AnalyticsKohei KaiGai
 
Fpga implementation of truncated multiplier for array multiplication
Fpga implementation of truncated multiplier for array multiplicationFpga implementation of truncated multiplier for array multiplication
Fpga implementation of truncated multiplier for array multiplicationFinalyear Projects
 

Similar to OXiGen Dataflow Acceleration from C for FPGA (20)

International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)
 
FPGA Implementation of High Speed FIR Filters and less power consumption stru...
FPGA Implementation of High Speed FIR Filters and less power consumption stru...FPGA Implementation of High Speed FIR Filters and less power consumption stru...
FPGA Implementation of High Speed FIR Filters and less power consumption stru...
 
The CAOS framework: democratize the acceleration of compute intensive applica...
The CAOS framework: democratize the acceleration of compute intensive applica...The CAOS framework: democratize the acceleration of compute intensive applica...
The CAOS framework: democratize the acceleration of compute intensive applica...
 
E0364025031
E0364025031E0364025031
E0364025031
 
High Performance DSP with Xilinx All Programmable Devices (Design Conference ...
High Performance DSP with Xilinx All Programmable Devices (Design Conference ...High Performance DSP with Xilinx All Programmable Devices (Design Conference ...
High Performance DSP with Xilinx All Programmable Devices (Design Conference ...
 
Klessydra-T: Designing Configurable Vector Co-Processors for Multi-Threaded E...
Klessydra-T: Designing Configurable Vector Co-Processors for Multi-Threaded E...Klessydra-T: Designing Configurable Vector Co-Processors for Multi-Threaded E...
Klessydra-T: Designing Configurable Vector Co-Processors for Multi-Threaded E...
 
Target updated track f
Target updated   track fTarget updated   track f
Target updated track f
 
Chip Ex2010 Gert Goossens
Chip Ex2010 Gert GoossensChip Ex2010 Gert Goossens
Chip Ex2010 Gert Goossens
 
Hardware & Software Platforms for HPC, AI and ML
Hardware & Software Platforms for HPC, AI and MLHardware & Software Platforms for HPC, AI and ML
Hardware & Software Platforms for HPC, AI and ML
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)
 
FIR_Filters_with_FPGA
FIR_Filters_with_FPGAFIR_Filters_with_FPGA
FIR_Filters_with_FPGA
 
pgconfasia2016 plcuda en
pgconfasia2016 plcuda enpgconfasia2016 plcuda en
pgconfasia2016 plcuda en
 
PARTIAL PRODUCT ARRAY HEIGHT REDUCTION USING RADIX-16 FOR 64-BIT BOOTH MULTI...
PARTIAL PRODUCT ARRAY HEIGHT REDUCTION USING RADIX-16 FOR 64-BIT BOOTH MULTI...PARTIAL PRODUCT ARRAY HEIGHT REDUCTION USING RADIX-16 FOR 64-BIT BOOTH MULTI...
PARTIAL PRODUCT ARRAY HEIGHT REDUCTION USING RADIX-16 FOR 64-BIT BOOTH MULTI...
 
Cache Optimization Techniques for General Purpose Graphic Processing Units
Cache Optimization Techniques for General Purpose Graphic Processing UnitsCache Optimization Techniques for General Purpose Graphic Processing Units
Cache Optimization Techniques for General Purpose Graphic Processing Units
 
Dsp lab manual 15 11-2016
Dsp lab manual 15 11-2016Dsp lab manual 15 11-2016
Dsp lab manual 15 11-2016
 
FPGA based Efficient Interpolator design using DALUT Algorithm
FPGA based Efficient Interpolator design using DALUT AlgorithmFPGA based Efficient Interpolator design using DALUT Algorithm
FPGA based Efficient Interpolator design using DALUT Algorithm
 
FPGA based Efficient Interpolator design using DALUT Algorithm
FPGA based Efficient Interpolator design using DALUT AlgorithmFPGA based Efficient Interpolator design using DALUT Algorithm
FPGA based Efficient Interpolator design using DALUT Algorithm
 
Iaetsd pipelined parallel fft architecture through folding transformation
Iaetsd pipelined parallel fft architecture through folding transformationIaetsd pipelined parallel fft architecture through folding transformation
Iaetsd pipelined parallel fft architecture through folding transformation
 
PL/CUDA - Fusion of HPC Grade Power with In-Database Analytics
PL/CUDA - Fusion of HPC Grade Power with In-Database AnalyticsPL/CUDA - Fusion of HPC Grade Power with In-Database Analytics
PL/CUDA - Fusion of HPC Grade Power with In-Database Analytics
 
Fpga implementation of truncated multiplier for array multiplication
Fpga implementation of truncated multiplier for array multiplicationFpga implementation of truncated multiplier for array multiplication
Fpga implementation of truncated multiplier for array multiplication
 

More from NECST Lab @ Politecnico di Milano

Embedding based knowledge graph link prediction for drug repurposing
Embedding based knowledge graph link prediction for drug repurposingEmbedding based knowledge graph link prediction for drug repurposing
Embedding based knowledge graph link prediction for drug repurposingNECST Lab @ Politecnico di Milano
 
PLASTER - PYNQ-based abandoned object detection using a map-reduce approach o...
PLASTER - PYNQ-based abandoned object detection using a map-reduce approach o...PLASTER - PYNQ-based abandoned object detection using a map-reduce approach o...
PLASTER - PYNQ-based abandoned object detection using a map-reduce approach o...NECST Lab @ Politecnico di Milano
 
EMPhASIS - An EMbedded Public Attention Stress Identification System
 EMPhASIS - An EMbedded Public Attention Stress Identification System EMPhASIS - An EMbedded Public Attention Stress Identification System
EMPhASIS - An EMbedded Public Attention Stress Identification SystemNECST Lab @ Politecnico di Milano
 
Maeve - Fast genome analysis leveraging exact string matching
Maeve - Fast genome analysis leveraging exact string matchingMaeve - Fast genome analysis leveraging exact string matching
Maeve - Fast genome analysis leveraging exact string matchingNECST Lab @ Politecnico di Milano
 

More from NECST Lab @ Politecnico di Milano (20)

Mesticheria Team - WiiReflex
Mesticheria Team - WiiReflexMesticheria Team - WiiReflex
Mesticheria Team - WiiReflex
 
Punto e virgola Team - Stressometro
Punto e virgola Team - StressometroPunto e virgola Team - Stressometro
Punto e virgola Team - Stressometro
 
BitIt Team - Stay.straight
BitIt Team - Stay.straight BitIt Team - Stay.straight
BitIt Team - Stay.straight
 
BabYodini Team - Talking Gloves
BabYodini Team - Talking GlovesBabYodini Team - Talking Gloves
BabYodini Team - Talking Gloves
 
printf("Nome Squadra"); Team - NeoTon
printf("Nome Squadra"); Team - NeoTonprintf("Nome Squadra"); Team - NeoTon
printf("Nome Squadra"); Team - NeoTon
 
BlackBoard Team - Motion Tracking Platform
BlackBoard Team - Motion Tracking PlatformBlackBoard Team - Motion Tracking Platform
BlackBoard Team - Motion Tracking Platform
 
#include<brain.h> Team - HomeBeatHome
#include<brain.h> Team - HomeBeatHome#include<brain.h> Team - HomeBeatHome
#include<brain.h> Team - HomeBeatHome
 
Flipflops Team - Wave U
Flipflops Team - Wave UFlipflops Team - Wave U
Flipflops Team - Wave U
 
Bug(atta) Team - Little Brother
Bug(atta) Team - Little BrotherBug(atta) Team - Little Brother
Bug(atta) Team - Little Brother
 
#NECSTCamp: come partecipare
#NECSTCamp: come partecipare#NECSTCamp: come partecipare
#NECSTCamp: come partecipare
 
NECSTCamp101@2020.10.1
NECSTCamp101@2020.10.1NECSTCamp101@2020.10.1
NECSTCamp101@2020.10.1
 
NECSTLab101 2020.2021
NECSTLab101 2020.2021NECSTLab101 2020.2021
NECSTLab101 2020.2021
 
TreeHouse, nourish your community
TreeHouse, nourish your communityTreeHouse, nourish your community
TreeHouse, nourish your community
 
TiReX: Tiled Regular eXpressionsmatching architecture
TiReX: Tiled Regular eXpressionsmatching architectureTiReX: Tiled Regular eXpressionsmatching architecture
TiReX: Tiled Regular eXpressionsmatching architecture
 
Embedding based knowledge graph link prediction for drug repurposing
Embedding based knowledge graph link prediction for drug repurposingEmbedding based knowledge graph link prediction for drug repurposing
Embedding based knowledge graph link prediction for drug repurposing
 
PLASTER - PYNQ-based abandoned object detection using a map-reduce approach o...
PLASTER - PYNQ-based abandoned object detection using a map-reduce approach o...PLASTER - PYNQ-based abandoned object detection using a map-reduce approach o...
PLASTER - PYNQ-based abandoned object detection using a map-reduce approach o...
 
EMPhASIS - An EMbedded Public Attention Stress Identification System
 EMPhASIS - An EMbedded Public Attention Stress Identification System EMPhASIS - An EMbedded Public Attention Stress Identification System
EMPhASIS - An EMbedded Public Attention Stress Identification System
 
Luns - Automatic lungs segmentation through neural network
Luns - Automatic lungs segmentation through neural networkLuns - Automatic lungs segmentation through neural network
Luns - Automatic lungs segmentation through neural network
 
BlastFunction: How to combine Serverless and FPGAs
BlastFunction: How to combine Serverless and FPGAsBlastFunction: How to combine Serverless and FPGAs
BlastFunction: How to combine Serverless and FPGAs
 
Maeve - Fast genome analysis leveraging exact string matching
Maeve - Fast genome analysis leveraging exact string matchingMaeve - Fast genome analysis leveraging exact string matching
Maeve - Fast genome analysis leveraging exact string matching
 

Recently uploaded

Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
What are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxWhat are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxwendy cai
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )Tsuyoshi Horigome
 
GDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSCAESB
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...ranjana rawat
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Dr.Costas Sachpazis
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxupamatechverse
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)Suman Mia
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Christo Ananth
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxJoão Esperancinha
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxupamatechverse
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...ranjana rawat
 
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...srsj9000
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINESIVASHANKAR N
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130Suhani Kapoor
 
the ladakh protest in leh ladakh 2024 sonam wangchuk.pptx
the ladakh protest in leh ladakh 2024 sonam wangchuk.pptxthe ladakh protest in leh ladakh 2024 sonam wangchuk.pptx
the ladakh protest in leh ladakh 2024 sonam wangchuk.pptxhumanexperienceaaa
 

Recently uploaded (20)

Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
 
What are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxWhat are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptx
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )
 
GDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentation
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptx
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
 
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptx
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
 
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
 
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
 
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCRCall Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
 
the ladakh protest in leh ladakh 2024 sonam wangchuk.pptx
the ladakh protest in leh ladakh 2024 sonam wangchuk.pptxthe ladakh protest in leh ladakh 2024 sonam wangchuk.pptx
the ladakh protest in leh ladakh 2024 sonam wangchuk.pptx
 

OXiGen Dataflow Acceleration from C for FPGA

  • 1. 1 / 30 OXiGen Dataflow acceleration from C for FPGA Francesco Peverelli: francesco1.peverelli@mail.polimi.it Marco Rabozzi: marco.rabozzi@mail.polimi.it Emanuele Del Sozzo: emanuele.delsozzo@polimi.it May 21 2018, Kitsilano C JW Marriott Parq Vancouver Vancouver, British Columbia CANADA
  • 2. 2 / 30 FPGA with RTL Image property of Design Time Performance FPGA with HLS FPGA with HLS FPGA with RTL x86 GPU DSP x86 DSP GPU First working version Optimized version Software project design time limit
  • 3. 3 / 30 Image property of Design Time Performance FPGA with HLS FPGA with HLS FPGA with RTL FPGA with RTL x86 GPU DSP x86 DSP GPU First working version Optimized version Software project design time limit
  • 4. 4 / 30 Courtesy of: Maxeler Technology DATAFLOW BACKGROUND
  • 5. 5 / 30 CONTRIBUTIONS DIRECT TRANSLATION FROM A WIDELY USED HIGH LEVEL LANGUAGE AUTOMATIC DESIGN SPACE EXPLORATION AND OPTIMAL SYSTEM IMPLEMENTATION AUTOMATIC DESIGN SPACE EXPLORATION AND OPTIMAL SYSTEM IMPLEMENTATION IDENTIFICATION AND EXTRACTION OF DATAFLOW COMPUTATIONS
  • 6. 6 / 30 OXIGEN TOOL OVERVIEW FRONTEND HLL source code Target function Target board specifications OXIGEN LLVM IR BACKEND BACKEND-SPECIFIC OPTIMIZED CODE FPGA bitstream
  • 7. 7 / 30 OXIGEN TOOL OVERVIEW FRONTEND HLL source code Target function BACKENDOXIGENFRONTEND
  • 8. 8 / 30 OXIGEN TOOL OVERVIEW HLL source code Target function LLVM IR BACKENDOXIGENFRONTEND
  • 9. 9 / 30 OXIGEN TOOL OVERVIEW LLVM IR Target board specifications RESOURCES AND PERFORMANCE OPTIMIZATION IR PREPROCESSING FUNCTION ANALYSIS DFG CONSTRUCTION OPTIMIZATION CONFIGURATION INSTRUCTION COUNT REPORT BACKEND TRANSLATION BACKEND-SPECIFIC OPTIMIZED CODE OXIGEN BACKENDOXIGENFRONTEND
  • 10. 10 / 30 OXIGEN TOOL OVERVIEW LLVM IR Target board specifications RESOURCES AND PERFORMANCE OPTIMIZATION IR PREPROCESSING FUNCTION ANALYSIS DFG CONSTRUCTION OPTIMIZATION CONFIGURATION INSTRUCTION COUNT REPORT BACKEND TRANSLATION Translation flow Optimization flow BACKEND-SPECIFIC OPTIMIZED CODE BACKENDOXIGENFRONTEND
  • 11. 11 / 30 OXIGEN TOOL OVERVIEW BACKENDOXIGENFRONTEND BACKEND-SPECIFIC OPTIMIZED CODE FPGA bitstream COMMERCIAL SYNTHESIS TOOLBACKEND
  • 12. 12 / 30 OXIGEN TOOL OVERVIEW BACKENDOXIGENFRONTEND BACKEND-SPECIFIC OPTIMIZED CODE FPGA bitstream HDL GENERATION IMPLEMENTATION AND BITSTREAM GENERATION
  • 13. 13 / 30 DATAFLOW INTERMEDIATE REPRESENTATION NESTED LOOP NODES LOOP CARRIED DEPENDENCY ARC OUTER LOOPS void foo(type_1* in_1, type_2* in_2 type_1* out_1, scalar_type_1* v_1){ for(int i = offs; i < I_SIZE – offs_2; i++){ S1: …statements… for(int j = …; j < 15; j++){ S2: …statements… } S3: …statements… for(int j = … ){ S4: …statements… } } for(int i = offs_3; … ){ S5: …statements… } }
  • 14. 14 / 30 OPTIMIZATION OPTIONS Throughput Required resources Resources driven design Throughput driven design Unoptimized design VECTORIZATION REROLLING
  • 15. 15 / 30 queue computational element input output Replicated compute elements from unrolled nested loop body REROLLING
  • 16. 16 / 30 computational elements input output vectorized input replicate the computation across elements of vectorized input / output VECTORIZATION
  • 17. 17 / 30 DESIGN SPACE EXPLORATION RESOURCE ESTIMATION MODEL PERFORMANCE MODEL
  • 18. 18 / 30 𝑇 = {𝐷𝑆𝑃, 𝐵𝑅𝐴𝑀, 𝐹𝐹, 𝐿𝑈𝑇} Nodes number of nodes of type n Resource consumption of the node given its implementation Overall use of resources of type t ∊ T Optimization parameters θ : Target specific configuration (e.g. DSP / LUT balance ...) RESOURCE ESTIMATION MODEL 𝑟𝑡 = σ 𝑛∊𝑁 𝑐 𝑛,𝑡,θ ∗ 𝑞(𝑓0, … , 𝑓𝑖)
  • 19. 19 / 30 Rerolling factor Occurrences of operations n outside nested loops Iterations of l after rerolling Number of operations of type n Original iterations of nested loop l Occurrences of operations n in nested loop l Loops that can be rerolled OPERATION COUNT(REROLLING)
  • 20. 20 / 30 Number of operations of type n Vectorization factor Overall occurrences of n OPERATION COUNT(VECTORIZATION)
  • 21. 21 / 30 Rerolling performance Vectorization performance Output production rate Maximum output bandwidth Maximum input bandwidth Input consume rateVectorization factor Rerolling factor PERFORMANCE MODEL 𝑝 𝑓0 = min 𝑅 𝑜𝑢𝑡 𝑓𝑜 , 𝐵 𝑜𝑢𝑡 , 𝑅 𝑜𝑢𝑡 𝑅𝑖𝑛 ∙ 𝐵𝑖𝑛 𝑝 𝑓0 = min 𝑓0 ∙ 𝑅 𝑜𝑢𝑡, 𝐵 𝑜𝑢𝑡, 𝑅 𝑜𝑢𝑡 𝑅𝑖𝑛 ∙ 𝐵𝑖𝑛
  • 22. 22 / 30 Maximum available resources of type t Resources of type t used Find the configuration of parameters that maximizes performance Resource types available on the board DESIGN SPACE EXPLORATION 𝑂𝑃𝑇𝐼𝑀𝐼𝑍𝐴𝑇𝐼𝑂𝑁 𝐹𝑈𝑁𝐶𝑇𝐼𝑂𝑁 𝑅𝐸𝑆𝑂𝑈𝑅𝐶𝐸𝑆 𝐶𝑂𝑁𝑆𝑇𝑅𝐴𝐼𝑁𝑇𝑆 𝑎𝑟𝑔𝑚𝑎𝑥 𝑓0 , … , 𝑓 𝑖 ,θ 𝑝(𝑓0, … , 𝑓𝑖) 𝑟𝑡 ≤ 𝑀𝑡 ∀ t ∊ T
  • 23. 23 / 30 CASE STUDIES • Sharpen image filter • Asian option pricing APPLICATIONS • A compute intensive and a data intensive design • Test diverse optimization options CHARACTERISTICS • Test the use of built-in library functions in the target language
  • 24. 24 / 30 • Speedup over software • Hardware resources utilization • MaxCompiler • Galava MAX4 board • Stratix V FPGA EXPERIMENTAL SETTING EVALUATED METRICS ENVIRONMENT
  • 25. 25 / 30 BEFORE AFTER SHARPENER IMAGE FILTER
  • 26. 26 / 30 Vector. Factor Speedup 1 4.58 2 8.86 4 15.37 8 15.85 16 15.62 Total Resources DSP BRAM FF LUT 5.86 9.81 4.85 7.28 11.72 13.57 7.07 10.75 23.44 21.14 11.52 17.55 46.88 36.72 20.43 31.17 93.75 67.29 38.16 58.51 Compared to an Intel® CoreTM i7-6700 single threaded implementation compiled with gcc 4.4.7 using –O3 SHARPENER IMAGE FILTER
  • 27. 27 / 30 ASIAN OPTION PRICING
  • 28. 28 / 30 Total Resources DSP BRAM FF LUT 39.06 37.65 21.86 37.15 55.08 42.09 24.47 40.71 29.30 41.50 31.55 52.32 87.11 54.30 29.60 47.73 41.02 59.13 39.51 64.31 46.88 63.96 43.31 70.09 Rerol. Factor Speedup DSP/LUT Balance 30 15.83 1.0 15 31.22 1.0 10 45.95 0.1 8 56.91 1.0 6 74.35 0.1 5 88.10 0.1 About a day of work vs several weeks*! *A. M. Nestorov, E. Reggiani, H. Palikareva, P. Burovskiy, T. Becker, and M. D. Santambrogio, “A scalable dataflow implementation of curran’s approximation algorithm” ASIAN OPTION PRICING
  • 29. 29 / 30 CONCLUSION AND FUTURE WORK CONTRIBUTIONS • Translation of high-level language functions into dataflow kernels implemented on FPGA • Design space exploration and automatic design optimization • Validation of the proposed approach on two different C designs using MaxCompiler as a backend FUTURE WORK • Expansion of the translation capabilities of the tool • Verification on a broader set of case studies • Integration of more optimization options and design space parameters
  • 30. 30 / 30 Thank you! Francesco Peverelli: francesco1.peverelli@mail.polimi.it Marco Rabozzi: marco.rabozzi@mail.polimi.it Emanuele Del Sozzo: emanuele.delsozzo@polimi.it May 21 2018, Kitsilano C JW Marriott Parq Vancouver Vancouver, British Columbia CANADA https://www.slideshare.net/necstlab /groups/ReconfigurableArchitectures Workshop/ https://necst.it/