The document describes OXiGen, a tool that translates C code into optimized FPGA implementations using a dataflow intermediate representation. OXiGen performs (1) automatic design space exploration to optimize for resources and performance, (2) extraction of dataflow computations from C functions, and (3) generation of backend-specific code and FPGA bitstreams. Two case studies on image filtering and option pricing show speedups of up to 88x over software and efficient FPGA resource utilization. Future work aims to support more languages, designs, and optimizations.
the ladakh protest in leh ladakh 2024 sonam wangchuk.pptx
OXiGen Dataflow Acceleration from C for FPGA
1. 1 / 30
OXiGen
Dataflow acceleration from C for FPGA
Francesco Peverelli: francesco1.peverelli@mail.polimi.it
Marco Rabozzi: marco.rabozzi@mail.polimi.it
Emanuele Del Sozzo: emanuele.delsozzo@polimi.it
May 21 2018, Kitsilano C
JW Marriott Parq Vancouver
Vancouver, British Columbia CANADA
2. 2 / 30
FPGA with
RTL
Image property of
Design Time
Performance FPGA with
HLS
FPGA with
HLS
FPGA with
RTL
x86
GPU
DSP
x86
DSP
GPU
First working
version
Optimized version
Software project
design time limit
3. 3 / 30
Image property of
Design Time
Performance FPGA with
HLS
FPGA with
HLS
FPGA with
RTL
FPGA with
RTL
x86
GPU
DSP
x86
DSP
GPU
First working
version
Optimized version
Software project
design time limit
5. 5 / 30
CONTRIBUTIONS
DIRECT TRANSLATION FROM A WIDELY USED
HIGH LEVEL LANGUAGE
AUTOMATIC DESIGN SPACE EXPLORATION
AND OPTIMAL SYSTEM IMPLEMENTATION
AUTOMATIC DESIGN SPACE EXPLORATION
AND OPTIMAL SYSTEM IMPLEMENTATION
IDENTIFICATION AND EXTRACTION
OF DATAFLOW COMPUTATIONS
17. 17 / 30
DESIGN SPACE EXPLORATION
RESOURCE ESTIMATION MODEL
PERFORMANCE MODEL
18. 18 / 30
𝑇 = {𝐷𝑆𝑃, 𝐵𝑅𝐴𝑀, 𝐹𝐹, 𝐿𝑈𝑇}
Nodes number of nodes of type n
Resource consumption of the
node given its implementation
Overall use of resources of
type t ∊ T
Optimization parameters
θ : Target specific configuration (e.g. DSP / LUT balance ...)
RESOURCE ESTIMATION MODEL
𝑟𝑡 = σ 𝑛∊𝑁 𝑐 𝑛,𝑡,θ ∗ 𝑞(𝑓0, … , 𝑓𝑖)
19. 19 / 30
Rerolling factor
Occurrences of operations n
outside nested loops
Iterations of l after
rerolling
Number of
operations of type n
Original iterations
of nested loop l
Occurrences of operations
n in nested loop l
Loops that can be
rerolled
OPERATION COUNT(REROLLING)
20. 20 / 30
Number of
operations of type n
Vectorization factor
Overall
occurrences of n
OPERATION COUNT(VECTORIZATION)
21. 21 / 30
Rerolling performance
Vectorization performance
Output production rate
Maximum output
bandwidth
Maximum input
bandwidth
Input consume rateVectorization factor
Rerolling factor
PERFORMANCE MODEL
𝑝 𝑓0 = min
𝑅 𝑜𝑢𝑡
𝑓𝑜
, 𝐵 𝑜𝑢𝑡 ,
𝑅 𝑜𝑢𝑡
𝑅𝑖𝑛
∙ 𝐵𝑖𝑛
𝑝 𝑓0 = min 𝑓0 ∙ 𝑅 𝑜𝑢𝑡, 𝐵 𝑜𝑢𝑡,
𝑅 𝑜𝑢𝑡
𝑅𝑖𝑛
∙ 𝐵𝑖𝑛
22. 22 / 30
Maximum available
resources of type t
Resources of type t used
Find the configuration of parameters that
maximizes performance
Resource types
available on the board
DESIGN SPACE EXPLORATION
𝑂𝑃𝑇𝐼𝑀𝐼𝑍𝐴𝑇𝐼𝑂𝑁 𝐹𝑈𝑁𝐶𝑇𝐼𝑂𝑁
𝑅𝐸𝑆𝑂𝑈𝑅𝐶𝐸𝑆 𝐶𝑂𝑁𝑆𝑇𝑅𝐴𝐼𝑁𝑇𝑆
𝑎𝑟𝑔𝑚𝑎𝑥 𝑓0
, … , 𝑓 𝑖
,θ 𝑝(𝑓0, … , 𝑓𝑖)
𝑟𝑡 ≤ 𝑀𝑡 ∀ t ∊ T
23. 23 / 30
CASE STUDIES
• Sharpen image filter
• Asian option pricing
APPLICATIONS
• A compute intensive and a data intensive design
• Test diverse optimization options
CHARACTERISTICS
• Test the use of built-in library functions in the target language
28. 28 / 30
Total Resources
DSP BRAM FF LUT
39.06 37.65 21.86 37.15
55.08 42.09 24.47 40.71
29.30 41.50 31.55 52.32
87.11 54.30 29.60 47.73
41.02 59.13 39.51 64.31
46.88 63.96 43.31 70.09
Rerol.
Factor
Speedup
DSP/LUT
Balance
30 15.83 1.0
15 31.22 1.0
10 45.95 0.1
8 56.91 1.0
6 74.35 0.1
5 88.10 0.1
About a day of work vs several weeks*!
*A. M. Nestorov, E. Reggiani, H. Palikareva, P. Burovskiy, T. Becker, and
M. D. Santambrogio, “A scalable dataflow implementation of curran’s approximation algorithm”
ASIAN OPTION PRICING
29. 29 / 30
CONCLUSION AND FUTURE WORK
CONTRIBUTIONS
• Translation of high-level language functions into dataflow
kernels implemented on FPGA
• Design space exploration and automatic design optimization
• Validation of the proposed approach on two different C designs
using MaxCompiler as a backend
FUTURE WORK
• Expansion of the translation capabilities of the tool
• Verification on a broader set of case studies
• Integration of more optimization options and design space
parameters
30. 30 / 30
Thank you!
Francesco Peverelli: francesco1.peverelli@mail.polimi.it
Marco Rabozzi: marco.rabozzi@mail.polimi.it
Emanuele Del Sozzo: emanuele.delsozzo@polimi.it
May 21 2018, Kitsilano C
JW Marriott Parq Vancouver
Vancouver, British Columbia CANADA
https://www.slideshare.net/necstlab
/groups/ReconfigurableArchitectures
Workshop/
https://necst.it/