HARDWAREACCELERATION
Lorenzo DiTucci
lorenzo.ditucci@mail.polimi.it
Giulia Guidi
giulia.guidi@mail.polimi.it
XilinxOpen Hardware Context 2016s
Profiling
2
To find out the bottleneck procedures in the program, i.e.
the most compute intensive function.
Static Code Analysis
3
• most computationally intese function: computePairEnergy
• operational intensity equal to 52 Operations/Byte
• exploiting Roofline model we obtain expected performance
in GOps/s
Hardware Implementation
4
1. implementation of the bottleneck function into Vivado HLS
2. integration of the code into SDAccel
• automation of the hardware design flow
• possibility of introducing errors reduced
Optimization (1)
5
DRAM BRAM
z
Optimization (2)
6
Loop without Pipeline
Loop with Pipeline
Issue: can’t pipeline all loops
• the FPGA area 

is not large enough
• RAW hazards
Optimization (3)
7
Reduction
• removes the dependency between
iterations of a loops when updating
a variable
• works by writing the result of each
iteration of the loop in a cell of a
temporary array
• creation of multiple loops and
arrays to sum up the values inside
the first temporary structure
8
Thanks for your attention
Lorenzo Di Tucci
Giulia Guidi
lorenzo.ditucci@mail.polimi.it
giulia.guidi@mail.polimi.it
Follow us on : https://www.facebook.com/profaxnecstlab/
Follow us on : https://twitter.com/ProFAX_NECST
Follow us on : http://www.slideshare.net/ProFAX

ProFAX - Implementation (Short version)