Enabling Transparent Hardware Acceleration on
Zynq SoC for Python Data Science Applications
Luca Stornaiuolo
Dipartimento di Elettronica Informazione e Bioingegneria (DEIB)
luca.stornaiuolo@polimi.it
Marco D. Santambrogio
numPYNQ
2
Context Definition
Huge amount of data that need to
be processed to get aggregated
information in real-timeSmart Embedded Systems
3
Context Definition
Huge amount of data that need to
be processed to get aggregated
information in real-timeSmart Embedded Systems
4
Context Definition
Huge amount of data that need to
be processed to get aggregated
information in real-timeSmart Embedded Systems
5
Existing Devices
Performance
Energy
Efficiency
Easy Integration
GPU
FPGA
CPU
6
Embedded Devices
Performance
Energy
Efficiency
Easy Integration
GPU
FPGA
CPU
FPGA-based hardware library that offers an
accelerated version of scientific functions
to be used transparently on embedded systems.
7
Embedded Devices
Performance
Energy
Efficiency
Easy Integration
GPU
FPGA
CPU
numPYNQ
8
Technology
Overlay
Correlation
Matrix Dot Product
FFT
9
Matrix Dot Product
0
25
50
75
16
64
128
256
384
512
768
1024
Input size
Executiontime[sec]
PYNQ-Z1 (only CPU)
PYNQ-Z1 (CPU+FPGA)
0.00
0.01
0.02
0.03
0.04
16
32
64
100
125
150
Input sizeExecutiontime[sec]
PYNQ-Z1 (only CPU)
PYNQ-Z1 (CPU+FPGA)
Pipelined Dot Product
for Non-Fixed size Matrix
Parallel Dot Product
for Fixed size Matrix
3.5x
6.1x
10
Runtime Adaptivity
Runtime
Input Analysis
Target
Implementation
numPYNQ
11
Case Study
FFT and Correlation
12
Experimental Results
12.4x
5.5x
https://necst.it/
https://www.slideshare.net/necstlab
Enabling Transparent Hardware Acceleration on
Zynq SoC for Python Data Science Applications
numPYNQ
Luca Stornaiuolo
Dipartimento di Elettronica Informazione e Bioingegneria (DEIB)
luca.stornaiuolo@polimi.it
Marco D. Santambrogio

numPYNQ: accelerating NumPy on PYNQ