PLASTER
PYNQ-based abandoned object detection using a
map-reduce approach on a multi-FPGA cluster
Daniele Valentino De Vincenti   danielevalentino.devincenti@mail.polimi.it
Lorenzo Farinelli                                lorenzo.farinelli@mail.polimi.it
Luca Stornaiuolo                     luca.stornaiuolo@polimi.it
Rolando Brondolin                     rolando.brondolin@polimi.it
July 19th 2020
VNGC project presentation
Context
2
Neural networks running on
embedded devices are :
▪ Computationally intensive
▪ Strongly memory bound
▪ Resource hungry
▪ Power consuming
https://www.flaticon.com/authors/freepik
Our solution
3
▪ PYNQ-based multi-FPGA cluster
for the:
○ Flexibility of the infrastructure
○ Reliability and redundancy
○ Portability and ease of setup
(e.g. events)
○ High computational power
○ Heterogeneous design
○ Embedded system
▪ Abandoned object detection
using accelerated YOLO detector
https://www.flaticon.com/authors/eucalyp
▪ C++ node manager for fast
communication
▪ End-user Python libraries for
ease of use
https://www.flaticon.com/free-icon/purchase-summary_1949624
https://github.com/dhm2013724/yolov2_xilinx_fpga
Multi-FPGA cluster
4
▪ Distributed system of accelerators
▪ Self-managed cluster of PYNQ-Z1s
▪ On-the-fly reconfiguration of FPGAs
to start new tasks and jobs
Cluster design
5
Rendering of a 3-PYNQ-Z1s configuration of the cluster, with the boards facing outwards to improve heat dissipation.
A fan will be placed on top, and on the side the cluster will be surrounded by a plexiglass pane to force the airflow to
go from bottom to top. In the central hole a network switch will be hosted, to connect together the boards.
Application
6
▪ Video input is gathered and split
into frames
▪ Frame chunks are sent to multiple
board for classification
▪ Results from the classification stage
are sent to a second analyzing stage
▪ Final results are aggregated and
sent back to the user
User Cluster
Node manager
7
https://creativemarket.com/Becris
▪ Fast underlying communication
layer
▪ Easy reconfigurability of nodes for
different tasks
▪ Simple rearrangement of the cluster
in case of failures
▪ Ease of use through a set of Python
APIs
Python libraries
8
▪ Python APIs to build applications
running on the cluster
▪ Easy configuration and assignment
of bit-streams to the boards
▪ Dedicated functions for the
communication and file exchange
between nodes
▪ Transparent C++ management layer
Results / user APIs
9
UML class diagrams of the base class representing a task of a distributed application and specifying the
APIs provided by the Python library to interact with the cluster and manage the execution of apps.
When developing a distributed application, a user only has to import PlasterTask in the app’s Python
code and implement a concrete subclass to define a custom behavior for the task.
ExecutorWrapper is used to expose methods to let tasks interact with the cluster (e.g. send/receive data)
Results / transfer times
10
File size Plaster transfer time Python transfer time
50 MB 194.094 ms 1.0 s
100 MB 300.680 ms 2.0 s
200 MB 600.392 ms 3.976 s
500 MB 1866.889 ms 9.679 s
Transfer times to move a file of the specified size from one board to another. The values refers to the time between
the 1st
file transfer request (from recipient to owner) and the reception (and disk writing) of the last chunk of the file,
comparing the results obtained by using the communication APIs of the cluster (left) and by explicitly defining a file
transfer function from scratch in Python (right)
Thank you
Lorenzo Farinelli
MSc Computer Science & Engineering
lorenzo.farinelli@mail.polimi.it
11
Luca Stornaiuolo
PhD - Computer Science & Engineering
luca.stornaiuolo@polimi.it
Daniele Valentino De Vincenti
Ba Biomedical Engineering
danielevalentino.devincenti@mail.polimi.it
Project code available at: https://bitbucket.org/necst/pynq-cluster/
Rolando Brondolin
PhD - Information Technology
rolando.brondolin@polimi.it

PLASTER - PYNQ-based abandoned object detection using a map-reduce approach on a multi-FPGA cluster

  • 1.
    PLASTER PYNQ-based abandoned objectdetection using a map-reduce approach on a multi-FPGA cluster Daniele Valentino De Vincenti   danielevalentino.devincenti@mail.polimi.it Lorenzo Farinelli                                lorenzo.farinelli@mail.polimi.it Luca Stornaiuolo                     luca.stornaiuolo@polimi.it Rolando Brondolin                     rolando.brondolin@polimi.it July 19th 2020 VNGC project presentation
  • 2.
    Context 2 Neural networks runningon embedded devices are : ▪ Computationally intensive ▪ Strongly memory bound ▪ Resource hungry ▪ Power consuming https://www.flaticon.com/authors/freepik
  • 3.
    Our solution 3 ▪ PYNQ-basedmulti-FPGA cluster for the: ○ Flexibility of the infrastructure ○ Reliability and redundancy ○ Portability and ease of setup (e.g. events) ○ High computational power ○ Heterogeneous design ○ Embedded system ▪ Abandoned object detection using accelerated YOLO detector https://www.flaticon.com/authors/eucalyp ▪ C++ node manager for fast communication ▪ End-user Python libraries for ease of use https://www.flaticon.com/free-icon/purchase-summary_1949624 https://github.com/dhm2013724/yolov2_xilinx_fpga
  • 4.
    Multi-FPGA cluster 4 ▪ Distributedsystem of accelerators ▪ Self-managed cluster of PYNQ-Z1s ▪ On-the-fly reconfiguration of FPGAs to start new tasks and jobs
  • 5.
    Cluster design 5 Rendering ofa 3-PYNQ-Z1s configuration of the cluster, with the boards facing outwards to improve heat dissipation. A fan will be placed on top, and on the side the cluster will be surrounded by a plexiglass pane to force the airflow to go from bottom to top. In the central hole a network switch will be hosted, to connect together the boards.
  • 6.
    Application 6 ▪ Video inputis gathered and split into frames ▪ Frame chunks are sent to multiple board for classification ▪ Results from the classification stage are sent to a second analyzing stage ▪ Final results are aggregated and sent back to the user User Cluster
  • 7.
    Node manager 7 https://creativemarket.com/Becris ▪ Fastunderlying communication layer ▪ Easy reconfigurability of nodes for different tasks ▪ Simple rearrangement of the cluster in case of failures ▪ Ease of use through a set of Python APIs
  • 8.
    Python libraries 8 ▪ PythonAPIs to build applications running on the cluster ▪ Easy configuration and assignment of bit-streams to the boards ▪ Dedicated functions for the communication and file exchange between nodes ▪ Transparent C++ management layer
  • 9.
    Results / userAPIs 9 UML class diagrams of the base class representing a task of a distributed application and specifying the APIs provided by the Python library to interact with the cluster and manage the execution of apps. When developing a distributed application, a user only has to import PlasterTask in the app’s Python code and implement a concrete subclass to define a custom behavior for the task. ExecutorWrapper is used to expose methods to let tasks interact with the cluster (e.g. send/receive data)
  • 10.
    Results / transfertimes 10 File size Plaster transfer time Python transfer time 50 MB 194.094 ms 1.0 s 100 MB 300.680 ms 2.0 s 200 MB 600.392 ms 3.976 s 500 MB 1866.889 ms 9.679 s Transfer times to move a file of the specified size from one board to another. The values refers to the time between the 1st file transfer request (from recipient to owner) and the reception (and disk writing) of the last chunk of the file, comparing the results obtained by using the communication APIs of the cluster (left) and by explicitly defining a file transfer function from scratch in Python (right)
  • 11.
    Thank you Lorenzo Farinelli MScComputer Science & Engineering lorenzo.farinelli@mail.polimi.it 11 Luca Stornaiuolo PhD - Computer Science & Engineering luca.stornaiuolo@polimi.it Daniele Valentino De Vincenti Ba Biomedical Engineering danielevalentino.devincenti@mail.polimi.it Project code available at: https://bitbucket.org/necst/pynq-cluster/ Rolando Brondolin PhD - Information Technology rolando.brondolin@polimi.it