Presentation made at the 14th International Symposium "Applied Reconfigurable Computing" (ARC 2018) in Santorini, Greece, on May 2-4, 2018, about a novel scheme for the acceleration of Alternating Least Squares-based (ALS) collaborative ltering for recommendation engines that can be used to speedup signicantly the processing time and also reduce the energy consumption of computing platforms.
Unleash Your Potential - Namagunga Girls Coding Club
Efficient hardware acceleration of recommendation engines: a use case on collaborative filtering
1. Konstantinos Katsantonis, Christoforos Kachris, Dimitrios Soudris
kachris@microlab.ntua.gr
ICCS-National Technical University of Athens
ARC , May 2018
Efficient hardware acceleration of
recommendation engines: a use case on
collaborative filtering
2. www.vineyard-h2020.eu
Recommendation engines Market size
2
• The global recommendation engine market, is
expected to grow from USD 801.1 Million in 2017 to
USD 4414.8 Million by 2022, at a Compound Annual
Growth Rate (CAGR) of 40.7% during the forecast
period.
• The collaborative filtering technique uses a large
volume of information, such as users' behavior,
preferences, and activities from the past records to
segment users based on similarity of likings
• https://www.prnewswire.com/news-releases/global-recommendation-engine-market-2018-2022---key-
innovators-include-fuzzyai--infinite-analytics-300624258.html
Christoforos Kachris, ICCS-NTUA, ARC 2018
3. www.vineyard-h2020.eu
Objectives
Christoforos Kachris, ICCS-NTUA, ARC 2018 3
• Design Space Exploration of a Recommendation System using
Matrix Factorization trained by Alternating Least Squares.
• Efficient mapping in reconfigurable computing using High-
Level Synthesis (HLS).
• Performance and power evaluation.
• Creation of a python interface for the accelerator.
• Integration with the Spark framework through python.
5. www.vineyard-h2020.eu
Architecture
• For every user and
every movie solve
the following system
• Efficient mapping on
BRAM and comput.
resources
Christoforos Kachris, ICCS-NTUA, ARC 2018 5
6. www.vineyard-h2020.eu
Prototype on Zedboard
Christoforos Kachris, ICCS-NTUA, ARC 2018 6
Software - Kernel
interface version 1
Software - Kernel
interface version 2
Software - Kernel
interface version 3
AXI4-Stream AXI4-Stream AXI4-Stream
Hand Written Driver Xilinx IP “Accelerator FIFO
Adapter”
Xilinx IP “Accelerator FIFO
Adapter”
2-dimensional Zero
Padding
2-dimensional Zero
Padding
1-dimensional Zero
Padding
Fixed Input Data Window
Dimensions
Fixed Input Data Window
Dimensions
Variable Size Input
Windows, Determined on
Runtime
7. www.vineyard-h2020.eu
Pynq: Python Productivity for Zynq
• An open-source project from Xilinx that
makes it easy to design embedded
systems with Zynq MPSoCs.
• The APSoC is programmed using
Python.
• The code is developed and tested
directly on the PYNQ-Z1 board.
• The programmable logic circuits are
imported as hardware libraries and
programmed through their APIs in
essentially the same way as the
software libraries.
7
Christoforos Kachris, ICCS-NTUA, ARC 2018
8. www.vineyard-h2020.eu
Apache Spark
The largest open source project in
data processing.
• Structured Data
• Streaming Analytics
• Machine Learning
• Graph Computation
Provides an interface for
programming entire clusters with
implicit data parallelism and fault-
tolerance.
8
Christoforos Kachris, ICCS-NTUA, ARC 2018
14. www.vineyard-h2020.eu
Results
• Kernel Integration with ALS algorithm. Acceleration
achieved using Zedboard with input datasets movielens
100k & 1m vs arm-only execution.
Christoforos Kachris, ICCS-NTUA, ARC 2018 14
15. www.vineyard-h2020.eu
Results
• System Integration with ALS algorithm. Acceleration
achieved using PyNQ with input datasets movielens
100k & 1m vs arm-only execution.
Christoforos Kachris, ICCS-NTUA, ARC 2018 15
16. www.vineyard-h2020.eu
Xeon vs 4-node Pynq cluster
• Xeon outperformed the Spark Cluster on the proposed
schema, due to the enormous data transfers required
by ALS, over Ethernet and the additional software
computations introduced by Spark.
Christoforos Kachris, ICCS-NTUA, ARC 2018 16
19. www.vineyard-h2020.eu
VINEYARD Framework
19
• Accelerators stored
in an AppStore
• Cloud users request
accelerators based
on applications
requirements
• Decouple Hardware
– Software
designers
Cloud computing Applications
VINEYARD Cloud Resource Manager
3rd party IP
developersLibrary of Hardware
accelerators as IP
Blocks
Heterogeneous Data Center
DFE
Processors Dataflow Proc.+FPGA
IP Accelerator’s
App store
Cloud tenants
Acc
Acc
Acc
Acc
DFE
DFE
DFE
Accelerator Controller
Accelerator Virtualization
Scheduler
Accelerator API
Performance
Energy
Christoforos Kachris, ICCS-NTUA, ARC 2018
20. www.vineyard-h2020.eu
Main goals
VINEYARD AIMS TO
• Build an integrated platform for energy-efficient data
centres based on novel programmable hardware
accelerators
• Develop a high-level programming framework and big
data infrastructure for allowing end-users to seamlessly
utilize these accelerators in heterogeneous computing
systems by employing typical data-centre programming
frameworks (i.e. Spark.).
• VINEYARD also foster the establishment of an
ecosystem that will empower open innovation based on
hardware accelerators as data-centre plugins for
marketplace, thereby facilitating innovative enterprises
(large industries, SMEs, and creative start-ups) to
develop novel solutions using VINEYARDS’s leading
edge developments.
20
Christoforos Kachris, ICCS-NTUA, ARC 2018
21. • Speedup your application seamlessly
• An integrated framework for the utilization of hardware
accelerators in HPC and data center seamlessly
Contact detais: kachris@microlab.ntua.gr