Christoforos Kachris
ICCS-NTUA
Speedup Spark
Applications using FPGA
Accelerators on the cloud
#EUres8
Performance in Spark
• 91% if Spark
users care about
performance
2Christoforos Kachris, www.inaccel.com, 2017, #EUres8
FEATURES USERS CONSIDER IMPORTANT
Accelerators versus Processors
3Christoforos Kachris, www.inaccel.com, 2017, #EUres8
[Source: National Instruments: Smart Grid Ready Instrumentation”
Flexibility versus Performance
4Christoforos Kachris, www.inaccel.com, 2017, #EUres8
CPU: High flexibility, lower efficiency Accelerators: High throughput, higher efficiency
GPUs and FPGAs can provide massive parallelism and higher efficiency
than CPUs for certain categories of applications
[Source: Amazon AWS f1]
Hardware Acceleration
5Christoforos Kachris, www.inaccel.com, 2017, #EUres8
module fi lter1 ( clock, rst, st rm_in, out)
for (i =0; i<N UMUNITS ; i=i+1 )
always @(posed ge cloc k)
intege r i,j; //index for lo ops
tm p_kerne l[j] = k[i*OFF SETX];
FPGA handles compute-
intensive, deeply pipelined,
hardware-accelerated
operations
CPU handles the rest
application
[Source: Amazon AWS f1]
Process flow using Accelerators
6Christoforos Kachris, www.inaccel.com, 2017, #EUres8
user
c4, m4
F1 (FPGA)
Amazon
Marketplace
Download
Accelerator
from Marketplace
Run your code on CPU
Offload hard
work on F1
Amazon Resources
7
Amazon
Machine
Image (AMI)
Amazon FPGA
Image (AFI)
EC2 F1
Instance
CPU
Application
on F1
DDR-4
Attached
Memory
DDR-4
Attached
Memory
DDR-4
Attached
Memory
DDR-4
Attached
Memory
DDR-4
Attached
Memory
DDR-4
Attached
Memory
DDR-4
Attached
Memory
DDR-4
Attached
Memory
FPGA Link
PCIe
DDR
Controllers
Launch Instance
and Load AFI
[Source: Amazon AWS f1]
FPGAs in Amazon AWS
8
Amazon EC2 FPGA
Deployment via Marketplace
Amazon
Machine
Image (AMI)
Amazon FPGA Image
(AFI)
AFI is secured, encrypted,
dynamically loaded into the
FPGA - can’t be copied or
downloaded
Customers
AWS Marketplace
[Source: Amazon AWS f1]
A use case on Logistic regression
LR is used for building
predictive models for many
complex pattern-matching and
classification problems.
It can be applied widely in such
diverse areas as
• bioinformatics,
• finance and
• data analytics.
One of the most popular
Machine Learning techniques.
9Christoforos Kachris, www.inaccel.com, 2017, #EUres8
Hardware accelerator for LR
10Christoforos Kachris, www.inaccel.com, 2017, #EUres8
Accelerated Mllib
• Extension of Spark libraries
11Christoforos Kachris, www.inaccel.com, 2017, #EUres8
Spark – FPGA interface
• Spark worker
• Python API
• C API
Support for:
12Christoforos Kachris, www.inaccel.com, 2017, #EUres8
ML acceleration library
13Christoforos Kachris, www.inaccel.com, 2017, #EUres8
Evaluation in cluster
14Christoforos Kachris, www.inaccel.com, 2017, #EUres8
• Evaluation on a Handwritten Digit
Recognition Problem
– Take an image of a handwritten single digit,
and determine what that digit is.
• 10 classes, 786 features
• 40k lines train file
• Given a new data point, models will be
run, and the class with largest probability
will be chosen as the predicted class.
Evaluation
• Evaluation on a
Handwritten Digit
Recognition Problem
– Take an image of a
handwritten single
digit, and determine
what that digit is.
• Features of evaluated
platforms
15Christoforos Kachris, www.inaccel.com, 2017, #EUres8
Spark implementation
16Christoforos Kachris, www.inaccel.com, 2017, #EUres8
Cluster on FPGA nodes
17Christoforos Kachris, www.inaccel.com, 2017, #EUres8
18Christoforos Kachris, www.inaccel.com, 2017, #EUres8
Spark running on FPGA-based cluster
19Christoforos Kachris, www.inaccel.com, 2017, #EUres8
Speedup
• Xeon vs
Xeon +
small
FPGA
cluster
20Christoforos Kachris, www.inaccel.com, 2017, #EUres8
Comparison
21Christoforos Kachris, www.inaccel.com, 2017, #EUres8
SW (Intel Xeon)
SW+HW
(Accel on FPGA SoCs)
Spark Accelerators
22Christoforos Kachris, www.inaccel.com, 2017, #EUres8
• Develop	hardware	components	
as	IP	cores	for	widely	used	
applications
• Spark
• Logistic	regression
• Recommendation	 (ALS)
• K-means
• Linear	regression
• PageRank
• Graph	computing
Source: C. Kachris et al.,
Spark acceleration using
FPGAs, best paper award,
IEEE MOCAST 2017
3x – 50x speedup
Comparison with Amazon AWS EC
• c4 (36 cores)
• m4 (16 cores)
• f1 with our
Accelerator
23Christoforos Kachris, www.inaccel.com, 2017, #EUres8
0
10
20
30
40
50
60
c4	(36) m4	(16) f1	(Accel)
Logistic	regression	comparison
3x	- 10x	
speedup
Mllib library to be covered
MLlib contains many algorithms and utilities.
• Classification: logistic regression, naive Bayes,...
• Regression: generalized linear regression, survival
regression,...
• Recommendation: alternating least squares (ALS)
• Clustering: K-means, Gaussian mixtures (GMMs),...
• Topic modeling: latent Dirichlet allocation (LDA)
• Frequent itemsets, association rules, and sequential pattern
mining
24Christoforos Kachris, www.inaccel.com, 2017, #EUres8
VINEYARD framework
• Utilize FPGA
resources
seamlessly
More info :
vineyardh2020.eu
25Christoforos Kachris, www.inaccel.com, 2017, #EUres8
www.vineyard-
h2020.eu
26
Overview - InAccel
27Christoforos Kachris, www.inaccel.com, 2017, #EUres8
InAccel provides high performance accelerators for your
application based on novel hardware reconfigurable engines as IP
blocks.
The hardware accelerators can be used to improve the
performance of your applications both in terms of throughput and
execution time (latency).
Spark integration
The hardware accelerators of InAccel are compatible with
the Apache Spark framework and allow faster execution of
Spark applications based on MLlib (machine learning) and
GraphX libraries.
Overview - InAccel
28Christoforos Kachris, www.inaccel.com, 2017, #EUres8
Amazon compatibility
InAccel develops hardware accelerators that are compatible
with the Amazon AWS F1 platform. Use the hardware
accelerators as an IP to speedup your application, seamlessly,
without the purchase of any hardware.​
No need for code changes
InAccel provides all the
required APIs for the
seamless integration of the
accelerators without any
modifications on your
original code.
Thank you
Questions?
kachris@microlab.ntua.gr
29Christoforos Kachris, www.inaccel.com, 2017, #EUres8

Hardware Acceleration of Apache Spark on Energy-Efficient FPGAs with Christoforos Kachris

  • 1.
    Christoforos Kachris ICCS-NTUA Speedup Spark Applicationsusing FPGA Accelerators on the cloud #EUres8
  • 2.
    Performance in Spark •91% if Spark users care about performance 2Christoforos Kachris, www.inaccel.com, 2017, #EUres8 FEATURES USERS CONSIDER IMPORTANT
  • 3.
    Accelerators versus Processors 3ChristoforosKachris, www.inaccel.com, 2017, #EUres8 [Source: National Instruments: Smart Grid Ready Instrumentation”
  • 4.
    Flexibility versus Performance 4ChristoforosKachris, www.inaccel.com, 2017, #EUres8 CPU: High flexibility, lower efficiency Accelerators: High throughput, higher efficiency GPUs and FPGAs can provide massive parallelism and higher efficiency than CPUs for certain categories of applications [Source: Amazon AWS f1]
  • 5.
    Hardware Acceleration 5Christoforos Kachris,www.inaccel.com, 2017, #EUres8 module fi lter1 ( clock, rst, st rm_in, out) for (i =0; i<N UMUNITS ; i=i+1 ) always @(posed ge cloc k) intege r i,j; //index for lo ops tm p_kerne l[j] = k[i*OFF SETX]; FPGA handles compute- intensive, deeply pipelined, hardware-accelerated operations CPU handles the rest application [Source: Amazon AWS f1]
  • 6.
    Process flow usingAccelerators 6Christoforos Kachris, www.inaccel.com, 2017, #EUres8 user c4, m4 F1 (FPGA) Amazon Marketplace Download Accelerator from Marketplace Run your code on CPU Offload hard work on F1
  • 7.
    Amazon Resources 7 Amazon Machine Image (AMI) AmazonFPGA Image (AFI) EC2 F1 Instance CPU Application on F1 DDR-4 Attached Memory DDR-4 Attached Memory DDR-4 Attached Memory DDR-4 Attached Memory DDR-4 Attached Memory DDR-4 Attached Memory DDR-4 Attached Memory DDR-4 Attached Memory FPGA Link PCIe DDR Controllers Launch Instance and Load AFI [Source: Amazon AWS f1]
  • 8.
    FPGAs in AmazonAWS 8 Amazon EC2 FPGA Deployment via Marketplace Amazon Machine Image (AMI) Amazon FPGA Image (AFI) AFI is secured, encrypted, dynamically loaded into the FPGA - can’t be copied or downloaded Customers AWS Marketplace [Source: Amazon AWS f1]
  • 9.
    A use caseon Logistic regression LR is used for building predictive models for many complex pattern-matching and classification problems. It can be applied widely in such diverse areas as • bioinformatics, • finance and • data analytics. One of the most popular Machine Learning techniques. 9Christoforos Kachris, www.inaccel.com, 2017, #EUres8
  • 10.
    Hardware accelerator forLR 10Christoforos Kachris, www.inaccel.com, 2017, #EUres8
  • 11.
    Accelerated Mllib • Extensionof Spark libraries 11Christoforos Kachris, www.inaccel.com, 2017, #EUres8
  • 12.
    Spark – FPGAinterface • Spark worker • Python API • C API Support for: 12Christoforos Kachris, www.inaccel.com, 2017, #EUres8
  • 13.
    ML acceleration library 13ChristoforosKachris, www.inaccel.com, 2017, #EUres8
  • 14.
    Evaluation in cluster 14ChristoforosKachris, www.inaccel.com, 2017, #EUres8 • Evaluation on a Handwritten Digit Recognition Problem – Take an image of a handwritten single digit, and determine what that digit is. • 10 classes, 786 features • 40k lines train file • Given a new data point, models will be run, and the class with largest probability will be chosen as the predicted class.
  • 15.
    Evaluation • Evaluation ona Handwritten Digit Recognition Problem – Take an image of a handwritten single digit, and determine what that digit is. • Features of evaluated platforms 15Christoforos Kachris, www.inaccel.com, 2017, #EUres8
  • 16.
    Spark implementation 16Christoforos Kachris,www.inaccel.com, 2017, #EUres8
  • 17.
    Cluster on FPGAnodes 17Christoforos Kachris, www.inaccel.com, 2017, #EUres8
  • 18.
  • 19.
    Spark running onFPGA-based cluster 19Christoforos Kachris, www.inaccel.com, 2017, #EUres8
  • 20.
    Speedup • Xeon vs Xeon+ small FPGA cluster 20Christoforos Kachris, www.inaccel.com, 2017, #EUres8
  • 21.
    Comparison 21Christoforos Kachris, www.inaccel.com,2017, #EUres8 SW (Intel Xeon) SW+HW (Accel on FPGA SoCs)
  • 22.
    Spark Accelerators 22Christoforos Kachris,www.inaccel.com, 2017, #EUres8 • Develop hardware components as IP cores for widely used applications • Spark • Logistic regression • Recommendation (ALS) • K-means • Linear regression • PageRank • Graph computing Source: C. Kachris et al., Spark acceleration using FPGAs, best paper award, IEEE MOCAST 2017 3x – 50x speedup
  • 23.
    Comparison with AmazonAWS EC • c4 (36 cores) • m4 (16 cores) • f1 with our Accelerator 23Christoforos Kachris, www.inaccel.com, 2017, #EUres8 0 10 20 30 40 50 60 c4 (36) m4 (16) f1 (Accel) Logistic regression comparison 3x - 10x speedup
  • 24.
    Mllib library tobe covered MLlib contains many algorithms and utilities. • Classification: logistic regression, naive Bayes,... • Regression: generalized linear regression, survival regression,... • Recommendation: alternating least squares (ALS) • Clustering: K-means, Gaussian mixtures (GMMs),... • Topic modeling: latent Dirichlet allocation (LDA) • Frequent itemsets, association rules, and sequential pattern mining 24Christoforos Kachris, www.inaccel.com, 2017, #EUres8
  • 25.
    VINEYARD framework • UtilizeFPGA resources seamlessly More info : vineyardh2020.eu 25Christoforos Kachris, www.inaccel.com, 2017, #EUres8
  • 26.
  • 27.
    Overview - InAccel 27ChristoforosKachris, www.inaccel.com, 2017, #EUres8 InAccel provides high performance accelerators for your application based on novel hardware reconfigurable engines as IP blocks. The hardware accelerators can be used to improve the performance of your applications both in terms of throughput and execution time (latency). Spark integration The hardware accelerators of InAccel are compatible with the Apache Spark framework and allow faster execution of Spark applications based on MLlib (machine learning) and GraphX libraries.
  • 28.
    Overview - InAccel 28ChristoforosKachris, www.inaccel.com, 2017, #EUres8 Amazon compatibility InAccel develops hardware accelerators that are compatible with the Amazon AWS F1 platform. Use the hardware accelerators as an IP to speedup your application, seamlessly, without the purchase of any hardware.​ No need for code changes InAccel provides all the required APIs for the seamless integration of the accelerators without any modifications on your original code.
  • 29.