17th Athens Big Data Meetup - 1st Talk - Speedup Machine Application Learning on the Cloud Using Hardware Accelerators

Leading Application Acceleration
Christoforos Kachris
CEO, co-founder
chris@inaccel.com

www.inaccel.com™
Internet in 2019
2

www.inaccel.com™
Growth of data
3
10x
2x
4x

www.inaccel.com™
Big data growth
4

www.inaccel.com™
End of Moore’s law
˃ What’s next?
5

www.inaccel.com™
Data Deluge Gap
6

www.inaccel.com™
Hyperscale Data Centers
7
Data Center Site Sq ft
Facebook (Santa Clara) 86,000
Google (South Carolina) 200,000
HP (Atlanta) 200,000
IBM (Colorado) 300,000
Microsoft (Chicago) 700,000
[Source: “How Clean is Your Cloud?”, Greenpeace 2011]
Wembley Stadium:172,000 square ft

www.inaccel.com™
Data Center Power consumption
8
˃ Data centers consumed 330 Billion KWh in 2007 and is expected
to reach 1012 Billion KWh in 2020
2007 (Billion
KWh)
2020 (Billion
KWh)
Data Centers 330 1012
Telecoms 293 951
Total Cloud 623 1963
[Source: How Clean is Your Data Center, Greenpeace,
2012

www.inaccel.com™
Hardware accelerators
(aka Customized architectures)
9

www.inaccel.com™
Specialization
10
CPU: High speed, lower efficiency GPU/FPGA: High throughput, higher efficiency
GPUs and FPGAs can provide massive parallelism and higher
efficiency than CPUs for certain categories of applications

www.inaccel.com™
The tradeoffs
11

www.inaccel.com™
Domain Specific Accelerators
12
module filter1 (clock, rst, strm_in, strm_out)
for (i=0; i<NUMUNITS; i=i+1)
always@(posedge clock)
integer i,j; //index for loops
tmp_kernel[j] = k[i*OFFSETX];
FPGA handles compute-
intensive, deeply pipelined,
hardware-accelerated
operations
CPU handles the rest
application
InAccel 800 sec80 sec
200
sec
Source: amazon, Inc.
David Patterson, 2019

www.inaccel.com™
Accelerators goes in the cloud
13
?

www.inaccel.com™
helps companies speedup
their applications
by providing ready-to-use
accelerators-as-a-service in
the cloud or on-prem
15x Speedup
4x Lower TCO
Zero code changes
14

www.inaccel.com™
Applications and Platforms
• Applications
• Platforms
• Partnerships
Machine learning Financial Analytics Genomics
15

www.inaccel.com™
Market size
16
˃ The data center accelerator market is expected to reach USD 21.19
billion by 2023 from USD 2.84 billion by 2018, at a CAGR of 49.47% from
2018 to 2023.
˃ The market for FPGA is expected to grow at the highest CAGR during
the forecast period owing to the increasing adoption of FPGAs for the
acceleration of enterprise workloads.
˃ The global MLaaS market is expected to register a CAGR of about
43.46% during 2018-2023 (the forecast period), to reach a value of USD
8.315 billion, by 2023, from USD 0.932 billion, as of 2017

www.inaccel.com™
Open FPGA Data Science
˃ We help Data science/engineers run up to 15x faster their applications without
changing their code
17

www.inaccel.com™
Integration solution for Application Acceleration
18
InAccel Scalable FPGA Resource Manager
Accelerated ML suite
On-premise Cloud
Higher Performance
Up to 16x Speedup compared to
highly optimized libraries
Lower Cost
Up to 4x lower TCO
Zero-code changes
Seamless integration to widely
used frameworks
Easy deployment
Docker-based container for
seamless integration
On-prem or on cloud
Available on cloud and on-prem
“Automated deployment, scaling and management of FPGA clusters”

www.inaccel.com™
Products (Accelerators as IP)
www.inaccel.com 19
• Logistic Regression
• K-means Clustering
• Naïve-Bayes
• FAISS (Similarity search)
•
Speedup
15x
14x
5x
2x
6x
Cost reduction
4x
4x
2x
1.5x
2x
https://github.com/inaccel

www.inaccel.com™
Current Framework for FPGAs on the cloud
Current limitations without the InAccel
Coral Manager
˃ Currently only one application can talk to
each FPGA accelerator through OpenCL
˃ Every application can talk to a single
FPGA.
˃ Complex device sharing
• From multiple threads/processes
• Even from the same thread
˃ Explicit allocation of the resources
(memory/compute units)
App1
Vendor drivers
Single FPGA
20

InAccel FPGA Resource manager
www.inaccel.com
Seamless integration with C/C++, Python,
Java and Scala
Automatic configuration and management
of the FPGA bitstreams and memory
Seamless sharing of the FPGA cluster from
multiple applications
Automatic virtualization and scheduling of
the applications to the FPGA cluster
Fully scalable: Scale-up (multiple FPGAs
per node) and Scale-out (multiple FPGA-
based servers over Spark)
21
InAccel Coral
Resource
Manager
InAccel Runtime
- Resource isolation
Applications
FPGA drivers
Server
FPGA
Kernels
“ automating deployment, scaling, and management of FPGAs”

www.inaccel.com™
InAccel Coral FPGA Resource Manager
˃ Coral abstracts FPGA resources
(device, memory), enabling fault-tolerant
heterogeneous distributed systems to
easily be built and run effectively.
22
Worlds’ first FPGA Orchestrator:
Program against your FPGAs like it’s
a single pool of accelerators
InAccel Coral
Resource
Manager
InAccel Runtime
- Resource isolation
Applications
FPGA drivers
Server
FPGA
Kernels

www.inaccel.com™
InAccel’s Coral FPGA Manager
Acceleration abstraction layer to virtualize,
manage and monitor the FPGA resources
˃ Management
• Automatic device (re-)configurations and efficient
memory transfers
˃ Fault-tolerance
• Highly-available service on top of a cluster of
FPGAs, each of which may be prone to failures
˃ Scalability
• Automatic scale-up from single devices (e.g. f1.x2) to
multi-FPGA systems (e.g. f1.x4, f1.x16)
App1
InAccel FPGA Manager
FPGA Cluster
C++/Java socket
App2 App3
RTE/Drivers (Intel/Xilinx)
OpenCL / OPAE
23
Documentation: https://docs.inaccel.com/latest/

www.inaccel.com™
FPGA Manager features
Ease of Use
˃ Write applications quickly in C/C++, Java,
Scala and Python.
InAccel offers all the required high-level
functions that make it easy to build and
accelerate parallel apps. No need to modify
your application to use an unfamiliar parallel
programming language (like OpenCL)
App1
InAccel FPGA Manager
FPGA Cluster
C++/Java socket
App2 App3
RTE/Drivers (Intel/Xilinx)
OpenCL / OPAE
24

www.inaccel.com™
Advantages
25
Distribution of
multi-thread
applications to
multiple clusters
Resources sharing
from multiple
applications
Instant
Scalability
Virtualization

www.inaccel.com™
Software simplicity
26
• 30x simpler code
• Software-alike function invoking
• No need for OpenCL directives
https://github.com/Xilinx/AWS-F1-Developer-Labs/blob/master/helloworld_ocl/src/host.cpp

www.inaccel.com™
Example on scaling to 2 FPGA using the resource
manager for logistic regression
27
1.86x speedup using 2 FPGAs
simply by changing a line
inaccel start --
fpga=xilinx:0,xilinx:1
You specify how many
FPGAs you want to use
inaccel start --fpga=all
or

www.inaccel.com™
InAccel’s Coral manager integrated with Spark
˃ Integrated solution that allows
Scale Up (1, 2, or 8 FPGAs per node)
Scale Out to multiple nodes (using Spark API)
Seamless integration
Docker-based deployment
28
Worker node (f1)
Driver Program
Executor
Cluster Manager
Task
Task
SparkContext
FPGA RTE
RTE & Man.
interface
Worker node (f1)
Executor
Task
Task
FPGA RTE
RTE & Man.
FPGA cluster FPGA cluster

www.inaccel.com™
Insight into the FPGA utilization
29

www.inaccel.com™
Bitstream repository
˃ FPGA Resource Manager is
integrated with Jfrog
bitstream repository that is
used to store FPGA
bitstreams
30
Application FPGA bitstream
repository
FPGA cluster
https://store.inaccel.com

www.inaccel.com™
Performance evaluation on Machine Learning
˃ Up to 15x speedup for LR ML
(7.5x overall)
˃ Up to 14x speedup for Kmeans
ML (6.2x overall)
˃ Spark- GPU* (3.8x – 5.7x)
˃ F1.4x
16 cores + 2 FPGAs (InAccel)
˃ R5d.4x
16 cores
31
r5d.4x
f1.4x (InAccel)
0 200 400 600 800 1000 1200 1400
Logistic Regression execution time MNIST
24GB, 100 iter. (secs)
Data preprocessing Data transformation ML training
15x Speedup
r5d.4x
f1.4x (InAccel)
0 500 1000 1500 2000 2500
K-Means clustering exection time
MNIST 24GB, 100 iter. (secs)
Data preprocessing Data transformation ML training
14x Speedup
*[Spark-GPU: An Accelerated In-Memory Data Processing Engine on Clusters]

www.inaccel.com™
Demo on logistic regression
32
˃ 14x faster training of logistic regression Python Jupyter, Instantly
https://www.inaccel.com/wp-content/uploads/InAccel_ML_accelerate_sound.mp4

www.inaccel.com™
Current InAccel
$0.00
$50,000.00
$100,000.00
$150,000.00
$200,000.00
$250,000.00
$300,000.00
$350,000.00
Yearly OpEx for 30 nodes ($)
Cost reduction
˃ For a cluster of 60 nodes on aws (24/7 – 1 year):
R5d.4x ($1.15/hour) : $302,220
˃ For a cluster of only 2 f1 nodes on aws (24/7 – 1 year) providing the same
performance:
F1.4x ($3.3/hour) : $57,816
+ 20k license of ML accelerators
Over $200k savings per year
˃ 16x smaller footprint
33
$200k savings

www.inaccel.com™
Personnel cost savings
˃ Reduction on the complexity to program the FPGAs
˃ Over 4x lower LoC (lines of code)
˃ Significant savings in time to run and debug the FPGAs
34

www.inaccel.com™
FPGA Manager deployment
Easy to Deploy
˃ Launch a container with InAccel's Docker
image or even deploy it as a daemonset on a
Kubernetes cluster and enjoy acceleration
services at the drop of a hat.
˃ https://hub.docker.com/u/inaccel/
FPGA Manager
• Easy deployment
• Easy scalability
• Easy integration
35

www.inaccel.com™
Integration with SQL Big Data Cluster 2019
36

www.inaccel.com™
Integration with SQL Big Data Cluster 2019
37
˃ Offload Spark ML tasks directly to the FPGAs

www.inaccel.com™
Serverless deployment
˃ Integrated framework for serverless
deployment
˃ Compatible with Kubernetes
˃ Compatible with Kubeless, Knative
˃ Users only have to upload the images on
the S3 bucket and then InAccel’s FPGA
Manager automatically deploy the cluster
of FPGAs, process the data and then store
back the results on the S3 bucket.
˃ Users do not have to know anything about
the FPGA execution.
38
Amazon S3 Amazon S3
Cluster of Amazon
EC2 f1 instances
trigger
InAccel FPGA Resource Manager
f1 library of
accelerated
functions
Upload files Download files
Accelerated
function
https://medium.com/@inaccel/fpgas-goes-serverless-on-kubernetes-55c1d39c5e30

www.inaccel.com™
Integration with Arrow
˃ Seamless experience for the application
developer writing software using Arrow-
backed dataframes.
˃ Arrow adoptance growth makes our
integration even more profound.
˃ Zero extra overhead to other operations
supported by Arrow - e.g serialization
39
Apache Arrow specifies a standardized language-
independent columnar memory format for flat and
hierarchical data, organized for efficient analytic
operations on modern hardware. The Arrow memory
format supports zero-copy reads for efficient data-
access without serialization overhead. —Apache
Software

www.inaccel.com™
InAccel, Inc. Corporate overview
˃ Founded in January 2018
˃ Registered in Delaware, USA
˃ Membership:
40

www.inaccel.com™
Job openings
˃ Software engineer
Java,
Python,
Scala,
Go lang
C++
˃ DevOps
Kubernetes,
K-native
Anthos
Mesos
Docker
41
• Collaboration with Hyperscale companies
• State of the art technology
• International exposure
Interested?
jobs@inaccel.com

Application Acceleration, seamlessly
www.inaccel.com
info@inaccel.com
USA:
500 Delaware Ave STE 1, #1960
Wilmington, DE 19801
USA
Europe (Design Center):
Formionos 47
Kesariani 116 33
Athens, Greece

17th Athens Big Data Meetup - 1st Talk - Speedup Machine Application Learning on the Cloud Using Hardware Accelerators

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to 17th Athens Big Data Meetup - 1st Talk - Speedup Machine Application Learning on the Cloud Using Hardware Accelerators

Similar to 17th Athens Big Data Meetup - 1st Talk - Speedup Machine Application Learning on the Cloud Using Hardware Accelerators (20)

More from Athens Big Data

More from Athens Big Data (20)

Recently uploaded

Recently uploaded (20)

17th Athens Big Data Meetup - 1st Talk - Speedup Machine Application Learning on the Cloud Using Hardware Accelerators