7. www.inaccel.com™
Hyperscale Data Centers
7
Data Center Site Sq ft
Facebook (Santa Clara) 86,000
Google (South Carolina) 200,000
HP (Atlanta) 200,000
IBM (Colorado) 300,000
Microsoft (Chicago) 700,000
[Source: “How Clean is Your Cloud?”, Greenpeace 2011]
Wembley Stadium:172,000 square ft
8. www.inaccel.com™
Data Center Power consumption
8
˃ Data centers consumed 330 Billion KWh in 2007 and is expected
to reach 1012 Billion KWh in 2020
2007 (Billion
KWh)
2020 (Billion
KWh)
Data Centers 330 1012
Telecoms 293 951
Total Cloud 623 1963
[Source: How Clean is Your Data Center, Greenpeace,
2012
10. www.inaccel.com™
Specialization
10
CPU: High speed, lower efficiency GPU/FPGA: High throughput, higher efficiency
GPUs and FPGAs can provide massive parallelism and higher
efficiency than CPUs for certain categories of applications
16. www.inaccel.com™
Market size
16
˃ The data center accelerator market is expected to reach USD 21.19
billion by 2023 from USD 2.84 billion by 2018, at a CAGR of 49.47% from
2018 to 2023.
˃ The market for FPGA is expected to grow at the highest CAGR during
the forecast period owing to the increasing adoption of FPGAs for the
acceleration of enterprise workloads.
˃ The global MLaaS market is expected to register a CAGR of about
43.46% during 2018-2023 (the forecast period), to reach a value of USD
8.315 billion, by 2023, from USD 0.932 billion, as of 2017
17. www.inaccel.com™
Open FPGA Data Science
˃ We help Data science/engineers run up to 15x faster their applications without
changing their code
17
18. www.inaccel.com™
Integration solution for Application Acceleration
18
InAccel Scalable FPGA Resource Manager
Accelerated ML suite
On-premise Cloud
Higher Performance
Up to 16x Speedup compared to
highly optimized libraries
Lower Cost
Up to 4x lower TCO
Zero-code changes
Seamless integration to widely
used frameworks
Easy deployment
Docker-based container for
seamless integration
On-prem or on cloud
Available on cloud and on-prem
“Automated deployment, scaling and management of FPGA clusters”
20. www.inaccel.com™
Current Framework for FPGAs on the cloud
Current limitations without the InAccel
Coral Manager
˃ Currently only one application can talk to
each FPGA accelerator through OpenCL
˃ Every application can talk to a single
FPGA.
˃ Complex device sharing
• From multiple threads/processes
• Even from the same thread
˃ Explicit allocation of the resources
(memory/compute units)
App1
Vendor drivers
Single FPGA
20
21. InAccel FPGA Resource manager
www.inaccel.com
Seamless integration with C/C++, Python,
Java and Scala
Automatic configuration and management
of the FPGA bitstreams and memory
Seamless sharing of the FPGA cluster from
multiple applications
Automatic virtualization and scheduling of
the applications to the FPGA cluster
Fully scalable: Scale-up (multiple FPGAs
per node) and Scale-out (multiple FPGA-
based servers over Spark)
21
InAccel Coral
Resource
Manager
InAccel Runtime
- Resource isolation
Applications
FPGA drivers
Server
FPGA
Kernels
“ automating deployment, scaling, and management of FPGAs”
22. www.inaccel.com™
InAccel Coral FPGA Resource Manager
˃ Coral abstracts FPGA resources
(device, memory), enabling fault-tolerant
heterogeneous distributed systems to
easily be built and run effectively.
22
Worlds’ first FPGA Orchestrator:
Program against your FPGAs like it’s
a single pool of accelerators
InAccel Coral
Resource
Manager
InAccel Runtime
- Resource isolation
Applications
FPGA drivers
Server
FPGA
Kernels
23. www.inaccel.com™
InAccel’s Coral FPGA Manager
Acceleration abstraction layer to virtualize,
manage and monitor the FPGA resources
˃ Management
• Automatic device (re-)configurations and efficient
memory transfers
˃ Fault-tolerance
• Highly-available service on top of a cluster of
FPGAs, each of which may be prone to failures
˃ Scalability
• Automatic scale-up from single devices (e.g. f1.x2) to
multi-FPGA systems (e.g. f1.x4, f1.x16)
App1
InAccel FPGA Manager
FPGA Cluster
C++/Java socket
App2 App3
RTE/Drivers (Intel/Xilinx)
OpenCL / OPAE
23
Documentation: https://docs.inaccel.com/latest/
24. www.inaccel.com™
FPGA Manager features
Ease of Use
˃ Write applications quickly in C/C++, Java,
Scala and Python.
InAccel offers all the required high-level
functions that make it easy to build and
accelerate parallel apps. No need to modify
your application to use an unfamiliar parallel
programming language (like OpenCL)
App1
InAccel FPGA Manager
FPGA Cluster
C++/Java socket
App2 App3
RTE/Drivers (Intel/Xilinx)
OpenCL / OPAE
24
26. www.inaccel.com™
Software simplicity
26
• 30x simpler code
• Software-alike function invoking
• No need for OpenCL directives
https://github.com/Xilinx/AWS-F1-Developer-Labs/blob/master/helloworld_ocl/src/host.cpp
27. www.inaccel.com™
Example on scaling to 2 FPGA using the resource
manager for logistic regression
27
1.86x speedup using 2 FPGAs
simply by changing a line
inaccel start --
fpga=xilinx:0,xilinx:1
You specify how many
FPGAs you want to use
inaccel start --fpga=all
or
28. www.inaccel.com™
InAccel’s Coral manager integrated with Spark
˃ Integrated solution that allows
Scale Up (1, 2, or 8 FPGAs per node)
Scale Out to multiple nodes (using Spark API)
Seamless integration
Docker-based deployment
28
Worker node (f1)
Driver Program
Executor
Cluster Manager
Task
Task
SparkContext
FPGA RTE
RTE & Man.
interface
Worker node (f1)
Executor
Task
Task
FPGA RTE
RTE & Man.
FPGA cluster FPGA cluster
30. www.inaccel.com™
Bitstream repository
˃ FPGA Resource Manager is
integrated with Jfrog
bitstream repository that is
used to store FPGA
bitstreams
30
Application FPGA bitstream
repository
FPGA cluster
https://store.inaccel.com
31. www.inaccel.com™
Performance evaluation on Machine Learning
˃ Up to 15x speedup for LR ML
(7.5x overall)
˃ Up to 14x speedup for Kmeans
ML (6.2x overall)
˃ Spark- GPU* (3.8x – 5.7x)
˃ F1.4x
16 cores + 2 FPGAs (InAccel)
˃ R5d.4x
16 cores
31
r5d.4x
f1.4x (InAccel)
0 200 400 600 800 1000 1200 1400
Logistic Regression execution time MNIST
24GB, 100 iter. (secs)
Data preprocessing Data transformation ML training
15x Speedup
r5d.4x
f1.4x (InAccel)
0 500 1000 1500 2000 2500
K-Means clustering exection time
MNIST 24GB, 100 iter. (secs)
Data preprocessing Data transformation ML training
14x Speedup
*[Spark-GPU: An Accelerated In-Memory Data Processing Engine on Clusters]
32. www.inaccel.com™
Demo on logistic regression
32
˃ 14x faster training of logistic regression Python Jupyter, Instantly
https://www.inaccel.com/wp-content/uploads/InAccel_ML_accelerate_sound.mp4
34. www.inaccel.com™
Personnel cost savings
˃ Reduction on the complexity to program the FPGAs
˃ Over 4x lower LoC (lines of code)
˃ Significant savings in time to run and debug the FPGAs
34
35. www.inaccel.com™
FPGA Manager deployment
Easy to Deploy
˃ Launch a container with InAccel's Docker
image or even deploy it as a daemonset on a
Kubernetes cluster and enjoy acceleration
services at the drop of a hat.
˃ https://hub.docker.com/u/inaccel/
FPGA Manager
• Easy deployment
• Easy scalability
• Easy integration
35
38. www.inaccel.com™
Serverless deployment
˃ Integrated framework for serverless
deployment
˃ Compatible with Kubernetes
˃ Compatible with Kubeless, Knative
˃ Users only have to upload the images on
the S3 bucket and then InAccel’s FPGA
Manager automatically deploy the cluster
of FPGAs, process the data and then store
back the results on the S3 bucket.
˃ Users do not have to know anything about
the FPGA execution.
38
Amazon S3 Amazon S3
Cluster of Amazon
EC2 f1 instances
trigger
InAccel FPGA Resource Manager
f1 library of
accelerated
functions
Upload files Download files
Accelerated
function
https://medium.com/@inaccel/fpgas-goes-serverless-on-kubernetes-55c1d39c5e30
39. www.inaccel.com™
Integration with Arrow
˃ Seamless experience for the application
developer writing software using Arrow-
backed dataframes.
˃ Arrow adoptance growth makes our
integration even more profound.
˃ Zero extra overhead to other operations
supported by Arrow - e.g serialization
39
Apache Arrow specifies a standardized language-
independent columnar memory format for flat and
hierarchical data, organized for efficient analytic
operations on modern hardware. The Arrow memory
format supports zero-copy reads for efficient data-
access without serialization overhead. —Apache
Software
41. www.inaccel.com™
Job openings
˃ Software engineer
Java,
Python,
Scala,
Go lang
C++
˃ DevOps
Kubernetes,
K-native
Anthos
Mesos
Docker
41
• Collaboration with Hyperscale companies
• State of the art technology
• International exposure
Interested?
jobs@inaccel.com