SlideShare a Scribd company logo
1 of 37
Download to read offline
Leading Application Acceleration
www.inaccel.com™
helps companies speedup
their applications
by providing ready-to-use
accelerators-as-a-service in
the cloud or on-prem
15x Speedup
4x Lower TCO
Zero code changes
2
www.inaccel.com™
Applications and Platforms
• Applications
• Platforms
• Partnerships
Machine learning Financial Analytics Genomics
3
www.inaccel.com™
Open FPGA Data Science
˃ We help Data science/engineers run up to 15x faster their applications without
changing their code
4
www.inaccel.com™
Integration solution for Application Acceleration
5
InAccel Scalable FPGA Resource Manager
Accelerated ML suite
On-premise Cloud
Higher Performance
Up to 16x Speedup compared to
highly optimized libraries
Lower Cost
Up to 4x lower TCO
Zero-code changes
Seamless integration to widely
used frameworks
Easy deployment
Docker-based container for
seamless integration
On-prem or on cloud
Available on cloud and on-prem
www.inaccel.com™
InAccel Coral FPGA Resource Manager
˃ Coral abstracts FPGA resources
(device, memory), enabling fault-tolerant
heterogeneous distributed systems to
easily be built and run effectively.
7
Worlds’ first FPGA Orchestrator:
Program against your FPGAs like it’s
a single pool of accelerators
InAccel Coral
Resource
Manager
InAccel Runtime
- Resource isolation
Applications
FPGA drivers
Server
FPGA
Kernels
www.inaccel.com™
Current Framework for FPGAs on the cloud
Current limitations without the InAccel
Coral Manager
˃ Currently only one application can talk to
each FPGA accelerator
˃ Every application can talk to a single
FPGA.
˃ Complex device sharing
• From multiple threads/processes
• Even from the same thread
˃ Explicit allocation of the resources
(memory/compute units)
App1
Vendor drivers
Single FPGA
8
www.inaccel.com™
InAccel’s Coral FPGA Manager
Acceleration abstraction layer to virtualize,
manage and monitor the FPGA resources
˃ Management
• Automatic device (re-)configurations and efficient
memory transfers
˃ Fault-tolerance
• Highly-available service on top of a cluster of
FPGAs, each of which may be prone to failures
˃ Scalability
• Automatic scale-up from single devices (e.g. f1.x2) to
multi-FPGA systems (e.g. f1.x4, f1.x16)
App1
InAccel FPGA Manager
FPGA Cluster
C++/Java socket
App2 App3
RTE/Drivers (Intel/Xilinx)
OpenCL / OPAE
9
Documentation: https://docs.inaccel.com/latest/
www.inaccel.com™
FPGA Manager features
Ease of Use
˃ Write applications quickly in C/C++, Java,
Scala and Python.
InAccel offers all the required high-level
functions that make it easy to build and
accelerate parallel apps. No need to modify
your application to use an unfamiliar parallel
programming language (like OpenCL)
App1
InAccel FPGA Manager
FPGA Cluster
C++/Java socket
App2 App3
RTE/Drivers (Intel/Xilinx)
OpenCL / OPAE
10
www.inaccel.com™
FPGA Manager features
Runs Anywhere
˃ Runs on any FPGA platform (Intel), giving
you the freedom to take full advantage of on-
premises, or public Cloud (Alibaba, Nimbix,
etc.) infrastructure.
On-premise
App1
InAccel FPGA Manager
FPGA Cluster
App2 App3
RTE/Drivers (Intel/Xilinx)
OpenCL / OPAE
11
www.inaccel.com™
FPGA Manager features
Resource Management
˃ Automatic resource configuration and
task scheduling across entire FPGA
clusters in private datacenters or public
cloud environments.
Coral examines the state of the FPGAs and
implements load-balancing policies across
them, efficiently taking care of all the required
device configurations and memory transfers.
Privacy / Isolation
˃ Coral allows the secure sharing of the
hardware resources among different
users and multiple processes or threads.
First class isolation support for accelerator
cores and FPGA memory.
App1
InAccel FPGA Manager
FPGA Cluster
App2 App3
RTE/Drivers (Intel/Xilinx)
OpenCL / OPAE
12
www.inaccel.com™
FPGA Manager use case
Abstracts away FPGA cluster
˃ App1: 2 threads that use LR
Just send LR requests
‒ The FPGA manager send the requests to
FPGAs programmed with LR (no need to
specify which FPGA/kernel)
˃ App2: Uses both LR and KM
Just send LR-KM requests
‒ The FPGA manager send the requests to
FPGAs programmed with LR and KM
kernels
˃ App3: 1 thread uses KM
Just send Km requests
‒ The FPGA manager sends the requested
to FPGAs programmed with KM kernels
InAccel FPGA Manager
FPGA Cluster
RTE/Drivers
OpenCL / OPAE
13
Logistic Regression
KMeans clustering
LR
LR
LR
LR
LR
KM
LR
KM
KM
KM
KM
KM
To LR To LR-KM To KM
LR
LR
LR
LR
FPGA1: LR
FPGA3: LR FPGA4: LR-KM
FPGA2: KM
Send sync or
async requests
App3App2
App1
www.inaccel.com™
Advantages
14
Distribution of
multi-thread
applications to
multiple clusters
Resources sharing
from multiple
applications
www.inaccel.com™
Graphical monitoring tool
15
FPGA utilization per kernel
www.inaccel.com™
Bitstream repository
˃ FPGA Resource Manager is
integrated with Artifact
repository that is used to
store FPGA bitstreams
16
Application FPGA bitstream
repository
FPGA cluster
https://store.inaccel.com
www.inaccel.com™
Pricing model (enterprise)
www.inaccel.com
FPGA Resource manager
˃ Pricing model per node (server)
˃ Each node can have 1 to 8 FPGAs
ML Accelerators on-prem
˃ Pricing model per node (server)
FPGA Resource
manager
20 nodes or less More than 20 nodes
Monthly $200/node $150/node
Yearly $2,000/node $1,500/node
ML Accelerator 20 nodes or less More than 20 nodes
Hour $2/node $1.5/node
Yearly $10,000/node $8,000/node 17
www.inaccel.com™
Performance evaluation on Machine Learning
˃ Up to 15x speedup for LR ML
(7.5x overall)
˃ Up to 14x speedup for Kmeans
ML (6.2x overall)
˃ Spark- GPU* (3.8x – 5.7x)
˃ F1.4x
16 cores + 2 FPGAs (InAccel)
˃ R5d.4x
16 cores
18
r5d.4x
f1.4x (InAccel)
0 200 400 600 800 1000 1200 1400
Logistic Regression execution time MNIST
24GB, 100 iter. (secs)
Data preprocessing Data transformation ML training
15x Speedup
r5d.4x
f1.4x (InAccel)
0 500 1000 1500 2000 2500
K-Means clustering exection time
MNIST 24GB, 100 iter. (secs)
Data preprocessing Data transformation ML training
14x Speedup
*[Spark-GPU: An Accelerated In-Memory Data Processing Engine on Clusters]
www.inaccel.com™
Cost reduction
˃ Up to 2.6x lower cost and 15x speedup
˃ F1.4x($3.3)
16 cores + 2 FPGAs (InAccel)
˃ R5d.4x ($1.15)
16 cores
19
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6
Logistic regression
K-means clustering
Total cost of execution ($)
f1.4x (InAccel) r54.4x
2.6x Lower OpEx
www.inaccel.com™
Current InAccel
$0.00
$50,000.00
$100,000.00
$150,000.00
$200,000.00
$250,000.00
$300,000.00
$350,000.00
Yearly OpEx for 30 nodes ($)
Cost reduction
˃ For a cluster of 60 nodes on aws (24/7 – 1 year):
R5d.4x ($1.15/hour) : $302,220
˃ For a cluster of only 2 f1 nodes on aws (24/7 – 1 year) providing the same
performance:
F1.4x ($3.3/hour) : $57,816
+ 20k license of ML accelerators
Over $200k savings per year
˃ 16x smaller footprint
20
$200k savings
www.inaccel.com™
Personnel cost savings
˃ Reduction on the complexity to program the FPGAs
˃ Over 4x lower LoC (lines of code)
˃ Significant savings in time to run and debug the FPGAs
21
www.inaccel.com™
Simpler software for FPGA communication
22
InAccel SW snapshot
Using FPGA manager Vanilla C Code snapshot
Typical use of
OpenCL for
FPGAs
InAccel FPGA Manager allows to keep the simplicity and the
performance efficiency of the FPGA without adding complexity
www.inaccel.com™
Software simplicity
23
30x simpler code
https://github.com/Xilinx/AWS-F1-Developer-Labs/blob/master/helloworld_ocl/src/host.cpp
www.inaccel.com™
Example on scaling to 2 FPGA using the resource
manager for logistic regression
24
1.86x speedup using 2 FPGAs
simply by changing a line
inaccel start --fpga=xilinx:0,xilinx:1
You specify how many
FPGAs you want to use
inaccel start --fpga=all
or
www.inaccel.com™
FPGA Manager deployment
Easy to Deploy
˃ Launch a container with InAccel's Docker
image or even deploy it as a daemonset on a
Kubernetes cluster and enjoy acceleration
services at the drop of a hat.
˃ https://hub.docker.com/u/inaccel/
FPGA Manager
• Easy deployment
• Easy scalability
• Easy integration
25
www.inaccel.com™
Serverless deployment
˃ Integrated framework for serverless
deployment
˃ Compatible with Kubernetes
˃ Compatible with Kubeless, Knative
˃ Users only have to upload the images on
the S3 bucket and then InAccel’s FPGA
Manager automatically deploy the cluster
of FPGAs, process the data and then store
back the results on the S3 bucket.
˃ Users do not have to know anything about
the FPGA execution.
26
Amazon S3 Amazon S3
Cluster of Amazon
EC2 f1 instances
trigger
InAccel FPGA Resource Manager
f1 library of
accelerated
functions
Upload files Download files
Accelerated
function
https://medium.com/@inaccel/fpgas-goes-serverless-on-kubernetes-55c1d39c5e30
www.inaccel.com™
InAccel’s Coral manager integrated with Spark
˃ Integrated solution that allows
Scale Up (1, 2, or 8 FPGAs per node)
Scale Out to multiple nodes (using Spark API)
Seamless integration
Docker-based deployment
27
Worker node (f1)
Driver Program
Executor
Cluster Manager
Task
Task
SparkContext
FPGA RTE
RTE & Man.
interface
Worker node (f1)
Executor
Task
Task
FPGA RTE
RTE & Man.
FPGA cluster FPGA cluster
www.inaccel.com™
Speedup comparison - Scalability
˃ Up to 10x speedup compared to 32 cores based on f1.x2
Cluster of 4 f1 (SW) Cluster of 4 f1 (SW + InAccel)
f1.x2largef1.x2large
ML
Accel
ML
Accel
ML
Accel
ML
Accel
f1.x2large f1.x2large
1
10.2x
4x f1.x2large (32 cores) 4x f1.x2large
(32cores+InAccel)
Speedup on cluster of f1.x2 using
InAccel
28
www.inaccel.com™
Speed up
˃ Up to 12x speedup compared to 64 cores on f1.x16
1.00
12.14
f1.16xlarge (sw) f1.16xlarge (hw)
Speedup of f1.x16 with 8 InAccel
FPGA kernels
f1.x16large (SW)
64 cores
f1.x16large (SW + 8 InAccel cores)
64 cores + 8 FPGAs with InAccel
MLAccel
MLAccel
MLAccel
MLAccel
MLAccel
MLAccel
MLAccel
MLAccel
29
www.inaccel.com™
Speedup comparison
˃ 3x Speedup compared to r4
˃ 2x lower OpEx
1.00
3.18
cluster of 4 r4 cluster of 4 f1.x2
Speedup comparison normalized on cost
for a cluster of 4 nodes ($2/hour/node)
Cluster of 4 r4 (SW) Cluster of 4 f1 (SW + InAccel)
r4 (32 cores each –
128 cores total)
ML
Accel
ML
Accel
ML
Accel
ML
Accel
f1.x2large f1.x2large
30
www.inaccel.com™
InAccel Docker Service
˃ Sustain FPGA driver
compatibility between the host
and the containers
• discover available resources
• mount/isolate visible devices
‒ forget --priviledged
• resolve library dependencies
31
FPGAs
(Intel/Xilinx)
Server
FPGA
RunTime Host OS
InAccel
Container Runtime Docker engine
App App App
InAccel’s Coral
Device Plugin
containers
www.inaccel.com™
Integration with Arrow
˃ Seamless experience for the application
developer writing software using Arrow-
backed dataframes.
˃ Arrow adoptance growth makes our
integration even more profound.
˃ Zero extra overhead to other operations
supported by Arrow - e.g serialization
32
Apache Arrow specifies a standardized language-
independent columnar memory format for flat and
hierarchical data, organized for efficient analytic
operations on modern hardware. The Arrow memory
format supports zero-copy reads for efficient data-access
without serialization overhead. —Apache Software
www.inaccel.com™
Integration with Arrow
˃ Minimize copy overheads to convey data
to the accelerator data plane.
˃ Single DMA operation from host memory
to DDR memory on FPGA.
˃ Shared memory between CPU and
FPGA will reduce the overhead to zero.
˃ (CAPI – CCIX)
33
Apache Arrow specifies a standardized language-
independent columnar memory format for flat and
hierarchical data, organized for efficient analytic
operations on modern hardware. The Arrow memory
format supports zero-copy reads for efficient data-access
without serialization overhead. —Apache Software
www.inaccel.com™
Arrow Integration - Allocation flow
34
˃ Trigger the inaccel Arrow allocator by
supplying metadata on a per-column
basis.
˃ Arrow columns enabled for acceleration
are mapped internally to shared
memory.
˃ Coral - our FPGA manager - accesses
Arrow-backed data with zero-copy.
Apache Arrow specifies a standardized language-
independent columnar memory format for flat and
hierarchical data, organized for efficient analytic
operations on modern hardware. The Arrow memory
format supports zero-copy reads for efficient data-access
without serialization overhead. —Apache Software
www.inaccel.com™
Apache Arrow Summing up
˃ Seamless Arrow integration
˃ Page-aligned
columnar format
˃ Native memory map
˃ Zero-copy operations
35
App1
Coral FPGA
Resource Manager
FPGA Cluster
App2 App3
columnar
format
structure
DRAM
www.inaccel.com™
Example
˃ Example on logistic
regression on top of
FPGA resource
manager
36
https://docs.inaccel.com/latest/manager/examples/
www.inaccel.com™
InAccel, Inc. Corporate overview
˃ Founded in January 2018
˃ Registered in Delaware, USA
˃ Membership:
37
Application Acceleration, seamlessly
www.inaccel.com
info@inaccel.com
USA:
500 Delaware Ave STE 1, #1960
Wilmington, DE 19801
USA
Europe (Design Center):
Formionos 47
Kesariani 116 33
Athens, Greece

More Related Content

What's hot

Simplifying OpenStack Networks with Routing on the Host: Gerard Chami + Scott...
Simplifying OpenStack Networks with Routing on the Host: Gerard Chami + Scott...Simplifying OpenStack Networks with Routing on the Host: Gerard Chami + Scott...
Simplifying OpenStack Networks with Routing on the Host: Gerard Chami + Scott...
OpenStack
 

What's hot (20)

OpenStack Summit Vancouver: Lessons learned on upgrades
OpenStack Summit Vancouver:  Lessons learned on upgradesOpenStack Summit Vancouver:  Lessons learned on upgrades
OpenStack Summit Vancouver: Lessons learned on upgrades
 
Neutron scaling
Neutron scalingNeutron scaling
Neutron scaling
 
Ansible- Durham Meetup: Using Ansible for Cisco ACI deployment
Ansible- Durham Meetup: Using Ansible for Cisco ACI deploymentAnsible- Durham Meetup: Using Ansible for Cisco ACI deployment
Ansible- Durham Meetup: Using Ansible for Cisco ACI deployment
 
Automating with NX-OS: Let's Get Started!
Automating with NX-OS: Let's Get Started!Automating with NX-OS: Let's Get Started!
Automating with NX-OS: Let's Get Started!
 
Commit to excellence - Java in containers
Commit to excellence - Java in containersCommit to excellence - Java in containers
Commit to excellence - Java in containers
 
Programmability and Automation in Data Center Networks: A talk on Hot Air Bal...
Programmability and Automation in Data Center Networks: A talk on Hot Air Bal...Programmability and Automation in Data Center Networks: A talk on Hot Air Bal...
Programmability and Automation in Data Center Networks: A talk on Hot Air Bal...
 
Development with JavaFX 9 in JDK 9.0.1
Development with JavaFX 9 in JDK 9.0.1Development with JavaFX 9 in JDK 9.0.1
Development with JavaFX 9 in JDK 9.0.1
 
Javaeeconf 2016 how to cook apache kafka with camel and spring boot
Javaeeconf 2016 how to cook apache kafka with camel and spring bootJavaeeconf 2016 how to cook apache kafka with camel and spring boot
Javaeeconf 2016 how to cook apache kafka with camel and spring boot
 
Net Devops Overview
Net Devops OverviewNet Devops Overview
Net Devops Overview
 
Real World Java 9 - JetBrains Webinar
Real World Java 9 - JetBrains WebinarReal World Java 9 - JetBrains Webinar
Real World Java 9 - JetBrains Webinar
 
Oracle on kubernetes 101 - Dec/2021
Oracle on kubernetes 101 - Dec/2021Oracle on kubernetes 101 - Dec/2021
Oracle on kubernetes 101 - Dec/2021
 
Measure and Increase Developer Productivity with Help of Serverless at JCON 2...
Measure and Increase Developer Productivity with Help of Serverless at JCON 2...Measure and Increase Developer Productivity with Help of Serverless at JCON 2...
Measure and Increase Developer Productivity with Help of Serverless at JCON 2...
 
ApacheCon NA - Apache Camel K: a cloud-native integration platform
ApacheCon NA - Apache Camel K: a cloud-native integration platformApacheCon NA - Apache Camel K: a cloud-native integration platform
ApacheCon NA - Apache Camel K: a cloud-native integration platform
 
Puppet on a string
Puppet on a stringPuppet on a string
Puppet on a string
 
Hystrix
HystrixHystrix
Hystrix
 
Your journey into the serverless world
Your journey into the serverless worldYour journey into the serverless world
Your journey into the serverless world
 
Simple tweaks to get the most out of your jvm
Simple tweaks to get the most out of your jvmSimple tweaks to get the most out of your jvm
Simple tweaks to get the most out of your jvm
 
Simplifying OpenStack Networks with Routing on the Host: Gerard Chami + Scott...
Simplifying OpenStack Networks with Routing on the Host: Gerard Chami + Scott...Simplifying OpenStack Networks with Routing on the Host: Gerard Chami + Scott...
Simplifying OpenStack Networks with Routing on the Host: Gerard Chami + Scott...
 
An Evaluation of OpenStack Deployment Frameworks
An Evaluation of OpenStack Deployment FrameworksAn Evaluation of OpenStack Deployment Frameworks
An Evaluation of OpenStack Deployment Frameworks
 
Configuration Management Tools on NX-OS
Configuration Management Tools on NX-OSConfiguration Management Tools on NX-OS
Configuration Management Tools on NX-OS
 

Similar to InAccel FPGA resource manager

Give Your Confluent Platform Superpowers! (Sandeep Togrika, Intel and Bert Ha...
Give Your Confluent Platform Superpowers! (Sandeep Togrika, Intel and Bert Ha...Give Your Confluent Platform Superpowers! (Sandeep Togrika, Intel and Bert Ha...
Give Your Confluent Platform Superpowers! (Sandeep Togrika, Intel and Bert Ha...
HostedbyConfluent
 

Similar to InAccel FPGA resource manager (20)

17th Athens Big Data Meetup - 1st Talk - Speedup Machine Application Learning...
17th Athens Big Data Meetup - 1st Talk - Speedup Machine Application Learning...17th Athens Big Data Meetup - 1st Talk - Speedup Machine Application Learning...
17th Athens Big Data Meetup - 1st Talk - Speedup Machine Application Learning...
 
Announcing Amazon EC2 F1 Instances with Custom FPGAs
Announcing Amazon EC2 F1 Instances with Custom FPGAsAnnouncing Amazon EC2 F1 Instances with Custom FPGAs
Announcing Amazon EC2 F1 Instances with Custom FPGAs
 
Using a Field Programmable Gate Array to Accelerate Application Performance
Using a Field Programmable Gate Array to Accelerate Application PerformanceUsing a Field Programmable Gate Array to Accelerate Application Performance
Using a Field Programmable Gate Array to Accelerate Application Performance
 
Accelerating Spark Genome Sequencing in Cloud—A Data Driven Approach, Case St...
Accelerating Spark Genome Sequencing in Cloud—A Data Driven Approach, Case St...Accelerating Spark Genome Sequencing in Cloud—A Data Driven Approach, Case St...
Accelerating Spark Genome Sequencing in Cloud—A Data Driven Approach, Case St...
 
CAPI and OpenCAPI Hardware acceleration enablement
CAPI and OpenCAPI Hardware acceleration enablementCAPI and OpenCAPI Hardware acceleration enablement
CAPI and OpenCAPI Hardware acceleration enablement
 
Give Your Confluent Platform Superpowers! (Sandeep Togrika, Intel and Bert Ha...
Give Your Confluent Platform Superpowers! (Sandeep Togrika, Intel and Bert Ha...Give Your Confluent Platform Superpowers! (Sandeep Togrika, Intel and Bert Ha...
Give Your Confluent Platform Superpowers! (Sandeep Togrika, Intel and Bert Ha...
 
Accelerate Your C/C++ Applications with Amazon EC2 F1 Instances (CMP405) - AW...
Accelerate Your C/C++ Applications with Amazon EC2 F1 Instances (CMP405) - AW...Accelerate Your C/C++ Applications with Amazon EC2 F1 Instances (CMP405) - AW...
Accelerate Your C/C++ Applications with Amazon EC2 F1 Instances (CMP405) - AW...
 
SCFE 2020 OpenCAPI presentation as part of OpenPWOER Tutorial
SCFE 2020 OpenCAPI presentation as part of OpenPWOER TutorialSCFE 2020 OpenCAPI presentation as part of OpenPWOER Tutorial
SCFE 2020 OpenCAPI presentation as part of OpenPWOER Tutorial
 
HOW Series: Knights Landing
HOW Series: Knights LandingHOW Series: Knights Landing
HOW Series: Knights Landing
 
InAccel FPGA resource manager and orchestrator
InAccel FPGA resource manager and orchestratorInAccel FPGA resource manager and orchestrator
InAccel FPGA resource manager and orchestrator
 
02 ai inference acceleration with components all in open hardware: opencapi a...
02 ai inference acceleration with components all in open hardware: opencapi a...02 ai inference acceleration with components all in open hardware: opencapi a...
02 ai inference acceleration with components all in open hardware: opencapi a...
 
A Primer on FPGAs - Field Programmable Gate Arrays
A Primer on FPGAs - Field Programmable Gate ArraysA Primer on FPGAs - Field Programmable Gate Arrays
A Primer on FPGAs - Field Programmable Gate Arrays
 
Ceph Day Taipei - Accelerate Ceph via SPDK
Ceph Day Taipei - Accelerate Ceph via SPDK Ceph Day Taipei - Accelerate Ceph via SPDK
Ceph Day Taipei - Accelerate Ceph via SPDK
 
RISC V in Spacer
RISC V in SpacerRISC V in Spacer
RISC V in Spacer
 
FPGA Accelerated Computing Using Amazon EC2 F1 Instances - CMP308 - re:Invent...
FPGA Accelerated Computing Using Amazon EC2 F1 Instances - CMP308 - re:Invent...FPGA Accelerated Computing Using Amazon EC2 F1 Instances - CMP308 - re:Invent...
FPGA Accelerated Computing Using Amazon EC2 F1 Instances - CMP308 - re:Invent...
 
PowerDRC/LVS 2.2 released by POLYTEDA
PowerDRC/LVS 2.2 released by POLYTEDAPowerDRC/LVS 2.2 released by POLYTEDA
PowerDRC/LVS 2.2 released by POLYTEDA
 
Spark day 2017 - Spark on Kubernetes
Spark day 2017 - Spark on KubernetesSpark day 2017 - Spark on Kubernetes
Spark day 2017 - Spark on Kubernetes
 
Seminar Accelerating Business Using Microservices Architecture in Digital Age...
Seminar Accelerating Business Using Microservices Architecture in Digital Age...Seminar Accelerating Business Using Microservices Architecture in Digital Age...
Seminar Accelerating Business Using Microservices Architecture in Digital Age...
 
Apache Big Data Europe 2016
Apache Big Data Europe 2016Apache Big Data Europe 2016
Apache Big Data Europe 2016
 
Heterogeneous Computing on POWER - IBM and OpenPOWER technologies to accelera...
Heterogeneous Computing on POWER - IBM and OpenPOWER technologies to accelera...Heterogeneous Computing on POWER - IBM and OpenPOWER technologies to accelera...
Heterogeneous Computing on POWER - IBM and OpenPOWER technologies to accelera...
 

Recently uploaded

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 

Recently uploaded (20)

The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 

InAccel FPGA resource manager

  • 2. www.inaccel.com™ helps companies speedup their applications by providing ready-to-use accelerators-as-a-service in the cloud or on-prem 15x Speedup 4x Lower TCO Zero code changes 2
  • 3. www.inaccel.com™ Applications and Platforms • Applications • Platforms • Partnerships Machine learning Financial Analytics Genomics 3
  • 4. www.inaccel.com™ Open FPGA Data Science ˃ We help Data science/engineers run up to 15x faster their applications without changing their code 4
  • 5. www.inaccel.com™ Integration solution for Application Acceleration 5 InAccel Scalable FPGA Resource Manager Accelerated ML suite On-premise Cloud Higher Performance Up to 16x Speedup compared to highly optimized libraries Lower Cost Up to 4x lower TCO Zero-code changes Seamless integration to widely used frameworks Easy deployment Docker-based container for seamless integration On-prem or on cloud Available on cloud and on-prem
  • 6. www.inaccel.com™ InAccel Coral FPGA Resource Manager ˃ Coral abstracts FPGA resources (device, memory), enabling fault-tolerant heterogeneous distributed systems to easily be built and run effectively. 7 Worlds’ first FPGA Orchestrator: Program against your FPGAs like it’s a single pool of accelerators InAccel Coral Resource Manager InAccel Runtime - Resource isolation Applications FPGA drivers Server FPGA Kernels
  • 7. www.inaccel.com™ Current Framework for FPGAs on the cloud Current limitations without the InAccel Coral Manager ˃ Currently only one application can talk to each FPGA accelerator ˃ Every application can talk to a single FPGA. ˃ Complex device sharing • From multiple threads/processes • Even from the same thread ˃ Explicit allocation of the resources (memory/compute units) App1 Vendor drivers Single FPGA 8
  • 8. www.inaccel.com™ InAccel’s Coral FPGA Manager Acceleration abstraction layer to virtualize, manage and monitor the FPGA resources ˃ Management • Automatic device (re-)configurations and efficient memory transfers ˃ Fault-tolerance • Highly-available service on top of a cluster of FPGAs, each of which may be prone to failures ˃ Scalability • Automatic scale-up from single devices (e.g. f1.x2) to multi-FPGA systems (e.g. f1.x4, f1.x16) App1 InAccel FPGA Manager FPGA Cluster C++/Java socket App2 App3 RTE/Drivers (Intel/Xilinx) OpenCL / OPAE 9 Documentation: https://docs.inaccel.com/latest/
  • 9. www.inaccel.com™ FPGA Manager features Ease of Use ˃ Write applications quickly in C/C++, Java, Scala and Python. InAccel offers all the required high-level functions that make it easy to build and accelerate parallel apps. No need to modify your application to use an unfamiliar parallel programming language (like OpenCL) App1 InAccel FPGA Manager FPGA Cluster C++/Java socket App2 App3 RTE/Drivers (Intel/Xilinx) OpenCL / OPAE 10
  • 10. www.inaccel.com™ FPGA Manager features Runs Anywhere ˃ Runs on any FPGA platform (Intel), giving you the freedom to take full advantage of on- premises, or public Cloud (Alibaba, Nimbix, etc.) infrastructure. On-premise App1 InAccel FPGA Manager FPGA Cluster App2 App3 RTE/Drivers (Intel/Xilinx) OpenCL / OPAE 11
  • 11. www.inaccel.com™ FPGA Manager features Resource Management ˃ Automatic resource configuration and task scheduling across entire FPGA clusters in private datacenters or public cloud environments. Coral examines the state of the FPGAs and implements load-balancing policies across them, efficiently taking care of all the required device configurations and memory transfers. Privacy / Isolation ˃ Coral allows the secure sharing of the hardware resources among different users and multiple processes or threads. First class isolation support for accelerator cores and FPGA memory. App1 InAccel FPGA Manager FPGA Cluster App2 App3 RTE/Drivers (Intel/Xilinx) OpenCL / OPAE 12
  • 12. www.inaccel.com™ FPGA Manager use case Abstracts away FPGA cluster ˃ App1: 2 threads that use LR Just send LR requests ‒ The FPGA manager send the requests to FPGAs programmed with LR (no need to specify which FPGA/kernel) ˃ App2: Uses both LR and KM Just send LR-KM requests ‒ The FPGA manager send the requests to FPGAs programmed with LR and KM kernels ˃ App3: 1 thread uses KM Just send Km requests ‒ The FPGA manager sends the requested to FPGAs programmed with KM kernels InAccel FPGA Manager FPGA Cluster RTE/Drivers OpenCL / OPAE 13 Logistic Regression KMeans clustering LR LR LR LR LR KM LR KM KM KM KM KM To LR To LR-KM To KM LR LR LR LR FPGA1: LR FPGA3: LR FPGA4: LR-KM FPGA2: KM Send sync or async requests App3App2 App1
  • 13. www.inaccel.com™ Advantages 14 Distribution of multi-thread applications to multiple clusters Resources sharing from multiple applications
  • 15. www.inaccel.com™ Bitstream repository ˃ FPGA Resource Manager is integrated with Artifact repository that is used to store FPGA bitstreams 16 Application FPGA bitstream repository FPGA cluster https://store.inaccel.com
  • 16. www.inaccel.com™ Pricing model (enterprise) www.inaccel.com FPGA Resource manager ˃ Pricing model per node (server) ˃ Each node can have 1 to 8 FPGAs ML Accelerators on-prem ˃ Pricing model per node (server) FPGA Resource manager 20 nodes or less More than 20 nodes Monthly $200/node $150/node Yearly $2,000/node $1,500/node ML Accelerator 20 nodes or less More than 20 nodes Hour $2/node $1.5/node Yearly $10,000/node $8,000/node 17
  • 17. www.inaccel.com™ Performance evaluation on Machine Learning ˃ Up to 15x speedup for LR ML (7.5x overall) ˃ Up to 14x speedup for Kmeans ML (6.2x overall) ˃ Spark- GPU* (3.8x – 5.7x) ˃ F1.4x 16 cores + 2 FPGAs (InAccel) ˃ R5d.4x 16 cores 18 r5d.4x f1.4x (InAccel) 0 200 400 600 800 1000 1200 1400 Logistic Regression execution time MNIST 24GB, 100 iter. (secs) Data preprocessing Data transformation ML training 15x Speedup r5d.4x f1.4x (InAccel) 0 500 1000 1500 2000 2500 K-Means clustering exection time MNIST 24GB, 100 iter. (secs) Data preprocessing Data transformation ML training 14x Speedup *[Spark-GPU: An Accelerated In-Memory Data Processing Engine on Clusters]
  • 18. www.inaccel.com™ Cost reduction ˃ Up to 2.6x lower cost and 15x speedup ˃ F1.4x($3.3) 16 cores + 2 FPGAs (InAccel) ˃ R5d.4x ($1.15) 16 cores 19 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 Logistic regression K-means clustering Total cost of execution ($) f1.4x (InAccel) r54.4x 2.6x Lower OpEx
  • 19. www.inaccel.com™ Current InAccel $0.00 $50,000.00 $100,000.00 $150,000.00 $200,000.00 $250,000.00 $300,000.00 $350,000.00 Yearly OpEx for 30 nodes ($) Cost reduction ˃ For a cluster of 60 nodes on aws (24/7 – 1 year): R5d.4x ($1.15/hour) : $302,220 ˃ For a cluster of only 2 f1 nodes on aws (24/7 – 1 year) providing the same performance: F1.4x ($3.3/hour) : $57,816 + 20k license of ML accelerators Over $200k savings per year ˃ 16x smaller footprint 20 $200k savings
  • 20. www.inaccel.com™ Personnel cost savings ˃ Reduction on the complexity to program the FPGAs ˃ Over 4x lower LoC (lines of code) ˃ Significant savings in time to run and debug the FPGAs 21
  • 21. www.inaccel.com™ Simpler software for FPGA communication 22 InAccel SW snapshot Using FPGA manager Vanilla C Code snapshot Typical use of OpenCL for FPGAs InAccel FPGA Manager allows to keep the simplicity and the performance efficiency of the FPGA without adding complexity
  • 22. www.inaccel.com™ Software simplicity 23 30x simpler code https://github.com/Xilinx/AWS-F1-Developer-Labs/blob/master/helloworld_ocl/src/host.cpp
  • 23. www.inaccel.com™ Example on scaling to 2 FPGA using the resource manager for logistic regression 24 1.86x speedup using 2 FPGAs simply by changing a line inaccel start --fpga=xilinx:0,xilinx:1 You specify how many FPGAs you want to use inaccel start --fpga=all or
  • 24. www.inaccel.com™ FPGA Manager deployment Easy to Deploy ˃ Launch a container with InAccel's Docker image or even deploy it as a daemonset on a Kubernetes cluster and enjoy acceleration services at the drop of a hat. ˃ https://hub.docker.com/u/inaccel/ FPGA Manager • Easy deployment • Easy scalability • Easy integration 25
  • 25. www.inaccel.com™ Serverless deployment ˃ Integrated framework for serverless deployment ˃ Compatible with Kubernetes ˃ Compatible with Kubeless, Knative ˃ Users only have to upload the images on the S3 bucket and then InAccel’s FPGA Manager automatically deploy the cluster of FPGAs, process the data and then store back the results on the S3 bucket. ˃ Users do not have to know anything about the FPGA execution. 26 Amazon S3 Amazon S3 Cluster of Amazon EC2 f1 instances trigger InAccel FPGA Resource Manager f1 library of accelerated functions Upload files Download files Accelerated function https://medium.com/@inaccel/fpgas-goes-serverless-on-kubernetes-55c1d39c5e30
  • 26. www.inaccel.com™ InAccel’s Coral manager integrated with Spark ˃ Integrated solution that allows Scale Up (1, 2, or 8 FPGAs per node) Scale Out to multiple nodes (using Spark API) Seamless integration Docker-based deployment 27 Worker node (f1) Driver Program Executor Cluster Manager Task Task SparkContext FPGA RTE RTE & Man. interface Worker node (f1) Executor Task Task FPGA RTE RTE & Man. FPGA cluster FPGA cluster
  • 27. www.inaccel.com™ Speedup comparison - Scalability ˃ Up to 10x speedup compared to 32 cores based on f1.x2 Cluster of 4 f1 (SW) Cluster of 4 f1 (SW + InAccel) f1.x2largef1.x2large ML Accel ML Accel ML Accel ML Accel f1.x2large f1.x2large 1 10.2x 4x f1.x2large (32 cores) 4x f1.x2large (32cores+InAccel) Speedup on cluster of f1.x2 using InAccel 28
  • 28. www.inaccel.com™ Speed up ˃ Up to 12x speedup compared to 64 cores on f1.x16 1.00 12.14 f1.16xlarge (sw) f1.16xlarge (hw) Speedup of f1.x16 with 8 InAccel FPGA kernels f1.x16large (SW) 64 cores f1.x16large (SW + 8 InAccel cores) 64 cores + 8 FPGAs with InAccel MLAccel MLAccel MLAccel MLAccel MLAccel MLAccel MLAccel MLAccel 29
  • 29. www.inaccel.com™ Speedup comparison ˃ 3x Speedup compared to r4 ˃ 2x lower OpEx 1.00 3.18 cluster of 4 r4 cluster of 4 f1.x2 Speedup comparison normalized on cost for a cluster of 4 nodes ($2/hour/node) Cluster of 4 r4 (SW) Cluster of 4 f1 (SW + InAccel) r4 (32 cores each – 128 cores total) ML Accel ML Accel ML Accel ML Accel f1.x2large f1.x2large 30
  • 30. www.inaccel.com™ InAccel Docker Service ˃ Sustain FPGA driver compatibility between the host and the containers • discover available resources • mount/isolate visible devices ‒ forget --priviledged • resolve library dependencies 31 FPGAs (Intel/Xilinx) Server FPGA RunTime Host OS InAccel Container Runtime Docker engine App App App InAccel’s Coral Device Plugin containers
  • 31. www.inaccel.com™ Integration with Arrow ˃ Seamless experience for the application developer writing software using Arrow- backed dataframes. ˃ Arrow adoptance growth makes our integration even more profound. ˃ Zero extra overhead to other operations supported by Arrow - e.g serialization 32 Apache Arrow specifies a standardized language- independent columnar memory format for flat and hierarchical data, organized for efficient analytic operations on modern hardware. The Arrow memory format supports zero-copy reads for efficient data-access without serialization overhead. —Apache Software
  • 32. www.inaccel.com™ Integration with Arrow ˃ Minimize copy overheads to convey data to the accelerator data plane. ˃ Single DMA operation from host memory to DDR memory on FPGA. ˃ Shared memory between CPU and FPGA will reduce the overhead to zero. ˃ (CAPI – CCIX) 33 Apache Arrow specifies a standardized language- independent columnar memory format for flat and hierarchical data, organized for efficient analytic operations on modern hardware. The Arrow memory format supports zero-copy reads for efficient data-access without serialization overhead. —Apache Software
  • 33. www.inaccel.com™ Arrow Integration - Allocation flow 34 ˃ Trigger the inaccel Arrow allocator by supplying metadata on a per-column basis. ˃ Arrow columns enabled for acceleration are mapped internally to shared memory. ˃ Coral - our FPGA manager - accesses Arrow-backed data with zero-copy. Apache Arrow specifies a standardized language- independent columnar memory format for flat and hierarchical data, organized for efficient analytic operations on modern hardware. The Arrow memory format supports zero-copy reads for efficient data-access without serialization overhead. —Apache Software
  • 34. www.inaccel.com™ Apache Arrow Summing up ˃ Seamless Arrow integration ˃ Page-aligned columnar format ˃ Native memory map ˃ Zero-copy operations 35 App1 Coral FPGA Resource Manager FPGA Cluster App2 App3 columnar format structure DRAM
  • 35. www.inaccel.com™ Example ˃ Example on logistic regression on top of FPGA resource manager 36 https://docs.inaccel.com/latest/manager/examples/
  • 36. www.inaccel.com™ InAccel, Inc. Corporate overview ˃ Founded in January 2018 ˃ Registered in Delaware, USA ˃ Membership: 37
  • 37. Application Acceleration, seamlessly www.inaccel.com info@inaccel.com USA: 500 Delaware Ave STE 1, #1960 Wilmington, DE 19801 USA Europe (Design Center): Formionos 47 Kesariani 116 33 Athens, Greece