SlideShare a Scribd company logo
IntroductionTo Acceleration with OpenCAPI
SCFE 2020 - March 24th 2020 - A.CASTELLANE
What Do You Need?
2
Out In
Current Computing Landscape
3
CPU technology advances have slowed the historical cost/performance improvements seen
over the last several decades => New CPU chips alone can not handle current challenges!
Over Burdened
CPUs
Slow/Complex
Algorithms &
Functions
+
100101010100011001
100110010010010010
101010001100110011
001001001001010101
000110011001100100
100100101010100011
001100110010010010
010101010001100110
011001001001001010
101000110011001100
100100100101010100
011001100110010010
01001010101
CPU
CPU
CPU
CPU
CPU
DATA
Network & Data Access
Rates
+
Computation Data Access
=
Current Technology and
Processing Overload!
Bad News, it’s only going to get worse!
Next Set Of Challenges Is Here!
4
Exponential Data Growth Compute Intensive Algorithms
Diverse Data Structures & Types Decreasing Time To Results
Hours .. Minutes .. Seconds .. Real Time
Compute
 AI, Machine / Deep Learning
 Video Processing
 Database / Big Data Analytics
Storage
 Scale-out Storage
 Petabytes of new data
 Intelligent / Compute SSDs
Networking
 Network Security
 Low-latency Networking
 Open vSwitch offload
 Software Defined Networking Acceleration
Next Challenges Affect All Computing Fields
5
Bank / Finance
• Risk analysis / Faster trading: Monte Carlo libraries
• Credit card fraud detection
• Block chain acceleration
Video / Analytics
• Smart Video surveillance from multiple videos feed
• 3D video stream from multi-angles videos streams
• Image search / Object tracking / Scene recreation
• Multi-jpeg compression
Machine Learning / Deep learning
• Machine learning inference
• Accelerate frequently used ML / DL algorithm
Algorithm acceleration
• Compression on network path or storage
• Encryption on the fly to various memory types
• String match
But, what if you
could have the best
of both worlds!
Options: Software or Hardware?
6
• Software:
• Advantages:
• More rapid development leading to faster time to market
• Lower non-recurring engineering costs. Software can be reused easily.
• Heightened portability
• Ease of updating features or patching bugs
• Disadvantages:
• Slower run time
• Hardware
• Advantages:
• Much faster execution of functions
• Reduced power consumption
• Lower latency
• Increased parallelism and bandwidth
• Better utilization of area and functional components available on an integrated circuit (IC)
• Disadvantages:
• Lower ability to update designs once etched onto silicon
• Difficult to share Verilog/VHDL source code between different hardware platforms
• Higher costs of functional verification
• Longer develop process and time to market
So, what’s the solution?
7
The use of computer hardware specially designed to perform functions more
efficiently than is possible in software alone running on a general-purpose CPU.
Hardware Acceleration
Thousands of tiny CPU using high
parallelization
 compute intensive application
Field Programmable Gate Array
Logic + IOs are customized exactly for the
application's needs.
 Very low and predictable latency applications
Two Options
GPU FPGA
The Better Choice?
8
Due to the inherent logic and IO flexibility, speed, and
predictably low latency, FPGAs have a clear advantage.
FPGA Acceleration
FPGA = Field Programmable Gate Array
Historically programmed
using Verilog/VHDL
Compiled
Mapped to FPGA HW Logic
What is a FPGA?
9
• A re-programmable computer chip with lots of configurable logic
elements based on Lookup-Tables (LUT)
• Programmable switch matrix routing
• Configurable I/O and high-speed serial links
• Advantages in flexibility, speed, and low latency due to:
• Limited instruction set
• High parallelism
• Deep pipelines
Programmable switchLogical View
Programmable logic element
• Integrated Hard IP (Multiply/Add, SRAM, PLL, PCIe, Ethernet, DRAM,...)
Field Programmable Gate Array
FPGA Example (Bittware 250-SOC)
10
Bittware 250-SoC
Multipurpose Converged Network / Storage
• Xilinx Zync UltraScale+ FPGA ZU19EG (64 bits Cortex-A53 ARM core)
• Two 4GB DDR4 (for FPGA and ARM)
• PCIe Gen3 x16 / Gen4 x8  CAPI2
• Up to 4 x8 Oculink ports suporting NVMe, 100GbE and OpenCAPI
• 2x 100GbE QSFP28 cages
• Half Height - Half Length format
Basics of HW Acceleration
11
Standard CPU Setup (No Acceleration)
Host Memory
Over burdened CPU
Slow functions
Congested
memory and
output card
access
CPU manages all data,
memory access,
functions, and flows
With increased data,
computing, storage, and
network challenges
Function
Application
Basics of HW Acceleration
12
Standard CPU Setup (No Acceleration)
Host Memory
CPU manages all data,
memory access,
functions, and flows
 CPU manages all data, memory access, functions, and flows
Over burdened CPU
Slow functions
Congested memory and output card access
Application
Function
HW Acceleration with FPGA
13
Classic Acceleration with FPGA
Host Memory
Faster functions
on FPGA
Relieved function only
from CPU burden
CPU still handles
FPGA memory
access and data
copying.
No Data Coherency
Standard CPU Setup (No Acceleration)
Host Memory
Historically
programmed using
Verilog/VHDL
Function
 CPU manages all data, memory access, functions, and flows
Over burdened CPU
Congested memory and output card access
Slow functions
ApplicationApplication
Function
HW Acceleration with FPGA
14
Standard CPU Setup (No Acceleration)
Host Memory
Classic Acceleration with FPGA
Host Memory
Function
 CPU is used to manage FPGA memory access
No Data Coherency (Host memory copied to FPGA)
FPGA historically programmed using Verilog/VHDL
CPU still handles all memory and data access
 CPU manages all data, memory access, functions, and flows
Over burdened CPU
Congested memory and output card access
Slow functions
ApplicationApplication
Function
Addressing Classic FPGA Acceleration Issues
15
• What is OpenCAPI?
• Open Coherent Accelerator Processor
Interface
• OpenCAPI is an open interface
architecture that allows any
microprocessor to attach to:
• Coherent user-level accelerators and
I/O devices
• Advanced memories accessible via
read/write or user-level direct
memory access (DMA) semantics
• Agnostic to processor architecture
• What is OC-Accel?
• OpenCAPI Acceleration Framework to
program FPGAs using C/C++ instead of
Verilog or VHDL
OpenCAPI 3.0
OC 3.1
OpenCAPI specifications are downloadable from www.opencapi.org
HW Acceleration with FPGA + OpenCAPI
16
Classic Acceleration with FPGA
Host Memory
Function
Acceleration with FPGA + OpenCAPI
Host Memory
OpenCAPI
 OpenCAPI IO interface on FPGA accesses host memory directly
 Function accesses only needed host memory data
 Data Coherency (Data does not need to be copied to FPGA)
 Address translation (@function=@application)
 FPGA programmed with C/C++ using OC-Accel Framework
Function
 CPU is used to manage FPGA memory access
No Data Coherency (Host memory copied to FPGA)
FPGA historically programmed using Verilog/VHDL
CPU still handles all memory and data access
ApplicationApplication
• Hardware
• Advantages:
• Using FPGA instead of CPU
• FPGA is function specific only
• FPGA is fast + OpenCAPI direct memory access
• FPGA can have parallel logic
• FPGA uses function logic only
• Disadvantages:
• FPGA easily reconfigurable with C/C++ updates
• C/C++ easily recompiled for different FPGAs
• C/C++ code simulated and debugged
• C/C++ code can be easier to write and upload
• Software
• Advantages:
• App. Eng. Writing C/C++ functions (OC-Accel)
• C/C++ code is reusable
• C/C++ code is portable
• FPGA reconfigurable with C/C++ updates
• Disadvantages:
• Function executed faster on FPGA + CPU relief
• Software
• Advantages:
• More rapid development
• Lower non-recurring engineering costs
• Heightened portability
• Ease of updating features or patching bugs
• Disadvantages:
• Slower run time
FPGAs + OpenCAPI + OC-Accel Address All Issues
17
• Hardware
• Advantages:
• Much faster execution of functions
• Reduced power consumption
• Lower latency
• Increased parallelism and bandwidth
• Better IC area and function utilization
• Disadvantages:
• Lower ability to update design hardware
• Difficult to share source code btw FPGAs
• Higher costs of functional verification
• Longer develop process and time to market
Ex: Monte-Carlo (FPGA Accelerated)
18
Monte Carlo Analysis is a risk
management technique used in
the financial and insurance
industries and is used for
conducting a quantitative analysis
of risks.
By using CAPI with a FPGA, the C/C++ code was reduce by 40x on the application
side and freed up 33% of memory and CPU (versus a non-CAPI FPGA ).
Running 1 million iterations
Results: At least
50x Faster
with CAPI and FPGA technology on
POWER server
Ex: PostgreSQL regex Matching Accelerated
19
PostgreSQL + OpenCAPI shows compelling “regex” performance increase by leveraging the bandwidth and virtual
addressing of OpenCAPI technology. In fact, accelerating the SQL with OpenCAPI-regex can be 4x to 10x faster than the
best PostgreSQL built-in functions (CPU multi-threads enabled).
PostgreSQL is a powerful, open source object-relational database system. SQL (Structured Query Language)
is used to communicate with a database.
Actual Example Single Search Run Times:
• CPU parallel Seq Scan: ~698ms
• Custom Scan (PFCAPI): ~161ms
SELECT * FROM table WHERE pkt ~ pattern;
Basically: search the db for all pkt that match pattern
Command example
Ex: Ultra Fast Data Acquisition (X-Ray Crystallography)
20
9GBps
1 4
4 MPixels @ 1.1kHz
Digital Camera Sensors
Raw Data
Goal: Real-time mapping of biological structure by examining molecule scatter plots of protein crystal struck by x-rays
2 3
GPU
PCIe
GPU + PCIe Configuration
(Today)
Protein
Molecule
Mapped
Real Image
Raw data to real image conversion
Decimate / sort images
Data compression
1 Data acquisition
2
3
4
Compressed
Data
Ex: Ultra Fast Data Acquisition (X-Ray Crystallography)
21
22GBps
1 2 4
10 MPixels @ 2.2 kHz
Digital Camera Sensors
22GBps
Compressed
Data
FPGA w/ OpenCAPI
(Goal)
OpenCAPI3.0
22GBps
Dual FPGAs
In Parallel
UnfilteredImage
FilteredImage
GPU or FPGA of both
Host with NX-gzip
Embedded
HW Accelerator
Raw Data
22GBps
Image Data
OpenCAPI breaks the 9GBps PCIe
bottleneck!
Protein
Molecule
Mapped
Real Image
Raw data to real image conversion
Decimate / sort images
Data compression
Data acquisition
3
4
3
Goal: Real-time mapping of biological structure by examining molecule scatter plots of protein crystal struck by x-rays
1
2
Ex: Pull Quote
22
The benefit of using POWER interfaces, i.e., NVLink and OpenCAPI, is
not only bandwidth, but these interfaces allow also for coherent
memory access. FPGA board connected via OpenCAPI or GPGPU
connected via NVLink sees host (CPU) virtual memory space exactly like
the process running on the CPU, reducing the burden of writing
reliable and secure applications. Memory coherency can be also
available for PCIe FPGA accelerators installed in POWER9 servers via
OpenCAPI predecessor, the Coherent Accelerator Processor Interface
(CAPI). IBM also provides optimized software to benefit from the
architecture, including the CAPI Storage, Network, and Analytics
Programming (SNAP) framework51,52 that simplifies the integration of
FPGA designs with POWER9, as well as optimized ML and data analysis
routines for GPGPUs or FPGAs.53
Structural Dynamics 7, 014305 (2020); https://doi.org/10.1063/1.5143480
Ex: Memory Coherency
23
Scenario: 2MB data scattered in host memory are processed in a FPGA.
« Classic » PCIe FPGA card
Server
Function
Server
« CAPI-enabled » FPGA card
Function
blk blk blk blk
Gathering data (SW memcopy)
1
1 transaction of big amount of
data to FPGA (2MB)
2
1
2
1 transaction of 8kB for AddrSet
from host memory to FPGA
1
1024 transactions of 2kB from
Host memory to FPGA.
Directly reads required data at
random address.
2 1
2
ApplicationApplication CAPI
Results: CAPI-enabled was 2-3x faster than Classic method
Ease of FPGA Programming (OC-Accel)
24
Benefits:
• Faster Time To Market: Port a function to a FPGA in days not months
• No Obsolescence: Simply recompile unchanged C/C++ code for different FPGA
• No Link Constraint: Moving from a CAPI (over PCIe) link to OpenCAPI is just a matter of recompiling
- no code change
• No Specific Hardware Skills Needed: C/C++ coder can focus on functionality as all the resources are
managed by the framework.
• Open-Source Framework: The code can be modified, improved by any user.
Example:
• Note: SNAP is the predecessor to OC-Accel and overall flow and performance is equivalent.
• Customer ported and optimized SHA3 C code within 10 days using SNAP* framework versus
4 months in VHDL without SNAP
Development Plans:
• OC-Accel with OpenCAPI today, OC-Accel with other emerging standards like CXL tomorrow!
FPGAs + OpenCAPI + OC-Accel Has It All
25
Very high bandwidth
Faster development and time
to market with OC-Accel
Developers Aren’t Where We Need Them
Scripting
Interpreted App (Python / Rails / Java)
Non-Interpreted App (C++ / Java JRE)
Procedural App (C / C++)
High Level OS (C / C++)
Firmware
HW API (C, ASM)
Kernel (C, AS)
HDL
Chart content courtesy of Aaron Sullivan @Rackspace
Spreading the CAPI Love (OC-Accel)
26
Interpreted App (Python / Rails / Java)
Non-Interpreted App (C++ / Java JRE)
Procedural App (C / C++)
High Level OS (C / C++)
Kernel (C, AS)
HW API (C, ASM)
Firmware
Scripting
HDL
Application
Application
New Abstraction
New Abstraction
New Abstraction
New Abstraction
Soft-Hardware
Soft-Hardware
Soft-Hardware
Spreading the CAPI Love (OC-Accel)
Developers Where We Need Them
Chart content courtesy of Aaron Sullivan @Rackspace
27
- Know more about accelerators ?
- See a live demonstration?
- Do a benchmark ?
- Get answers to your questions?
Contact us
alexandre.castellane@fr.ibm.com
bruno.mesnet@fr.ibm.com
fabrice_moyen@fr.ibm.com
luyong@cn.ibm.com
shgoupf@cn.ibm.com
28
29
Thank You!

More Related Content

What's hot

Covid-19 Response Capability with Power Systems
Covid-19 Response Capability with Power SystemsCovid-19 Response Capability with Power Systems
Covid-19 Response Capability with Power Systems
Ganesan Narayanasamy
 
MIT's experience on OpenPOWER/POWER 9 platform
MIT's experience on OpenPOWER/POWER 9 platformMIT's experience on OpenPOWER/POWER 9 platform
MIT's experience on OpenPOWER/POWER 9 platform
Ganesan Narayanasamy
 
OpenPOWER Webinar
OpenPOWER Webinar OpenPOWER Webinar
OpenPOWER Webinar
Ganesan Narayanasamy
 
Deeplearningusingcloudpakfordata
DeeplearningusingcloudpakfordataDeeplearningusingcloudpakfordata
Deeplearningusingcloudpakfordata
Ganesan Narayanasamy
 
Energy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic TuningEnergy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic Tuning
inside-BigData.com
 
Overview of HPC Interconnects
Overview of HPC InterconnectsOverview of HPC Interconnects
Overview of HPC Interconnects
inside-BigData.com
 
Hardware & Software Platforms for HPC, AI and ML
Hardware & Software Platforms for HPC, AI and MLHardware & Software Platforms for HPC, AI and ML
Hardware & Software Platforms for HPC, AI and ML
inside-BigData.com
 
OpenPOWER Latest Updates
OpenPOWER Latest UpdatesOpenPOWER Latest Updates
OpenPOWER Latest Updates
Ganesan Narayanasamy
 
NNSA Explorations: ARM for Supercomputing
NNSA Explorations: ARM for SupercomputingNNSA Explorations: ARM for Supercomputing
NNSA Explorations: ARM for Supercomputing
inside-BigData.com
 
IBM Data Centric Systems & OpenPOWER
IBM Data Centric Systems & OpenPOWERIBM Data Centric Systems & OpenPOWER
IBM Data Centric Systems & OpenPOWER
inside-BigData.com
 
Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...
inside-BigData.com
 
OpenPOWER Webinar on Machine Learning for Academic Research
OpenPOWER Webinar on Machine Learning for Academic Research OpenPOWER Webinar on Machine Learning for Academic Research
OpenPOWER Webinar on Machine Learning for Academic Research
Ganesan Narayanasamy
 
CUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computingCUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computing
inside-BigData.com
 
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
inside-BigData.com
 
Exceeding the Limits of Air Cooling to Unlock Greater Potential in HPC
Exceeding the Limits of Air Cooling to Unlock Greater Potential in HPCExceeding the Limits of Air Cooling to Unlock Greater Potential in HPC
Exceeding the Limits of Air Cooling to Unlock Greater Potential in HPC
inside-BigData.com
 
SNAP MACHINE LEARNING
SNAP MACHINE LEARNINGSNAP MACHINE LEARNING
SNAP MACHINE LEARNING
Ganesan Narayanasamy
 
State of ARM-based HPC
State of ARM-based HPCState of ARM-based HPC
State of ARM-based HPC
inside-BigData.com
 
DOME 64-bit μDataCenter
DOME 64-bit μDataCenterDOME 64-bit μDataCenter
DOME 64-bit μDataCenter
inside-BigData.com
 
2018 bsc power9 and power ai
2018   bsc power9 and power ai 2018   bsc power9 and power ai
2018 bsc power9 and power ai
Ganesan Narayanasamy
 
CFD on Power
CFD on Power CFD on Power
CFD on Power
Ganesan Narayanasamy
 

What's hot (20)

Covid-19 Response Capability with Power Systems
Covid-19 Response Capability with Power SystemsCovid-19 Response Capability with Power Systems
Covid-19 Response Capability with Power Systems
 
MIT's experience on OpenPOWER/POWER 9 platform
MIT's experience on OpenPOWER/POWER 9 platformMIT's experience on OpenPOWER/POWER 9 platform
MIT's experience on OpenPOWER/POWER 9 platform
 
OpenPOWER Webinar
OpenPOWER Webinar OpenPOWER Webinar
OpenPOWER Webinar
 
Deeplearningusingcloudpakfordata
DeeplearningusingcloudpakfordataDeeplearningusingcloudpakfordata
Deeplearningusingcloudpakfordata
 
Energy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic TuningEnergy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic Tuning
 
Overview of HPC Interconnects
Overview of HPC InterconnectsOverview of HPC Interconnects
Overview of HPC Interconnects
 
Hardware & Software Platforms for HPC, AI and ML
Hardware & Software Platforms for HPC, AI and MLHardware & Software Platforms for HPC, AI and ML
Hardware & Software Platforms for HPC, AI and ML
 
OpenPOWER Latest Updates
OpenPOWER Latest UpdatesOpenPOWER Latest Updates
OpenPOWER Latest Updates
 
NNSA Explorations: ARM for Supercomputing
NNSA Explorations: ARM for SupercomputingNNSA Explorations: ARM for Supercomputing
NNSA Explorations: ARM for Supercomputing
 
IBM Data Centric Systems & OpenPOWER
IBM Data Centric Systems & OpenPOWERIBM Data Centric Systems & OpenPOWER
IBM Data Centric Systems & OpenPOWER
 
Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...
 
OpenPOWER Webinar on Machine Learning for Academic Research
OpenPOWER Webinar on Machine Learning for Academic Research OpenPOWER Webinar on Machine Learning for Academic Research
OpenPOWER Webinar on Machine Learning for Academic Research
 
CUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computingCUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computing
 
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
 
Exceeding the Limits of Air Cooling to Unlock Greater Potential in HPC
Exceeding the Limits of Air Cooling to Unlock Greater Potential in HPCExceeding the Limits of Air Cooling to Unlock Greater Potential in HPC
Exceeding the Limits of Air Cooling to Unlock Greater Potential in HPC
 
SNAP MACHINE LEARNING
SNAP MACHINE LEARNINGSNAP MACHINE LEARNING
SNAP MACHINE LEARNING
 
State of ARM-based HPC
State of ARM-based HPCState of ARM-based HPC
State of ARM-based HPC
 
DOME 64-bit μDataCenter
DOME 64-bit μDataCenterDOME 64-bit μDataCenter
DOME 64-bit μDataCenter
 
2018 bsc power9 and power ai
2018   bsc power9 and power ai 2018   bsc power9 and power ai
2018 bsc power9 and power ai
 
CFD on Power
CFD on Power CFD on Power
CFD on Power
 

Similar to SCFE 2020 OpenCAPI presentation as part of OpenPWOER Tutorial

CAPI and OpenCAPI Hardware acceleration enablement
CAPI and OpenCAPI Hardware acceleration enablementCAPI and OpenCAPI Hardware acceleration enablement
CAPI and OpenCAPI Hardware acceleration enablement
Ganesan Narayanasamy
 
Using a Field Programmable Gate Array to Accelerate Application Performance
Using a Field Programmable Gate Array to Accelerate Application PerformanceUsing a Field Programmable Gate Array to Accelerate Application Performance
Using a Field Programmable Gate Array to Accelerate Application Performance
Odinot Stanislas
 
OpenCAPI next generation accelerator
OpenCAPI next generation accelerator OpenCAPI next generation accelerator
OpenCAPI next generation accelerator
Ganesan Narayanasamy
 
6 open capi_meetup_in_japan_final
6 open capi_meetup_in_japan_final6 open capi_meetup_in_japan_final
6 open capi_meetup_in_japan_final
Yutaka Kawai
 
OpenPOWER Acceleration of HPCC Systems
OpenPOWER Acceleration of HPCC SystemsOpenPOWER Acceleration of HPCC Systems
OpenPOWER Acceleration of HPCC Systems
HPCC Systems
 
00 opencapi acceleration framework yonglu_ver2
00 opencapi acceleration framework yonglu_ver200 opencapi acceleration framework yonglu_ver2
00 opencapi acceleration framework yonglu_ver2
Yutaka Kawai
 
Deep learning with FPGA
Deep learning with FPGADeep learning with FPGA
Deep learning with FPGA
Ayush Singh, MS
 
FPGA MeetUp
FPGA MeetUpFPGA MeetUp
FPGA MeetUp
Moya Brannan
 
OpenCAPI Technology Ecosystem
OpenCAPI Technology EcosystemOpenCAPI Technology Ecosystem
OpenCAPI Technology Ecosystem
Ganesan Narayanasamy
 
Heterogeneous Computing on POWER - IBM and OpenPOWER technologies to accelera...
Heterogeneous Computing on POWER - IBM and OpenPOWER technologies to accelera...Heterogeneous Computing on POWER - IBM and OpenPOWER technologies to accelera...
Heterogeneous Computing on POWER - IBM and OpenPOWER technologies to accelera...
Cesar Maciel
 
Software Network Data Plane - Satisfying the need for speed - FD.io - VPP and...
Software Network Data Plane - Satisfying the need for speed - FD.io - VPP and...Software Network Data Plane - Satisfying the need for speed - FD.io - VPP and...
Software Network Data Plane - Satisfying the need for speed - FD.io - VPP and...
Haidee McMahon
 
Using FPGA in Embedded Devices
Using FPGA in Embedded DevicesUsing FPGA in Embedded Devices
Using FPGA in Embedded Devices
GlobalLogic Ukraine
 
FPGAs in the cloud? (October 2017)
FPGAs in the cloud? (October 2017)FPGAs in the cloud? (October 2017)
FPGAs in the cloud? (October 2017)
Julien SIMON
 
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
Spark Summit
 
GTC15-Manoj-Roge-OpenPOWER
GTC15-Manoj-Roge-OpenPOWERGTC15-Manoj-Roge-OpenPOWER
GTC15-Manoj-Roge-OpenPOWERAchronix
 
"Making Computer Vision Software Run Fast on Your Embedded Platform," a Prese...
"Making Computer Vision Software Run Fast on Your Embedded Platform," a Prese..."Making Computer Vision Software Run Fast on Your Embedded Platform," a Prese...
"Making Computer Vision Software Run Fast on Your Embedded Platform," a Prese...
Edge AI and Vision Alliance
 
Heterogeneous Computing : The Future of Systems
Heterogeneous Computing : The Future of SystemsHeterogeneous Computing : The Future of Systems
Heterogeneous Computing : The Future of Systems
Anand Haridass
 
ProjectVault[VivekKumar_CS-C_6Sem_MIT].pptx
ProjectVault[VivekKumar_CS-C_6Sem_MIT].pptxProjectVault[VivekKumar_CS-C_6Sem_MIT].pptx
ProjectVault[VivekKumar_CS-C_6Sem_MIT].pptxVivek Kumar
 

Similar to SCFE 2020 OpenCAPI presentation as part of OpenPWOER Tutorial (20)

CAPI and OpenCAPI Hardware acceleration enablement
CAPI and OpenCAPI Hardware acceleration enablementCAPI and OpenCAPI Hardware acceleration enablement
CAPI and OpenCAPI Hardware acceleration enablement
 
Using a Field Programmable Gate Array to Accelerate Application Performance
Using a Field Programmable Gate Array to Accelerate Application PerformanceUsing a Field Programmable Gate Array to Accelerate Application Performance
Using a Field Programmable Gate Array to Accelerate Application Performance
 
OpenCAPI next generation accelerator
OpenCAPI next generation accelerator OpenCAPI next generation accelerator
OpenCAPI next generation accelerator
 
6 open capi_meetup_in_japan_final
6 open capi_meetup_in_japan_final6 open capi_meetup_in_japan_final
6 open capi_meetup_in_japan_final
 
OpenPOWER Acceleration of HPCC Systems
OpenPOWER Acceleration of HPCC SystemsOpenPOWER Acceleration of HPCC Systems
OpenPOWER Acceleration of HPCC Systems
 
00 opencapi acceleration framework yonglu_ver2
00 opencapi acceleration framework yonglu_ver200 opencapi acceleration framework yonglu_ver2
00 opencapi acceleration framework yonglu_ver2
 
Deep learning with FPGA
Deep learning with FPGADeep learning with FPGA
Deep learning with FPGA
 
FPGA MeetUp
FPGA MeetUpFPGA MeetUp
FPGA MeetUp
 
OpenCAPI Technology Ecosystem
OpenCAPI Technology EcosystemOpenCAPI Technology Ecosystem
OpenCAPI Technology Ecosystem
 
Heterogeneous Computing on POWER - IBM and OpenPOWER technologies to accelera...
Heterogeneous Computing on POWER - IBM and OpenPOWER technologies to accelera...Heterogeneous Computing on POWER - IBM and OpenPOWER technologies to accelera...
Heterogeneous Computing on POWER - IBM and OpenPOWER technologies to accelera...
 
Software Network Data Plane - Satisfying the need for speed - FD.io - VPP and...
Software Network Data Plane - Satisfying the need for speed - FD.io - VPP and...Software Network Data Plane - Satisfying the need for speed - FD.io - VPP and...
Software Network Data Plane - Satisfying the need for speed - FD.io - VPP and...
 
Using FPGA in Embedded Devices
Using FPGA in Embedded DevicesUsing FPGA in Embedded Devices
Using FPGA in Embedded Devices
 
FPGAs in the cloud? (October 2017)
FPGAs in the cloud? (October 2017)FPGAs in the cloud? (October 2017)
FPGAs in the cloud? (October 2017)
 
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
 
GTC15-Manoj-Roge-OpenPOWER
GTC15-Manoj-Roge-OpenPOWERGTC15-Manoj-Roge-OpenPOWER
GTC15-Manoj-Roge-OpenPOWER
 
Infrastructure et serveurs HP
Infrastructure et serveurs HPInfrastructure et serveurs HP
Infrastructure et serveurs HP
 
"Making Computer Vision Software Run Fast on Your Embedded Platform," a Prese...
"Making Computer Vision Software Run Fast on Your Embedded Platform," a Prese..."Making Computer Vision Software Run Fast on Your Embedded Platform," a Prese...
"Making Computer Vision Software Run Fast on Your Embedded Platform," a Prese...
 
Xilinx track g
Xilinx   track gXilinx   track g
Xilinx track g
 
Heterogeneous Computing : The Future of Systems
Heterogeneous Computing : The Future of SystemsHeterogeneous Computing : The Future of Systems
Heterogeneous Computing : The Future of Systems
 
ProjectVault[VivekKumar_CS-C_6Sem_MIT].pptx
ProjectVault[VivekKumar_CS-C_6Sem_MIT].pptxProjectVault[VivekKumar_CS-C_6Sem_MIT].pptx
ProjectVault[VivekKumar_CS-C_6Sem_MIT].pptx
 

More from Ganesan Narayanasamy

Chip Design Curriculum development Residency program
Chip Design Curriculum development Residency programChip Design Curriculum development Residency program
Chip Design Curriculum development Residency program
Ganesan Narayanasamy
 
Basics of Digital Design and Verilog
Basics of Digital Design and VerilogBasics of Digital Design and Verilog
Basics of Digital Design and Verilog
Ganesan Narayanasamy
 
180 nm Tape out experience using Open POWER ISA
180 nm Tape out experience using Open POWER ISA180 nm Tape out experience using Open POWER ISA
180 nm Tape out experience using Open POWER ISA
Ganesan Narayanasamy
 
Workload Transformation and Innovations in POWER Architecture
Workload Transformation and Innovations in POWER Architecture Workload Transformation and Innovations in POWER Architecture
Workload Transformation and Innovations in POWER Architecture
Ganesan Narayanasamy
 
OpenPOWER Workshop at IIT Roorkee
OpenPOWER Workshop at IIT RoorkeeOpenPOWER Workshop at IIT Roorkee
OpenPOWER Workshop at IIT Roorkee
Ganesan Narayanasamy
 
Deep Learning Use Cases using OpenPOWER systems
Deep Learning Use Cases using OpenPOWER systemsDeep Learning Use Cases using OpenPOWER systems
Deep Learning Use Cases using OpenPOWER systems
Ganesan Narayanasamy
 
IBM BOA for POWER
IBM BOA for POWER IBM BOA for POWER
IBM BOA for POWER
Ganesan Narayanasamy
 
AI in healthcare and Automobile Industry using OpenPOWER/IBM POWER9 systems
AI in healthcare and Automobile Industry using OpenPOWER/IBM POWER9 systemsAI in healthcare and Automobile Industry using OpenPOWER/IBM POWER9 systems
AI in healthcare and Automobile Industry using OpenPOWER/IBM POWER9 systems
Ganesan Narayanasamy
 
AI in healthcare - Use Cases
AI in healthcare - Use Cases AI in healthcare - Use Cases
AI in healthcare - Use Cases
Ganesan Narayanasamy
 
AI in Health Care using IBM Systems/OpenPOWER systems
AI in Health Care using IBM Systems/OpenPOWER systemsAI in Health Care using IBM Systems/OpenPOWER systems
AI in Health Care using IBM Systems/OpenPOWER systems
Ganesan Narayanasamy
 
AI in Healh Care using IBM POWER systems
AI in Healh Care using IBM POWER systems AI in Healh Care using IBM POWER systems
AI in Healh Care using IBM POWER systems
Ganesan Narayanasamy
 
Poster from NUS
Poster from NUSPoster from NUS
Poster from NUS
Ganesan Narayanasamy
 
SAP HANA on POWER9 systems
SAP HANA on POWER9 systemsSAP HANA on POWER9 systems
SAP HANA on POWER9 systems
Ganesan Narayanasamy
 
Graphical Structure Learning accelerated with POWER9
Graphical Structure Learning accelerated with POWER9Graphical Structure Learning accelerated with POWER9
Graphical Structure Learning accelerated with POWER9
Ganesan Narayanasamy
 
AI in the enterprise
AI in the enterprise AI in the enterprise
AI in the enterprise
Ganesan Narayanasamy
 
Robustness in deep learning
Robustness in deep learningRobustness in deep learning
Robustness in deep learning
Ganesan Narayanasamy
 
Perspectives of Frond end Design
Perspectives of Frond end DesignPerspectives of Frond end Design
Perspectives of Frond end Design
Ganesan Narayanasamy
 
A2O Core implementation on FPGA
A2O Core implementation on FPGAA2O Core implementation on FPGA
A2O Core implementation on FPGA
Ganesan Narayanasamy
 
OpenPOWER Foundation Introduction
OpenPOWER Foundation Introduction OpenPOWER Foundation Introduction
OpenPOWER Foundation Introduction
Ganesan Narayanasamy
 
Open Hardware and Future Computing
Open Hardware and Future ComputingOpen Hardware and Future Computing
Open Hardware and Future Computing
Ganesan Narayanasamy
 

More from Ganesan Narayanasamy (20)

Chip Design Curriculum development Residency program
Chip Design Curriculum development Residency programChip Design Curriculum development Residency program
Chip Design Curriculum development Residency program
 
Basics of Digital Design and Verilog
Basics of Digital Design and VerilogBasics of Digital Design and Verilog
Basics of Digital Design and Verilog
 
180 nm Tape out experience using Open POWER ISA
180 nm Tape out experience using Open POWER ISA180 nm Tape out experience using Open POWER ISA
180 nm Tape out experience using Open POWER ISA
 
Workload Transformation and Innovations in POWER Architecture
Workload Transformation and Innovations in POWER Architecture Workload Transformation and Innovations in POWER Architecture
Workload Transformation and Innovations in POWER Architecture
 
OpenPOWER Workshop at IIT Roorkee
OpenPOWER Workshop at IIT RoorkeeOpenPOWER Workshop at IIT Roorkee
OpenPOWER Workshop at IIT Roorkee
 
Deep Learning Use Cases using OpenPOWER systems
Deep Learning Use Cases using OpenPOWER systemsDeep Learning Use Cases using OpenPOWER systems
Deep Learning Use Cases using OpenPOWER systems
 
IBM BOA for POWER
IBM BOA for POWER IBM BOA for POWER
IBM BOA for POWER
 
AI in healthcare and Automobile Industry using OpenPOWER/IBM POWER9 systems
AI in healthcare and Automobile Industry using OpenPOWER/IBM POWER9 systemsAI in healthcare and Automobile Industry using OpenPOWER/IBM POWER9 systems
AI in healthcare and Automobile Industry using OpenPOWER/IBM POWER9 systems
 
AI in healthcare - Use Cases
AI in healthcare - Use Cases AI in healthcare - Use Cases
AI in healthcare - Use Cases
 
AI in Health Care using IBM Systems/OpenPOWER systems
AI in Health Care using IBM Systems/OpenPOWER systemsAI in Health Care using IBM Systems/OpenPOWER systems
AI in Health Care using IBM Systems/OpenPOWER systems
 
AI in Healh Care using IBM POWER systems
AI in Healh Care using IBM POWER systems AI in Healh Care using IBM POWER systems
AI in Healh Care using IBM POWER systems
 
Poster from NUS
Poster from NUSPoster from NUS
Poster from NUS
 
SAP HANA on POWER9 systems
SAP HANA on POWER9 systemsSAP HANA on POWER9 systems
SAP HANA on POWER9 systems
 
Graphical Structure Learning accelerated with POWER9
Graphical Structure Learning accelerated with POWER9Graphical Structure Learning accelerated with POWER9
Graphical Structure Learning accelerated with POWER9
 
AI in the enterprise
AI in the enterprise AI in the enterprise
AI in the enterprise
 
Robustness in deep learning
Robustness in deep learningRobustness in deep learning
Robustness in deep learning
 
Perspectives of Frond end Design
Perspectives of Frond end DesignPerspectives of Frond end Design
Perspectives of Frond end Design
 
A2O Core implementation on FPGA
A2O Core implementation on FPGAA2O Core implementation on FPGA
A2O Core implementation on FPGA
 
OpenPOWER Foundation Introduction
OpenPOWER Foundation Introduction OpenPOWER Foundation Introduction
OpenPOWER Foundation Introduction
 
Open Hardware and Future Computing
Open Hardware and Future ComputingOpen Hardware and Future Computing
Open Hardware and Future Computing
 

Recently uploaded

PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
DianaGray10
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
Matthew Sinclair
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems S.M.S.A.
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIEnchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Vladimir Iglovikov, Ph.D.
 
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofszkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
Alex Pruden
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
DianaGray10
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
Neo4j
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
Free Complete Python - A step towards Data Science
Free Complete Python - A step towards Data ScienceFree Complete Python - A step towards Data Science
Free Complete Python - A step towards Data Science
RinaMondal9
 
Large Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial ApplicationsLarge Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial Applications
Rohit Gautam
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
mikeeftimakis1
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
Kumud Singh
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
Aftab Hussain
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
Matthew Sinclair
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
Kari Kakkonen
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
Safe Software
 

Recently uploaded (20)

PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIEnchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
 
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofszkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
Free Complete Python - A step towards Data Science
Free Complete Python - A step towards Data ScienceFree Complete Python - A step towards Data Science
Free Complete Python - A step towards Data Science
 
Large Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial ApplicationsLarge Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial Applications
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
 

SCFE 2020 OpenCAPI presentation as part of OpenPWOER Tutorial

  • 1. IntroductionTo Acceleration with OpenCAPI SCFE 2020 - March 24th 2020 - A.CASTELLANE
  • 2. What Do You Need? 2 Out In
  • 3. Current Computing Landscape 3 CPU technology advances have slowed the historical cost/performance improvements seen over the last several decades => New CPU chips alone can not handle current challenges! Over Burdened CPUs Slow/Complex Algorithms & Functions + 100101010100011001 100110010010010010 101010001100110011 001001001001010101 000110011001100100 100100101010100011 001100110010010010 010101010001100110 011001001001001010 101000110011001100 100100100101010100 011001100110010010 01001010101 CPU CPU CPU CPU CPU DATA Network & Data Access Rates + Computation Data Access = Current Technology and Processing Overload! Bad News, it’s only going to get worse!
  • 4. Next Set Of Challenges Is Here! 4 Exponential Data Growth Compute Intensive Algorithms Diverse Data Structures & Types Decreasing Time To Results Hours .. Minutes .. Seconds .. Real Time Compute  AI, Machine / Deep Learning  Video Processing  Database / Big Data Analytics Storage  Scale-out Storage  Petabytes of new data  Intelligent / Compute SSDs Networking  Network Security  Low-latency Networking  Open vSwitch offload  Software Defined Networking Acceleration
  • 5. Next Challenges Affect All Computing Fields 5 Bank / Finance • Risk analysis / Faster trading: Monte Carlo libraries • Credit card fraud detection • Block chain acceleration Video / Analytics • Smart Video surveillance from multiple videos feed • 3D video stream from multi-angles videos streams • Image search / Object tracking / Scene recreation • Multi-jpeg compression Machine Learning / Deep learning • Machine learning inference • Accelerate frequently used ML / DL algorithm Algorithm acceleration • Compression on network path or storage • Encryption on the fly to various memory types • String match
  • 6. But, what if you could have the best of both worlds! Options: Software or Hardware? 6 • Software: • Advantages: • More rapid development leading to faster time to market • Lower non-recurring engineering costs. Software can be reused easily. • Heightened portability • Ease of updating features or patching bugs • Disadvantages: • Slower run time • Hardware • Advantages: • Much faster execution of functions • Reduced power consumption • Lower latency • Increased parallelism and bandwidth • Better utilization of area and functional components available on an integrated circuit (IC) • Disadvantages: • Lower ability to update designs once etched onto silicon • Difficult to share Verilog/VHDL source code between different hardware platforms • Higher costs of functional verification • Longer develop process and time to market
  • 7. So, what’s the solution? 7 The use of computer hardware specially designed to perform functions more efficiently than is possible in software alone running on a general-purpose CPU. Hardware Acceleration Thousands of tiny CPU using high parallelization  compute intensive application Field Programmable Gate Array Logic + IOs are customized exactly for the application's needs.  Very low and predictable latency applications Two Options GPU FPGA
  • 8. The Better Choice? 8 Due to the inherent logic and IO flexibility, speed, and predictably low latency, FPGAs have a clear advantage. FPGA Acceleration FPGA = Field Programmable Gate Array Historically programmed using Verilog/VHDL Compiled Mapped to FPGA HW Logic
  • 9. What is a FPGA? 9 • A re-programmable computer chip with lots of configurable logic elements based on Lookup-Tables (LUT) • Programmable switch matrix routing • Configurable I/O and high-speed serial links • Advantages in flexibility, speed, and low latency due to: • Limited instruction set • High parallelism • Deep pipelines Programmable switchLogical View Programmable logic element • Integrated Hard IP (Multiply/Add, SRAM, PLL, PCIe, Ethernet, DRAM,...) Field Programmable Gate Array
  • 10. FPGA Example (Bittware 250-SOC) 10 Bittware 250-SoC Multipurpose Converged Network / Storage • Xilinx Zync UltraScale+ FPGA ZU19EG (64 bits Cortex-A53 ARM core) • Two 4GB DDR4 (for FPGA and ARM) • PCIe Gen3 x16 / Gen4 x8  CAPI2 • Up to 4 x8 Oculink ports suporting NVMe, 100GbE and OpenCAPI • 2x 100GbE QSFP28 cages • Half Height - Half Length format
  • 11. Basics of HW Acceleration 11 Standard CPU Setup (No Acceleration) Host Memory Over burdened CPU Slow functions Congested memory and output card access CPU manages all data, memory access, functions, and flows With increased data, computing, storage, and network challenges Function Application
  • 12. Basics of HW Acceleration 12 Standard CPU Setup (No Acceleration) Host Memory CPU manages all data, memory access, functions, and flows  CPU manages all data, memory access, functions, and flows Over burdened CPU Slow functions Congested memory and output card access Application Function
  • 13. HW Acceleration with FPGA 13 Classic Acceleration with FPGA Host Memory Faster functions on FPGA Relieved function only from CPU burden CPU still handles FPGA memory access and data copying. No Data Coherency Standard CPU Setup (No Acceleration) Host Memory Historically programmed using Verilog/VHDL Function  CPU manages all data, memory access, functions, and flows Over burdened CPU Congested memory and output card access Slow functions ApplicationApplication Function
  • 14. HW Acceleration with FPGA 14 Standard CPU Setup (No Acceleration) Host Memory Classic Acceleration with FPGA Host Memory Function  CPU is used to manage FPGA memory access No Data Coherency (Host memory copied to FPGA) FPGA historically programmed using Verilog/VHDL CPU still handles all memory and data access  CPU manages all data, memory access, functions, and flows Over burdened CPU Congested memory and output card access Slow functions ApplicationApplication Function
  • 15. Addressing Classic FPGA Acceleration Issues 15 • What is OpenCAPI? • Open Coherent Accelerator Processor Interface • OpenCAPI is an open interface architecture that allows any microprocessor to attach to: • Coherent user-level accelerators and I/O devices • Advanced memories accessible via read/write or user-level direct memory access (DMA) semantics • Agnostic to processor architecture • What is OC-Accel? • OpenCAPI Acceleration Framework to program FPGAs using C/C++ instead of Verilog or VHDL OpenCAPI 3.0 OC 3.1 OpenCAPI specifications are downloadable from www.opencapi.org
  • 16. HW Acceleration with FPGA + OpenCAPI 16 Classic Acceleration with FPGA Host Memory Function Acceleration with FPGA + OpenCAPI Host Memory OpenCAPI  OpenCAPI IO interface on FPGA accesses host memory directly  Function accesses only needed host memory data  Data Coherency (Data does not need to be copied to FPGA)  Address translation (@function=@application)  FPGA programmed with C/C++ using OC-Accel Framework Function  CPU is used to manage FPGA memory access No Data Coherency (Host memory copied to FPGA) FPGA historically programmed using Verilog/VHDL CPU still handles all memory and data access ApplicationApplication
  • 17. • Hardware • Advantages: • Using FPGA instead of CPU • FPGA is function specific only • FPGA is fast + OpenCAPI direct memory access • FPGA can have parallel logic • FPGA uses function logic only • Disadvantages: • FPGA easily reconfigurable with C/C++ updates • C/C++ easily recompiled for different FPGAs • C/C++ code simulated and debugged • C/C++ code can be easier to write and upload • Software • Advantages: • App. Eng. Writing C/C++ functions (OC-Accel) • C/C++ code is reusable • C/C++ code is portable • FPGA reconfigurable with C/C++ updates • Disadvantages: • Function executed faster on FPGA + CPU relief • Software • Advantages: • More rapid development • Lower non-recurring engineering costs • Heightened portability • Ease of updating features or patching bugs • Disadvantages: • Slower run time FPGAs + OpenCAPI + OC-Accel Address All Issues 17 • Hardware • Advantages: • Much faster execution of functions • Reduced power consumption • Lower latency • Increased parallelism and bandwidth • Better IC area and function utilization • Disadvantages: • Lower ability to update design hardware • Difficult to share source code btw FPGAs • Higher costs of functional verification • Longer develop process and time to market
  • 18. Ex: Monte-Carlo (FPGA Accelerated) 18 Monte Carlo Analysis is a risk management technique used in the financial and insurance industries and is used for conducting a quantitative analysis of risks. By using CAPI with a FPGA, the C/C++ code was reduce by 40x on the application side and freed up 33% of memory and CPU (versus a non-CAPI FPGA ). Running 1 million iterations Results: At least 50x Faster with CAPI and FPGA technology on POWER server
  • 19. Ex: PostgreSQL regex Matching Accelerated 19 PostgreSQL + OpenCAPI shows compelling “regex” performance increase by leveraging the bandwidth and virtual addressing of OpenCAPI technology. In fact, accelerating the SQL with OpenCAPI-regex can be 4x to 10x faster than the best PostgreSQL built-in functions (CPU multi-threads enabled). PostgreSQL is a powerful, open source object-relational database system. SQL (Structured Query Language) is used to communicate with a database. Actual Example Single Search Run Times: • CPU parallel Seq Scan: ~698ms • Custom Scan (PFCAPI): ~161ms SELECT * FROM table WHERE pkt ~ pattern; Basically: search the db for all pkt that match pattern Command example
  • 20. Ex: Ultra Fast Data Acquisition (X-Ray Crystallography) 20 9GBps 1 4 4 MPixels @ 1.1kHz Digital Camera Sensors Raw Data Goal: Real-time mapping of biological structure by examining molecule scatter plots of protein crystal struck by x-rays 2 3 GPU PCIe GPU + PCIe Configuration (Today) Protein Molecule Mapped Real Image Raw data to real image conversion Decimate / sort images Data compression 1 Data acquisition 2 3 4 Compressed Data
  • 21. Ex: Ultra Fast Data Acquisition (X-Ray Crystallography) 21 22GBps 1 2 4 10 MPixels @ 2.2 kHz Digital Camera Sensors 22GBps Compressed Data FPGA w/ OpenCAPI (Goal) OpenCAPI3.0 22GBps Dual FPGAs In Parallel UnfilteredImage FilteredImage GPU or FPGA of both Host with NX-gzip Embedded HW Accelerator Raw Data 22GBps Image Data OpenCAPI breaks the 9GBps PCIe bottleneck! Protein Molecule Mapped Real Image Raw data to real image conversion Decimate / sort images Data compression Data acquisition 3 4 3 Goal: Real-time mapping of biological structure by examining molecule scatter plots of protein crystal struck by x-rays 1 2
  • 22. Ex: Pull Quote 22 The benefit of using POWER interfaces, i.e., NVLink and OpenCAPI, is not only bandwidth, but these interfaces allow also for coherent memory access. FPGA board connected via OpenCAPI or GPGPU connected via NVLink sees host (CPU) virtual memory space exactly like the process running on the CPU, reducing the burden of writing reliable and secure applications. Memory coherency can be also available for PCIe FPGA accelerators installed in POWER9 servers via OpenCAPI predecessor, the Coherent Accelerator Processor Interface (CAPI). IBM also provides optimized software to benefit from the architecture, including the CAPI Storage, Network, and Analytics Programming (SNAP) framework51,52 that simplifies the integration of FPGA designs with POWER9, as well as optimized ML and data analysis routines for GPGPUs or FPGAs.53 Structural Dynamics 7, 014305 (2020); https://doi.org/10.1063/1.5143480
  • 23. Ex: Memory Coherency 23 Scenario: 2MB data scattered in host memory are processed in a FPGA. « Classic » PCIe FPGA card Server Function Server « CAPI-enabled » FPGA card Function blk blk blk blk Gathering data (SW memcopy) 1 1 transaction of big amount of data to FPGA (2MB) 2 1 2 1 transaction of 8kB for AddrSet from host memory to FPGA 1 1024 transactions of 2kB from Host memory to FPGA. Directly reads required data at random address. 2 1 2 ApplicationApplication CAPI Results: CAPI-enabled was 2-3x faster than Classic method
  • 24. Ease of FPGA Programming (OC-Accel) 24 Benefits: • Faster Time To Market: Port a function to a FPGA in days not months • No Obsolescence: Simply recompile unchanged C/C++ code for different FPGA • No Link Constraint: Moving from a CAPI (over PCIe) link to OpenCAPI is just a matter of recompiling - no code change • No Specific Hardware Skills Needed: C/C++ coder can focus on functionality as all the resources are managed by the framework. • Open-Source Framework: The code can be modified, improved by any user. Example: • Note: SNAP is the predecessor to OC-Accel and overall flow and performance is equivalent. • Customer ported and optimized SHA3 C code within 10 days using SNAP* framework versus 4 months in VHDL without SNAP Development Plans: • OC-Accel with OpenCAPI today, OC-Accel with other emerging standards like CXL tomorrow!
  • 25. FPGAs + OpenCAPI + OC-Accel Has It All 25 Very high bandwidth Faster development and time to market with OC-Accel
  • 26. Developers Aren’t Where We Need Them Scripting Interpreted App (Python / Rails / Java) Non-Interpreted App (C++ / Java JRE) Procedural App (C / C++) High Level OS (C / C++) Firmware HW API (C, ASM) Kernel (C, AS) HDL Chart content courtesy of Aaron Sullivan @Rackspace Spreading the CAPI Love (OC-Accel) 26
  • 27. Interpreted App (Python / Rails / Java) Non-Interpreted App (C++ / Java JRE) Procedural App (C / C++) High Level OS (C / C++) Kernel (C, AS) HW API (C, ASM) Firmware Scripting HDL Application Application New Abstraction New Abstraction New Abstraction New Abstraction Soft-Hardware Soft-Hardware Soft-Hardware Spreading the CAPI Love (OC-Accel) Developers Where We Need Them Chart content courtesy of Aaron Sullivan @Rackspace 27
  • 28. - Know more about accelerators ? - See a live demonstration? - Do a benchmark ? - Get answers to your questions? Contact us alexandre.castellane@fr.ibm.com bruno.mesnet@fr.ibm.com fabrice_moyen@fr.ibm.com luyong@cn.ibm.com shgoupf@cn.ibm.com 28