SlideShare a Scribd company logo
1 of 29
Download to read offline
IntroductionTo Acceleration with OpenCAPI
SCFE 2020 - March 24th 2020 - A.CASTELLANE
What Do You Need?
2
Out In
Current Computing Landscape
3
CPU technology advances have slowed the historical cost/performance improvements seen
over the last several decades => New CPU chips alone can not handle current challenges!
Over Burdened
CPUs
Slow/Complex
Algorithms &
Functions
+
100101010100011001
100110010010010010
101010001100110011
001001001001010101
000110011001100100
100100101010100011
001100110010010010
010101010001100110
011001001001001010
101000110011001100
100100100101010100
011001100110010010
01001010101
CPU
CPU
CPU
CPU
CPU
DATA
Network & Data Access
Rates
+
Computation Data Access
=
Current Technology and
Processing Overload!
Bad News, it’s only going to get worse!
Next Set Of Challenges Is Here!
4
Exponential Data Growth Compute Intensive Algorithms
Diverse Data Structures & Types Decreasing Time To Results
Hours .. Minutes .. Seconds .. Real Time
Compute
 AI, Machine / Deep Learning
 Video Processing
 Database / Big Data Analytics
Storage
 Scale-out Storage
 Petabytes of new data
 Intelligent / Compute SSDs
Networking
 Network Security
 Low-latency Networking
 Open vSwitch offload
 Software Defined Networking Acceleration
Next Challenges Affect All Computing Fields
5
Bank / Finance
• Risk analysis / Faster trading: Monte Carlo libraries
• Credit card fraud detection
• Block chain acceleration
Video / Analytics
• Smart Video surveillance from multiple videos feed
• 3D video stream from multi-angles videos streams
• Image search / Object tracking / Scene recreation
• Multi-jpeg compression
Machine Learning / Deep learning
• Machine learning inference
• Accelerate frequently used ML / DL algorithm
Algorithm acceleration
• Compression on network path or storage
• Encryption on the fly to various memory types
• String match
But, what if you
could have the best
of both worlds!
Options: Software or Hardware?
6
• Software:
• Advantages:
• More rapid development leading to faster time to market
• Lower non-recurring engineering costs. Software can be reused easily.
• Heightened portability
• Ease of updating features or patching bugs
• Disadvantages:
• Slower run time
• Hardware
• Advantages:
• Much faster execution of functions
• Reduced power consumption
• Lower latency
• Increased parallelism and bandwidth
• Better utilization of area and functional components available on an integrated circuit (IC)
• Disadvantages:
• Lower ability to update designs once etched onto silicon
• Difficult to share Verilog/VHDL source code between different hardware platforms
• Higher costs of functional verification
• Longer develop process and time to market
So, what’s the solution?
7
The use of computer hardware specially designed to perform functions more
efficiently than is possible in software alone running on a general-purpose CPU.
Hardware Acceleration
Thousands of tiny CPU using high
parallelization
 compute intensive application
Field Programmable Gate Array
Logic + IOs are customized exactly for the
application's needs.
 Very low and predictable latency applications
Two Options
GPU FPGA
The Better Choice?
8
Due to the inherent logic and IO flexibility, speed, and
predictably low latency, FPGAs have a clear advantage.
FPGA Acceleration
FPGA = Field Programmable Gate Array
Historically programmed
using Verilog/VHDL
Compiled
Mapped to FPGA HW Logic
What is a FPGA?
9
• A re-programmable computer chip with lots of configurable logic
elements based on Lookup-Tables (LUT)
• Programmable switch matrix routing
• Configurable I/O and high-speed serial links
• Advantages in flexibility, speed, and low latency due to:
• Limited instruction set
• High parallelism
• Deep pipelines
Programmable switchLogical View
Programmable logic element
• Integrated Hard IP (Multiply/Add, SRAM, PLL, PCIe, Ethernet, DRAM,...)
Field Programmable Gate Array
FPGA Example (Bittware 250-SOC)
10
Bittware 250-SoC
Multipurpose Converged Network / Storage
• Xilinx Zync UltraScale+ FPGA ZU19EG (64 bits Cortex-A53 ARM core)
• Two 4GB DDR4 (for FPGA and ARM)
• PCIe Gen3 x16 / Gen4 x8  CAPI2
• Up to 4 x8 Oculink ports suporting NVMe, 100GbE and OpenCAPI
• 2x 100GbE QSFP28 cages
• Half Height - Half Length format
Basics of HW Acceleration
11
Standard CPU Setup (No Acceleration)
Host Memory
Over burdened CPU
Slow functions
Congested
memory and
output card
access
CPU manages all data,
memory access,
functions, and flows
With increased data,
computing, storage, and
network challenges
Function
Application
Basics of HW Acceleration
12
Standard CPU Setup (No Acceleration)
Host Memory
CPU manages all data,
memory access,
functions, and flows
 CPU manages all data, memory access, functions, and flows
Over burdened CPU
Slow functions
Congested memory and output card access
Application
Function
HW Acceleration with FPGA
13
Classic Acceleration with FPGA
Host Memory
Faster functions
on FPGA
Relieved function only
from CPU burden
CPU still handles
FPGA memory
access and data
copying.
No Data Coherency
Standard CPU Setup (No Acceleration)
Host Memory
Historically
programmed using
Verilog/VHDL
Function
 CPU manages all data, memory access, functions, and flows
Over burdened CPU
Congested memory and output card access
Slow functions
ApplicationApplication
Function
HW Acceleration with FPGA
14
Standard CPU Setup (No Acceleration)
Host Memory
Classic Acceleration with FPGA
Host Memory
Function
 CPU is used to manage FPGA memory access
No Data Coherency (Host memory copied to FPGA)
FPGA historically programmed using Verilog/VHDL
CPU still handles all memory and data access
 CPU manages all data, memory access, functions, and flows
Over burdened CPU
Congested memory and output card access
Slow functions
ApplicationApplication
Function
Addressing Classic FPGA Acceleration Issues
15
• What is OpenCAPI?
• Open Coherent Accelerator Processor
Interface
• OpenCAPI is an open interface
architecture that allows any
microprocessor to attach to:
• Coherent user-level accelerators and
I/O devices
• Advanced memories accessible via
read/write or user-level direct
memory access (DMA) semantics
• Agnostic to processor architecture
• What is OC-Accel?
• OpenCAPI Acceleration Framework to
program FPGAs using C/C++ instead of
Verilog or VHDL
OpenCAPI 3.0
OC 3.1
OpenCAPI specifications are downloadable from www.opencapi.org
HW Acceleration with FPGA + OpenCAPI
16
Classic Acceleration with FPGA
Host Memory
Function
Acceleration with FPGA + OpenCAPI
Host Memory
OpenCAPI
 OpenCAPI IO interface on FPGA accesses host memory directly
 Function accesses only needed host memory data
 Data Coherency (Data does not need to be copied to FPGA)
 Address translation (@function=@application)
 FPGA programmed with C/C++ using OC-Accel Framework
Function
 CPU is used to manage FPGA memory access
No Data Coherency (Host memory copied to FPGA)
FPGA historically programmed using Verilog/VHDL
CPU still handles all memory and data access
ApplicationApplication
• Hardware
• Advantages:
• Using FPGA instead of CPU
• FPGA is function specific only
• FPGA is fast + OpenCAPI direct memory access
• FPGA can have parallel logic
• FPGA uses function logic only
• Disadvantages:
• FPGA easily reconfigurable with C/C++ updates
• C/C++ easily recompiled for different FPGAs
• C/C++ code simulated and debugged
• C/C++ code can be easier to write and upload
• Software
• Advantages:
• App. Eng. Writing C/C++ functions (OC-Accel)
• C/C++ code is reusable
• C/C++ code is portable
• FPGA reconfigurable with C/C++ updates
• Disadvantages:
• Function executed faster on FPGA + CPU relief
• Software
• Advantages:
• More rapid development
• Lower non-recurring engineering costs
• Heightened portability
• Ease of updating features or patching bugs
• Disadvantages:
• Slower run time
FPGAs + OpenCAPI + OC-Accel Address All Issues
17
• Hardware
• Advantages:
• Much faster execution of functions
• Reduced power consumption
• Lower latency
• Increased parallelism and bandwidth
• Better IC area and function utilization
• Disadvantages:
• Lower ability to update design hardware
• Difficult to share source code btw FPGAs
• Higher costs of functional verification
• Longer develop process and time to market
Ex: Monte-Carlo (FPGA Accelerated)
18
Monte Carlo Analysis is a risk
management technique used in
the financial and insurance
industries and is used for
conducting a quantitative analysis
of risks.
By using CAPI with a FPGA, the C/C++ code was reduce by 40x on the application
side and freed up 33% of memory and CPU (versus a non-CAPI FPGA ).
Running 1 million iterations
Results: At least
50x Faster
with CAPI and FPGA technology on
POWER server
Ex: PostgreSQL regex Matching Accelerated
19
PostgreSQL + OpenCAPI shows compelling “regex” performance increase by leveraging the bandwidth and virtual
addressing of OpenCAPI technology. In fact, accelerating the SQL with OpenCAPI-regex can be 4x to 10x faster than the
best PostgreSQL built-in functions (CPU multi-threads enabled).
PostgreSQL is a powerful, open source object-relational database system. SQL (Structured Query Language)
is used to communicate with a database.
Actual Example Single Search Run Times:
• CPU parallel Seq Scan: ~698ms
• Custom Scan (PFCAPI): ~161ms
SELECT * FROM table WHERE pkt ~ pattern;
Basically: search the db for all pkt that match pattern
Command example
Ex: Ultra Fast Data Acquisition (X-Ray Crystallography)
20
9GBps
1 4
4 MPixels @ 1.1kHz
Digital Camera Sensors
Raw Data
Goal: Real-time mapping of biological structure by examining molecule scatter plots of protein crystal struck by x-rays
2 3
GPU
PCIe
GPU + PCIe Configuration
(Today)
Protein
Molecule
Mapped
Real Image
Raw data to real image conversion
Decimate / sort images
Data compression
1 Data acquisition
2
3
4
Compressed
Data
Ex: Ultra Fast Data Acquisition (X-Ray Crystallography)
21
22GBps
1 2 4
10 MPixels @ 2.2 kHz
Digital Camera Sensors
22GBps
Compressed
Data
FPGA w/ OpenCAPI
(Goal)
OpenCAPI3.0
22GBps
Dual FPGAs
In Parallel
UnfilteredImage
FilteredImage
GPU or FPGA of both
Host with NX-gzip
Embedded
HW Accelerator
Raw Data
22GBps
Image Data
OpenCAPI breaks the 9GBps PCIe
bottleneck!
Protein
Molecule
Mapped
Real Image
Raw data to real image conversion
Decimate / sort images
Data compression
Data acquisition
3
4
3
Goal: Real-time mapping of biological structure by examining molecule scatter plots of protein crystal struck by x-rays
1
2
Ex: Pull Quote
22
The benefit of using POWER interfaces, i.e., NVLink and OpenCAPI, is
not only bandwidth, but these interfaces allow also for coherent
memory access. FPGA board connected via OpenCAPI or GPGPU
connected via NVLink sees host (CPU) virtual memory space exactly like
the process running on the CPU, reducing the burden of writing
reliable and secure applications. Memory coherency can be also
available for PCIe FPGA accelerators installed in POWER9 servers via
OpenCAPI predecessor, the Coherent Accelerator Processor Interface
(CAPI). IBM also provides optimized software to benefit from the
architecture, including the CAPI Storage, Network, and Analytics
Programming (SNAP) framework51,52 that simplifies the integration of
FPGA designs with POWER9, as well as optimized ML and data analysis
routines for GPGPUs or FPGAs.53
Structural Dynamics 7, 014305 (2020); https://doi.org/10.1063/1.5143480
Ex: Memory Coherency
23
Scenario: 2MB data scattered in host memory are processed in a FPGA.
« Classic » PCIe FPGA card
Server
Function
Server
« CAPI-enabled » FPGA card
Function
blk blk blk blk
Gathering data (SW memcopy)
1
1 transaction of big amount of
data to FPGA (2MB)
2
1
2
1 transaction of 8kB for AddrSet
from host memory to FPGA
1
1024 transactions of 2kB from
Host memory to FPGA.
Directly reads required data at
random address.
2 1
2
ApplicationApplication CAPI
Results: CAPI-enabled was 2-3x faster than Classic method
Ease of FPGA Programming (OC-Accel)
24
Benefits:
• Faster Time To Market: Port a function to a FPGA in days not months
• No Obsolescence: Simply recompile unchanged C/C++ code for different FPGA
• No Link Constraint: Moving from a CAPI (over PCIe) link to OpenCAPI is just a matter of recompiling
- no code change
• No Specific Hardware Skills Needed: C/C++ coder can focus on functionality as all the resources are
managed by the framework.
• Open-Source Framework: The code can be modified, improved by any user.
Example:
• Note: SNAP is the predecessor to OC-Accel and overall flow and performance is equivalent.
• Customer ported and optimized SHA3 C code within 10 days using SNAP* framework versus
4 months in VHDL without SNAP
Development Plans:
• OC-Accel with OpenCAPI today, OC-Accel with other emerging standards like CXL tomorrow!
FPGAs + OpenCAPI + OC-Accel Has It All
25
Very high bandwidth
Faster development and time
to market with OC-Accel
Developers Aren’t Where We Need Them
Scripting
Interpreted App (Python / Rails / Java)
Non-Interpreted App (C++ / Java JRE)
Procedural App (C / C++)
High Level OS (C / C++)
Firmware
HW API (C, ASM)
Kernel (C, AS)
HDL
Chart content courtesy of Aaron Sullivan @Rackspace
Spreading the CAPI Love (OC-Accel)
26
Interpreted App (Python / Rails / Java)
Non-Interpreted App (C++ / Java JRE)
Procedural App (C / C++)
High Level OS (C / C++)
Kernel (C, AS)
HW API (C, ASM)
Firmware
Scripting
HDL
Application
Application
New Abstraction
New Abstraction
New Abstraction
New Abstraction
Soft-Hardware
Soft-Hardware
Soft-Hardware
Spreading the CAPI Love (OC-Accel)
Developers Where We Need Them
Chart content courtesy of Aaron Sullivan @Rackspace
27
- Know more about accelerators ?
- See a live demonstration?
- Do a benchmark ?
- Get answers to your questions?
Contact us
alexandre.castellane@fr.ibm.com
bruno.mesnet@fr.ibm.com
fabrice_moyen@fr.ibm.com
luyong@cn.ibm.com
shgoupf@cn.ibm.com
28
29
Thank You!

More Related Content

What's hot

Covid-19 Response Capability with Power Systems
Covid-19 Response Capability with Power SystemsCovid-19 Response Capability with Power Systems
Covid-19 Response Capability with Power SystemsGanesan Narayanasamy
 
MIT's experience on OpenPOWER/POWER 9 platform
MIT's experience on OpenPOWER/POWER 9 platformMIT's experience on OpenPOWER/POWER 9 platform
MIT's experience on OpenPOWER/POWER 9 platformGanesan Narayanasamy
 
Energy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic TuningEnergy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic Tuninginside-BigData.com
 
Hardware & Software Platforms for HPC, AI and ML
Hardware & Software Platforms for HPC, AI and MLHardware & Software Platforms for HPC, AI and ML
Hardware & Software Platforms for HPC, AI and MLinside-BigData.com
 
NNSA Explorations: ARM for Supercomputing
NNSA Explorations: ARM for SupercomputingNNSA Explorations: ARM for Supercomputing
NNSA Explorations: ARM for Supercomputinginside-BigData.com
 
IBM Data Centric Systems & OpenPOWER
IBM Data Centric Systems & OpenPOWERIBM Data Centric Systems & OpenPOWER
IBM Data Centric Systems & OpenPOWERinside-BigData.com
 
Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...inside-BigData.com
 
OpenPOWER Webinar on Machine Learning for Academic Research
OpenPOWER Webinar on Machine Learning for Academic Research OpenPOWER Webinar on Machine Learning for Academic Research
OpenPOWER Webinar on Machine Learning for Academic Research Ganesan Narayanasamy
 
CUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computingCUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computinginside-BigData.com
 
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...inside-BigData.com
 
Exceeding the Limits of Air Cooling to Unlock Greater Potential in HPC
Exceeding the Limits of Air Cooling to Unlock Greater Potential in HPCExceeding the Limits of Air Cooling to Unlock Greater Potential in HPC
Exceeding the Limits of Air Cooling to Unlock Greater Potential in HPCinside-BigData.com
 

What's hot (20)

Covid-19 Response Capability with Power Systems
Covid-19 Response Capability with Power SystemsCovid-19 Response Capability with Power Systems
Covid-19 Response Capability with Power Systems
 
MIT's experience on OpenPOWER/POWER 9 platform
MIT's experience on OpenPOWER/POWER 9 platformMIT's experience on OpenPOWER/POWER 9 platform
MIT's experience on OpenPOWER/POWER 9 platform
 
OpenPOWER Webinar
OpenPOWER Webinar OpenPOWER Webinar
OpenPOWER Webinar
 
Deeplearningusingcloudpakfordata
DeeplearningusingcloudpakfordataDeeplearningusingcloudpakfordata
Deeplearningusingcloudpakfordata
 
Energy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic TuningEnergy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic Tuning
 
Overview of HPC Interconnects
Overview of HPC InterconnectsOverview of HPC Interconnects
Overview of HPC Interconnects
 
Hardware & Software Platforms for HPC, AI and ML
Hardware & Software Platforms for HPC, AI and MLHardware & Software Platforms for HPC, AI and ML
Hardware & Software Platforms for HPC, AI and ML
 
OpenPOWER Latest Updates
OpenPOWER Latest UpdatesOpenPOWER Latest Updates
OpenPOWER Latest Updates
 
NNSA Explorations: ARM for Supercomputing
NNSA Explorations: ARM for SupercomputingNNSA Explorations: ARM for Supercomputing
NNSA Explorations: ARM for Supercomputing
 
IBM Data Centric Systems & OpenPOWER
IBM Data Centric Systems & OpenPOWERIBM Data Centric Systems & OpenPOWER
IBM Data Centric Systems & OpenPOWER
 
Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...
 
OpenPOWER Webinar on Machine Learning for Academic Research
OpenPOWER Webinar on Machine Learning for Academic Research OpenPOWER Webinar on Machine Learning for Academic Research
OpenPOWER Webinar on Machine Learning for Academic Research
 
CUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computingCUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computing
 
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
 
Exceeding the Limits of Air Cooling to Unlock Greater Potential in HPC
Exceeding the Limits of Air Cooling to Unlock Greater Potential in HPCExceeding the Limits of Air Cooling to Unlock Greater Potential in HPC
Exceeding the Limits of Air Cooling to Unlock Greater Potential in HPC
 
SNAP MACHINE LEARNING
SNAP MACHINE LEARNINGSNAP MACHINE LEARNING
SNAP MACHINE LEARNING
 
State of ARM-based HPC
State of ARM-based HPCState of ARM-based HPC
State of ARM-based HPC
 
DOME 64-bit μDataCenter
DOME 64-bit μDataCenterDOME 64-bit μDataCenter
DOME 64-bit μDataCenter
 
2018 bsc power9 and power ai
2018   bsc power9 and power ai 2018   bsc power9 and power ai
2018 bsc power9 and power ai
 
CFD on Power
CFD on Power CFD on Power
CFD on Power
 

Similar to SCFE 2020 OpenCAPI presentation as part of OpenPWOER Tutorial

CAPI and OpenCAPI Hardware acceleration enablement
CAPI and OpenCAPI Hardware acceleration enablementCAPI and OpenCAPI Hardware acceleration enablement
CAPI and OpenCAPI Hardware acceleration enablementGanesan Narayanasamy
 
Using a Field Programmable Gate Array to Accelerate Application Performance
Using a Field Programmable Gate Array to Accelerate Application PerformanceUsing a Field Programmable Gate Array to Accelerate Application Performance
Using a Field Programmable Gate Array to Accelerate Application PerformanceOdinot Stanislas
 
OpenCAPI next generation accelerator
OpenCAPI next generation accelerator OpenCAPI next generation accelerator
OpenCAPI next generation accelerator Ganesan Narayanasamy
 
6 open capi_meetup_in_japan_final
6 open capi_meetup_in_japan_final6 open capi_meetup_in_japan_final
6 open capi_meetup_in_japan_finalYutaka Kawai
 
OpenPOWER Acceleration of HPCC Systems
OpenPOWER Acceleration of HPCC SystemsOpenPOWER Acceleration of HPCC Systems
OpenPOWER Acceleration of HPCC SystemsHPCC Systems
 
00 opencapi acceleration framework yonglu_ver2
00 opencapi acceleration framework yonglu_ver200 opencapi acceleration framework yonglu_ver2
00 opencapi acceleration framework yonglu_ver2Yutaka Kawai
 
Heterogeneous Computing on POWER - IBM and OpenPOWER technologies to accelera...
Heterogeneous Computing on POWER - IBM and OpenPOWER technologies to accelera...Heterogeneous Computing on POWER - IBM and OpenPOWER technologies to accelera...
Heterogeneous Computing on POWER - IBM and OpenPOWER technologies to accelera...Cesar Maciel
 
Software Network Data Plane - Satisfying the need for speed - FD.io - VPP and...
Software Network Data Plane - Satisfying the need for speed - FD.io - VPP and...Software Network Data Plane - Satisfying the need for speed - FD.io - VPP and...
Software Network Data Plane - Satisfying the need for speed - FD.io - VPP and...Haidee McMahon
 
FPGAs in the cloud? (October 2017)
FPGAs in the cloud? (October 2017)FPGAs in the cloud? (October 2017)
FPGAs in the cloud? (October 2017)Julien SIMON
 
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang Spark Summit
 
GTC15-Manoj-Roge-OpenPOWER
GTC15-Manoj-Roge-OpenPOWERGTC15-Manoj-Roge-OpenPOWER
GTC15-Manoj-Roge-OpenPOWERAchronix
 
"Making Computer Vision Software Run Fast on Your Embedded Platform," a Prese...
"Making Computer Vision Software Run Fast on Your Embedded Platform," a Prese..."Making Computer Vision Software Run Fast on Your Embedded Platform," a Prese...
"Making Computer Vision Software Run Fast on Your Embedded Platform," a Prese...Edge AI and Vision Alliance
 
Heterogeneous Computing : The Future of Systems
Heterogeneous Computing : The Future of SystemsHeterogeneous Computing : The Future of Systems
Heterogeneous Computing : The Future of SystemsAnand Haridass
 
ProjectVault[VivekKumar_CS-C_6Sem_MIT].pptx
ProjectVault[VivekKumar_CS-C_6Sem_MIT].pptxProjectVault[VivekKumar_CS-C_6Sem_MIT].pptx
ProjectVault[VivekKumar_CS-C_6Sem_MIT].pptxVivek Kumar
 

Similar to SCFE 2020 OpenCAPI presentation as part of OpenPWOER Tutorial (20)

CAPI and OpenCAPI Hardware acceleration enablement
CAPI and OpenCAPI Hardware acceleration enablementCAPI and OpenCAPI Hardware acceleration enablement
CAPI and OpenCAPI Hardware acceleration enablement
 
Using a Field Programmable Gate Array to Accelerate Application Performance
Using a Field Programmable Gate Array to Accelerate Application PerformanceUsing a Field Programmable Gate Array to Accelerate Application Performance
Using a Field Programmable Gate Array to Accelerate Application Performance
 
OpenCAPI next generation accelerator
OpenCAPI next generation accelerator OpenCAPI next generation accelerator
OpenCAPI next generation accelerator
 
6 open capi_meetup_in_japan_final
6 open capi_meetup_in_japan_final6 open capi_meetup_in_japan_final
6 open capi_meetup_in_japan_final
 
OpenPOWER Acceleration of HPCC Systems
OpenPOWER Acceleration of HPCC SystemsOpenPOWER Acceleration of HPCC Systems
OpenPOWER Acceleration of HPCC Systems
 
00 opencapi acceleration framework yonglu_ver2
00 opencapi acceleration framework yonglu_ver200 opencapi acceleration framework yonglu_ver2
00 opencapi acceleration framework yonglu_ver2
 
Deep learning with FPGA
Deep learning with FPGADeep learning with FPGA
Deep learning with FPGA
 
FPGA MeetUp
FPGA MeetUpFPGA MeetUp
FPGA MeetUp
 
OpenCAPI Technology Ecosystem
OpenCAPI Technology EcosystemOpenCAPI Technology Ecosystem
OpenCAPI Technology Ecosystem
 
Heterogeneous Computing on POWER - IBM and OpenPOWER technologies to accelera...
Heterogeneous Computing on POWER - IBM and OpenPOWER technologies to accelera...Heterogeneous Computing on POWER - IBM and OpenPOWER technologies to accelera...
Heterogeneous Computing on POWER - IBM and OpenPOWER technologies to accelera...
 
Software Network Data Plane - Satisfying the need for speed - FD.io - VPP and...
Software Network Data Plane - Satisfying the need for speed - FD.io - VPP and...Software Network Data Plane - Satisfying the need for speed - FD.io - VPP and...
Software Network Data Plane - Satisfying the need for speed - FD.io - VPP and...
 
Using FPGA in Embedded Devices
Using FPGA in Embedded DevicesUsing FPGA in Embedded Devices
Using FPGA in Embedded Devices
 
FPGAs in the cloud? (October 2017)
FPGAs in the cloud? (October 2017)FPGAs in the cloud? (October 2017)
FPGAs in the cloud? (October 2017)
 
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
 
GTC15-Manoj-Roge-OpenPOWER
GTC15-Manoj-Roge-OpenPOWERGTC15-Manoj-Roge-OpenPOWER
GTC15-Manoj-Roge-OpenPOWER
 
Infrastructure et serveurs HP
Infrastructure et serveurs HPInfrastructure et serveurs HP
Infrastructure et serveurs HP
 
"Making Computer Vision Software Run Fast on Your Embedded Platform," a Prese...
"Making Computer Vision Software Run Fast on Your Embedded Platform," a Prese..."Making Computer Vision Software Run Fast on Your Embedded Platform," a Prese...
"Making Computer Vision Software Run Fast on Your Embedded Platform," a Prese...
 
Xilinx track g
Xilinx   track gXilinx   track g
Xilinx track g
 
Heterogeneous Computing : The Future of Systems
Heterogeneous Computing : The Future of SystemsHeterogeneous Computing : The Future of Systems
Heterogeneous Computing : The Future of Systems
 
ProjectVault[VivekKumar_CS-C_6Sem_MIT].pptx
ProjectVault[VivekKumar_CS-C_6Sem_MIT].pptxProjectVault[VivekKumar_CS-C_6Sem_MIT].pptx
ProjectVault[VivekKumar_CS-C_6Sem_MIT].pptx
 

More from Ganesan Narayanasamy

Chip Design Curriculum development Residency program
Chip Design Curriculum development Residency programChip Design Curriculum development Residency program
Chip Design Curriculum development Residency programGanesan Narayanasamy
 
Basics of Digital Design and Verilog
Basics of Digital Design and VerilogBasics of Digital Design and Verilog
Basics of Digital Design and VerilogGanesan Narayanasamy
 
180 nm Tape out experience using Open POWER ISA
180 nm Tape out experience using Open POWER ISA180 nm Tape out experience using Open POWER ISA
180 nm Tape out experience using Open POWER ISAGanesan Narayanasamy
 
Workload Transformation and Innovations in POWER Architecture
Workload Transformation and Innovations in POWER Architecture Workload Transformation and Innovations in POWER Architecture
Workload Transformation and Innovations in POWER Architecture Ganesan Narayanasamy
 
Deep Learning Use Cases using OpenPOWER systems
Deep Learning Use Cases using OpenPOWER systemsDeep Learning Use Cases using OpenPOWER systems
Deep Learning Use Cases using OpenPOWER systemsGanesan Narayanasamy
 
AI in healthcare and Automobile Industry using OpenPOWER/IBM POWER9 systems
AI in healthcare and Automobile Industry using OpenPOWER/IBM POWER9 systemsAI in healthcare and Automobile Industry using OpenPOWER/IBM POWER9 systems
AI in healthcare and Automobile Industry using OpenPOWER/IBM POWER9 systemsGanesan Narayanasamy
 
AI in Health Care using IBM Systems/OpenPOWER systems
AI in Health Care using IBM Systems/OpenPOWER systemsAI in Health Care using IBM Systems/OpenPOWER systems
AI in Health Care using IBM Systems/OpenPOWER systemsGanesan Narayanasamy
 
AI in Healh Care using IBM POWER systems
AI in Healh Care using IBM POWER systems AI in Healh Care using IBM POWER systems
AI in Healh Care using IBM POWER systems Ganesan Narayanasamy
 
Graphical Structure Learning accelerated with POWER9
Graphical Structure Learning accelerated with POWER9Graphical Structure Learning accelerated with POWER9
Graphical Structure Learning accelerated with POWER9Ganesan Narayanasamy
 
OpenPOWER Foundation Introduction
OpenPOWER Foundation Introduction OpenPOWER Foundation Introduction
OpenPOWER Foundation Introduction Ganesan Narayanasamy
 
Open Hardware and Future Computing
Open Hardware and Future ComputingOpen Hardware and Future Computing
Open Hardware and Future ComputingGanesan Narayanasamy
 

More from Ganesan Narayanasamy (20)

Chip Design Curriculum development Residency program
Chip Design Curriculum development Residency programChip Design Curriculum development Residency program
Chip Design Curriculum development Residency program
 
Basics of Digital Design and Verilog
Basics of Digital Design and VerilogBasics of Digital Design and Verilog
Basics of Digital Design and Verilog
 
180 nm Tape out experience using Open POWER ISA
180 nm Tape out experience using Open POWER ISA180 nm Tape out experience using Open POWER ISA
180 nm Tape out experience using Open POWER ISA
 
Workload Transformation and Innovations in POWER Architecture
Workload Transformation and Innovations in POWER Architecture Workload Transformation and Innovations in POWER Architecture
Workload Transformation and Innovations in POWER Architecture
 
OpenPOWER Workshop at IIT Roorkee
OpenPOWER Workshop at IIT RoorkeeOpenPOWER Workshop at IIT Roorkee
OpenPOWER Workshop at IIT Roorkee
 
Deep Learning Use Cases using OpenPOWER systems
Deep Learning Use Cases using OpenPOWER systemsDeep Learning Use Cases using OpenPOWER systems
Deep Learning Use Cases using OpenPOWER systems
 
IBM BOA for POWER
IBM BOA for POWER IBM BOA for POWER
IBM BOA for POWER
 
AI in healthcare and Automobile Industry using OpenPOWER/IBM POWER9 systems
AI in healthcare and Automobile Industry using OpenPOWER/IBM POWER9 systemsAI in healthcare and Automobile Industry using OpenPOWER/IBM POWER9 systems
AI in healthcare and Automobile Industry using OpenPOWER/IBM POWER9 systems
 
AI in healthcare - Use Cases
AI in healthcare - Use Cases AI in healthcare - Use Cases
AI in healthcare - Use Cases
 
AI in Health Care using IBM Systems/OpenPOWER systems
AI in Health Care using IBM Systems/OpenPOWER systemsAI in Health Care using IBM Systems/OpenPOWER systems
AI in Health Care using IBM Systems/OpenPOWER systems
 
AI in Healh Care using IBM POWER systems
AI in Healh Care using IBM POWER systems AI in Healh Care using IBM POWER systems
AI in Healh Care using IBM POWER systems
 
Poster from NUS
Poster from NUSPoster from NUS
Poster from NUS
 
SAP HANA on POWER9 systems
SAP HANA on POWER9 systemsSAP HANA on POWER9 systems
SAP HANA on POWER9 systems
 
Graphical Structure Learning accelerated with POWER9
Graphical Structure Learning accelerated with POWER9Graphical Structure Learning accelerated with POWER9
Graphical Structure Learning accelerated with POWER9
 
AI in the enterprise
AI in the enterprise AI in the enterprise
AI in the enterprise
 
Robustness in deep learning
Robustness in deep learningRobustness in deep learning
Robustness in deep learning
 
Perspectives of Frond end Design
Perspectives of Frond end DesignPerspectives of Frond end Design
Perspectives of Frond end Design
 
A2O Core implementation on FPGA
A2O Core implementation on FPGAA2O Core implementation on FPGA
A2O Core implementation on FPGA
 
OpenPOWER Foundation Introduction
OpenPOWER Foundation Introduction OpenPOWER Foundation Introduction
OpenPOWER Foundation Introduction
 
Open Hardware and Future Computing
Open Hardware and Future ComputingOpen Hardware and Future Computing
Open Hardware and Future Computing
 

Recently uploaded

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdfChristopherTHyatt
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfhans926745
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 

Recently uploaded (20)

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 

SCFE 2020 OpenCAPI presentation as part of OpenPWOER Tutorial

  • 1. IntroductionTo Acceleration with OpenCAPI SCFE 2020 - March 24th 2020 - A.CASTELLANE
  • 2. What Do You Need? 2 Out In
  • 3. Current Computing Landscape 3 CPU technology advances have slowed the historical cost/performance improvements seen over the last several decades => New CPU chips alone can not handle current challenges! Over Burdened CPUs Slow/Complex Algorithms & Functions + 100101010100011001 100110010010010010 101010001100110011 001001001001010101 000110011001100100 100100101010100011 001100110010010010 010101010001100110 011001001001001010 101000110011001100 100100100101010100 011001100110010010 01001010101 CPU CPU CPU CPU CPU DATA Network & Data Access Rates + Computation Data Access = Current Technology and Processing Overload! Bad News, it’s only going to get worse!
  • 4. Next Set Of Challenges Is Here! 4 Exponential Data Growth Compute Intensive Algorithms Diverse Data Structures & Types Decreasing Time To Results Hours .. Minutes .. Seconds .. Real Time Compute  AI, Machine / Deep Learning  Video Processing  Database / Big Data Analytics Storage  Scale-out Storage  Petabytes of new data  Intelligent / Compute SSDs Networking  Network Security  Low-latency Networking  Open vSwitch offload  Software Defined Networking Acceleration
  • 5. Next Challenges Affect All Computing Fields 5 Bank / Finance • Risk analysis / Faster trading: Monte Carlo libraries • Credit card fraud detection • Block chain acceleration Video / Analytics • Smart Video surveillance from multiple videos feed • 3D video stream from multi-angles videos streams • Image search / Object tracking / Scene recreation • Multi-jpeg compression Machine Learning / Deep learning • Machine learning inference • Accelerate frequently used ML / DL algorithm Algorithm acceleration • Compression on network path or storage • Encryption on the fly to various memory types • String match
  • 6. But, what if you could have the best of both worlds! Options: Software or Hardware? 6 • Software: • Advantages: • More rapid development leading to faster time to market • Lower non-recurring engineering costs. Software can be reused easily. • Heightened portability • Ease of updating features or patching bugs • Disadvantages: • Slower run time • Hardware • Advantages: • Much faster execution of functions • Reduced power consumption • Lower latency • Increased parallelism and bandwidth • Better utilization of area and functional components available on an integrated circuit (IC) • Disadvantages: • Lower ability to update designs once etched onto silicon • Difficult to share Verilog/VHDL source code between different hardware platforms • Higher costs of functional verification • Longer develop process and time to market
  • 7. So, what’s the solution? 7 The use of computer hardware specially designed to perform functions more efficiently than is possible in software alone running on a general-purpose CPU. Hardware Acceleration Thousands of tiny CPU using high parallelization  compute intensive application Field Programmable Gate Array Logic + IOs are customized exactly for the application's needs.  Very low and predictable latency applications Two Options GPU FPGA
  • 8. The Better Choice? 8 Due to the inherent logic and IO flexibility, speed, and predictably low latency, FPGAs have a clear advantage. FPGA Acceleration FPGA = Field Programmable Gate Array Historically programmed using Verilog/VHDL Compiled Mapped to FPGA HW Logic
  • 9. What is a FPGA? 9 • A re-programmable computer chip with lots of configurable logic elements based on Lookup-Tables (LUT) • Programmable switch matrix routing • Configurable I/O and high-speed serial links • Advantages in flexibility, speed, and low latency due to: • Limited instruction set • High parallelism • Deep pipelines Programmable switchLogical View Programmable logic element • Integrated Hard IP (Multiply/Add, SRAM, PLL, PCIe, Ethernet, DRAM,...) Field Programmable Gate Array
  • 10. FPGA Example (Bittware 250-SOC) 10 Bittware 250-SoC Multipurpose Converged Network / Storage • Xilinx Zync UltraScale+ FPGA ZU19EG (64 bits Cortex-A53 ARM core) • Two 4GB DDR4 (for FPGA and ARM) • PCIe Gen3 x16 / Gen4 x8  CAPI2 • Up to 4 x8 Oculink ports suporting NVMe, 100GbE and OpenCAPI • 2x 100GbE QSFP28 cages • Half Height - Half Length format
  • 11. Basics of HW Acceleration 11 Standard CPU Setup (No Acceleration) Host Memory Over burdened CPU Slow functions Congested memory and output card access CPU manages all data, memory access, functions, and flows With increased data, computing, storage, and network challenges Function Application
  • 12. Basics of HW Acceleration 12 Standard CPU Setup (No Acceleration) Host Memory CPU manages all data, memory access, functions, and flows  CPU manages all data, memory access, functions, and flows Over burdened CPU Slow functions Congested memory and output card access Application Function
  • 13. HW Acceleration with FPGA 13 Classic Acceleration with FPGA Host Memory Faster functions on FPGA Relieved function only from CPU burden CPU still handles FPGA memory access and data copying. No Data Coherency Standard CPU Setup (No Acceleration) Host Memory Historically programmed using Verilog/VHDL Function  CPU manages all data, memory access, functions, and flows Over burdened CPU Congested memory and output card access Slow functions ApplicationApplication Function
  • 14. HW Acceleration with FPGA 14 Standard CPU Setup (No Acceleration) Host Memory Classic Acceleration with FPGA Host Memory Function  CPU is used to manage FPGA memory access No Data Coherency (Host memory copied to FPGA) FPGA historically programmed using Verilog/VHDL CPU still handles all memory and data access  CPU manages all data, memory access, functions, and flows Over burdened CPU Congested memory and output card access Slow functions ApplicationApplication Function
  • 15. Addressing Classic FPGA Acceleration Issues 15 • What is OpenCAPI? • Open Coherent Accelerator Processor Interface • OpenCAPI is an open interface architecture that allows any microprocessor to attach to: • Coherent user-level accelerators and I/O devices • Advanced memories accessible via read/write or user-level direct memory access (DMA) semantics • Agnostic to processor architecture • What is OC-Accel? • OpenCAPI Acceleration Framework to program FPGAs using C/C++ instead of Verilog or VHDL OpenCAPI 3.0 OC 3.1 OpenCAPI specifications are downloadable from www.opencapi.org
  • 16. HW Acceleration with FPGA + OpenCAPI 16 Classic Acceleration with FPGA Host Memory Function Acceleration with FPGA + OpenCAPI Host Memory OpenCAPI  OpenCAPI IO interface on FPGA accesses host memory directly  Function accesses only needed host memory data  Data Coherency (Data does not need to be copied to FPGA)  Address translation (@function=@application)  FPGA programmed with C/C++ using OC-Accel Framework Function  CPU is used to manage FPGA memory access No Data Coherency (Host memory copied to FPGA) FPGA historically programmed using Verilog/VHDL CPU still handles all memory and data access ApplicationApplication
  • 17. • Hardware • Advantages: • Using FPGA instead of CPU • FPGA is function specific only • FPGA is fast + OpenCAPI direct memory access • FPGA can have parallel logic • FPGA uses function logic only • Disadvantages: • FPGA easily reconfigurable with C/C++ updates • C/C++ easily recompiled for different FPGAs • C/C++ code simulated and debugged • C/C++ code can be easier to write and upload • Software • Advantages: • App. Eng. Writing C/C++ functions (OC-Accel) • C/C++ code is reusable • C/C++ code is portable • FPGA reconfigurable with C/C++ updates • Disadvantages: • Function executed faster on FPGA + CPU relief • Software • Advantages: • More rapid development • Lower non-recurring engineering costs • Heightened portability • Ease of updating features or patching bugs • Disadvantages: • Slower run time FPGAs + OpenCAPI + OC-Accel Address All Issues 17 • Hardware • Advantages: • Much faster execution of functions • Reduced power consumption • Lower latency • Increased parallelism and bandwidth • Better IC area and function utilization • Disadvantages: • Lower ability to update design hardware • Difficult to share source code btw FPGAs • Higher costs of functional verification • Longer develop process and time to market
  • 18. Ex: Monte-Carlo (FPGA Accelerated) 18 Monte Carlo Analysis is a risk management technique used in the financial and insurance industries and is used for conducting a quantitative analysis of risks. By using CAPI with a FPGA, the C/C++ code was reduce by 40x on the application side and freed up 33% of memory and CPU (versus a non-CAPI FPGA ). Running 1 million iterations Results: At least 50x Faster with CAPI and FPGA technology on POWER server
  • 19. Ex: PostgreSQL regex Matching Accelerated 19 PostgreSQL + OpenCAPI shows compelling “regex” performance increase by leveraging the bandwidth and virtual addressing of OpenCAPI technology. In fact, accelerating the SQL with OpenCAPI-regex can be 4x to 10x faster than the best PostgreSQL built-in functions (CPU multi-threads enabled). PostgreSQL is a powerful, open source object-relational database system. SQL (Structured Query Language) is used to communicate with a database. Actual Example Single Search Run Times: • CPU parallel Seq Scan: ~698ms • Custom Scan (PFCAPI): ~161ms SELECT * FROM table WHERE pkt ~ pattern; Basically: search the db for all pkt that match pattern Command example
  • 20. Ex: Ultra Fast Data Acquisition (X-Ray Crystallography) 20 9GBps 1 4 4 MPixels @ 1.1kHz Digital Camera Sensors Raw Data Goal: Real-time mapping of biological structure by examining molecule scatter plots of protein crystal struck by x-rays 2 3 GPU PCIe GPU + PCIe Configuration (Today) Protein Molecule Mapped Real Image Raw data to real image conversion Decimate / sort images Data compression 1 Data acquisition 2 3 4 Compressed Data
  • 21. Ex: Ultra Fast Data Acquisition (X-Ray Crystallography) 21 22GBps 1 2 4 10 MPixels @ 2.2 kHz Digital Camera Sensors 22GBps Compressed Data FPGA w/ OpenCAPI (Goal) OpenCAPI3.0 22GBps Dual FPGAs In Parallel UnfilteredImage FilteredImage GPU or FPGA of both Host with NX-gzip Embedded HW Accelerator Raw Data 22GBps Image Data OpenCAPI breaks the 9GBps PCIe bottleneck! Protein Molecule Mapped Real Image Raw data to real image conversion Decimate / sort images Data compression Data acquisition 3 4 3 Goal: Real-time mapping of biological structure by examining molecule scatter plots of protein crystal struck by x-rays 1 2
  • 22. Ex: Pull Quote 22 The benefit of using POWER interfaces, i.e., NVLink and OpenCAPI, is not only bandwidth, but these interfaces allow also for coherent memory access. FPGA board connected via OpenCAPI or GPGPU connected via NVLink sees host (CPU) virtual memory space exactly like the process running on the CPU, reducing the burden of writing reliable and secure applications. Memory coherency can be also available for PCIe FPGA accelerators installed in POWER9 servers via OpenCAPI predecessor, the Coherent Accelerator Processor Interface (CAPI). IBM also provides optimized software to benefit from the architecture, including the CAPI Storage, Network, and Analytics Programming (SNAP) framework51,52 that simplifies the integration of FPGA designs with POWER9, as well as optimized ML and data analysis routines for GPGPUs or FPGAs.53 Structural Dynamics 7, 014305 (2020); https://doi.org/10.1063/1.5143480
  • 23. Ex: Memory Coherency 23 Scenario: 2MB data scattered in host memory are processed in a FPGA. « Classic » PCIe FPGA card Server Function Server « CAPI-enabled » FPGA card Function blk blk blk blk Gathering data (SW memcopy) 1 1 transaction of big amount of data to FPGA (2MB) 2 1 2 1 transaction of 8kB for AddrSet from host memory to FPGA 1 1024 transactions of 2kB from Host memory to FPGA. Directly reads required data at random address. 2 1 2 ApplicationApplication CAPI Results: CAPI-enabled was 2-3x faster than Classic method
  • 24. Ease of FPGA Programming (OC-Accel) 24 Benefits: • Faster Time To Market: Port a function to a FPGA in days not months • No Obsolescence: Simply recompile unchanged C/C++ code for different FPGA • No Link Constraint: Moving from a CAPI (over PCIe) link to OpenCAPI is just a matter of recompiling - no code change • No Specific Hardware Skills Needed: C/C++ coder can focus on functionality as all the resources are managed by the framework. • Open-Source Framework: The code can be modified, improved by any user. Example: • Note: SNAP is the predecessor to OC-Accel and overall flow and performance is equivalent. • Customer ported and optimized SHA3 C code within 10 days using SNAP* framework versus 4 months in VHDL without SNAP Development Plans: • OC-Accel with OpenCAPI today, OC-Accel with other emerging standards like CXL tomorrow!
  • 25. FPGAs + OpenCAPI + OC-Accel Has It All 25 Very high bandwidth Faster development and time to market with OC-Accel
  • 26. Developers Aren’t Where We Need Them Scripting Interpreted App (Python / Rails / Java) Non-Interpreted App (C++ / Java JRE) Procedural App (C / C++) High Level OS (C / C++) Firmware HW API (C, ASM) Kernel (C, AS) HDL Chart content courtesy of Aaron Sullivan @Rackspace Spreading the CAPI Love (OC-Accel) 26
  • 27. Interpreted App (Python / Rails / Java) Non-Interpreted App (C++ / Java JRE) Procedural App (C / C++) High Level OS (C / C++) Kernel (C, AS) HW API (C, ASM) Firmware Scripting HDL Application Application New Abstraction New Abstraction New Abstraction New Abstraction Soft-Hardware Soft-Hardware Soft-Hardware Spreading the CAPI Love (OC-Accel) Developers Where We Need Them Chart content courtesy of Aaron Sullivan @Rackspace 27
  • 28. - Know more about accelerators ? - See a live demonstration? - Do a benchmark ? - Get answers to your questions? Contact us alexandre.castellane@fr.ibm.com bruno.mesnet@fr.ibm.com fabrice_moyen@fr.ibm.com luyong@cn.ibm.com shgoupf@cn.ibm.com 28