SlideShare a Scribd company logo
Offloading Deep Dive
Efstathios Efstathiou
Agenda
Introduction
Definition of offloading (DB view)
Offloading techniques we can use
Demo-Time ☺
Findings
Q&A
Introduction
About me
Married
Linux since 1998
Oracle since 2000
OCM & OCP
Master Database Engineer @BIT since 2014
Definition of offloading (DB view)
In general:
«Everything, that saves resources on
the database server»
Definition of offloading (DB view)
Examples of offloading implementations
NIC (TCP/IP Offload, iSCSI Offload, Infiniband RDMA, NVMe)
Storage Adatapets (RAID Calculation, SCSI)
Math Co-Processors
FPGAs
DMA-Engines
Distributed Computing (e.g. using MPI)
Remote DB Engine (Hadoop Connector, Gluent)
Definition of offloading (DB view)
How is it done the Exadata?
Offloading via DMA-Engine of the Infiniband HCA
Enables Remote-DMA (RDMA) Operations (DB to Cell)
The storage cell can be acessed at near zero cpu cost
Latency of a DMA operation is higher than PIO via CPU therefore good for large
amounts of data e.g. DWH, but worse for OLTP
The task can be distributed
Order e.g. to execute a sub-query on a node via MPI-call and to transmit the start
or end memory address to the requester (DB server)
The DB server now only needs to merge the partial results.
The DB server is in this sense more acting as a client
Offloading techniques we can use
The following devices have a DMA engine:
RDMA-enabled network adapters and Infiniband cards
Intel IOATDMA chip on Xeon boards (for NVMe SSDs
PCIe switch cards
PLX-based NVMe controllers
Or the PCIe chip in your Intel Xeon computer ;-)
Lowest latency
Offloading techniques we can use
The following protocols have (R) DMA support:
iSCSI over RMDA
NFS over RDMA
NVMe over Fabrics (RDMA-based) or RDMA Block Device
Needs the least CPU
Good starting point
Offloading techniques we can use
Comparison (Native PCIe fabric vs. NVMe over Fabrics)
Native PCIe fabric has significantly less latency
Setup with PCIe-JBOF is less complex than NVMe over Fabrics
Throughput is identical
Offloading techniques we can use
That PCIe is quite cool… What other tricks can it do?
DMA-Engine like Infiniband
Connect multiple PCIe root complexes via Non-Transparent Bridge
Network protocol IPoPCIe analogous to IPoIB, but performs way better
Device Sharing via I / O Virtualization (SR-IOV, MR-IOV)
Offloading techniques we can use
How do we get the system really fast?
Answer: Memory!
The only question is:
Which memory?
Where is it located?
How is it structured?
Demo-Time ☺
Demo 1: Device Sharing
Description
Host 1 has a SR-IOV capable NIC
Host 1 initializes a Virtual Function
Through Non-Transparent Bridge
(NTB) Host 2 can access that
function by loading the device driver
for the NIC
https://www.youtube.com/watch?v=GPh0Ms3dfPo
Demo-Time ☺
Demo 1: Device Sharing
Expected behaviour
Works as designed ☺
Depending on the approach PCIe switch chip, there is device driver dependencies
Demo-Time ☺
Demo 2: DMA-Transfer
Description
Host 1 and Host2 are fitted with a
PCIe Switch based host card and
connected back to back
PLXSDK comes with a Sample
Program supporting PIO and DMA
transfer
We measure the overall throughput
and cpu load
https://www.youtube.com/watch?v=LNPBr3WvuNg
Demo-Time ☺
Demo 2: DMA-Transfer
Expected behaviour
Large data transfer benefits from DMA (DWH) ☺
Small, time critical transfers have less latency with PIO (OLTP)
You’ll need both modes
Demo-Time ☺
Demo 3: Fabric Attached Memory (PCIe) and Oracle RAC
Description
Database and Memory hosts are fitted
with a PCIe Switch based host card and
connected to a central PCIe Switch
Memory hosts’s physical DRAM is
expanded with OptaneGrid 3DXpoint
into an SDM Pool (mirrored via PCIe
NTB)
Database Servers expose a tiered
PMEM Device using local DRAM
(mirrored via PCIe NTB) and the remote
SDM Pool accessed over PCIe NTB)
ASM High Redudancy on top of PMEM
Devices with preferred mirror read and
device mapper path swapping
db0 db1 db2
mem0 mem1 mem2
SDM
DRAM
Optane
GRID
SDM
DRAM
Optane
GRID
SDM
DRAM
Optane
GRID
ASM
PMEM
DRAM
Expansion
PMEM
DRAM
Expansion
PMEM
DRAM
Expansion
PCIe Switch
RAC
NTB
Domain
Demo-Time ☺
Demo 3: Fabric Attached Memory (PCIe) and Oracle RAC
16 GB/s throughput per licensable core (4cores, 8 threads per db node)
85 % of native aggregated memory controller performance
Findings
Generic offloading is possible per se, but different than expected :
Fabric Attached Memory
Yes, the DB is running in memory (mirrored)
Question is:
In which server’s memory (local or remote)?
How do we acccess it (local memory extension or DMA call)?
How is it constructed (DRAM or Software Defined Memory)?
Using the right PCIe-Switch and storage module combination you
get it to work
Any PCIe-capable host can use Fabric Attached Memory per se
An OpenMCCA-compatible PCIe switch (PLX 9700) and high-performance M.2 SSDs
such as Optane Memory or fast NVMe modules are required
Q&A
Thanks to our supporters
Contact Information
elgreco@linux.com
Thanks

More Related Content

What's hot

SOUG IMDT Oracle In-Memory
SOUG IMDT Oracle In-MemorySOUG IMDT Oracle In-Memory
SOUG IMDT Oracle In-Memory
UniFabric
 
SOUG_GV_Flashgrid_V4
SOUG_GV_Flashgrid_V4SOUG_GV_Flashgrid_V4
SOUG_GV_Flashgrid_V4UniFabric
 
IMCSummit 2015 - Day 1 Developer Track - Evolution of non-volatile memory exp...
IMCSummit 2015 - Day 1 Developer Track - Evolution of non-volatile memory exp...IMCSummit 2015 - Day 1 Developer Track - Evolution of non-volatile memory exp...
IMCSummit 2015 - Day 1 Developer Track - Evolution of non-volatile memory exp...
In-Memory Computing Summit
 
IMCSummit 2015 - Day 1 Developer Session - The Science and Engineering Behind...
IMCSummit 2015 - Day 1 Developer Session - The Science and Engineering Behind...IMCSummit 2015 - Day 1 Developer Session - The Science and Engineering Behind...
IMCSummit 2015 - Day 1 Developer Session - The Science and Engineering Behind...
In-Memory Computing Summit
 
Webinar: What’s Your Path to NVMe?
Webinar: What’s Your Path to NVMe?Webinar: What’s Your Path to NVMe?
Webinar: What’s Your Path to NVMe?
Storage Switzerland
 
TDS-16489U - Dual Processor
TDS-16489U - Dual ProcessorTDS-16489U - Dual Processor
TDS-16489U - Dual Processor
Fernando Barrientos
 
Ceph Day San Jose - Ceph at Salesforce
Ceph Day San Jose - Ceph at Salesforce Ceph Day San Jose - Ceph at Salesforce
Ceph Day San Jose - Ceph at Salesforce
Ceph Community
 
Ceph Day Beijing - Ceph All-Flash Array Design Based on NUMA Architecture
Ceph Day Beijing - Ceph All-Flash Array Design Based on NUMA ArchitectureCeph Day Beijing - Ceph All-Flash Array Design Based on NUMA Architecture
Ceph Day Beijing - Ceph All-Flash Array Design Based on NUMA Architecture
Danielle Womboldt
 
Enterprise Storage NAS - Dual Controller
Enterprise Storage NAS - Dual ControllerEnterprise Storage NAS - Dual Controller
Enterprise Storage NAS - Dual Controller
Fernando Barrientos
 
Introduction to NVMe Over Fabrics-V3R
Introduction to NVMe Over Fabrics-V3RIntroduction to NVMe Over Fabrics-V3R
Introduction to NVMe Over Fabrics-V3RSimon Huang
 
Webinar: NVMe, NVMe over Fabrics and Beyond - Everything You Need to Know
Webinar: NVMe, NVMe over Fabrics and Beyond - Everything You Need to KnowWebinar: NVMe, NVMe over Fabrics and Beyond - Everything You Need to Know
Webinar: NVMe, NVMe over Fabrics and Beyond - Everything You Need to Know
Storage Switzerland
 
A Key-Value Store for Data Acquisition Systems
A Key-Value Store for Data Acquisition SystemsA Key-Value Store for Data Acquisition Systems
A Key-Value Store for Data Acquisition Systems
Intel® Software
 
Webinar: How NVMe Will Change Flash Storage
Webinar: How NVMe Will Change Flash StorageWebinar: How NVMe Will Change Flash Storage
Webinar: How NVMe Will Change Flash Storage
Storage Switzerland
 
Ceph Day San Jose - All-Flahs Ceph on NUMA-Balanced Server
Ceph Day San Jose - All-Flahs Ceph on NUMA-Balanced Server Ceph Day San Jose - All-Flahs Ceph on NUMA-Balanced Server
Ceph Day San Jose - All-Flahs Ceph on NUMA-Balanced Server
Ceph Community
 
Ceph Day Beijing - Ceph RDMA Update
Ceph Day Beijing - Ceph RDMA UpdateCeph Day Beijing - Ceph RDMA Update
Ceph Day Beijing - Ceph RDMA Update
Danielle Womboldt
 
Ceph Day Beijing - Storage Modernization with Intel and Ceph
Ceph Day Beijing - Storage Modernization with Intel and CephCeph Day Beijing - Storage Modernization with Intel and Ceph
Ceph Day Beijing - Storage Modernization with Intel and Ceph
Danielle Womboldt
 
Ceph Day KL - Delivering cost-effective, high performance Ceph cluster
Ceph Day KL - Delivering cost-effective, high performance Ceph clusterCeph Day KL - Delivering cost-effective, high performance Ceph cluster
Ceph Day KL - Delivering cost-effective, high performance Ceph cluster
Ceph Community
 
Disrupt the Storage & Memory Hierarchy
Disrupt the Storage & Memory HierarchyDisrupt the Storage & Memory Hierarchy
Disrupt the Storage & Memory Hierarchy
Intel® Software
 
Bridging Big - Small, Fast - Slow with Campaign Storage
Bridging Big - Small, Fast - Slow with Campaign StorageBridging Big - Small, Fast - Slow with Campaign Storage
Bridging Big - Small, Fast - Slow with Campaign Storage
inside-BigData.com
 
OWF14 - Plenary Session : Thibaud Besson, IBM POWER Systems Specialist
OWF14 - Plenary Session : Thibaud Besson, IBM POWER Systems SpecialistOWF14 - Plenary Session : Thibaud Besson, IBM POWER Systems Specialist
OWF14 - Plenary Session : Thibaud Besson, IBM POWER Systems Specialist
Paris Open Source Summit
 

What's hot (20)

SOUG IMDT Oracle In-Memory
SOUG IMDT Oracle In-MemorySOUG IMDT Oracle In-Memory
SOUG IMDT Oracle In-Memory
 
SOUG_GV_Flashgrid_V4
SOUG_GV_Flashgrid_V4SOUG_GV_Flashgrid_V4
SOUG_GV_Flashgrid_V4
 
IMCSummit 2015 - Day 1 Developer Track - Evolution of non-volatile memory exp...
IMCSummit 2015 - Day 1 Developer Track - Evolution of non-volatile memory exp...IMCSummit 2015 - Day 1 Developer Track - Evolution of non-volatile memory exp...
IMCSummit 2015 - Day 1 Developer Track - Evolution of non-volatile memory exp...
 
IMCSummit 2015 - Day 1 Developer Session - The Science and Engineering Behind...
IMCSummit 2015 - Day 1 Developer Session - The Science and Engineering Behind...IMCSummit 2015 - Day 1 Developer Session - The Science and Engineering Behind...
IMCSummit 2015 - Day 1 Developer Session - The Science and Engineering Behind...
 
Webinar: What’s Your Path to NVMe?
Webinar: What’s Your Path to NVMe?Webinar: What’s Your Path to NVMe?
Webinar: What’s Your Path to NVMe?
 
TDS-16489U - Dual Processor
TDS-16489U - Dual ProcessorTDS-16489U - Dual Processor
TDS-16489U - Dual Processor
 
Ceph Day San Jose - Ceph at Salesforce
Ceph Day San Jose - Ceph at Salesforce Ceph Day San Jose - Ceph at Salesforce
Ceph Day San Jose - Ceph at Salesforce
 
Ceph Day Beijing - Ceph All-Flash Array Design Based on NUMA Architecture
Ceph Day Beijing - Ceph All-Flash Array Design Based on NUMA ArchitectureCeph Day Beijing - Ceph All-Flash Array Design Based on NUMA Architecture
Ceph Day Beijing - Ceph All-Flash Array Design Based on NUMA Architecture
 
Enterprise Storage NAS - Dual Controller
Enterprise Storage NAS - Dual ControllerEnterprise Storage NAS - Dual Controller
Enterprise Storage NAS - Dual Controller
 
Introduction to NVMe Over Fabrics-V3R
Introduction to NVMe Over Fabrics-V3RIntroduction to NVMe Over Fabrics-V3R
Introduction to NVMe Over Fabrics-V3R
 
Webinar: NVMe, NVMe over Fabrics and Beyond - Everything You Need to Know
Webinar: NVMe, NVMe over Fabrics and Beyond - Everything You Need to KnowWebinar: NVMe, NVMe over Fabrics and Beyond - Everything You Need to Know
Webinar: NVMe, NVMe over Fabrics and Beyond - Everything You Need to Know
 
A Key-Value Store for Data Acquisition Systems
A Key-Value Store for Data Acquisition SystemsA Key-Value Store for Data Acquisition Systems
A Key-Value Store for Data Acquisition Systems
 
Webinar: How NVMe Will Change Flash Storage
Webinar: How NVMe Will Change Flash StorageWebinar: How NVMe Will Change Flash Storage
Webinar: How NVMe Will Change Flash Storage
 
Ceph Day San Jose - All-Flahs Ceph on NUMA-Balanced Server
Ceph Day San Jose - All-Flahs Ceph on NUMA-Balanced Server Ceph Day San Jose - All-Flahs Ceph on NUMA-Balanced Server
Ceph Day San Jose - All-Flahs Ceph on NUMA-Balanced Server
 
Ceph Day Beijing - Ceph RDMA Update
Ceph Day Beijing - Ceph RDMA UpdateCeph Day Beijing - Ceph RDMA Update
Ceph Day Beijing - Ceph RDMA Update
 
Ceph Day Beijing - Storage Modernization with Intel and Ceph
Ceph Day Beijing - Storage Modernization with Intel and CephCeph Day Beijing - Storage Modernization with Intel and Ceph
Ceph Day Beijing - Storage Modernization with Intel and Ceph
 
Ceph Day KL - Delivering cost-effective, high performance Ceph cluster
Ceph Day KL - Delivering cost-effective, high performance Ceph clusterCeph Day KL - Delivering cost-effective, high performance Ceph cluster
Ceph Day KL - Delivering cost-effective, high performance Ceph cluster
 
Disrupt the Storage & Memory Hierarchy
Disrupt the Storage & Memory HierarchyDisrupt the Storage & Memory Hierarchy
Disrupt the Storage & Memory Hierarchy
 
Bridging Big - Small, Fast - Slow with Campaign Storage
Bridging Big - Small, Fast - Slow with Campaign StorageBridging Big - Small, Fast - Slow with Campaign Storage
Bridging Big - Small, Fast - Slow with Campaign Storage
 
OWF14 - Plenary Session : Thibaud Besson, IBM POWER Systems Specialist
OWF14 - Plenary Session : Thibaud Besson, IBM POWER Systems SpecialistOWF14 - Plenary Session : Thibaud Besson, IBM POWER Systems Specialist
OWF14 - Plenary Session : Thibaud Besson, IBM POWER Systems Specialist
 

Similar to Offloading for Databases - Deep Dive

Accelerating Ceph with iWARP RDMA over Ethernet - Brien Porter, Haodong Tang
Accelerating Ceph with iWARP RDMA over Ethernet - Brien Porter, Haodong TangAccelerating Ceph with iWARP RDMA over Ethernet - Brien Porter, Haodong Tang
Accelerating Ceph with iWARP RDMA over Ethernet - Brien Porter, Haodong Tang
Ceph Community
 
CETH for XDP [Linux Meetup Santa Clara | July 2016]
CETH for XDP [Linux Meetup Santa Clara | July 2016] CETH for XDP [Linux Meetup Santa Clara | July 2016]
CETH for XDP [Linux Meetup Santa Clara | July 2016]
IO Visor Project
 
Realizing Exabyte-scale PM Centric Architectures and Memory Fabrics
Realizing Exabyte-scale PM Centric Architectures and Memory FabricsRealizing Exabyte-scale PM Centric Architectures and Memory Fabrics
Realizing Exabyte-scale PM Centric Architectures and Memory Fabrics
inside-BigData.com
 
Технологии работы с дисковыми хранилищами и файловыми системами Windows Serve...
Технологии работы с дисковыми хранилищами и файловыми системами Windows Serve...Технологии работы с дисковыми хранилищами и файловыми системами Windows Serve...
Технологии работы с дисковыми хранилищами и файловыми системами Windows Serve...
Виталий Стародубцев
 
Optimized HPC/AI cloud with OpenStack acceleration service and composable har...
Optimized HPC/AI cloud with OpenStack acceleration service and composable har...Optimized HPC/AI cloud with OpenStack acceleration service and composable har...
Optimized HPC/AI cloud with OpenStack acceleration service and composable har...
Shuquan Huang
 
Caching methodology and strategies
Caching methodology and strategiesCaching methodology and strategies
Caching methodology and strategies
Tiep Vu
 
Caching Methodology & Strategies
Caching Methodology & StrategiesCaching Methodology & Strategies
Caching Methodology & Strategies
Tiệp Vũ
 
openSUSE storage workshop 2016
openSUSE storage workshop 2016openSUSE storage workshop 2016
openSUSE storage workshop 2016
Alex Lau
 
Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...
Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...
Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...
Databricks
 
SAN BASICS..Why we will go for SAN?
SAN BASICS..Why we will go for SAN?SAN BASICS..Why we will go for SAN?
SAN BASICS..Why we will go for SAN?
Saroj Sahu
 
DPDK Summit 2015 - Aspera - Charles Shiflett
DPDK Summit 2015 - Aspera - Charles ShiflettDPDK Summit 2015 - Aspera - Charles Shiflett
DPDK Summit 2015 - Aspera - Charles Shiflett
Jim St. Leger
 
Steen_Dissertation_March5
Steen_Dissertation_March5Steen_Dissertation_March5
Steen_Dissertation_March5Steen Larsen
 
NVMe over Fabric
NVMe over FabricNVMe over Fabric
NVMe over Fabric
singh.gurjeet
 
20181210 - PGconf.ASIA Unconference
20181210 - PGconf.ASIA Unconference20181210 - PGconf.ASIA Unconference
20181210 - PGconf.ASIA Unconference
Kohei KaiGai
 
6 open capi_meetup_in_japan_final
6 open capi_meetup_in_japan_final6 open capi_meetup_in_japan_final
6 open capi_meetup_in_japan_final
Yutaka Kawai
 
O que há de novo na plataforma x86 para High Performance por Jefferson de A S...
O que há de novo na plataforma x86 para High Performance por Jefferson de A S...O que há de novo na plataforma x86 para High Performance por Jefferson de A S...
O que há de novo na plataforma x86 para High Performance por Jefferson de A S...Joao Galdino Mello de Souza
 
100 M pps on PC.
100 M pps on PC.100 M pps on PC.
100 M pps on PC.
Redge Technologies
 
Crimson: Ceph for the Age of NVMe and Persistent Memory
Crimson: Ceph for the Age of NVMe and Persistent MemoryCrimson: Ceph for the Age of NVMe and Persistent Memory
Crimson: Ceph for the Age of NVMe and Persistent Memory
ScyllaDB
 
Mellanox Storage Solutions
Mellanox Storage SolutionsMellanox Storage Solutions
Mellanox Storage Solutions
Mellanox Technologies
 
Big Data Glossary of terms
Big Data Glossary of termsBig Data Glossary of terms
Big Data Glossary of terms
Kognitio
 

Similar to Offloading for Databases - Deep Dive (20)

Accelerating Ceph with iWARP RDMA over Ethernet - Brien Porter, Haodong Tang
Accelerating Ceph with iWARP RDMA over Ethernet - Brien Porter, Haodong TangAccelerating Ceph with iWARP RDMA over Ethernet - Brien Porter, Haodong Tang
Accelerating Ceph with iWARP RDMA over Ethernet - Brien Porter, Haodong Tang
 
CETH for XDP [Linux Meetup Santa Clara | July 2016]
CETH for XDP [Linux Meetup Santa Clara | July 2016] CETH for XDP [Linux Meetup Santa Clara | July 2016]
CETH for XDP [Linux Meetup Santa Clara | July 2016]
 
Realizing Exabyte-scale PM Centric Architectures and Memory Fabrics
Realizing Exabyte-scale PM Centric Architectures and Memory FabricsRealizing Exabyte-scale PM Centric Architectures and Memory Fabrics
Realizing Exabyte-scale PM Centric Architectures and Memory Fabrics
 
Технологии работы с дисковыми хранилищами и файловыми системами Windows Serve...
Технологии работы с дисковыми хранилищами и файловыми системами Windows Serve...Технологии работы с дисковыми хранилищами и файловыми системами Windows Serve...
Технологии работы с дисковыми хранилищами и файловыми системами Windows Serve...
 
Optimized HPC/AI cloud with OpenStack acceleration service and composable har...
Optimized HPC/AI cloud with OpenStack acceleration service and composable har...Optimized HPC/AI cloud with OpenStack acceleration service and composable har...
Optimized HPC/AI cloud with OpenStack acceleration service and composable har...
 
Caching methodology and strategies
Caching methodology and strategiesCaching methodology and strategies
Caching methodology and strategies
 
Caching Methodology & Strategies
Caching Methodology & StrategiesCaching Methodology & Strategies
Caching Methodology & Strategies
 
openSUSE storage workshop 2016
openSUSE storage workshop 2016openSUSE storage workshop 2016
openSUSE storage workshop 2016
 
Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...
Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...
Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...
 
SAN BASICS..Why we will go for SAN?
SAN BASICS..Why we will go for SAN?SAN BASICS..Why we will go for SAN?
SAN BASICS..Why we will go for SAN?
 
DPDK Summit 2015 - Aspera - Charles Shiflett
DPDK Summit 2015 - Aspera - Charles ShiflettDPDK Summit 2015 - Aspera - Charles Shiflett
DPDK Summit 2015 - Aspera - Charles Shiflett
 
Steen_Dissertation_March5
Steen_Dissertation_March5Steen_Dissertation_March5
Steen_Dissertation_March5
 
NVMe over Fabric
NVMe over FabricNVMe over Fabric
NVMe over Fabric
 
20181210 - PGconf.ASIA Unconference
20181210 - PGconf.ASIA Unconference20181210 - PGconf.ASIA Unconference
20181210 - PGconf.ASIA Unconference
 
6 open capi_meetup_in_japan_final
6 open capi_meetup_in_japan_final6 open capi_meetup_in_japan_final
6 open capi_meetup_in_japan_final
 
O que há de novo na plataforma x86 para High Performance por Jefferson de A S...
O que há de novo na plataforma x86 para High Performance por Jefferson de A S...O que há de novo na plataforma x86 para High Performance por Jefferson de A S...
O que há de novo na plataforma x86 para High Performance por Jefferson de A S...
 
100 M pps on PC.
100 M pps on PC.100 M pps on PC.
100 M pps on PC.
 
Crimson: Ceph for the Age of NVMe and Persistent Memory
Crimson: Ceph for the Age of NVMe and Persistent MemoryCrimson: Ceph for the Age of NVMe and Persistent Memory
Crimson: Ceph for the Age of NVMe and Persistent Memory
 
Mellanox Storage Solutions
Mellanox Storage SolutionsMellanox Storage Solutions
Mellanox Storage Solutions
 
Big Data Glossary of terms
Big Data Glossary of termsBig Data Glossary of terms
Big Data Glossary of terms
 

Recently uploaded

Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfEnhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
GetInData
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
g4dpvqap0
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
rwarrenll
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
ahzuo
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
74nqk8xf
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
Subhajit Sahu
 
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
dwreak4tg
 
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
mzpolocfi
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
roli9797
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
TravisMalana
 
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptxData_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
AnirbanRoy608946
 
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
u86oixdj
 
Nanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdfNanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdf
eddie19851
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
AbhimanyuSinha9
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
slg6lamcq
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
u86oixdj
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
Roger Valdez
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
jerlynmaetalle
 
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTESAdjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
Subhajit Sahu
 

Recently uploaded (20)

Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfEnhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
 
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
 
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
 
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptxData_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
 
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
 
Nanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdfNanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdf
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
 
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTESAdjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
 

Offloading for Databases - Deep Dive

  • 2. Agenda Introduction Definition of offloading (DB view) Offloading techniques we can use Demo-Time ☺ Findings Q&A
  • 3. Introduction About me Married Linux since 1998 Oracle since 2000 OCM & OCP Master Database Engineer @BIT since 2014
  • 4. Definition of offloading (DB view) In general: «Everything, that saves resources on the database server»
  • 5. Definition of offloading (DB view) Examples of offloading implementations NIC (TCP/IP Offload, iSCSI Offload, Infiniband RDMA, NVMe) Storage Adatapets (RAID Calculation, SCSI) Math Co-Processors FPGAs DMA-Engines Distributed Computing (e.g. using MPI) Remote DB Engine (Hadoop Connector, Gluent)
  • 6. Definition of offloading (DB view) How is it done the Exadata? Offloading via DMA-Engine of the Infiniband HCA Enables Remote-DMA (RDMA) Operations (DB to Cell) The storage cell can be acessed at near zero cpu cost Latency of a DMA operation is higher than PIO via CPU therefore good for large amounts of data e.g. DWH, but worse for OLTP The task can be distributed Order e.g. to execute a sub-query on a node via MPI-call and to transmit the start or end memory address to the requester (DB server) The DB server now only needs to merge the partial results. The DB server is in this sense more acting as a client
  • 7. Offloading techniques we can use The following devices have a DMA engine: RDMA-enabled network adapters and Infiniband cards Intel IOATDMA chip on Xeon boards (for NVMe SSDs PCIe switch cards PLX-based NVMe controllers Or the PCIe chip in your Intel Xeon computer ;-) Lowest latency
  • 8. Offloading techniques we can use The following protocols have (R) DMA support: iSCSI over RMDA NFS over RDMA NVMe over Fabrics (RDMA-based) or RDMA Block Device Needs the least CPU Good starting point
  • 9. Offloading techniques we can use Comparison (Native PCIe fabric vs. NVMe over Fabrics) Native PCIe fabric has significantly less latency Setup with PCIe-JBOF is less complex than NVMe over Fabrics Throughput is identical
  • 10. Offloading techniques we can use That PCIe is quite cool… What other tricks can it do? DMA-Engine like Infiniband Connect multiple PCIe root complexes via Non-Transparent Bridge Network protocol IPoPCIe analogous to IPoIB, but performs way better Device Sharing via I / O Virtualization (SR-IOV, MR-IOV)
  • 11. Offloading techniques we can use How do we get the system really fast? Answer: Memory! The only question is: Which memory? Where is it located? How is it structured?
  • 12. Demo-Time ☺ Demo 1: Device Sharing Description Host 1 has a SR-IOV capable NIC Host 1 initializes a Virtual Function Through Non-Transparent Bridge (NTB) Host 2 can access that function by loading the device driver for the NIC https://www.youtube.com/watch?v=GPh0Ms3dfPo
  • 13. Demo-Time ☺ Demo 1: Device Sharing Expected behaviour Works as designed ☺ Depending on the approach PCIe switch chip, there is device driver dependencies
  • 14. Demo-Time ☺ Demo 2: DMA-Transfer Description Host 1 and Host2 are fitted with a PCIe Switch based host card and connected back to back PLXSDK comes with a Sample Program supporting PIO and DMA transfer We measure the overall throughput and cpu load https://www.youtube.com/watch?v=LNPBr3WvuNg
  • 15. Demo-Time ☺ Demo 2: DMA-Transfer Expected behaviour Large data transfer benefits from DMA (DWH) ☺ Small, time critical transfers have less latency with PIO (OLTP) You’ll need both modes
  • 16. Demo-Time ☺ Demo 3: Fabric Attached Memory (PCIe) and Oracle RAC Description Database and Memory hosts are fitted with a PCIe Switch based host card and connected to a central PCIe Switch Memory hosts’s physical DRAM is expanded with OptaneGrid 3DXpoint into an SDM Pool (mirrored via PCIe NTB) Database Servers expose a tiered PMEM Device using local DRAM (mirrored via PCIe NTB) and the remote SDM Pool accessed over PCIe NTB) ASM High Redudancy on top of PMEM Devices with preferred mirror read and device mapper path swapping db0 db1 db2 mem0 mem1 mem2 SDM DRAM Optane GRID SDM DRAM Optane GRID SDM DRAM Optane GRID ASM PMEM DRAM Expansion PMEM DRAM Expansion PMEM DRAM Expansion PCIe Switch RAC NTB Domain
  • 17. Demo-Time ☺ Demo 3: Fabric Attached Memory (PCIe) and Oracle RAC 16 GB/s throughput per licensable core (4cores, 8 threads per db node) 85 % of native aggregated memory controller performance
  • 18. Findings Generic offloading is possible per se, but different than expected : Fabric Attached Memory Yes, the DB is running in memory (mirrored) Question is: In which server’s memory (local or remote)? How do we acccess it (local memory extension or DMA call)? How is it constructed (DRAM or Software Defined Memory)? Using the right PCIe-Switch and storage module combination you get it to work Any PCIe-capable host can use Fabric Attached Memory per se An OpenMCCA-compatible PCIe switch (PLX 9700) and high-performance M.2 SSDs such as Optane Memory or fast NVMe modules are required
  • 19. Q&A
  • 20. Thanks to our supporters