A deep five into offloading techniques for Oracle database servers, that takes both hardware and software solution into consideration. The focus is clearly to boost the efficiency of your already paid licenses.
4. Definition of offloading (DB view)
In general:
«Everything, that saves resources on
the database server»
5. Definition of offloading (DB view)
Examples of offloading implementations
NIC (TCP/IP Offload, iSCSI Offload, Infiniband RDMA, NVMe)
Storage Adatapets (RAID Calculation, SCSI)
Math Co-Processors
FPGAs
DMA-Engines
Distributed Computing (e.g. using MPI)
Remote DB Engine (Hadoop Connector, Gluent)
6. Definition of offloading (DB view)
How is it done the Exadata?
Offloading via DMA-Engine of the Infiniband HCA
Enables Remote-DMA (RDMA) Operations (DB to Cell)
The storage cell can be acessed at near zero cpu cost
Latency of a DMA operation is higher than PIO via CPU therefore good for large
amounts of data e.g. DWH, but worse for OLTP
The task can be distributed
Order e.g. to execute a sub-query on a node via MPI-call and to transmit the start
or end memory address to the requester (DB server)
The DB server now only needs to merge the partial results.
The DB server is in this sense more acting as a client
7. Offloading techniques we can use
The following devices have a DMA engine:
RDMA-enabled network adapters and Infiniband cards
Intel IOATDMA chip on Xeon boards (for NVMe SSDs
PCIe switch cards
PLX-based NVMe controllers
Or the PCIe chip in your Intel Xeon computer ;-)
Lowest latency
8. Offloading techniques we can use
The following protocols have (R) DMA support:
iSCSI over RMDA
NFS over RDMA
NVMe over Fabrics (RDMA-based) or RDMA Block Device
Needs the least CPU
Good starting point
9. Offloading techniques we can use
Comparison (Native PCIe fabric vs. NVMe over Fabrics)
Native PCIe fabric has significantly less latency
Setup with PCIe-JBOF is less complex than NVMe over Fabrics
Throughput is identical
10. Offloading techniques we can use
That PCIe is quite cool… What other tricks can it do?
DMA-Engine like Infiniband
Connect multiple PCIe root complexes via Non-Transparent Bridge
Network protocol IPoPCIe analogous to IPoIB, but performs way better
Device Sharing via I / O Virtualization (SR-IOV, MR-IOV)
11. Offloading techniques we can use
How do we get the system really fast?
Answer: Memory!
The only question is:
Which memory?
Where is it located?
How is it structured?
12. Demo-Time ☺
Demo 1: Device Sharing
Description
Host 1 has a SR-IOV capable NIC
Host 1 initializes a Virtual Function
Through Non-Transparent Bridge
(NTB) Host 2 can access that
function by loading the device driver
for the NIC
https://www.youtube.com/watch?v=GPh0Ms3dfPo
13. Demo-Time ☺
Demo 1: Device Sharing
Expected behaviour
Works as designed ☺
Depending on the approach PCIe switch chip, there is device driver dependencies
14. Demo-Time ☺
Demo 2: DMA-Transfer
Description
Host 1 and Host2 are fitted with a
PCIe Switch based host card and
connected back to back
PLXSDK comes with a Sample
Program supporting PIO and DMA
transfer
We measure the overall throughput
and cpu load
https://www.youtube.com/watch?v=LNPBr3WvuNg
15. Demo-Time ☺
Demo 2: DMA-Transfer
Expected behaviour
Large data transfer benefits from DMA (DWH) ☺
Small, time critical transfers have less latency with PIO (OLTP)
You’ll need both modes
16. Demo-Time ☺
Demo 3: Fabric Attached Memory (PCIe) and Oracle RAC
Description
Database and Memory hosts are fitted
with a PCIe Switch based host card and
connected to a central PCIe Switch
Memory hosts’s physical DRAM is
expanded with OptaneGrid 3DXpoint
into an SDM Pool (mirrored via PCIe
NTB)
Database Servers expose a tiered
PMEM Device using local DRAM
(mirrored via PCIe NTB) and the remote
SDM Pool accessed over PCIe NTB)
ASM High Redudancy on top of PMEM
Devices with preferred mirror read and
device mapper path swapping
db0 db1 db2
mem0 mem1 mem2
SDM
DRAM
Optane
GRID
SDM
DRAM
Optane
GRID
SDM
DRAM
Optane
GRID
ASM
PMEM
DRAM
Expansion
PMEM
DRAM
Expansion
PMEM
DRAM
Expansion
PCIe Switch
RAC
NTB
Domain
17. Demo-Time ☺
Demo 3: Fabric Attached Memory (PCIe) and Oracle RAC
16 GB/s throughput per licensable core (4cores, 8 threads per db node)
85 % of native aggregated memory controller performance
18. Findings
Generic offloading is possible per se, but different than expected :
Fabric Attached Memory
Yes, the DB is running in memory (mirrored)
Question is:
In which server’s memory (local or remote)?
How do we acccess it (local memory extension or DMA call)?
How is it constructed (DRAM or Software Defined Memory)?
Using the right PCIe-Switch and storage module combination you
get it to work
Any PCIe-capable host can use Fabric Attached Memory per se
An OpenMCCA-compatible PCIe switch (PLX 9700) and high-performance M.2 SSDs
such as Optane Memory or fast NVMe modules are required