SlideShare a Scribd company logo
1 of 10
Download to read offline
CETH for XDP
Common Ethernet Driver Framework
for faster network I/O
Yan Chen(Y.Chen@Huawei.com)
Yunsong Lu (Yunsong.Lu@Huawei.com)
Leveraging IO Visor
• Performance Tuning
• Tracing
• Networking for Container: Dynamic E2E Monitoring
• Cloud Native NFV: Micro Data Path Container(MDPC)
• http://www.slideshare.net/IOVisor/evolving-virtual-networking-with-
io-visor-openstack-summit-austin-april-2016
Express I/O for XDP
• Kernel Network I/O has been a performance bottleneck
• Netmap and DPDK claimed 10x performance advantage 
• Bypass is not low-hanging fruit
• Could rebuilding EVERYTHING in userspace really do better?
• Unless all bottlenecks are removed, it’s still a long way to go
• Kernel is the place for better driver/platform eco-system
• Multi-vendor NICs and accelerators
• X86, ARM, Power, SPARC, etc.
• Programmability of XDP will enable innovation in “Network
Functional Applications”
History of CETH (Common Ethernet Driver Framework)
Designed for Performance and Virtualization:
1. Improve kernel networking performance for
virtualization, particularly vSwitch and virtual I/O
2. Simplify NIC drivers by consolidate common
functions, particularly for “internal” new NICs
accelerators
3. Standalone module for various kernel versions
Supports:
• Huawei’s EVS(Elastic Virtual Switch)
• NICs:
• Intel ixgbe
• Intel i40e (40G)
• Broadcom bnx2x
• Mellanox mlnx-en
• Emulex be2net
• Accelerators:
• Huawei SNP-lite
• Broadcom XLP
• Ezchip Gx36
• Huawei VDR
• vNIC:
• ctap(tap+vhost)
• virtio-net
• ceth-pair
Design Considerations (before XDP)
1. Efficient Memory/Buffer Management
o Pre-allocated packet buffer pool
o Efficient buffer acquire/recycle mechanism
o Data Prefetching
o Batching packet process
o Optimized for efficient cache usage
o Locking reduction/avoidance
o High performance copy
o Reduction of DMA mapping
o Huge pages, etc.
2. Flexible TX/RX Scheduling
o Threaded_irq
o All-in-interrupt handling
o Optional R2C or Pipeline Threading models
o Feature-triggered mode switching
3. Customizable Meta-data structure
o Cache-friendly data structure
o Hardware/accelerator friendly
o Extensible Metadata format is customizable
o SKB compatible
4. Compatible with Kernel IP stack
o Hardware Offloading friendly
o Checksum, VLAN, etc.
o TSO/GSO, LRO/GRO
o Easy to port existing Linux device drivers
o Reuse most existing non-datapath functions
o Guild for easy driver porting
5. Tools for easy performance tuning
o “ceth” tool to tune all parameters
o sysfs interfaces
Simplified CETH for XDP
1. Efficient Memory/Buffer Management
o Pre-allocated packet buffer pool
o Efficient buffer acquire/recycle mechanism
o Data Prefetching
o Batching packet process
o Optimized for efficient cache usage
o Locking reduction/avoidance
o High performance copy
o Reduction of DMA mapping
o Huge pages, etc.
2. Flexible TX/RX Scheduling
o Threaded_irq
o All-in-interrupt handling
o Optional R2C or Pipeline Threading models
o Feature-triggered mode switching
3. Customizable Meta-data structure
o Cache-friendly data structure
o Hardware/accelerator friendly
o Extensible Metadata format is customizable
o SKB compatible
4. Compatible with Kernel IP stack
o Hardware Offloading friendly
o Checksum, VLAN, etc.
o TSO/GSO, LRO/GRO
o Easy to port existing Linux device drivers
o Easy driver porting: less than 200LOC/driver
5. Tools for easy performance tuning
o “ceth” tool to tune all parameters
o Sysfs interfaces
Simple interfaces for drivers
• New Functions (CETH module)
o ceth_pkt_aquire()
o ceth_pkt_recycle()
o ceth_pkt_to_skb()
• Kernel modification
o __kfree_skb()
• Driver modifications
• allocate buffers from CETH
• optional: use pkt_t by default
• optimize the driver! 
• Performance
 30% performance improvement for packet
switching (br, ovs)
 40% of pktgen performance
 100% improvement for XDP forwarding
 33Mpps XDP dropping rate with 2 CPU threads
 Scalable with multiple hardware queues
Patch available based on latest XDP kernel tree.
Preliminary Performance numbers:
https://docs.google.com/spreadsheets/d/1nT0DO25lfS1QpB
LQkdIMm4LJl1v_VMScZVSOcRgkQOI/edit#gid=0
NOTE: all numbers were internally tested for development
purpose only.
Memory and Buffer Management
• Separate memory
management layer for
various optimizations,
like huge page
• Per-CPU or per-queue
buffer pool mechanisms
• May use skb by default
(pkt_t as buffer data
structure only)
• Can use non-skb meta-
data cross all XDP
functions
Packet Management
for XDP and protocol stack
driver
Buffer ManagementMemory Management
RX queue
RX queue
RX queue
RX queue
RX queue
RX queue
per-CPU
ceth_pkt buffer pool
per-CPU
ceth_pkt buffer pool
per-CPU
ceth_pkt buffer pool
ceth_pkt batch
in-use
ceth_pkt free
ceth_pkt free
ceth_pkt in-use
ceth_pkt batch
in-use
ceth_pkt free
ceth_pkt free
ceth_pkt in-use
ceth_pkt batch
in-use
ceth_pkt free
ceth_pkt free
ceth_pkt in-use
ceth_pkt batch
in-use
ceth_pkt free
ceth_pkt free
ceth_pkt in-use
default
paged memory implementation
using buddy allocator
contiguous pages
of batch size
page
page
page
contiguous pages
of batch size
page
page
page
contiguous pages
of batch size
page
page
page
per-CPU / per device queue
ceth_pkt buffer pool
recycled batch list
free ceth_pkt batch
ceth_pkt
ceth_pkt
ceth_pkt
free ceth_pkt batch
ceth_pkt
ceth_pkt
ceth_pkt
TX queue
current ceth_pkt batch
ceth_pkt batch
in-use
desc ring
if current batch is used up
and recycled list is not empty
take the first batch in recycled list
ceth_pkt
ceth_pkt
ceth_pkt
ceth_pkt
RX queue
desc ring
ceth_pkt
ceth_pkt
ceth_pkt
ceth_pkt
host protocol stack
forwarding
ceth_pkt_acquire()
__kfree_skb(skb)
ceth_pkt_to_skb(pkt)
netif_receive_skb(skb)
ceth_pkt
ceth_pkt in-use
ceth_pkt in-use
ceth_pkt
ceth_pkt
ceth_pkt in-use
ceth_pkt
ceth_pkt
contiguous pages
of batch size
page
page
page
page
if recycled list is empty
alloc_pages()
if recycled list is too long
free the batch directly
if recycled list idled for too long
free all pkt batches in the list
free ceth_pkt batch
ceth_pkt
ceth_pkt
ceth_pkt
ceth_pkt
whoever frees the last in-use ceth_pkt in a batch
will push the batch to head of recycled list
while taking the recycle list lock
drop
ceth_pkt_recycle(pkt)
optional
huge-page implementation
for mapping to user space
contiguous pages
of batch size
page
page
page
contiguous pages
of batch size
page
page
page
contiguous pages
of batch size
page
page
page
contiguous memory
frags of batch size
frag
frag
frag
frag
XDP
CETH pkt_t Structure
• Use one page for one packet
• Customizable meta data (for XDP)
• Header room for overlay
• SKB data structure ready
• Easy conversion between pkt_t and skb_buff
(with cost)
• Reuse skb_shared_info for fragments
frags[17]
end
skb_shared_info
head room
128 (64x2)
data
skb
data
sk_buff
232x2+8(64x8)
320 (64x5)
128 (64x2)
2880(64x45)
4K (64*64)
sk_buff2
fclone_ref=2
sk_buff_fclones
head
data
end
head
data
end
handle
data_offset
signature
meda data
ceth_pkt
list head
ceth_pkt_buffer
Next Steps (w/ XDP)
• Ongoing
1. Port more mm/bm features
2. Measure performance with XDP
use cases
3. optimize performance with
drivers (need help from driver
developers! )
4. Measure perfoermance
improvement of virtio
5. Direct Socket Interface for
userspace applications
• Discussions on mailing lists
1. Meta-data format
2. Offloading features, like TSO
3. Acceleration API
4. Virtualization Supports

More Related Content

What's hot

Introduction to eBPF and XDP
Introduction to eBPF and XDPIntroduction to eBPF and XDP
Introduction to eBPF and XDPlcplcp1
 
Receive side scaling (RSS) with eBPF in QEMU and virtio-net
Receive side scaling (RSS) with eBPF in QEMU and virtio-netReceive side scaling (RSS) with eBPF in QEMU and virtio-net
Receive side scaling (RSS) with eBPF in QEMU and virtio-netYan Vugenfirer
 
Cilium - BPF & XDP for containers
 Cilium - BPF & XDP for containers Cilium - BPF & XDP for containers
Cilium - BPF & XDP for containersDocker, Inc.
 
TEE (Trusted Execution Environment)は第二の仮想化技術になるか?
TEE (Trusted Execution Environment)は第二の仮想化技術になるか?TEE (Trusted Execution Environment)は第二の仮想化技術になるか?
TEE (Trusted Execution Environment)は第二の仮想化技術になるか?Kuniyasu Suzaki
 
Understanding eBPF in a Hurry!
Understanding eBPF in a Hurry!Understanding eBPF in a Hurry!
Understanding eBPF in a Hurry!Ray Jenkins
 
知っておくべきCephのIOアクセラレーション技術とその活用方法 - OpenStack最新情報セミナー 2015年9月
知っておくべきCephのIOアクセラレーション技術とその活用方法 - OpenStack最新情報セミナー 2015年9月知っておくべきCephのIOアクセラレーション技術とその活用方法 - OpenStack最新情報セミナー 2015年9月
知っておくべきCephのIOアクセラレーション技術とその活用方法 - OpenStack最新情報セミナー 2015年9月VirtualTech Japan Inc.
 
Ansible 入門 #01 (初心者向け)
Ansible 入門 #01 (初心者向け)Ansible 入門 #01 (初心者向け)
Ansible 入門 #01 (初心者向け)Taro Hirose
 
2014 OpenStack Summit - Neutron OVS to LinuxBridge Migration
2014 OpenStack Summit - Neutron OVS to LinuxBridge Migration2014 OpenStack Summit - Neutron OVS to LinuxBridge Migration
2014 OpenStack Summit - Neutron OVS to LinuxBridge MigrationJames Denton
 
Nick Fisk - low latency Ceph
Nick Fisk - low latency CephNick Fisk - low latency Ceph
Nick Fisk - low latency CephShapeBlue
 
A whirlwind tour of the LLVM optimizer
A whirlwind tour of the LLVM optimizerA whirlwind tour of the LLVM optimizer
A whirlwind tour of the LLVM optimizerNikita Popov
 
VLANs in the Linux Kernel
VLANs in the Linux KernelVLANs in the Linux Kernel
VLANs in the Linux KernelKernel TLV
 
containerdの概要と最近の機能
containerdの概要と最近の機能containerdの概要と最近の機能
containerdの概要と最近の機能Kohei Tokunaga
 
DPDK: Multi Architecture High Performance Packet Processing
DPDK: Multi Architecture High Performance Packet ProcessingDPDK: Multi Architecture High Performance Packet Processing
DPDK: Multi Architecture High Performance Packet ProcessingMichelle Holley
 
【Interop Tokyo 2023】ShowNetにおけるジュニパーネットワークスの取り組み
【Interop Tokyo 2023】ShowNetにおけるジュニパーネットワークスの取り組み【Interop Tokyo 2023】ShowNetにおけるジュニパーネットワークスの取り組み
【Interop Tokyo 2023】ShowNetにおけるジュニパーネットワークスの取り組みJuniper Networks (日本)
 
Introduction to DPDK
Introduction to DPDKIntroduction to DPDK
Introduction to DPDKKernel TLV
 
Hokkaido.cap #osc11do Wiresharkを使いこなそう!
Hokkaido.cap #osc11do Wiresharkを使いこなそう!Hokkaido.cap #osc11do Wiresharkを使いこなそう!
Hokkaido.cap #osc11do Wiresharkを使いこなそう!Panda Yamaki
 
Faster packet processing in Linux: XDP
Faster packet processing in Linux: XDPFaster packet processing in Linux: XDP
Faster packet processing in Linux: XDPDaniel T. Lee
 
Linux Networking Explained
Linux Networking ExplainedLinux Networking Explained
Linux Networking ExplainedThomas Graf
 
eBPFは何が嬉しいのか
eBPFは何が嬉しいのかeBPFは何が嬉しいのか
eBPFは何が嬉しいのかYutaro Hayakawa
 

What's hot (20)

eBPF Basics
eBPF BasicseBPF Basics
eBPF Basics
 
Introduction to eBPF and XDP
Introduction to eBPF and XDPIntroduction to eBPF and XDP
Introduction to eBPF and XDP
 
Receive side scaling (RSS) with eBPF in QEMU and virtio-net
Receive side scaling (RSS) with eBPF in QEMU and virtio-netReceive side scaling (RSS) with eBPF in QEMU and virtio-net
Receive side scaling (RSS) with eBPF in QEMU and virtio-net
 
Cilium - BPF & XDP for containers
 Cilium - BPF & XDP for containers Cilium - BPF & XDP for containers
Cilium - BPF & XDP for containers
 
TEE (Trusted Execution Environment)は第二の仮想化技術になるか?
TEE (Trusted Execution Environment)は第二の仮想化技術になるか?TEE (Trusted Execution Environment)は第二の仮想化技術になるか?
TEE (Trusted Execution Environment)は第二の仮想化技術になるか?
 
Understanding eBPF in a Hurry!
Understanding eBPF in a Hurry!Understanding eBPF in a Hurry!
Understanding eBPF in a Hurry!
 
知っておくべきCephのIOアクセラレーション技術とその活用方法 - OpenStack最新情報セミナー 2015年9月
知っておくべきCephのIOアクセラレーション技術とその活用方法 - OpenStack最新情報セミナー 2015年9月知っておくべきCephのIOアクセラレーション技術とその活用方法 - OpenStack最新情報セミナー 2015年9月
知っておくべきCephのIOアクセラレーション技術とその活用方法 - OpenStack最新情報セミナー 2015年9月
 
Ansible 入門 #01 (初心者向け)
Ansible 入門 #01 (初心者向け)Ansible 入門 #01 (初心者向け)
Ansible 入門 #01 (初心者向け)
 
2014 OpenStack Summit - Neutron OVS to LinuxBridge Migration
2014 OpenStack Summit - Neutron OVS to LinuxBridge Migration2014 OpenStack Summit - Neutron OVS to LinuxBridge Migration
2014 OpenStack Summit - Neutron OVS to LinuxBridge Migration
 
Nick Fisk - low latency Ceph
Nick Fisk - low latency CephNick Fisk - low latency Ceph
Nick Fisk - low latency Ceph
 
A whirlwind tour of the LLVM optimizer
A whirlwind tour of the LLVM optimizerA whirlwind tour of the LLVM optimizer
A whirlwind tour of the LLVM optimizer
 
VLANs in the Linux Kernel
VLANs in the Linux KernelVLANs in the Linux Kernel
VLANs in the Linux Kernel
 
containerdの概要と最近の機能
containerdの概要と最近の機能containerdの概要と最近の機能
containerdの概要と最近の機能
 
DPDK: Multi Architecture High Performance Packet Processing
DPDK: Multi Architecture High Performance Packet ProcessingDPDK: Multi Architecture High Performance Packet Processing
DPDK: Multi Architecture High Performance Packet Processing
 
【Interop Tokyo 2023】ShowNetにおけるジュニパーネットワークスの取り組み
【Interop Tokyo 2023】ShowNetにおけるジュニパーネットワークスの取り組み【Interop Tokyo 2023】ShowNetにおけるジュニパーネットワークスの取り組み
【Interop Tokyo 2023】ShowNetにおけるジュニパーネットワークスの取り組み
 
Introduction to DPDK
Introduction to DPDKIntroduction to DPDK
Introduction to DPDK
 
Hokkaido.cap #osc11do Wiresharkを使いこなそう!
Hokkaido.cap #osc11do Wiresharkを使いこなそう!Hokkaido.cap #osc11do Wiresharkを使いこなそう!
Hokkaido.cap #osc11do Wiresharkを使いこなそう!
 
Faster packet processing in Linux: XDP
Faster packet processing in Linux: XDPFaster packet processing in Linux: XDP
Faster packet processing in Linux: XDP
 
Linux Networking Explained
Linux Networking ExplainedLinux Networking Explained
Linux Networking Explained
 
eBPFは何が嬉しいのか
eBPFは何が嬉しいのかeBPFは何が嬉しいのか
eBPFは何が嬉しいのか
 

Similar to CETH for XDP [Linux Meetup Santa Clara | July 2016]

Ceph Day Beijing - Ceph all-flash array design based on NUMA architecture
Ceph Day Beijing - Ceph all-flash array design based on NUMA architectureCeph Day Beijing - Ceph all-flash array design based on NUMA architecture
Ceph Day Beijing - Ceph all-flash array design based on NUMA architectureCeph Community
 
Ceph Day Beijing - Ceph All-Flash Array Design Based on NUMA Architecture
Ceph Day Beijing - Ceph All-Flash Array Design Based on NUMA ArchitectureCeph Day Beijing - Ceph All-Flash Array Design Based on NUMA Architecture
Ceph Day Beijing - Ceph All-Flash Array Design Based on NUMA ArchitectureDanielle Womboldt
 
Dataplane networking acceleration with OpenDataplane / Максим Уваров (Linaro)
Dataplane networking acceleration with OpenDataplane / Максим Уваров (Linaro)Dataplane networking acceleration with OpenDataplane / Максим Уваров (Linaro)
Dataplane networking acceleration with OpenDataplane / Максим Уваров (Linaro)Ontico
 
DAOS - Scale-Out Software-Defined Storage for HPC/Big Data/AI Convergence
DAOS - Scale-Out Software-Defined Storage for HPC/Big Data/AI ConvergenceDAOS - Scale-Out Software-Defined Storage for HPC/Big Data/AI Convergence
DAOS - Scale-Out Software-Defined Storage for HPC/Big Data/AI Convergenceinside-BigData.com
 
Ceph Day Seoul - AFCeph: SKT Scale Out Storage Ceph
Ceph Day Seoul - AFCeph: SKT Scale Out Storage Ceph Ceph Day Seoul - AFCeph: SKT Scale Out Storage Ceph
Ceph Day Seoul - AFCeph: SKT Scale Out Storage Ceph Ceph Community
 
Heterogeneous Computing : The Future of Systems
Heterogeneous Computing : The Future of SystemsHeterogeneous Computing : The Future of Systems
Heterogeneous Computing : The Future of SystemsAnand Haridass
 
From Rack scale computers to Warehouse scale computers
From Rack scale computers to Warehouse scale computersFrom Rack scale computers to Warehouse scale computers
From Rack scale computers to Warehouse scale computersRyousei Takano
 
Walk Through a Software Defined Everything PoC
Walk Through a Software Defined Everything PoCWalk Through a Software Defined Everything PoC
Walk Through a Software Defined Everything PoCCeph Community
 
Data Policies for the Kafka-API with WebAssembly | Alexander Gallego, Vectorized
Data Policies for the Kafka-API with WebAssembly | Alexander Gallego, VectorizedData Policies for the Kafka-API with WebAssembly | Alexander Gallego, Vectorized
Data Policies for the Kafka-API with WebAssembly | Alexander Gallego, VectorizedHostedbyConfluent
 
Introduction to NVMe Over Fabrics-V3R
Introduction to NVMe Over Fabrics-V3RIntroduction to NVMe Over Fabrics-V3R
Introduction to NVMe Over Fabrics-V3RSimon Huang
 
Ceph Day Melbourne - Walk Through a Software Defined Everything PoC
Ceph Day Melbourne - Walk Through a Software Defined Everything PoCCeph Day Melbourne - Walk Through a Software Defined Everything PoC
Ceph Day Melbourne - Walk Through a Software Defined Everything PoCCeph Community
 
DPDK Summit 2015 - Aspera - Charles Shiflett
DPDK Summit 2015 - Aspera - Charles ShiflettDPDK Summit 2015 - Aspera - Charles Shiflett
DPDK Summit 2015 - Aspera - Charles ShiflettJim St. Leger
 
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...Odinot Stanislas
 
SOUG_GV_Flashgrid_V4
SOUG_GV_Flashgrid_V4SOUG_GV_Flashgrid_V4
SOUG_GV_Flashgrid_V4UniFabric
 
Ceph Community Talk on High-Performance Solid Sate Ceph
Ceph Community Talk on High-Performance Solid Sate Ceph Ceph Community Talk on High-Performance Solid Sate Ceph
Ceph Community Talk on High-Performance Solid Sate Ceph Ceph Community
 
April 2014 IBM announcement webcast
April 2014 IBM announcement webcastApril 2014 IBM announcement webcast
April 2014 IBM announcement webcastHELP400
 
PLNOG16: Obsługa 100M pps na platformie PC , Przemysław Frasunek, Paweł Mała...
PLNOG16: Obsługa 100M pps na platformie PC, Przemysław Frasunek, Paweł Mała...PLNOG16: Obsługa 100M pps na platformie PC, Przemysław Frasunek, Paweł Mała...
PLNOG16: Obsługa 100M pps na platformie PC , Przemysław Frasunek, Paweł Mała...PROIDEA
 

Similar to CETH for XDP [Linux Meetup Santa Clara | July 2016] (20)

Ceph Day Beijing - Ceph all-flash array design based on NUMA architecture
Ceph Day Beijing - Ceph all-flash array design based on NUMA architectureCeph Day Beijing - Ceph all-flash array design based on NUMA architecture
Ceph Day Beijing - Ceph all-flash array design based on NUMA architecture
 
Ceph Day Beijing - Ceph All-Flash Array Design Based on NUMA Architecture
Ceph Day Beijing - Ceph All-Flash Array Design Based on NUMA ArchitectureCeph Day Beijing - Ceph All-Flash Array Design Based on NUMA Architecture
Ceph Day Beijing - Ceph All-Flash Array Design Based on NUMA Architecture
 
Dataplane networking acceleration with OpenDataplane / Максим Уваров (Linaro)
Dataplane networking acceleration with OpenDataplane / Максим Уваров (Linaro)Dataplane networking acceleration with OpenDataplane / Максим Уваров (Linaro)
Dataplane networking acceleration with OpenDataplane / Максим Уваров (Linaro)
 
DAOS - Scale-Out Software-Defined Storage for HPC/Big Data/AI Convergence
DAOS - Scale-Out Software-Defined Storage for HPC/Big Data/AI ConvergenceDAOS - Scale-Out Software-Defined Storage for HPC/Big Data/AI Convergence
DAOS - Scale-Out Software-Defined Storage for HPC/Big Data/AI Convergence
 
Ceph Day Seoul - AFCeph: SKT Scale Out Storage Ceph
Ceph Day Seoul - AFCeph: SKT Scale Out Storage Ceph Ceph Day Seoul - AFCeph: SKT Scale Out Storage Ceph
Ceph Day Seoul - AFCeph: SKT Scale Out Storage Ceph
 
Heterogeneous Computing : The Future of Systems
Heterogeneous Computing : The Future of SystemsHeterogeneous Computing : The Future of Systems
Heterogeneous Computing : The Future of Systems
 
From Rack scale computers to Warehouse scale computers
From Rack scale computers to Warehouse scale computersFrom Rack scale computers to Warehouse scale computers
From Rack scale computers to Warehouse scale computers
 
PROSE
PROSEPROSE
PROSE
 
Walk Through a Software Defined Everything PoC
Walk Through a Software Defined Everything PoCWalk Through a Software Defined Everything PoC
Walk Through a Software Defined Everything PoC
 
Data Policies for the Kafka-API with WebAssembly | Alexander Gallego, Vectorized
Data Policies for the Kafka-API with WebAssembly | Alexander Gallego, VectorizedData Policies for the Kafka-API with WebAssembly | Alexander Gallego, Vectorized
Data Policies for the Kafka-API with WebAssembly | Alexander Gallego, Vectorized
 
Introduction to NVMe Over Fabrics-V3R
Introduction to NVMe Over Fabrics-V3RIntroduction to NVMe Over Fabrics-V3R
Introduction to NVMe Over Fabrics-V3R
 
Ceph Day Melbourne - Walk Through a Software Defined Everything PoC
Ceph Day Melbourne - Walk Through a Software Defined Everything PoCCeph Day Melbourne - Walk Through a Software Defined Everything PoC
Ceph Day Melbourne - Walk Through a Software Defined Everything PoC
 
NWU and HPC
NWU and HPCNWU and HPC
NWU and HPC
 
DPDK Summit 2015 - Aspera - Charles Shiflett
DPDK Summit 2015 - Aspera - Charles ShiflettDPDK Summit 2015 - Aspera - Charles Shiflett
DPDK Summit 2015 - Aspera - Charles Shiflett
 
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...
 
Ceph
CephCeph
Ceph
 
SOUG_GV_Flashgrid_V4
SOUG_GV_Flashgrid_V4SOUG_GV_Flashgrid_V4
SOUG_GV_Flashgrid_V4
 
Ceph Community Talk on High-Performance Solid Sate Ceph
Ceph Community Talk on High-Performance Solid Sate Ceph Ceph Community Talk on High-Performance Solid Sate Ceph
Ceph Community Talk on High-Performance Solid Sate Ceph
 
April 2014 IBM announcement webcast
April 2014 IBM announcement webcastApril 2014 IBM announcement webcast
April 2014 IBM announcement webcast
 
PLNOG16: Obsługa 100M pps na platformie PC , Przemysław Frasunek, Paweł Mała...
PLNOG16: Obsługa 100M pps na platformie PC, Przemysław Frasunek, Paweł Mała...PLNOG16: Obsługa 100M pps na platformie PC, Przemysław Frasunek, Paweł Mała...
PLNOG16: Obsługa 100M pps na platformie PC , Przemysław Frasunek, Paweł Mała...
 

Recently uploaded

Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)OPEN KNOWLEDGE GmbH
 
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...OnePlan Solutions
 
Xen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdfXen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdfStefano Stabellini
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWave PLM
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxTier1 app
 
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)jennyeacort
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureDinusha Kumarasiri
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfAlina Yurenko
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...soniya singh
 
React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaReact Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaHanief Utama
 
CRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceCRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceBrainSell Technologies
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Andreas Granig
 
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company OdishaBalasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odishasmiwainfosol
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackVICTOR MAESTRE RAMIREZ
 
SpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at RuntimeSpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at Runtimeandrehoraa
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...stazi3110
 
Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsUnveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsAhmed Mohamed
 
Unveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesUnveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesŁukasz Chruściel
 
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEOrtus Solutions, Corp
 

Recently uploaded (20)

Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)
 
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
 
Xen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdfXen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdf
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need It
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
 
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with Azure
 
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort ServiceHot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
 
React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaReact Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief Utama
 
CRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceCRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. Salesforce
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024
 
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company OdishaBalasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStack
 
SpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at RuntimeSpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at Runtime
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
 
Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsUnveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML Diagrams
 
Unveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesUnveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New Features
 
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
 

CETH for XDP [Linux Meetup Santa Clara | July 2016]

  • 1. CETH for XDP Common Ethernet Driver Framework for faster network I/O Yan Chen(Y.Chen@Huawei.com) Yunsong Lu (Yunsong.Lu@Huawei.com)
  • 2. Leveraging IO Visor • Performance Tuning • Tracing • Networking for Container: Dynamic E2E Monitoring • Cloud Native NFV: Micro Data Path Container(MDPC) • http://www.slideshare.net/IOVisor/evolving-virtual-networking-with- io-visor-openstack-summit-austin-april-2016
  • 3. Express I/O for XDP • Kernel Network I/O has been a performance bottleneck • Netmap and DPDK claimed 10x performance advantage  • Bypass is not low-hanging fruit • Could rebuilding EVERYTHING in userspace really do better? • Unless all bottlenecks are removed, it’s still a long way to go • Kernel is the place for better driver/platform eco-system • Multi-vendor NICs and accelerators • X86, ARM, Power, SPARC, etc. • Programmability of XDP will enable innovation in “Network Functional Applications”
  • 4. History of CETH (Common Ethernet Driver Framework) Designed for Performance and Virtualization: 1. Improve kernel networking performance for virtualization, particularly vSwitch and virtual I/O 2. Simplify NIC drivers by consolidate common functions, particularly for “internal” new NICs accelerators 3. Standalone module for various kernel versions Supports: • Huawei’s EVS(Elastic Virtual Switch) • NICs: • Intel ixgbe • Intel i40e (40G) • Broadcom bnx2x • Mellanox mlnx-en • Emulex be2net • Accelerators: • Huawei SNP-lite • Broadcom XLP • Ezchip Gx36 • Huawei VDR • vNIC: • ctap(tap+vhost) • virtio-net • ceth-pair
  • 5. Design Considerations (before XDP) 1. Efficient Memory/Buffer Management o Pre-allocated packet buffer pool o Efficient buffer acquire/recycle mechanism o Data Prefetching o Batching packet process o Optimized for efficient cache usage o Locking reduction/avoidance o High performance copy o Reduction of DMA mapping o Huge pages, etc. 2. Flexible TX/RX Scheduling o Threaded_irq o All-in-interrupt handling o Optional R2C or Pipeline Threading models o Feature-triggered mode switching 3. Customizable Meta-data structure o Cache-friendly data structure o Hardware/accelerator friendly o Extensible Metadata format is customizable o SKB compatible 4. Compatible with Kernel IP stack o Hardware Offloading friendly o Checksum, VLAN, etc. o TSO/GSO, LRO/GRO o Easy to port existing Linux device drivers o Reuse most existing non-datapath functions o Guild for easy driver porting 5. Tools for easy performance tuning o “ceth” tool to tune all parameters o sysfs interfaces
  • 6. Simplified CETH for XDP 1. Efficient Memory/Buffer Management o Pre-allocated packet buffer pool o Efficient buffer acquire/recycle mechanism o Data Prefetching o Batching packet process o Optimized for efficient cache usage o Locking reduction/avoidance o High performance copy o Reduction of DMA mapping o Huge pages, etc. 2. Flexible TX/RX Scheduling o Threaded_irq o All-in-interrupt handling o Optional R2C or Pipeline Threading models o Feature-triggered mode switching 3. Customizable Meta-data structure o Cache-friendly data structure o Hardware/accelerator friendly o Extensible Metadata format is customizable o SKB compatible 4. Compatible with Kernel IP stack o Hardware Offloading friendly o Checksum, VLAN, etc. o TSO/GSO, LRO/GRO o Easy to port existing Linux device drivers o Easy driver porting: less than 200LOC/driver 5. Tools for easy performance tuning o “ceth” tool to tune all parameters o Sysfs interfaces
  • 7. Simple interfaces for drivers • New Functions (CETH module) o ceth_pkt_aquire() o ceth_pkt_recycle() o ceth_pkt_to_skb() • Kernel modification o __kfree_skb() • Driver modifications • allocate buffers from CETH • optional: use pkt_t by default • optimize the driver!  • Performance  30% performance improvement for packet switching (br, ovs)  40% of pktgen performance  100% improvement for XDP forwarding  33Mpps XDP dropping rate with 2 CPU threads  Scalable with multiple hardware queues Patch available based on latest XDP kernel tree. Preliminary Performance numbers: https://docs.google.com/spreadsheets/d/1nT0DO25lfS1QpB LQkdIMm4LJl1v_VMScZVSOcRgkQOI/edit#gid=0 NOTE: all numbers were internally tested for development purpose only.
  • 8. Memory and Buffer Management • Separate memory management layer for various optimizations, like huge page • Per-CPU or per-queue buffer pool mechanisms • May use skb by default (pkt_t as buffer data structure only) • Can use non-skb meta- data cross all XDP functions Packet Management for XDP and protocol stack driver Buffer ManagementMemory Management RX queue RX queue RX queue RX queue RX queue RX queue per-CPU ceth_pkt buffer pool per-CPU ceth_pkt buffer pool per-CPU ceth_pkt buffer pool ceth_pkt batch in-use ceth_pkt free ceth_pkt free ceth_pkt in-use ceth_pkt batch in-use ceth_pkt free ceth_pkt free ceth_pkt in-use ceth_pkt batch in-use ceth_pkt free ceth_pkt free ceth_pkt in-use ceth_pkt batch in-use ceth_pkt free ceth_pkt free ceth_pkt in-use default paged memory implementation using buddy allocator contiguous pages of batch size page page page contiguous pages of batch size page page page contiguous pages of batch size page page page per-CPU / per device queue ceth_pkt buffer pool recycled batch list free ceth_pkt batch ceth_pkt ceth_pkt ceth_pkt free ceth_pkt batch ceth_pkt ceth_pkt ceth_pkt TX queue current ceth_pkt batch ceth_pkt batch in-use desc ring if current batch is used up and recycled list is not empty take the first batch in recycled list ceth_pkt ceth_pkt ceth_pkt ceth_pkt RX queue desc ring ceth_pkt ceth_pkt ceth_pkt ceth_pkt host protocol stack forwarding ceth_pkt_acquire() __kfree_skb(skb) ceth_pkt_to_skb(pkt) netif_receive_skb(skb) ceth_pkt ceth_pkt in-use ceth_pkt in-use ceth_pkt ceth_pkt ceth_pkt in-use ceth_pkt ceth_pkt contiguous pages of batch size page page page page if recycled list is empty alloc_pages() if recycled list is too long free the batch directly if recycled list idled for too long free all pkt batches in the list free ceth_pkt batch ceth_pkt ceth_pkt ceth_pkt ceth_pkt whoever frees the last in-use ceth_pkt in a batch will push the batch to head of recycled list while taking the recycle list lock drop ceth_pkt_recycle(pkt) optional huge-page implementation for mapping to user space contiguous pages of batch size page page page contiguous pages of batch size page page page contiguous pages of batch size page page page contiguous memory frags of batch size frag frag frag frag XDP
  • 9. CETH pkt_t Structure • Use one page for one packet • Customizable meta data (for XDP) • Header room for overlay • SKB data structure ready • Easy conversion between pkt_t and skb_buff (with cost) • Reuse skb_shared_info for fragments frags[17] end skb_shared_info head room 128 (64x2) data skb data sk_buff 232x2+8(64x8) 320 (64x5) 128 (64x2) 2880(64x45) 4K (64*64) sk_buff2 fclone_ref=2 sk_buff_fclones head data end head data end handle data_offset signature meda data ceth_pkt list head ceth_pkt_buffer
  • 10. Next Steps (w/ XDP) • Ongoing 1. Port more mm/bm features 2. Measure performance with XDP use cases 3. optimize performance with drivers (need help from driver developers! ) 4. Measure perfoermance improvement of virtio 5. Direct Socket Interface for userspace applications • Discussions on mailing lists 1. Meta-data format 2. Offloading features, like TSO 3. Acceleration API 4. Virtualization Supports