CETH for XDP [Linux Meetup Santa Clara | July 2016]

971 views

Published on

Published in: Software
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
971
On SlideShare
0
From Embeds
0
Number of Embeds
24
Actions
Shares
0
Downloads
36
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide

CETH for XDP [Linux Meetup Santa Clara | July 2016]

  1. 1. CETH for XDP Common Ethernet Driver Framework for faster network I/O Yan Chen(Y.Chen@Huawei.com) Yunsong Lu (Yunsong.Lu@Huawei.com)
  2. 2. Leveraging IO Visor • Performance Tuning • Tracing • Networking for Container: Dynamic E2E Monitoring • Cloud Native NFV: Micro Data Path Container(MDPC) • http://www.slideshare.net/IOVisor/evolving-virtual-networking-with- io-visor-openstack-summit-austin-april-2016
  3. 3. Express I/O for XDP • Kernel Network I/O has been a performance bottleneck • Netmap and DPDK claimed 10x performance advantage  • Bypass is not low-hanging fruit • Could rebuilding EVERYTHING in userspace really do better? • Unless all bottlenecks are removed, it’s still a long way to go • Kernel is the place for better driver/platform eco-system • Multi-vendor NICs and accelerators • X86, ARM, Power, SPARC, etc. • Programmability of XDP will enable innovation in “Network Functional Applications”
  4. 4. History of CETH (Common Ethernet Driver Framework) Designed for Performance and Virtualization: 1. Improve kernel networking performance for virtualization, particularly vSwitch and virtual I/O 2. Simplify NIC drivers by consolidate common functions, particularly for “internal” new NICs accelerators 3. Standalone module for various kernel versions Supports: • Huawei’s EVS(Elastic Virtual Switch) • NICs: • Intel ixgbe • Intel i40e (40G) • Broadcom bnx2x • Mellanox mlnx-en • Emulex be2net • Accelerators: • Huawei SNP-lite • Broadcom XLP • Ezchip Gx36 • Huawei VDR • vNIC: • ctap(tap+vhost) • virtio-net • ceth-pair
  5. 5. Design Considerations (before XDP) 1. Efficient Memory/Buffer Management o Pre-allocated packet buffer pool o Efficient buffer acquire/recycle mechanism o Data Prefetching o Batching packet process o Optimized for efficient cache usage o Locking reduction/avoidance o High performance copy o Reduction of DMA mapping o Huge pages, etc. 2. Flexible TX/RX Scheduling o Threaded_irq o All-in-interrupt handling o Optional R2C or Pipeline Threading models o Feature-triggered mode switching 3. Customizable Meta-data structure o Cache-friendly data structure o Hardware/accelerator friendly o Extensible Metadata format is customizable o SKB compatible 4. Compatible with Kernel IP stack o Hardware Offloading friendly o Checksum, VLAN, etc. o TSO/GSO, LRO/GRO o Easy to port existing Linux device drivers o Reuse most existing non-datapath functions o Guild for easy driver porting 5. Tools for easy performance tuning o “ceth” tool to tune all parameters o sysfs interfaces
  6. 6. Simplified CETH for XDP 1. Efficient Memory/Buffer Management o Pre-allocated packet buffer pool o Efficient buffer acquire/recycle mechanism o Data Prefetching o Batching packet process o Optimized for efficient cache usage o Locking reduction/avoidance o High performance copy o Reduction of DMA mapping o Huge pages, etc. 2. Flexible TX/RX Scheduling o Threaded_irq o All-in-interrupt handling o Optional R2C or Pipeline Threading models o Feature-triggered mode switching 3. Customizable Meta-data structure o Cache-friendly data structure o Hardware/accelerator friendly o Extensible Metadata format is customizable o SKB compatible 4. Compatible with Kernel IP stack o Hardware Offloading friendly o Checksum, VLAN, etc. o TSO/GSO, LRO/GRO o Easy to port existing Linux device drivers o Easy driver porting: less than 200LOC/driver 5. Tools for easy performance tuning o “ceth” tool to tune all parameters o Sysfs interfaces
  7. 7. Simple interfaces for drivers • New Functions (CETH module) o ceth_pkt_aquire() o ceth_pkt_recycle() o ceth_pkt_to_skb() • Kernel modification o __kfree_skb() • Driver modifications • allocate buffers from CETH • optional: use pkt_t by default • optimize the driver!  • Performance  30% performance improvement for packet switching (br, ovs)  40% of pktgen performance  100% improvement for XDP forwarding  33Mpps XDP dropping rate with 2 CPU threads  Scalable with multiple hardware queues Patch available based on latest XDP kernel tree. Preliminary Performance numbers: https://docs.google.com/spreadsheets/d/1nT0DO25lfS1QpB LQkdIMm4LJl1v_VMScZVSOcRgkQOI/edit#gid=0 NOTE: all numbers were internally tested for development purpose only.
  8. 8. Memory and Buffer Management • Separate memory management layer for various optimizations, like huge page • Per-CPU or per-queue buffer pool mechanisms • May use skb by default (pkt_t as buffer data structure only) • Can use non-skb meta- data cross all XDP functions Packet Management for XDP and protocol stack driver Buffer ManagementMemory Management RX queue RX queue RX queue RX queue RX queue RX queue per-CPU ceth_pkt buffer pool per-CPU ceth_pkt buffer pool per-CPU ceth_pkt buffer pool ceth_pkt batch in-use ceth_pkt free ceth_pkt free ceth_pkt in-use ceth_pkt batch in-use ceth_pkt free ceth_pkt free ceth_pkt in-use ceth_pkt batch in-use ceth_pkt free ceth_pkt free ceth_pkt in-use ceth_pkt batch in-use ceth_pkt free ceth_pkt free ceth_pkt in-use default paged memory implementation using buddy allocator contiguous pages of batch size page page page contiguous pages of batch size page page page contiguous pages of batch size page page page per-CPU / per device queue ceth_pkt buffer pool recycled batch list free ceth_pkt batch ceth_pkt ceth_pkt ceth_pkt free ceth_pkt batch ceth_pkt ceth_pkt ceth_pkt TX queue current ceth_pkt batch ceth_pkt batch in-use desc ring if current batch is used up and recycled list is not empty take the first batch in recycled list ceth_pkt ceth_pkt ceth_pkt ceth_pkt RX queue desc ring ceth_pkt ceth_pkt ceth_pkt ceth_pkt host protocol stack forwarding ceth_pkt_acquire() __kfree_skb(skb) ceth_pkt_to_skb(pkt) netif_receive_skb(skb) ceth_pkt ceth_pkt in-use ceth_pkt in-use ceth_pkt ceth_pkt ceth_pkt in-use ceth_pkt ceth_pkt contiguous pages of batch size page page page page if recycled list is empty alloc_pages() if recycled list is too long free the batch directly if recycled list idled for too long free all pkt batches in the list free ceth_pkt batch ceth_pkt ceth_pkt ceth_pkt ceth_pkt whoever frees the last in-use ceth_pkt in a batch will push the batch to head of recycled list while taking the recycle list lock drop ceth_pkt_recycle(pkt) optional huge-page implementation for mapping to user space contiguous pages of batch size page page page contiguous pages of batch size page page page contiguous pages of batch size page page page contiguous memory frags of batch size frag frag frag frag XDP
  9. 9. CETH pkt_t Structure • Use one page for one packet • Customizable meta data (for XDP) • Header room for overlay • SKB data structure ready • Easy conversion between pkt_t and skb_buff (with cost) • Reuse skb_shared_info for fragments frags[17] end skb_shared_info head room 128 (64x2) data skb data sk_buff 232x2+8(64x8) 320 (64x5) 128 (64x2) 2880(64x45) 4K (64*64) sk_buff2 fclone_ref=2 sk_buff_fclones head data end head data end handle data_offset signature meda data ceth_pkt list head ceth_pkt_buffer
  10. 10. Next Steps (w/ XDP) • Ongoing 1. Port more mm/bm features 2. Measure performance with XDP use cases 3. optimize performance with drivers (need help from driver developers! ) 4. Measure perfoermance improvement of virtio 5. Direct Socket Interface for userspace applications • Discussions on mailing lists 1. Meta-data format 2. Offloading features, like TSO 3. Acceleration API 4. Virtualization Supports

×