Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

DevConf 2014 Kernel Networking Walkthrough


Published on

This presentation features a walk through the Linux kernel networking stack covering the essentials and recent developments a developer needs to know. Our starting point is the network card driver as it feeds a packet into the stack. We will follow the packet as it traverses through various subsystems such as packet filtering, routing, protocol stacks, and the socket layer. We will pause here and there to look into concepts such as segmentation offloading, TCP small queues, and low latency polling. We will cover APIs exposed by the kernel that go beyond use of write()/read() on sockets and will look into how they are implemented on the kernel side.

Published in: Technology

DevConf 2014 Kernel Networking Walkthrough

  1. 1. Kernel Networking Walkthrough Thomas Graf – Principal Software Engineer Networking Services Red Hat Feb 7, 2014 1 Kernel Networking Walkthrough
  2. 2. Agenda ● How does a packet get in and out of the net stack? ● ● How does a packet get through the net stack? ● ● 2 RX Handler, IP Processing, TCP Processing, TCP Fast Open How to account for memory and do flow control? ● ● NAPI, Busy Polling, RSS, RPS, XPS, GRO, TSO Socket Buffers, Flow Control, TCP Small Queues Q&A Kernel Networking Walkthrough
  3. 3. Touring the Network Stack Expectation 3 Reality Kernel Networking Walkthrough
  4. 4. How does a packet get in and out of the Network Stack? 4 Kernel Networking Walkthrough
  5. 5. Receive & Transmit Process NIC Network Stack (Kernel Space) Ring Buffer Parse IP Parse TCP/UDP Socket Buffer read() Forward DMA Device? Ring Buffer 5 Local? Process (User Space) Task Construct IP Construct TCP/UDP Kernel Networking Walkthrough write() Socket Buffer
  6. 6. The 3 ways into the Network Stack Interrupt Driven Network Stack Ring Buffer NAPI based Polling poll() Network Stack Ring Buffer Busy Polling busy_poll() Task Network Stack Ring Buffer 6 Kernel Networking Walkthrough
  7. 7. RSS – Receive Side Scaling ● ● NIC distributes packets across multiple RX queues allowing for parallel processing. Separate IRQ per RX queue, thus selects CPU to run hardware interrupt handler on. RX-queue-1 CPU 1 RX-queue-2 CPU 3 filter RX-queue-3 CPU 1 RX-queue-4 CPU 5 7 Kernel Networking Walkthrough
  8. 8. RPS – Receive Packet Steering ● Software filter to select CPU # for processing ● Use it to ... ... distribute single queue to multiple CPUs ... redo queue - CPU mapping RX-queue-1 RX-queue-2 RX-queue-3 RX-queue-4 8 CPU 1 CPU 1 CPU 2 CPU 2 CPU 3 CPU 3 Kernel Networking Walkthrough
  9. 9. Hardware Offload ● RX/TX Checksumming ● ● Virtual LAN filtering and tag stripping ● ● 9 Perform CPU intensive checksumming in hardware. Strip 802.1Q header and store VLAN ID in network packet meta data. Filter out unsubscribed VLANs. Kernel Networking Walkthrough
  10. 10. Generic Receive Offload NAPI based GRO poll() Network Stack Ring Buffer GRO MTU 10 Kernel Networking Walkthrough Up to 64K
  11. 11. Segmentation Offload Up to 64K Network Stack Generic Segmentation Offload (GSO) MTU Ring Buffer TCP Segmentation Offload (TSO) MTU 11 Kernel Networking Walkthrough
  12. 12. How does a packet get through the Network Stack? (c) Karen Sagovac 12 Kernel Networking Walkthrough
  13. 13. Packet Processing Link Layer Packet Socket ETH_P_ALL Ingress QoS tcpdump Bridge Open vSwitch RX Handler Team Bonding macvlan macvtap IPv4 Proto Handler IPv6 ARP Feast of the hungry chicks IPX Drop 13 Kernel Networking Walkthrough ...
  14. 14. IP Processing PREROUTING IP Handler INPUT Route Lookup Local Delivery Forwarding L4 (TCP, ...) FORWARD Route Lookup Link Layer IPv4 Construction POSTROUTING OUTPUT 14 Kernel Networking Walkthrough Local Output User Space
  15. 15. TCP Processing IP Parse TCP Lookup Socket Socket Filter socket locked task exists Receive TCP Prequeue process context ← softirq Receive Socket Buffer read() poll() Task 15 Backlog Kernel Networking Walkthrough
  16. 16. TCP Fast Open (net.ipv4.tcp_fastopen) Regular Fast Open Client 1st Req Server Client 1st Req SYN ACK SYN+ 2x RTT ACK+ HTTP GE Server 2x RTT T SYN ookie CK+C A SYN+ ACK+ HTTP GET Data 2nd Req Data 2nd Req SYN 1x RTT ACK SYN+ 2x RTT ACK+ HTTP GE T Data 16 Kernel Networking Walkthrough SYN+ Cook ie+ HTTP GET +Data +ACK SYN
  17. 17. Memory Accounting & Flow Control 17 Kernel Networking Walkthrough
  18. 18. Socket Buffers & Flow Control (net.ipv4.tcp_{r|w}mem) ssh ssh Block or EWOULDBLOCK write() rmem -= packet-size wmem overlimit? Socket Buffer rmem += packet-size wmem += packet-size rmem overlimit? Socket Buffer Reduce TCP Window TCP/IP TCP/IP TX Ring Buffer wmem -= packet-size 18 Kernel Networking Walkthrough RX Ring Buffer
  19. 19. TCP Small Queues (net.ipv4.tcp_limit_output_bytes) ssh torrent write() write() Socket Buffer Socket Buffer TSQ: max 128Kb in flight per socket TCP/IP Queuing Discipline Driver TX Ring Buffer 19 Kernel Networking Walkthrough
  20. 20. Q&A Feedback Page ● Coming Up Next: NetworkManager for Enterprise Dan Williams 20 Kernel Networking Walkthrough