Linux Multiqueue Networking


Published on

  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Linux Multiqueue Networking

  1. 1. Linux Multiqueue Networking David S. Miller Basic Premise Linux Multiqueue Networking Receive Side Multiqueue Transmit Side David S. Miller Multiqueue Existing Design Red Hat Inc. New Design Implementation Netfilter Workshop, ESIEA, Paris, 2008 Notes
  2. 2. N ETWORKING PACKETS AS CPU W ORK Linux Multiqueue Networking David S. Miller Basic Premise Each packet generates some CPU work Receive Side Multiqueue Both at protocol and application level Transmit Side Faster interfaces can generate more work Multiqueue Sophisticated networking can generate more work Existing Design Firewalling lookups, route lookups, etc. New Design Implementation Notes
  3. 3. CPU E VOLUTION Linux Multiqueue Networking David S. Miller Basic Premise Multiprocessor systems used to be uncommon Receive Side This is changing... significantly Multiqueue Transmit Common systems will soon have 64 cpu threads, or Side Multiqueue more Existing Lots of CPU processing resources Design New Design The trouble is finding ways to use them Implementation Notes
  4. 4. E ND TO E ND P RINCIPLE Linux Multiqueue Networking David S. Miller Basic premise: Communication protocol operations Basic Premise should be defined to occur at the end-points of a Receive Side communications system, or as close as possible to the Multiqueue Transmit resource being controlled Side Multiqueue The network is dumb Existing The end nodes are smart Design New Design Computational complexity is pushed as far to the edges Implementation as possible Notes
  5. 5. E ND - TO - END APPLIED TO “ SYSTEM ” Linux Multiqueue Networking David S. Miller The end-to-end principle extends all the way into your Basic Premise computer Receive Side Multiqueue Each cpu in your system is an end node Transmit The “network” pushes work to your system Side Multiqueue Internally, work should be pushed down further to the Existing Design execution resources New Design With multi-threaded processors like Niagara, this is Implementation Notes more important to get right than ever before
  6. 6. N ETWORK D EVICE Linux Multiqueue Networking David S. Miller Devices must provide facilities to balance the load of Basic incoming network work Premise Receive Side But they must do so without creating new problems Multiqueue (such as reordering) Transmit Side Packet reordering looks like packet loss to TCP Multiqueue Existing Independant “work” should call upon independant cpus Design to process it New Design Implementation PCI MSI-X interrupts Notes Multiqueue network devices, with classification facilities
  7. 7. S OFTWARE S OLUTIONS Linux Multiqueue Networking David S. Miller Basic Premise Not everyone will have multiqueue capable network Receive Side Multiqueue devices or systems. Transmit Side Receive balancing is possible in software Multiqueue Older systems can have their usefulness extended by Existing Design such facilities New Design Implementation Notes
  8. 8. NAPI, “N EW API” Linux Multiqueue Basic abstraction for packet reception under Linux, Networking providing software interrupt mitigation and load David S. Miller balancing. Basic Device signals interrupt when receive packets are Premise available. Receive Side Multiqueue Driver disables chip interrupts, and NAPI context is Transmit queued up for packet processing Side Multiqueue Kernel generic networking processes NAPI context in Existing Design software interrupt context, asking for packets from the New Design driver one at a time Implementation When no more receive packets are pending, the NAPI Notes context is dequeued and the driver re-enabled chip interrupts Under high load with a weak cpu, we may process thousands of packets per hardware interrupt
  9. 9. NAPI L IMITATIONS Linux Multiqueue Networking David S. Miller Basic It was not ready for multi-queue network devices Premise Receive Side One NAPI context exists per network device Multiqueue It assumes a network device only generates one Transmit Side interrupt source which must be disabled during a poll Multiqueue Existing Cannot handle cleanly the case of a one to many Design relationship between network devices and packet event New Design sources Implementation Notes
  10. 10. F IXING NAPI Linux Multiqueue Networking David Implemented by Stephen Hemminger S. Miller Remove NAPI context from network device structure Basic Premise Let device driver define NAPI instances however it likes Receive Side in it’s private data structures Multiqueue Transmit Simple devices will define only one NAPI context Side Multiqueue Multiqueue devices will define a NAPI context for each Existing RX queue Design New Design All NAPI instances for a device execute independantly, Implementation no shared locking, no mutual exclusion Notes Independant cpus process independant NAPI contexts in parallel
  11. 11. O K , WHAT ABOUT NON - MULTIQUEUE HARDWARE ? Linux Multiqueue Networking David S. Miller Remote softirq block I/O layer changes by Jens Axboe Basic Premise Provides a generic facility to execute software interrupt Receive Side work on remote processors Multiqueue We can use this to simulate network devices with Transmit Side hardware multiqueue facilities Multiqueue Existing netif_receive_skb() is changed to hash the flow of a Design packet and use this to choose a remote cpu New Design Implementation We push the packet input processing to the selected Notes cpu
  12. 12. C ONFIGURABILITY Linux Multiqueue Networking David S. Miller Basic Premise The default distribution works well for most people. Receive Side Some people have special needs or want to control Multiqueue how work is distributed on their system Transmit Side Multiqueue Configurability of hardware facilities is variable Existing Some provide lots of control, some provide very little Design New Design We can fully control the software implementation Implementation Notes
  13. 13. I MPETUS Linux Multiqueue Networking David S. Miller Basic Premise Writing NIU driver Receive Side Multiqueue Peter W’s multiqueue hacks Transmit Side TX multiqueue will be pervasive. Multiqueue Robert Olsson’s pending paper on 10GB routing (found Existing Design TX locking to be bottleneck in several situations) New Design Implementation Notes
  14. 14. P RESENTATIONS Linux Multiqueue Networking David S. Miller Basic Tokyo 2008 Premise Berlin 2008 Receive Side Multiqueue Seattle 2008 Transmit Side Portland 2008 Multiqueue Existing Implementation plan changing constantly Design End result was different than all of these designs New Design Implementation Full implementation will be in 2.6.27 Notes
  15. 15. P ICTURE OF TX E NGINE Linux Multiqueue Networking David dev->q S. Miller ueue_ lo ck TX l Basic ock Premise Receive Side Multiqueue dev_queue_xmit() -> QDISC hard_start_xmit Transmit Side Multiqueue set SKB queue mapping Existing Design DRIVER New Design Implementation Notes TXQ TXQ TXQ
  16. 16. D ETAILS TO N OTE Linux Multiqueue Networking David S. Miller Basic Premise Generic networking has no idea about multiple TX Receive Side queues Multiqueue Transmit Parallelization impossible because of single queue and Side Multiqueue TX lock Existing Only single root QDISC is possible, sharing is not Design New Design allowed Implementation Notes
  17. 17. P ICTURE OF D EFAULT C ONFIGURATION Linux Multiqueue Networking David S. Miller Basic qdisc TX lock Premise TXQ ->q.lock Receive Side Multiqueue Transmit Side Multiqueue dev_queue_xmit TXQ qdisc TX lock driver ->q.lock Existing Design New Design Implementation TXQ qdisc TX lock Notes ->q.lock
  18. 18. P ICTURE WITH N ON - TRIVIAL QDISC Linux Multiqueue Networking David S. Miller SKB TX lock Basic Premise TXQ Receive Side Multiqueue qdisc Transmit Side Multiqueue SKB TXQ ->q.lock TX lock driver Existing Design New Design TXQ TX lock Implementation Notes skb
  19. 19. T HE N ETWORK D EVICE Q UEUE Linux Multiqueue Networking David S. Miller Basic Premise Direction agnostic, can represent both RX and TX Receive Side Holds: Multiqueue Backpointer to struct net_device Transmit Side Pointer to ROOT qdisc for this queue, and sleeping Multiqueue qdisc Existing Design TX lock for ->hard_start_xmit() synchronization New Design Flow control state bit (XOFF) Implementation Notes
  20. 20. QDISC C HANGES Linux Multiqueue Networking David S. Miller Basic Holds state bits for qdisc_run() scheduling Premise Has backpointer to referring dev_queue Receive Side Multiqueue Therefore, current configured QDISC root is always Transmit Side qdisc->dev_queue->qdisc_sleeping Multiqueue Existing qdisc->q.lock on root is now used for tree Design synchronization New Design qdisc->list of root qdisc replaces netdev->qdisc_list Implementation Notes
  21. 21. TX Q UEUE S ELECTION Linux Multiqueue Networking David Simple hash function over the packet FLOW S. Miller information (source and destination address, source Basic Premise and destination port) Receive Side Driver or driver layer is allowed to override this queue Multiqueue selection function if the queues of the device are used Transmit Side for something other than flow seperation or prioritization Multiqueue Existing Example: Wireless Design New Design In the future, this function will disappear. Implementation To be replaced with socket hash marking of transmit Notes packets, and RX side hash recording of received packets for forwarding and bridging
  22. 22. I MPLEMENTATION AND S EMANTIC I SSUES Linux Multiqueue Networking David S. Miller Basic Synchronization is difficult. Premise Receive Side People want to use queues for other tasks such as Multiqueue priority scheduling, will be supported by sch_multiq.c Transmit Side module from Intel in 2.6.28 Multiqueue TX multiqueue interactions with the packet scheduler Existing Design and it’s semantics are still not entirely clear New Design Most of the problems are about flow control Implementation Notes