Linux Multiqueue Networking


Published on

1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Linux Multiqueue Networking

  1. 1. Linux Multiqueue Networking David S. Miller Background RX Linux Multiqueue Networking Multiqueue TX Multiqueue Application- David S. Miller based and SW Steering Red Hat Inc. The End New York City, 2009
  2. 2. T RENDS Linux Multiqueue Networking David S. Miller Background More CPUs, either less powerful (high arity) or same RX Multiqueue (low arity) as existing CPUs TX Multiqueue Flow counts increasing Application- Networking hardware adjusting to horizontal scaling based and SW Steering Single queue model no longer works The End Routers and firewalls have different needs than servers
  3. 3. CPU D ESIGN Linux Multiqueue Networking David S. Miller Traditionally single CPUs or very low count SMP Background The move to high-arity CPU counts RX Multiqueue One model: Sun’s Niagara TX Multiqueue Lower powered CPUs, but many of them Application- based and Other model: x86 based systems SW Steering High powered CPUs, but not as high increase in arity The End as Niagara approach, starting with hyperthreading Future: Best of both worlds, high arity and power
  4. 4. E ND N ODES VS . I NTERMEDIATE N ODES Linux Multiqueue Networking David End Nodes: Servers S. Miller Intermediate Nodes: Routers and Firewalls Background Intermediate nodes have good flow distribution implicit RX Multiqueue in their traffic TX Multiqueue Also, processing a packet occurs purely within the Application- networking stack itself, no application level work based and SW Steering End nodes also usually have good flow distribution The End However, there is the added aspect of application cpu usage Completely stateless flow steering Or, application oriented flow steering
  5. 5. N ETWORKING H ARDWARE D ESIGN Linux Multiqueue Networking David S. Miller Traditionally a single-queue model Background Limitations of bus technology, f.e. PCI RX Multiqueue Advent of MSI and MSI-X interrupts TX RSS based flow hashing Multiqueue Application- Multiple TX and RX queues based and SW Steering Stateless flow distribution The End Extra sophistication: Sun’s Neptune 10G Ethernet TCAMs and more fine-grained flow steering Intel’s IXGBE “Flow Director”
  6. 6. NAPI: “N EW API” Linux Multiqueue Networking David S. Miller Interrupt mitigation scheme designed by Jamal Hadi Salim and Robert Olsson Background RX On interrupt, further interrupts are disabled and Multiqueue software interrupt is scheduled TX Multiqueue Software interrupt “polls” the driver, which processes Application- based and RX packets until no more pending packets or quota is SW Steering hit The End Quota provides DRR (Distributed Round Robin) sharing between links When polling is complete, chip interrupts are re-enabled
  7. 7. L IMITATIONS OF NAPI Linux Multiqueue Networking David S. Miller Background All state embedded literally inside of “struct netdevice” RX Multiqueue Ideally we want some kind of “NAPI instance” for each TX Multiqueue chip interrupt source Application- But we had no direct way to instantiate such instances based and SW Steering structurally The End Fixes were in order
  8. 8. S TEPHEN H EMMINGER TO THE R ESCUE Linux Multiqueue Networking David S. Miller Background Extracted NAPI state into seperate structure RX Device driver could create as many instances as Multiqueue TX necessary Multiqueue Multiple RX queues could be represented using Application- based and multiple NAPI instances SW Steering The End And this is exactly what multiqueue drivers do Oh BTW: Nasty hacks...
  9. 9. PACKET S CHEDULER Linux Multiqueue Networking Sits between network stack and device transmit method David S. Miller Supports arbitrary packet classification and an Background assortment of queueing disciplines RX Multiqueue Has to lock QDISC and then device TX queue to get a TX packet to the device Multiqueue SMP unfriendly, and just like NAPI had state embedded Application- based and in netdevice struct SW Steering The End Root qdiscs cannot be shared Complicated qdisc and classifier state has “device scope” Luckily the default configuration is a stateless and simple qdisc
  10. 10. D RIVER TX M ETHOD Linux Multiqueue Networking David S. Miller Manages TX queue flow control assuming one queue Background Need to add queue specifier to flow control APIs RX Multiqueue But do so without breaking multiqueue-unaware drivers TX Multiqueue With NAPI we could totally break the API and just fix all Application- the drivers at once based and SW Steering Only a relative handful of drivers use NAPI The End Breaking the flow control API would require changes to roughly 450 drivers So, backward compatible solutions only.
  11. 11. TX Q UEUE S ELECTION Linux Multiqueue Networking David S. Miller Selected queue stored in SKB Background Queue selection function is different depending upon RX Multiqueue packet origin TX Forwarded packet: Function of RX queue selected by Multiqueue Application- input device based and SW Steering Locally generated packet: Use hash value of attached The End socket Thorny cases: Devices with unequal RX and TX queues
  12. 12. P ICTURE OF TX E NGINE Linux Multiqueue Networking David dev->q S. Miller ueue_ lo ck TX l Background ock RX Multiqueue TX dev_queue_xmit() -> QDISC hard_start_xmit Multiqueue Application- based and set SKB queue mapping SW Steering The End DRIVER TXQ TXQ TXQ
  13. 13. P ICTURE OF D EFAULT C ONFIGURATION Linux Multiqueue Networking David S. Miller Background qdisc TX lock RX TXQ ->q.lock Multiqueue qdisc TX TXQ driver Multiqueue Application- dev_queue_xmit TX lock based and ->q.lock SW Steering qdisc The End TXQ TX lock ->q.lock
  14. 14. P ICTURE WITH N ON - TRIVIAL QDISC Linux Multiqueue Networking David S. Miller SKB TX lock Background RX TXQ Multiqueue TX qdisc Multiqueue Application- based and SKB TXQ ->q.lock TX lock driver SW Steering The End TXQ TX lock skb
  15. 15. M OTIVATION Linux Multiqueue Networking David S. Miller Performance, duh... Background RX Many networking devices out there are not multiqueue Multiqueue capable TX Multiqueue Whilst stateless RX queue hashing is great for Application- based and forwarding applications... SW Steering It is decidedly suboptimal for end-nodes. The End Problem: Figuring out the packet’s “destination” before it’s “too late”
  16. 16. E XAMPLE S CENERIO Linux Multiqueue Networking David S. Miller Background RX Multiqueue TX Multiqueue Application- based and SW Steering The End
  17. 17. E ARLY E FFORTS Linux Multiqueue Networking David S. Miller Influenced by Jens Axboe’s remote block I/O Background completion experiments RX Multiqueue Up to 10 percent improvement in benchmarks where TX usually a 3 percent improvement is something to brag Multiqueue heavily about Application- based and Generalization of remote software interrupt invocation SW Steering The End Counterpart usage implemented for networking Basically SW multiqueue on receive Detrimental for loopback traffic
  18. 18. M ORE R ECENT W ORK Linux Multiqueue Networking David S. Miller Patch posted by Tom Herbert at Google Background Per-device “packet steering” table, set via sysctl by user RX When packet steering is enabled, receive packets are Multiqueue hashed and this indexes into the table TX Multiqueue Entry found in table is cpu to steer packets to Application- based and Packet steered to foreign cpus using remote SMP calls SW Steering and special software interrupt The End Whole mechanism is enabled also via sysctrl If disabled or no valid entry found in the table, behavior is existing behavior
  19. 19. A NOTHER I DEA : SW “F LOW D IRECTOR ” Linux Multiqueue Networking David S. Miller Background CPU on which transmits for a flow occur is RX “remembered” Multiqueue TX On receive for that flow, remembered cpu is looked up Multiqueue and packet steered to that CPU Application- based and Problems of space SW Steering The End Problems of time Problems of locality
  20. 20. C REDITS Linux Multiqueue Networking Linus Torvalds, for sharing his kernel instead of keeping David S. Miller it to himself Background Ron Guerin and the rest of the NYLUG folks for letting RX me present today Multiqueue TX Stephen Hemminger and Rusty Russell for early RX Multiqueue multiqueue work Application- based and Jarek Poplawski, Patrick McHardy, Jamal Hadi Salim, SW Steering The End and Eric Dumazet for help with TX multiqueue implementation Robert Olsson and Herbert Xu for continuing help throughout all of this Tom Herbert and others at Google for ongoing efforts