X86 hardware for packet processing

5,900 views

Published on

Mainly for DCA and VMDq

0 Comments
6 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
5,900
On SlideShare
0
From Embeds
0
Number of Embeds
37
Actions
Shares
0
Downloads
81
Comments
0
Likes
6
Embeds 0
No embeds

No notes for slide

X86 hardware for packet processing

  1. 1. Hardware/Software for Packet Processing Hisaki OharaThursday, October 11, 2012
  2. 2. Today’s Agenda • Basic Requirement • DCA (Direct Cache Access) • Multiqueue (VMDq and RSS)Thursday, October 11, 2012
  3. 3. Basic Requirement for Packet Processing • 14.8 Mpps (packets per second) for 10GbE • 10G / {8 * (64+8+12)} • Processing time 67 nsec for a packet • About 134 cycles for Xeon 2GHzThursday, October 11, 2012
  4. 4. Hardware and Software CPU Core Core Core Core Multi Core Memory Cache DCA Interrupt Coalescing MSI-X Posed Interrupt DMA Full APIC Virtualization RSS IOMMU TSO VMDq LRO Multi Queue NICThursday, October 11, 2012
  5. 5. DCA (Direct Cache Access) • Feature: Put the data directly into the cache • Reduce memory traffic • Improve latency • VT-c .... • Hard to determine which CPU/chipset/NIC/firmware supports the feature • First platform: Xeon 5100, 7300 • Now, Intel Data Direct I/O Technology.. • TPH (TLP Processing Hints) • PCI Express 2.1 Protocol Extensions • Steering Tags (8 bits)Thursday, October 11, 2012
  6. 6. recap: DMA [Device Write] Cache Maintain Cache Coherent! ⑥ M→E→I→E ② ③ Memory Controller Memory ① Memory write by NIC ④ ② Snoop system cache PCIe RC ③ If Modified state, Writeback by CPU (Transits to Exclusive state) ④ Device Write to memory ① (Transits to Invalid state) ⑤ ⑤ Interrupt NIC ⑥ Software reads DMA data (Transits to Exclusive state)Thursday, October 11, 2012
  7. 7. DCA [Device Write] Cache M→E→M ② ③ Memory Controller Memory ① Memory write by NIC ② Snoop system cache PCIe RC ④ ③ If Modified state, Writeback by CPU (Transits to Exclusive state) ④ Device Write to cache ① (Transits to Modified state) ⑤ ⑤ Software reads DMA data NIC Keeps Modified state as much as possibleThursday, October 11, 2012
  8. 8. MESI protocol M E S IThursday, October 11, 2012
  9. 9. Workload for DCA • Possibility that cache line modified by DCA is evicted by writeback before it is read • Depends on workloadsThursday, October 11, 2012
  10. 10. Virtualization for Networking VM VM Virtual Driver Virtual Driver Virtual HW Virtual HW Virtual I/F Virtual I/F Forward Ethernet frame Resource reservation Virtual Switch Header inspection Physical Driver Physical NICThursday, October 11, 2012
  11. 11. Virtualization for Networking VMDq VM VM Virtual Driver Virtual Driver Virtual HW Virtual HW Virtual I/F Virtual I/F Forward Ethernet frame Resource reservation Virtual Switch Header inspection Physical Driver - Packet sorting Physical NIC - Moving data to VM - Routing packets to proper CPU for receiveThursday, October 11, 2012
  12. 12. Multiqueue (VMDq, RSS) • When reading the source code of ixgbe, relationship between VMDq, RSS, DCA and multi queue is not clear (for me) • VT-c, again... • Mixed terminology for feature and marketing • Let’s clarify with datasheet • Only focus on Intel 82599 (Niantic)Thursday, October 11, 2012
  13. 13. Queues in 82599 Non-Virtualization 128 Receive Queue 128 Transmit Queue 16 RSS QueuesThursday, October 11, 2012
  14. 14. Queues in 82599 Virtualization RX TX RX TX RX TX #Pools * #Queue_Pair = 128 RX TX RX TX RX TX RX TX RX TX RX TX Without RSS: RX TX RX TX RX TX 16 pools x 1 queue RX TX RX TX RX TX 32 pools x 1 queue RX TX RX TX RX TX RX TX RX TX RX TX 64 pools x 1 queue QP (Queue Pair) With RSS: Pool 32 pools x 4 RSS 128 Queue Pairs 64 pools x 2 RSS VM0 VM1 VM2 VM63 2 QPs 2 QPs 2 QPs 2 QPs Pool 0 Pool 1 Pool 2 Pool 63 VMDq L2 Sorter/Classifier SwitchThursday, October 11, 2012
  15. 15. VMDq and RSS • RSS is not supported in IOV mode (case of 82599) • Supported in VMDq mode • NetQueue in VMware ESXThursday, October 11, 2012

×