IO
syuu@openbsd.org
•
                          IO



•            Linux



•
    Vyatta           PC
Process(User)




         Process(Kernel)
socket
queue




 input   SW Intr Handler
queue




         HW Intr Handler
•   NIC

    •   NIC:1GbE→10GbE
    •   CPU:1GHz→3.2GHz       :CPU   1/10

•              CPU

    •   1CPU              →

    •
• NIC
    NIC



•
Process(User)




         Process(Kernel)
socket
queue




 input   SW Intr Handler
queue




         HW Intr Handler
Interrupt Coalescing
•
•

•
•
•
    •   NIC
              NIC



•
    •
        →NAPI Linux   http://tinyurl.com/LinuxNAPI
NAPI
                 Process(User)




                Process(Kernel)
socket
queue




                SW Intr Handler




                HW Intr Handler
•          NIC                  CPU



    →CPU

•                1Gbps
    Pentium4 2.4GHz CPU   80%

•   CPU
Process(User)




         Process(Kernel)
socket
queue




         SW Intr Handler




         HW Intr Handler
TOE
(TCP Offload Engine)
•   NIC             TCP/IP



•
    •               TOE
          OS

    •          OS            TOE


                    TOE
TOE
(TCP Offload Engine)
• Linux
• Windows                     OS
       http://bit.ly/offload

•           RDMA, iSCSI HBA
•   TCP Checksum Offload
    TCP

•   Large Segment Offload
                           64KB
        NIC MTU

•   Large Receive Offload
    LSO          NIC
Linux
•   TCP Checksum Offload
    TCP

•   Large Segment Offload
                           64KB
        NIC MTU

•   Large Receive Offload
    LSO          NIC
•         NIC
    CPU



•
CPU
         cpu0                              cpu1



                 Process(User)                     Process(User)




                Process(Kernel)                   Process(Kernel)
socket                            socket
queue                             queue




                SW Intr Handler                   SW Intr Handler




                HW Intr Handler                   HW Intr Handler
Receive Side Scaling
•

•                     CPU
          CPU

•   CPU

•               CPU
    →
Receive Side Scaling
     cpu0     cpu1     cpu2      cpu3




    RX       RX       RX       RX
   Queue    Queue    Queue    Queue
    #0       #1       #2       #3


                               hash     queue
                                ■         0
                                ■         1




                                          NIC
Receive Side Scaling
•   Microsoft Scalable Network Initiative
              http://bit.ly/ReceiveSideScaling

•   Windows Linux

•
    •   PCI         MSI-X

    •   NIC       RSS
RPS(Linux)
•   RSS         NIC



•         RSS

•                           CPU



•   CPU               CPU

•   RSS
cpu0   cpu1             cpu2      cpu3




                               socket
socket
                               queue
queue




hash     queue                 backlog
  ■        0                     #1
  ■        1
                                         backlog
                                           #2


                                                   backlog
                                                     #3
RFS(Linux)

•           CPU
                  RPS



•
RPS
CPU
•   Intel     http://bit.ly/IOATJ

•           NIC CPU                 OS



•                        CPU             I/O




•            CPU
TOE
•   TOE



•         CPU
    CPU
    →TCP/IP

•               TCP/IP   CPU
                  TOE
    →
Intel I/O Acceleration
      Technology
• Intel QuickData Technology
• Direct Cache Access
• Receive Side Scaling
• Large Receive Offload
• Low Latency Interrupts
Intel QuickData Technology

 • NIC     →
               DMA

 • CPU
 •             OS
Intel QuickData Technology
                 Process(User)




                Process(Kernel)
      socket
      queue




                SW Intr Handler




                HW Intr Handler
Direct Cache Access
•   NIC     DMA
    CPU



•   NIC

•           prefetch
DCA
              CPU
               Cache             Fetch
Snoop invalidate    Writeback

            Memory
                                Memory
           Controller
                          Memory Wirte
     DMA Write

          I/O Device
DCA
              CPU
               Cache            HW Prefetch
Snoop invalidate    Writeback
+hint
            Memory
                                  Memory
           Controller
                          Memory Wirte
     DMA Write

          I/O Device
• Intel VT-c
 • SR-IOV
    •               OS   NIC

 • VMDq
    • VM       IO
VM1         VM2




      Hypervisor
Intel VT-d
PCI Passthrough
      VM1          VM2




             Hypervisor
SR-IOV
 VM1         VM2




       Hypervisor
VMDq
      VM1               VM2




                  Hypervisor




RX1         RX2
RX1         RX2
RX1




        RX1
        RX2
        RX1
        RX1
        RX2
•

•

•

•   Intel

イマドキなNetwork/IO

  • 1.
  • 2.
    IO • Linux • Vyatta PC
  • 3.
    Process(User) Process(Kernel) socket queue input SW Intr Handler queue HW Intr Handler
  • 4.
    NIC • NIC:1GbE→10GbE • CPU:1GHz→3.2GHz :CPU 1/10 • CPU • 1CPU → •
  • 5.
    • NIC NIC •
  • 6.
    Process(User) Process(Kernel) socket queue input SW Intr Handler queue HW Intr Handler
  • 7.
  • 8.
    • NIC NIC • • →NAPI Linux http://tinyurl.com/LinuxNAPI
  • 9.
    NAPI Process(User) Process(Kernel) socket queue SW Intr Handler HW Intr Handler
  • 10.
    NIC CPU →CPU • 1Gbps Pentium4 2.4GHz CPU 80% • CPU
  • 11.
    Process(User) Process(Kernel) socket queue SW Intr Handler HW Intr Handler
  • 12.
    TOE (TCP Offload Engine) • NIC TCP/IP • • TOE OS • OS TOE TOE
  • 13.
    TOE (TCP Offload Engine) •Linux • Windows OS http://bit.ly/offload • RDMA, iSCSI HBA
  • 14.
    TCP Checksum Offload TCP • Large Segment Offload 64KB NIC MTU • Large Receive Offload LSO NIC
  • 15.
    Linux • TCP Checksum Offload TCP • Large Segment Offload 64KB NIC MTU • Large Receive Offload LSO NIC
  • 16.
    NIC CPU •
  • 17.
    CPU cpu0 cpu1 Process(User) Process(User) Process(Kernel) Process(Kernel) socket socket queue queue SW Intr Handler SW Intr Handler HW Intr Handler HW Intr Handler
  • 18.
    Receive Side Scaling • • CPU CPU • CPU • CPU →
  • 19.
    Receive Side Scaling cpu0 cpu1 cpu2 cpu3 RX RX RX RX Queue Queue Queue Queue #0 #1 #2 #3 hash queue ■ 0 ■ 1 NIC
  • 20.
    Receive Side Scaling • Microsoft Scalable Network Initiative http://bit.ly/ReceiveSideScaling • Windows Linux • • PCI MSI-X • NIC RSS
  • 21.
    RPS(Linux) • RSS NIC • RSS • CPU • CPU CPU • RSS
  • 22.
    cpu0 cpu1 cpu2 cpu3 socket socket queue queue hash queue backlog ■ 0 #1 ■ 1 backlog #2 backlog #3
  • 23.
    RFS(Linux) • CPU RPS •
  • 24.
  • 25.
  • 26.
    Intel http://bit.ly/IOATJ • NIC CPU OS • CPU I/O • CPU
  • 27.
    TOE • TOE • CPU CPU →TCP/IP • TCP/IP CPU TOE →
  • 28.
    Intel I/O Acceleration Technology • Intel QuickData Technology • Direct Cache Access • Receive Side Scaling • Large Receive Offload • Low Latency Interrupts
  • 29.
    Intel QuickData Technology • NIC → DMA • CPU • OS
  • 30.
    Intel QuickData Technology Process(User) Process(Kernel) socket queue SW Intr Handler HW Intr Handler
  • 31.
    Direct Cache Access • NIC DMA CPU • NIC • prefetch
  • 32.
    DCA CPU Cache Fetch Snoop invalidate Writeback Memory Memory Controller Memory Wirte DMA Write I/O Device
  • 33.
    DCA CPU Cache HW Prefetch Snoop invalidate Writeback +hint Memory Memory Controller Memory Wirte DMA Write I/O Device
  • 34.
    • Intel VT-c • SR-IOV • OS NIC • VMDq • VM IO
  • 35.
    VM1 VM2 Hypervisor
  • 36.
    Intel VT-d PCI Passthrough VM1 VM2 Hypervisor
  • 37.
    SR-IOV VM1 VM2 Hypervisor
  • 38.
    VMDq VM1 VM2 Hypervisor RX1 RX2 RX1 RX2 RX1 RX1 RX2 RX1 RX1 RX2
  • 39.