SlideShare a Scribd company logo
1 of 75
Download to read offline
I/O




2012   8   24     @
VM
     VM   OS




– 

– 
               2
I/O
•  I/O
     – 
          •  DB    HPC
     – 
          • 
• 
     –  I/O                    PCI
        SR-IOV …
• 
     – 
     –                   I/O
                                     3
PCI pass-
VM    virtio, vhost                    SR-IOV
                      through



VMM
      Open vSwitch



         VT-d



                         VM: Virtual Machine
                         VMM: Virtual Machine Monitor
                         SR-IOV: Single Root-I/O Virtualization
                                                              4
•  I/O
   –           virtio vhost
   –  PCI
   –  SR-IOV
•  QEMU/KVM
• 
   – 




                              5
6
•                 CPU        I/O

             OS
     –  OS
        • 




                        OS




                                   7
•               VM
     –  VM                 I/F OS
•  VM                 1960
     –  1972      IBM VM/370
     –  1973 ACM workshop on virtual computer systems

               OS            OS         OS

               VM            VM         VM

                                  VMM



                                                        8
Intel
• 
     –                              VMWare 1999
          • 
                                           Popek Goldberg
     –             Xen 2003               VMM
          •               OS


• 
     –  Intel VT AMD-V (2006)
     –                       !
     – 
          •  KVM (2006) BitVisor (2009) BHyVe (2011)

                                                            9
Intel VT (Virtualization Technology)
•  CPU
   –  IA32 Intel 64    VT-x
   –  Itanium     VT-i
•  I/O
   –  VT-d (Virtualization Technology for Directed I/O)
   –  VT-c (Virtualization Technology for Connectivity)
         •  VMDq IOAT SR-IOV


•  AMD
                                     VMDq: Virtual Machine Device Queues
                                     IOAT: IO Acceleration Technology
                                                                      10
KVM: Kernel-based Virtual Machine
  • 
         –  Xen             ring aliasing
  •  CPU                                       QEMU
         –                                   BIOS
                VMX root mode       OS                 VMX non-root mode    OS

              proc.


                            QEMU
Ring 3
                 device           memory                  VM Entry
                emulation       management      VMCS
                                                          VM Exit



                                     KVM
Ring 0                                                         Guest OS Kernel
                        Linux Kernel
                                                                                 11
CPU
              Xen                             KVM
VM       VM (Xen DomU)          VM (QEMU process)
(Dom0)           Guest OS                         Guest OS
         Process                          Process


         VCPU                           VCPU
                                        threads


Xen Hypervisor                  Linux

                                        KVM
                    Domain                           Process
                    scheduler                        scheduler



   Physical                        Physical
   CPU                             CPU



                                                                 12
OS                   Guest
                       OS
         VA     PA             GVA    GPA
                                            GVA



                       VMM                  GPA



                                            HPA


 MMU#
                     MMU#
 (CR3)
                     (CR3)
         page                  page
H/W
                                                  13
PVM                              HVM       EPT# HVM
   Guest                       Guest                       Guest
   OS                          OS                          OS
            GVA        HPA              GVA   GPA                  GVA    GPA



                                       OS
           OS                  SPT
   VMM                         VMM                         VMM


                                        GVA   HPA                  GPA    HPA




  MMU#                       MMU#                         MMU#
  (CR3)                      (CR3)                        (CR3)
                page                   page                        page
H/W
                                                                                14
Intel Extended Page Table
               GVA

                                               TLB
         OS
                                                       page walk

CR3     GVA   GPA
                                                TLB
                                         GVA         HPA



       VMM


EPTP    GPA   HPA

                     3   Intel x64   4



               HPA                              TLB: Translation Look-aside Buffer
                                                                                 15
I/O


      16
I/O
•               IO (PIO)
• 
•  DMA (Direct Memory Access)

              I/O                          DMA
     CPU                        CPU

                                  4.EOI
                      1.DMA
IN/OUT                                3.
                                                 2.DMA
              I/O

                                                 EOI: End Of Interrupt
                                                                   17
PCI
• 
     –            INTx
           •  4
     –  MSI/MSI-x (Message Signaled Interrupt)
           •                    DMA write
•  IDT (Interrupt Description Table)                   OS
• 

                          VMM

                  MSI
     PCI          INT A                     INTx       CPU
                                IOAPIC
                                                   (Local APIC)
                                            EOI

                                                                  18
PCI
   •  PCI                       BDF                                                      .

        –  PCI
             •                                   1
             •  NIC                    1
             •  SR-IOV                             VF
$ lspci –tv	
... snip ...	                                                                 2       GbE
 -[0000:00]-+-00.0  Intel Corporation 5500 I/O Hub to ESI Port	
             +-01.0-[01]--+-00.0  Broadcom Corporation NetXtreme II BCM5716 Gigabit Ethernet	
             |            -00.1  Broadcom Corporation NetXtreme II BCM5716 Gigabit Ethernet	
             +-03.0-[05]--	
             +-07.0-[06]----00.0  Intel Corporation 82599EB 10-Gigabit SFI/SFP+ Network Connect	
             +-09.0-[03]--	
... snip ...	
                                           .

                                                                                              19
VM        I/O
•  I/O                           VM
                                       (virtio, vhost)
                                                         PCI pass-
                                                         through
                                                                     SR-IOV


     –  VMM                      VMM
                                       Open vSwitch
         •  QEMU      ne2000
            rtl8139 e1000                 VT-d


• 
     –  Xen split driver model
     –  virtio vhost
     –  VMWare VMXNET3
•                   Direct assignment VMM bypass I/O
     –  PCI
     –  SR-IOV
                                                                              20
VM               I/O
I/O
                                     PCI                               SR-IOV
VM1              VM2              VM1             VM2            VM1            VM2
 Guest OS                           Guest OS                      Guest OS
                        …                                …                             …
  Guest                             Physical                      Physical
  driver                             driver                        driver


VMM                               VMM                            VMM
            vSwitch

            Physical
             driver

NIC                                NIC                           NIC

                                                                        Switch (VEB)
                            I/O emulation      PCI passthrough     SR-IOV
      VM

                                                                                       21
Edge Virtual Bridging
                  (IEEE 802.1Qbg)
•  VM
• 

 (a) Software VEB               (b) Hardware VEB                (c) VEPA, VN-Tag
VM1        VM2                 VM1            VM2             VM1            VM2
 VNIC       VNIC                 VNIC           VNIC            VNIC            VNIC

 VMM    vSwitch                VMM                            VMM

 NIC                            NIC         switch             NIC


                                                                       switch

                    VEB: Virtual Ethernet Bridging     VEPA: Virtual Ethernet Port Aggregator
                                                                                                22
I/O
  •               OS
         – 
  •                                        VM Exits
                       VMX root mode                         VMX non-root mode
                              QEMU
Ring 3
                                   e1000              copy



              Linux Kernel/          tap                              Guest OS Kernel
              KVM
                                  vSwitch               buffer
Ring 0
                               Physical driver               e1000

                                                                                        23
virtio
  •                               VM Exits
  •                            virtio_ring
         –                    I/O

                      VMX root mode                      VMX non-root mode
                              QEMU
Ring 3
                                 virtio_net      copy



              Linux Kernel/           tap                            Guest OS Kernel
              KVM
                                    vSwitch        buffer
Ring 0
                               Physical driver          virtio_net

                                                                                       24
vhost
  •  tap                                       QEMU

  •  macvlan/macvtap
                   VMX root mode                VMX non-root mode
                         QEMU
Ring 3


         Linux Kernel/
                                vhost_net
         KVM
                                                           Guest OS Kernel
                                macvtap
                                            buffer
Ring 0
           physical driver      macvlan       virtio_net

                                                                             25
VM                         PCI pass-
                                                                                SR-IOV
  •                                               (virtio, vhost)   through



         –  VMM          DMA             VMM
                                                  Open vSwitch


         –            VMM                            VT-d


                  VMX root mode                VMX non-root mode
                          QEMU
Ring 3


          Linux Kernel/                                     Guest OS Kernel
          KVM
Ring 0                                   buffer

                                         physical driver
                                  EOI

H/W                               VT-d            DMA

                                                                                         26
VM1                 VM2
                   :         Guest OS
       VMM                                          …
VM Exit
   VMCS

                       VMM
  OS    VM Entry

                                           DMA


                              IOMMU
                       NIC



                               VMCS: Virtual Machine Control Structure
                                                                    27
Intel VT-d: I/O
•             VMM               OS

     –  I/O
     –         OS
        •  VMM

        • 
•  VT-d
     –  DMA remapping (IOMMU)
     –  Interrupt remapping
                                     VT-d
                                        Interrupt remapping
                                                              28
VT-d: DMA remapping
• 
     –      OS
     –  DMA                         NG
•  VM      DMA
     – 




                  IOMMU   MMU+EPT

           DMA


                 I/O        CPU
                                         29
ource-id” in this document. The remapping hardware may determine the source-id of a
 in implementation-specific ways. For example, some I/O bus protocols may provide the
device identity as part of each I/O transaction. In other cases (for Root-Complex
devices, for example), the source-id may be derived based on the Root-Complex internal
 tion.



                               DMA remapping page walk
ress devices, the source-id is the requester identifier in the PCI Express transaction layer
 requester identifier of a device, which is composed of its PCI Bus/Device/Function
assigned by configuration software and uniquely identifies the hardware function that
 request. Figure 3-6 illustrates the requester-id1 as defined by the PCI Express
n.

       1
                            ID
    DMA Remapping—Intel® Virtualization Technology for Directed I/O

                                                                                  •                 BDF                               ID
       5                      87              3 2         0
                                                                                              DMA Remapping—Intel® Virtualization Technology for Directed I/O
                       Bus #                 Device #         Function #

                                                                                  •          ID DMA
         Figure 3-6. Requester Identifier Format
                                                                                  •  IOTLB hardware encounters a page-table entry with either Read or Write fieldisClear
                                                                                         — If
                                                                                           address translating a Atomic Operation (AtomicOp) request, the request   bloc
                                     (Dev 31, Func 7) Context entry 255

 g sections describe the data structures for mapping I/O devices to domains.
                                                                        Figure 3-8 shows a multi-level (3-level) page-table structure with 4KB page mappings
                                                                        tables. Figure 3-9 shows a 2-level page-table structure with 2MB super pages.
   Root-Entry
                                     (Dev 0, Func 1)                                                              DMA
 try functions as the top level (Dev 0, Func 0) to map devices on a specific PCI bus to6 their
       (Bus 255) Root entry 255  structure Context entry 0                                                        3 3              3 2              22                11
domains. Each root-entry structure contains the following fields: Translation
                                               Context-entry Table  Address
                                                                                       3                          9 8              0 9              10                21                      0
                                                          for Bus N        Structures for Domain A                                                                                                 DMA with address bits
        (Bus N)    Root entry N
t flag: The present field is used by software to indicate to hardware whether the                         root-
                                                                                                            0s                                                                                    63:39 validated to be 0s
 present and initialized. Software may Clear the present field for root entries




                                                                                                                                                                             12-bits
                                                                                                                   9-bits




                                                                                                                                           9-bits




                                                                                                                                                         9-bits
onding to bus numbers that are either not present in the platform, or don’t have any
eam devices attached. If the present field of a root-entry used to process a DMA request
        (Bus 0) Root entry 0                                                                                                                                                                                       +
 the DMA request is blocked, resulting in a translation fault.                                                                                                                         << 3
                  Root-entry Table
t-entry table pointer: The context-entry table 255
                                         Context entry pointer references the context-entry
r devices on the bus identified by the root-entry. Section 3.3.3 describes context entries in
detail.                                                                                                                                             << 3
                                                                                                                                                                                        +           SP = 0

 illustrates the root-entry format. The root entries are programmed Translation the root-entry
                                                                Address through                                                                                                                                              4KB page
ocation of the root-entry table in system memory is programmed through the Root-entry
                                                              Structures for Domain B
ss register. The root-entry table is 4KB in sizeentry 0 accommodates 256 root entries to
                                           Context and
                                                                                                                        << 3                         +              SP = 0
CI bus number space (0-255). In the case of a PCI device, the bus number (upper 8-bits)
                                         Context-entry Table
a DMA transaction’s source-id field is used for Bus 0 into the root-entry structure.
                                              to index                                1
                                                                                                                                                                                                  4KB page table
                                                                                      2 6     1
llustrates how these tables are used to map devices to domains.                       7 3     2 0
                   Figure 3-7. Device to Domain Mapping Structures                        ASR                               +     SP = 0
                                                                                                     Context
                                                                                                      Entry                                                       4KB page table
                                                                                                                                                                             ,
    3.3.3               Context-Entry                                                                                                      ,
                                                                                                                                4KB page table

   A context-entry maps a specific I/O device on a bus to the domain to which it is assigned, and, 3-8. Example Multi-level Page Table
                                                                                              Figure in
   turn, to the address translation structures for the domain. The context entries are programmed
          Intel Virtualization Technology for Directed I/O
   through the memory-resident context-entry tables. Each root-entry in the root-entry table contains
   the pointer to the context-entry table for the corresponding bus number. Each context-entry table
 Express devices entries, with each entryRouting-ID Interpretation (ARI), bits traditionally bus. For a PCI
   contains 256 supporting Alternative representing a unique PCI device function on the                                                                                                                                                 30
VT-d: Interrupt remapping
•  MSI            VM

•  MSI/MSI-x
•  Interrupt remapping table (IRT)             MSI write
   request
  –      VT-d            CPU
       •  DMA write request   destination ID
  –  VT-d VMM                 IRT




                                                           31
mance for I/O Virtualization                                Exit-Less Interrupt
 mit2         Nadav Har’El1
 1
          •  “ELI: Bare-Metal Performance
          Assaf Schuster2   Dan Tsafrir2                                                          for I/O Virtualization”, A.
                   Gordon, et al., ASPLOS 2012
     2 Technion—Israel  Institute of Technology
 namit,muli,assaf,dan}@cs.technion.ac.il
                     –        OS              VM Exits                                                                     ELI (Exit-Less
                        Interrupt)
                     –  netperf Apache memcached BMM                                                         97-100%
              guest/host context switch (exits and entries) interrupt to the host
                     CPU forces an exit and delivers the                            through the
              handling costIDT.
                     host (handling physical interrupts and their completions)                                        guest          interrupt
                           Guests receive virtual interrupts, which are not necessarily related                        IDT            handler
                       to physical interrupts. The host may decide to inject guest
                                                                              the guest with a
        (a)   baseline                                                                                                   assigned
                       virtual interrupt because the host received a corresponding physical
                                physical      interrupt   interrupt                                                      interrupt
                                                                              host
                       interrupt, or the host injection completion
                               interrupt       may decide to inject the guest with a virtual
                       interrupt manufactured by the host. The host injects virtual interrupts                       shadow
                                                                              guest
              ELI      through the guest IDT. When the processor enters guest mode after      shadow                   IDT
        (b)   delivery an injection, the guest receives and handles the virtual interrupt.
                                               interrupt                      host              IDT                            VM                non-assigned
                           During interrupt completion the guest will access its LAPIC. Just
                                             handling,                                                                                           interrupt
                       like the IDT, full access to a core’s physical LAPIC implies total                   ELI                                  (exit)
              ELI                                                             guest
                       control of the core, so the host cannot easily give untrusted guests
              delivery &
                                                                                                       delivery           hypervisor
        (c)            access to the physical LAPIC. For guests using the first LAPIC          x2APIC
              completion                                                      host                                                                 Non present
                       generation, the processor forces an exit when the guest accesses
        (d)   bare-metal LAPIC memory area. For guests using x2APIC, the host traps
                       the                                                                                                 physical
                       LAPIC accesses through an MSR bitmap. When running a guest,                                         interrupt
                       the host provides the CPU with a bitmap specifying whichtime    benign
                       MSRs the guest is allowed to access directly and which sensitive
                          Figure 1. Exits during interrupt handling
                       MSRs must not be accessed by the guest directly. When the guest                            Figure 2. ELI interrupt delivery flow
                       accesses sensitive MSRs, execution exits back to the host. In general,                                                                    32
PCI-SIG IO Virtualization
•  I/O                       PCIe Gen2
     –  SR-IOV (Single Root-I/O Virtualization)
         •                   VM
         •  NIC
     –  MR-IOV (Multi Root-I/O Virtualization)
         • 
         • 
         •  NEC   ExpEther
•            VMM      SR-IOV
     –  KVM Xen VMWare Hyper-V
     – 
        Linux VFIO
                                                  33
SR-IOV NIC
 •  1                  NIC                          NIC vNIC
           VM
     –  vNIC = VF (Virtual Function)

                VM1                 VM2              VM3
                vNIC                 vNIC             vNIC

                                     VMM

           RX          TX
Virtual
Function

                             L2 Classified Sorter

                                  MAC/PHY



                                                               34
SR-IOV NIC
•  Physical Function (PF)
   –  VMM
•  Virtual Function (VF)
   –  VM                             OS       VF
   –  PF                                            PF
   –                                                82576 8                   256
            VM
                  Guest OS           VM Device                System Device
                                    Config Space               Config Space
                   VF driver
                                       VFn0                       PFn0
                  Virtual NIC
                                                                  VFn0
            VMM                         PF driver
                                                                  VFn1
                               Physical NIC                       VFn2
                                                                    :
                                                                                    35
1.                              + tap                     VM
                                                                      (virtio, vhost)
                                                                                         PCI pass-
                                                                                         through
                                                                                                     SR-IOV


      – 
                                                          VMM
                                                                      Open vSwitch
      – 
                •                                                           VT-d
                •  Open vSwitch

2.  MAC                     tap                : macvlan/macvtap
      – 

                   VM1              VM2                        VM1                 VM2
           1.                                        2.
                         eth0           eth0                         eth0               eth0

                   VMM                                     VMM
                         tap0           tap1                         tap0               tap1


                                                                macvlan0              macvlan1
                                 eth0                                          eth0

                                                                                                              36
Open vSwitch
• 
     –  Linux

          • 
          •  OvS
     –          OpenFlow

     – 
          •  Linux kernel 3.3
          • 
                Pica8 Pronto

                                     http://openvswitch.org/
                                                               37
VM

   VM1               VM2
                                VM                  VLAN
                                •         OS     VLAN
      eth0               eth0
                                •  1 VM               VLAN ID
   VMM

         tap0           tap1     #   ovs-vsctl   add-br br0	
                                 #   ovs-vsctl   add-port br0 tap0 tag=101	
                                 #   ovs-vsctl   add-port br0 tap1 tag=102	
             vSwitch (br0)
                                 #   ovs-vsctl   add-port br0 eth0	

VLAN ID 101
                 eth0
VLAN ID 102                                                 VLAN

                                      tap0 <-> br0_101 <-> eth0.101

                                                                              38
QoS
•           Linux          Qdisc
•  ingress policing egress shaping
  VM1              VM2
                              ingress policing
                               # ovs-vsctl set Interface tap0 
     eth0              eth0    ingress_policing_rate=10000	
                               # ovs-vsctl set Interface tap0 
 VMM
                               ingress_policing_burst=1000	

        tap0          tap1                           : 10Mbps
ingress policing                                                : 10MB
           vSwitch (br0)
egress shaping


               eth0

                                                                         39
QoS
•           Linux          Qdisc
•  ingress policing egress shaping
  VM1              VM2
                              egress shaping

                              # ovs-vsctl -- set port eth0 qos=@newqos 	
     eth0              eth0
                                -- --id=@newqos create qos type=linux-htb
                              other-config:max-rate=40000000
                              queues=0=@q0,1=@q1 	
 VMM
                                -- --id=@q0 create queue other-config:min-
        tap0          tap1    rate=10000000 other-config:max-rate=10000000 	
                                -- --id=@q1 create queue other-config:min-
ingress policing              rate=20000000 other-config:max-rate=20000000	
           vSwitch (br0)      	
egress shaping                # ovs-ofctl add-flow br0 “in_port=3
                              idle_timeout=0 actions=enqueue1:1	

               eth0
                                                       HTB HFSC
                                                                           40
QEMU/KVM


           41
•  Linux
•  QEMU/KVM
  –  QEMU          PCI
  –  libvirt Virt-manager
     →
•  Open vSwitch 1.6.1
•  PCI          & SR-IOV
  " Intel Gigabit ET dual port server adapter [SR-IOV ]
  " Intel Ethernet Converged Network Adapter X520-LR1 [SR-IOV   ]
  " Mellanox ConnectX-2 QDR Infiniband HCA
     Broadcom on board GbE NIC (BMC5709)
     Brocade BR1741M-k 10 Gigabit Converged HCA
                                                                42
QEMU/KVM
                                                VM
                                      CPU      2 (CPU model host)
#!/bin/sh	
                                       Memory   2GB
sudo /usr/bin/kvm 	
                                       Network  virtio_net
   	-cpu host 	
   	-smp 2 	                          Storage  virtio_blk
   	-m 2000 	
   	-net nic,model=virtio,macaddr=00:16:3e:1d:ff:01 	
   	-net tap,ifname=tap0,script=/etc/ovs-ifup,downscript=/etc/ovs-
ifdown 	
   	-monitor telnet::5963,server,nowait 	
   	-serial telnet::5964,server,nowait 	
   	-daemonize 	
   	-nographic 	
   	-drive file=/work/kvm/vm01.img,if=virtio 	
   	$@	



                                                                     43
QEMU/KVM
$ cat /etc/ovs-ifup	
#!/bin/sh	
switch='br0'	
/sbin/ip link set mtu 9000 dev $1 up	
/opt/bin/ovs-vsctl add-port ${switch} $1	
	
$ cat /etc/ovs-ifdown	
#!/bin/sh	
switch='br0'	
/sbin/ip link set $1 down	
/opt/bin/ovs-vsctl del-port ${switch} $1	



    QEMU/KVM                              tap

                              ovs-vsctl         brctl

                                                        44
PCI
1.  BIOS Intel VT      VT-d
2.  Linux         VT-d
   –                             intel_iommu=on
3.  PCI
4.      OS
5.      OS


                        “How to assign devices with VT-d in KVM,”
http://www.linux-kvm.org/page/How_to_assign_devices_with_VT-
d_in_KVM

                                                                    45
PCI
•        PCI               BDF                  ID              ID
•         OS
               pci_stub
     # echo "8086 10fb" > /sys/bus/pci/drivers/pci-stub/new_id	
     # echo "0000:06:00.0" > /sys/bus/pci/devices/0000:06:00.0/driver/unbind
     # echo "0000:06:00.0" > /sys/bus/pci/drivers/pci-stub/bind	




•                                  QEMU
     –  -device pci-assign,host=06:00.0	
•                                  QEMU
     –  device_add pci-assign,host=06:00.0,id=vf0	
     –  device_del vf0


                                                                               46
SR-IOV VF
                                                •  VF
     # modprobe –r ixgbe	                                            max_vfs VF
     # modprobe ixgbe max_vfs=8	                •  OS                VF       PCI

$ lspci –tv	
... snip ...	
 -[0000:00]-+-00.0  Intel Corporation 5500 I/O Hub to ESI Port	
             +-01.0-[01]--+-00.0  Broadcom Corporation NetXtreme II BCM5716 Gigabit Ethernet	
                                        Physical Function (PF)
             |            -00.1  Broadcom Corporation NetXtreme II BCM5716 Gigabit Ethernet	
             +-03.0-[05]--	
             +-07.0-[06]----00.0  Intel Corporation 82599EB 10-Gigabit SFI/SFP+ Network Connect	
             |            +-10.0  Intel Corporation 82599 Ethernet Controller Virtual Function	
             |            +-10.2  Intel Corporation 82599 Ethernet Controller Virtual Function	
             |            +-10.4  Intel Corporation 82599 Ethernet Controller Virtual Function	
                                          Virtual Function (VF)
             |            +-10.6  Intel Corporation 82599 Ethernet Controller Virtual Function	
             |            +-11.0  Intel Corporation 82599 Ethernet Controller Virtual Function	
             |            +-11.2  Intel Corporation 82599 Ethernet Controller Virtual Function	
             |            +-11.4  Intel Corporation 82599 Ethernet Controller Virtual Function	
             |            -11.6  Intel Corporation 82599 Ethernet Controller Virtual Function	
             +-09.0-[03]--	
... snip ...

                                                                                              47
SR-IOV
•  PCI
•      OS
                          pci_stub
     # echo "8086 10fb" > /sys/bus/pci/drivers/pci-stub/new_id	
     # echo "0000:06:10.0" > /sys/bus/pci/devices/0000:06:10.0/driver/unbind
     # echo "0000:06:10.0" > /sys/bus/pci/drivers/pci-stub/bind	



•                                       QEMU
     –  -device pci-assign,host=06:10.0	
•                                       QEMU
     –  device_add pci-assign,host=06:10.0,id=vf0	
     –  device_del vf0
                                                                               48
SR-IOV                            OS
•         OS         VF                                            PCI

                 $ cat /proc/interrupts 	
                             CPU0      CPU1       	
                 ...snip...	
                  29:     114941     114133   PCI-MSI-edge      eth1-rx-0	
$ lspci	
                  30:      77616      78385   PCI-MSI-edge      eth1-tx-0	
00:00.0 Host bridge: Intel Corporation 440FX - 82441FX PMC [Natoma] (rev 02)	
                  31:           5         5   PCI-MSI-edge      eth1:mbx	
00:01.0 ISA bridge: Intel Corporation 82371SB PIIX3 ISA [Natoma/Triton II]	
00:01.1 IDE interface: Intel Corporation 82371SB PIIX3 IDE [Natoma/Triton II]	
00:01.3 Bridge: Intel Corporation 82371AB/EB/MB PIIX4 ACPI (rev 03)	
00:02.0 VGA compatible controller: Cirrus Logic GD 5446	
00:03.0 Ethernet controller: Red Hat, Inc Virtio network device	
00:04.0 SCSI storage controller: Red Hat, Inc Virtio block device	
00:05.0 Ethernet controller: Intel Corporation 82599 Ethernet Controller
Virtual Function (rev 01)	


                                                                             49
SR-IOV
•  VF                                  NIC
•  VF                                        OS
 # ip link set dev eth5 vf 0 rate 200	
 # ip link set dev eth5 vf 1 rate 400	
 # ip link show dev eth5	
 42: eth5: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq
 state UP mode DEFAULT qlen 1000	
     link/ether 00:1b:21:81:55:3e brd ff:ff:ff:ff:ff:ff	
     vf 0 MAC 00:16:3e:1d:ee:01, tx rate 200 (Mbps), spoof
 checking on	
     vf 1 MAC 00:16:3e:1d:ee:02, tx rate 400 (Mbps), spoof
 checking on	
 	
                               OS          2010-OS-117 13
                         OS
                                                                 50
SR-IOV                              TIPS
•  VF MAC
   # ip link set dev eth5 vf 0 00:16:3e:1d:ee:01 	


•  VF VLAN ID
   # ip link set dev eth5 vf 0 vlan 101	



•  Intel 82576 GbE      82599 X540 10GbE
                       NIC
  – 
  –  http://www.intel.com/content/www/us/en/ethernet-
     controllers/ethernet-controllers.html
                                                        51
VM
•      VM                            NG
•  PCI                     Bonding
  –                          PCI              NIC

  –                                    NIC
              virtio               NIC active-standby
       bonding
  –                                               S

•  SR-IOV NIC               VF                    virio   PV
                       1     NIC
                                                               52
SR-IOV:
 GesutOS

          bond0
   eth0       eth1
  (virtio)   (igbvf)


   tap0           Host OS           Host OS
                            tap0

    br0                     br0
   eth0                     eth0
   (igb)                    (igb)

    SR-IOV NIC               SR-IOV NIC



                                              53
SR-IOV:
 GesutOS
                       (qemu) device_del vf0
          bond0
   eth0       eth1
  (virtio)   (igbvf)


   tap0           Host OS                         Host OS
                                          tap0

    br0                                   br0
   eth0                                   eth0
   (igb)                                  (igb)

    SR-IOV NIC                             SR-IOV NIC



                                                            54
SR-IOV:
                    (qemu) migrate -d tcp:x.x.x.x:y
 GesutOS                               GesutOS

          bond0
   eth0
  (virtio)
                   $ qemu -incoming tcp:0:y ...

   tap0           Host OS                         Host OS
                                          tap0

    br0                                   br0
   eth0                                   eth0
   (igb)                                  (igb)

    SR-IOV NIC                             SR-IOV NIC



                                                            55
SR-IOV
(qemu) device_add pci-assign,
host=05:10.0,id=vf0             GesutOS

                                         bond0
                                  eth0        eth1
                                 (virtio)    (igbvf)


   tap0    Host OS                               Host OS
                                  tap0

    br0                            br0
   eth0                           eth0
   (igb)                          (igb)

    SR-IOV NIC                     SR-IOV NIC



                                                           56
MPI
Guest OS
              rank 1
                             →
         bond0
   eth0        eth1
  (virtio)    (igbvf)


   tap0      192.168.0.1   tap0    192.168.0.2             192.168.0.3

   br0                                           rank 0
                           br0
   eth0                    eth0
   (igb)                   (igb)

    SR-IOV NIC              SR-IOV NIC                    NIC

                                                     192.168.0.0/24
                                                                         57
SymVirt
•         VM


     –     Infiniband
•               OS VMM                                               SymVirt
     (Symbiotic Virtualization)
     –  PCI
                                  Cloud scheduler                          Cloud scheduler
     –  VM                             allocation                               re-allocation

•                                                Failu
                                                       re!!
                                                               Failure
                                                              prediction
     –  SymCR:
                                                                           VM migration
     –  SymPFT:
                                    global storage                           global storage
                                     (VM images)                              (VM images)


                                                                                                58
SymVirt
•  SymVirt coordinator
   –           OS      MPI
        •                        global consistency
                           !VM

•  SymVirt controller/agent
   – 

             Application
                                    confirm                          confirm linkup
                                               SymVirt coordinator
                 SymVirt    SymVirt
                  wait       signal                                         Guest OS mode
                                                                                VMM mode
                           detach             migration        re-attach

                           SymVirt controller/agent

                 R. Takano, et al., “Cooperative VM Migration for a Virtualized HPC Cluster
                 with VMM-Bypass I/O devices”, 8th IEEE e-Science 2012 (             )
                                                                                              59
HPC


      60
•  AIST Super Cluster 2004    TOP500 #19


•  AIST Green Cloud 2010       AIST Super Cloud 2011

                       1/10
                 1~2
   –  HPCI EC2
      !
•  IT



                                                       61
• 

• 
           ←
• 
                          DB     HPC




               TOP3 IDC   2011
     1. 
     2. 
     3. 
                                       62
e.g., ASC


            63
AIST Green Cloud
 AGC                   1                          16
 HPC
   Compute node Dell PowerEdge M610                     Host machine environment
CPU       Intel quad-core Xeon E5540/2.53GHz x2        OS             Debian 6.0.1

Chipset   Intel 5520                                   Linux kernel   2.6.32-5-amd64

Memory    48 GB DDR3                                   KVM            0.12.50

InfiniBand Mellanox ConnectX (MT26428)                 Compiler       gcc/gfortran 4.4.5
                                                       MPI            Open MPI 1.4.2
                Blade switch                                    VM environment
InfiniBand Mellanox M3601Q (QDR 16 ports)              VCPU       8
                                                       Memory     45 GB
                               1                   1 VM
                                                                                       64
MPI Point-to-Point
                     10000
                                 (higher is better)                       2.4 GB/s           qperf
                                                                                             3.2 GB/s
                      1000
Bandwidth [MB/sec]




                       100



                                            PCI                                KVM
                        10
                                                                       Bare Metal
                                                                  Bare Metal
                                                                       KVM
                         1
                             1     10    100    1k  10k 100k 1M         10M 100M     1G
                                                 Message size [byte]           Bare Metal:
                                                                                                        65
NPB BT-MZ:
                                                                             (higher is better)
                                300                                                               100
    Performance [Gop/s total]




                                250     Degradation of PE:                                        80




                                                                                                       Parallel efficiency [%]
                                          KVM: 2%, EC2 CCI: 14%
                                200
                                       Bare Metal                                                 60
                                150    KVM
                                       Amazon EC2
                                                                                                  40
                                100    Bare Metal (PE)
                                       KVM (PE)
                                                                                                  20
                                 50    Amazon EC2 (PE)


                                  0                                                               0
                                        1           2          4         8            16
  EC2 Cluster compute                                    Number of nodes
instances (CCI)
                                                                                                                                 66
Bloss:
                                                                               Rank 0                   Rank 0    N
                                    MPI       OpenMP                           Bcast
                                                                                          760 MB
                                                                                                        Liner Solver
                                                                                                     (require 10GB mem.
                                                                               Reduce
                                                                                            1 GB
                                                                                    coarse-grained MPI comm.
                                            Parallel Efficiency                             1 GB
                          120                                                  Bcast                Eigenvector calc.
                                    (higher is better)                         Gather
                          100                                                               350 MB
Parallel Efficiency [%]




                           80


                           60

                                    Degradation of PE:
                           40
                                      KVM: 8%, EC2 CCI: 22%
                           20                                     Bare Metal
                                                                       KVM
                                                                 Amazon EC2
                                                                       Ideal
                            0
                                1            2           4            8        16
                                                   Number of nodes                                                        67
VMWare ESXi
•  Dell PowerEdge T410
     –  CPU Intel Hexa-core Xeon X5650, single socket
     –       6GB DDR3-1333
     –  HBA: QLogic QLE2460 (single-port 4Gbps Fibre Channel)
•  IBM DS3400 FC SAN
•  VMM: VMWare ESXi 5.0                         T410    Fibre DS3400
                                                       Channel
•  OS Windows server 2008 R2
• 
                                                       Ethernet
     –  8 vCPU                                         (out-of-band   )
     –  3840 MB
• 
     –  IOMeter 2006.07.27 (http://www.iometer.org/)
Bare Metal Machine        Raw Device Mapping         VMDirectPath I/O
      (BMM)                    (RDM)                      (FPT)
                           VM                        VM

 Windows                    Windows                   Windows

        NTFS                         NTFS                      NTFS
   Volume manager            Volume manager               Volume manager
   Disk class driver            Disk class driver         Disk class driver
 Storport/FC HBA driver    Storport/SCSI driver       Storport/FC HBA driver



                          VMKernel                  VMKernel

                                 FC HBA driver




         LUN                         LUN                         LUN
12   OS
ESXi
•  FC SAN                PCI
        RDM
  –  VMM     SCSI                  FC

  –  RDM            PCI                         HBA

       •    OS   Linux         Windows   Linux ESXi


•  BMM
  – 
•  PCI
   HPC
     –         "InfiniBand PCI                    HPC
          ", SACSIS2011, pp.109-116, 2011 5   .
     –         “HPC                                     ”,
                   ACS37 , 2012 5 .

•                           PCI
•              VM          SR-IOV
     – 
• 

     –  VM
          •                             VM
                                                             72
•  HPC
  – 




         73
Yabusame
•  QEMU/KVM

  – 

  –  http://grivon.apgrid.org/quick-kvm-migration




                                                    74
•  I/O
•  I/O

   –  I/O
   –                       virtio vhost
   –               : PCI             SR-IOV
•  VMM

  !
   –        SymVirt BitVisor

                                              75

More Related Content

What's hot

UEFI時代のブートローダ
UEFI時代のブートローダUEFI時代のブートローダ
UEFI時代のブートローダTakuya ASADA
 
KVM環境におけるネットワーク速度ベンチマーク
KVM環境におけるネットワーク速度ベンチマークKVM環境におけるネットワーク速度ベンチマーク
KVM環境におけるネットワーク速度ベンチマークVirtualTech Japan Inc.
 
GPU仮想化最前線 - KVMGTとvirtio-gpu -
GPU仮想化最前線 - KVMGTとvirtio-gpu -GPU仮想化最前線 - KVMGTとvirtio-gpu -
GPU仮想化最前線 - KVMGTとvirtio-gpu -zgock
 
マルチコアとネットワークスタックの高速化技法
マルチコアとネットワークスタックの高速化技法マルチコアとネットワークスタックの高速化技法
マルチコアとネットワークスタックの高速化技法Takuya ASADA
 
OSC2011 Tokyo/Fall 濃いバナ(virtio)
OSC2011 Tokyo/Fall 濃いバナ(virtio)OSC2011 Tokyo/Fall 濃いバナ(virtio)
OSC2011 Tokyo/Fall 濃いバナ(virtio)Takeshi HASEGAWA
 
#ljstudy KVM勉強会
#ljstudy KVM勉強会#ljstudy KVM勉強会
#ljstudy KVM勉強会Etsuji Nakai
 
OVN 設定サンプル | OVN config example 2015/12/27
OVN 設定サンプル | OVN config example 2015/12/27OVN 設定サンプル | OVN config example 2015/12/27
OVN 設定サンプル | OVN config example 2015/12/27Kentaro Ebisawa
 
ML2/OVN アーキテクチャ概観
ML2/OVN アーキテクチャ概観ML2/OVN アーキテクチャ概観
ML2/OVN アーキテクチャ概観Yamato Tanaka
 
仮想マシンにおけるメモリ管理
仮想マシンにおけるメモリ管理仮想マシンにおけるメモリ管理
仮想マシンにおけるメモリ管理Akari Asai
 
Kubernetes雑にまとめてみた 2020年8月版
Kubernetes雑にまとめてみた 2020年8月版Kubernetes雑にまとめてみた 2020年8月版
Kubernetes雑にまとめてみた 2020年8月版VirtualTech Japan Inc.
 
10分で分かるLinuxブロックレイヤ
10分で分かるLinuxブロックレイヤ10分で分かるLinuxブロックレイヤ
10分で分かるLinuxブロックレイヤTakashi Hoshino
 
Vyatta and Virtualization 仮想環境でのVyatta
Vyatta and Virtualization 仮想環境でのVyattaVyatta and Virtualization 仮想環境でのVyatta
Vyatta and Virtualization 仮想環境でのVyattaKentaro Ebisawa
 
20111015 勉強会 (PCIe / SR-IOV)
20111015 勉強会 (PCIe / SR-IOV)20111015 勉強会 (PCIe / SR-IOV)
20111015 勉強会 (PCIe / SR-IOV)Kentaro Ebisawa
 
知っているようで知らないNeutron -仮想ルータの冗長と分散- - OpenStack最新情報セミナー 2016年3月
知っているようで知らないNeutron -仮想ルータの冗長と分散- - OpenStack最新情報セミナー 2016年3月 知っているようで知らないNeutron -仮想ルータの冗長と分散- - OpenStack最新情報セミナー 2016年3月
知っているようで知らないNeutron -仮想ルータの冗長と分散- - OpenStack最新情報セミナー 2016年3月 VirtualTech Japan Inc.
 
10分でわかる Cilium と XDP / BPF
10分でわかる Cilium と XDP / BPF10分でわかる Cilium と XDP / BPF
10分でわかる Cilium と XDP / BPFShuji Yamada
 
containerdの概要と最近の機能
containerdの概要と最近の機能containerdの概要と最近の機能
containerdの概要と最近の機能Kohei Tokunaga
 
NEDIA_SNIA_CXL_講演資料.pdf
NEDIA_SNIA_CXL_講演資料.pdfNEDIA_SNIA_CXL_講演資料.pdf
NEDIA_SNIA_CXL_講演資料.pdfYasunori Goto
 
ARM Trusted FirmwareのBL31を単体で使う!
ARM Trusted FirmwareのBL31を単体で使う!ARM Trusted FirmwareのBL31を単体で使う!
ARM Trusted FirmwareのBL31を単体で使う!Mr. Vengineer
 

What's hot (20)

UEFI時代のブートローダ
UEFI時代のブートローダUEFI時代のブートローダ
UEFI時代のブートローダ
 
KVM環境におけるネットワーク速度ベンチマーク
KVM環境におけるネットワーク速度ベンチマークKVM環境におけるネットワーク速度ベンチマーク
KVM環境におけるネットワーク速度ベンチマーク
 
GPU仮想化最前線 - KVMGTとvirtio-gpu -
GPU仮想化最前線 - KVMGTとvirtio-gpu -GPU仮想化最前線 - KVMGTとvirtio-gpu -
GPU仮想化最前線 - KVMGTとvirtio-gpu -
 
マルチコアとネットワークスタックの高速化技法
マルチコアとネットワークスタックの高速化技法マルチコアとネットワークスタックの高速化技法
マルチコアとネットワークスタックの高速化技法
 
OSC2011 Tokyo/Fall 濃いバナ(virtio)
OSC2011 Tokyo/Fall 濃いバナ(virtio)OSC2011 Tokyo/Fall 濃いバナ(virtio)
OSC2011 Tokyo/Fall 濃いバナ(virtio)
 
#ljstudy KVM勉強会
#ljstudy KVM勉強会#ljstudy KVM勉強会
#ljstudy KVM勉強会
 
OVN 設定サンプル | OVN config example 2015/12/27
OVN 設定サンプル | OVN config example 2015/12/27OVN 設定サンプル | OVN config example 2015/12/27
OVN 設定サンプル | OVN config example 2015/12/27
 
ML2/OVN アーキテクチャ概観
ML2/OVN アーキテクチャ概観ML2/OVN アーキテクチャ概観
ML2/OVN アーキテクチャ概観
 
仮想マシンにおけるメモリ管理
仮想マシンにおけるメモリ管理仮想マシンにおけるメモリ管理
仮想マシンにおけるメモリ管理
 
Kubernetes雑にまとめてみた 2020年8月版
Kubernetes雑にまとめてみた 2020年8月版Kubernetes雑にまとめてみた 2020年8月版
Kubernetes雑にまとめてみた 2020年8月版
 
10分で分かるLinuxブロックレイヤ
10分で分かるLinuxブロックレイヤ10分で分かるLinuxブロックレイヤ
10分で分かるLinuxブロックレイヤ
 
レシピの作り方入門
レシピの作り方入門レシピの作り方入門
レシピの作り方入門
 
Vyatta and Virtualization 仮想環境でのVyatta
Vyatta and Virtualization 仮想環境でのVyattaVyatta and Virtualization 仮想環境でのVyatta
Vyatta and Virtualization 仮想環境でのVyatta
 
SSH力をつけよう
SSH力をつけようSSH力をつけよう
SSH力をつけよう
 
20111015 勉強会 (PCIe / SR-IOV)
20111015 勉強会 (PCIe / SR-IOV)20111015 勉強会 (PCIe / SR-IOV)
20111015 勉強会 (PCIe / SR-IOV)
 
知っているようで知らないNeutron -仮想ルータの冗長と分散- - OpenStack最新情報セミナー 2016年3月
知っているようで知らないNeutron -仮想ルータの冗長と分散- - OpenStack最新情報セミナー 2016年3月 知っているようで知らないNeutron -仮想ルータの冗長と分散- - OpenStack最新情報セミナー 2016年3月
知っているようで知らないNeutron -仮想ルータの冗長と分散- - OpenStack最新情報セミナー 2016年3月
 
10分でわかる Cilium と XDP / BPF
10分でわかる Cilium と XDP / BPF10分でわかる Cilium と XDP / BPF
10分でわかる Cilium と XDP / BPF
 
containerdの概要と最近の機能
containerdの概要と最近の機能containerdの概要と最近の機能
containerdの概要と最近の機能
 
NEDIA_SNIA_CXL_講演資料.pdf
NEDIA_SNIA_CXL_講演資料.pdfNEDIA_SNIA_CXL_講演資料.pdf
NEDIA_SNIA_CXL_講演資料.pdf
 
ARM Trusted FirmwareのBL31を単体で使う!
ARM Trusted FirmwareのBL31を単体で使う!ARM Trusted FirmwareのBL31を単体で使う!
ARM Trusted FirmwareのBL31を単体で使う!
 

Viewers also liked

Ethernet and TCP optimizations
Ethernet and TCP optimizationsEthernet and TCP optimizations
Ethernet and TCP optimizationsJeff Squyres
 
Zynq mp勉強会資料
Zynq mp勉強会資料Zynq mp勉強会資料
Zynq mp勉強会資料一路 川染
 
Linux KVM環境におけるGPGPU活用最新動向
Linux KVM環境におけるGPGPU活用最新動向Linux KVM環境におけるGPGPU活用最新動向
Linux KVM環境におけるGPGPU活用最新動向Taira Hajime
 
10GbE時代のネットワークI/O高速化
10GbE時代のネットワークI/O高速化10GbE時代のネットワークI/O高速化
10GbE時代のネットワークI/O高速化Takuya ASADA
 
もしCloudStackのKVMホストでPCIパススルーできるようになったら
もしCloudStackのKVMホストでPCIパススルーできるようになったらもしCloudStackのKVMホストでPCIパススルーできるようになったら
もしCloudStackのKVMホストでPCIパススルーできるようになったらTakuma Nakajima
 
あなたの知らないネットワークプログラミングの世界
あなたの知らないネットワークプログラミングの世界あなたの知らないネットワークプログラミングの世界
あなたの知らないネットワークプログラミングの世界Ryousei Takano
 
とある帽子の大蛇料理Ⅱ
とある帽子の大蛇料理Ⅱとある帽子の大蛇料理Ⅱ
とある帽子の大蛇料理ⅡMasami Ichikawa
 
100Gbpsソフトウェアルータの実現可能性に関する論文
100Gbpsソフトウェアルータの実現可能性に関する論文100Gbpsソフトウェアルータの実現可能性に関する論文
100Gbpsソフトウェアルータの実現可能性に関する論文y_uuki
 
User-space Network Processing
User-space Network ProcessingUser-space Network Processing
User-space Network ProcessingRyousei Takano
 
xv6のコンテキストスイッチを読む
xv6のコンテキストスイッチを読むxv6のコンテキストスイッチを読む
xv6のコンテキストスイッチを読むmfumi
 
デバドラを書いてみよう!
デバドラを書いてみよう!デバドラを書いてみよう!
デバドラを書いてみよう!Masami Ichikawa
 
Disruptive IP Networking with Intel DPDK on Linux
Disruptive IP Networking with Intel DPDK on LinuxDisruptive IP Networking with Intel DPDK on Linux
Disruptive IP Networking with Intel DPDK on LinuxNaoto MATSUMOTO
 
x86とコンテキストスイッチ
x86とコンテキストスイッチx86とコンテキストスイッチ
x86とコンテキストスイッチMasami Ichikawa
 
エンジニアなら知っておきたい「仮想マシン」のしくみ (BPStudy38)
エンジニアなら知っておきたい「仮想マシン」のしくみ (BPStudy38)エンジニアなら知っておきたい「仮想マシン」のしくみ (BPStudy38)
エンジニアなら知っておきたい「仮想マシン」のしくみ (BPStudy38)Takeshi HASEGAWA
 
クラウド環境におけるキャッシュメモリQoS制御の評価
クラウド環境におけるキャッシュメモリQoS制御の評価クラウド環境におけるキャッシュメモリQoS制御の評価
クラウド環境におけるキャッシュメモリQoS制御の評価Ryousei Takano
 

Viewers also liked (20)

Ethernet and TCP optimizations
Ethernet and TCP optimizationsEthernet and TCP optimizations
Ethernet and TCP optimizations
 
Userspace networking
Userspace networkingUserspace networking
Userspace networking
 
Zynq mp勉強会資料
Zynq mp勉強会資料Zynq mp勉強会資料
Zynq mp勉強会資料
 
Linux KVM環境におけるGPGPU活用最新動向
Linux KVM環境におけるGPGPU活用最新動向Linux KVM環境におけるGPGPU活用最新動向
Linux KVM環境におけるGPGPU活用最新動向
 
10GbE時代のネットワークI/O高速化
10GbE時代のネットワークI/O高速化10GbE時代のネットワークI/O高速化
10GbE時代のネットワークI/O高速化
 
もしCloudStackのKVMホストでPCIパススルーできるようになったら
もしCloudStackのKVMホストでPCIパススルーできるようになったらもしCloudStackのKVMホストでPCIパススルーできるようになったら
もしCloudStackのKVMホストでPCIパススルーできるようになったら
 
あなたの知らないネットワークプログラミングの世界
あなたの知らないネットワークプログラミングの世界あなたの知らないネットワークプログラミングの世界
あなたの知らないネットワークプログラミングの世界
 
πολλαπλασιασμοι ενοτητα 11
πολλαπλασιασμοι ενοτητα 11πολλαπλασιασμοι ενοτητα 11
πολλαπλασιασμοι ενοτητα 11
 
とある帽子の大蛇料理Ⅱ
とある帽子の大蛇料理Ⅱとある帽子の大蛇料理Ⅱ
とある帽子の大蛇料理Ⅱ
 
Bish Bash Bosh & Co
Bish Bash Bosh & Co Bish Bash Bosh & Co
Bish Bash Bosh & Co
 
100Gbpsソフトウェアルータの実現可能性に関する論文
100Gbpsソフトウェアルータの実現可能性に関する論文100Gbpsソフトウェアルータの実現可能性に関する論文
100Gbpsソフトウェアルータの実現可能性に関する論文
 
User-space Network Processing
User-space Network ProcessingUser-space Network Processing
User-space Network Processing
 
xv6のコンテキストスイッチを読む
xv6のコンテキストスイッチを読むxv6のコンテキストスイッチを読む
xv6のコンテキストスイッチを読む
 
デバドラを書いてみよう!
デバドラを書いてみよう!デバドラを書いてみよう!
デバドラを書いてみよう!
 
Disruptive IP Networking with Intel DPDK on Linux
Disruptive IP Networking with Intel DPDK on LinuxDisruptive IP Networking with Intel DPDK on Linux
Disruptive IP Networking with Intel DPDK on Linux
 
x86とコンテキストスイッチ
x86とコンテキストスイッチx86とコンテキストスイッチ
x86とコンテキストスイッチ
 
エンジニアなら知っておきたい「仮想マシン」のしくみ (BPStudy38)
エンジニアなら知っておきたい「仮想マシン」のしくみ (BPStudy38)エンジニアなら知っておきたい「仮想マシン」のしくみ (BPStudy38)
エンジニアなら知っておきたい「仮想マシン」のしくみ (BPStudy38)
 
クラウド環境におけるキャッシュメモリQoS制御の評価
クラウド環境におけるキャッシュメモリQoS制御の評価クラウド環境におけるキャッシュメモリQoS制御の評価
クラウド環境におけるキャッシュメモリQoS制御の評価
 
DPDKを拡張してみた話し
DPDKを拡張してみた話しDPDKを拡張してみた話し
DPDKを拡張してみた話し
 
Interrupts
InterruptsInterrupts
Interrupts
 

Similar to I/O仮想化最前線〜ネットワークI/Oを中心に〜

Toward a practical “HPC Cloud”: Performance tuning of a virtualized HPC cluster
Toward a practical “HPC Cloud”: Performance tuning of a virtualized HPC clusterToward a practical “HPC Cloud”: Performance tuning of a virtualized HPC cluster
Toward a practical “HPC Cloud”: Performance tuning of a virtualized HPC clusterRyousei Takano
 
Virtualization Technology Overview
Virtualization Technology OverviewVirtualization Technology Overview
Virtualization Technology OverviewOpenCity Community
 
Cooperative VM Migration for a virtualized HPC Cluster with VMM-bypass I/O de...
Cooperative VM Migration for a virtualized HPC Cluster with VMM-bypass I/O de...Cooperative VM Migration for a virtualized HPC Cluster with VMM-bypass I/O de...
Cooperative VM Migration for a virtualized HPC Cluster with VMM-bypass I/O de...Ryousei Takano
 
virtualization tutorial at ACM bangalore Compute 2009
virtualization tutorial at ACM bangalore Compute 2009virtualization tutorial at ACM bangalore Compute 2009
virtualization tutorial at ACM bangalore Compute 2009ACMBangalore
 
Hyper V R2 Deep Dive
Hyper V R2 Deep DiveHyper V R2 Deep Dive
Hyper V R2 Deep DiveAidan Finn
 
ARMvisor @ Linux Symposium 2012
ARMvisor @ Linux Symposium 2012ARMvisor @ Linux Symposium 2012
ARMvisor @ Linux Symposium 2012Peter Chang
 
Hardware supports for Virtualization
Hardware supports for VirtualizationHardware supports for Virtualization
Hardware supports for VirtualizationYoonje Choi
 
Toward a practical “HPC Cloud”: Performance tuning of a virtualized HPC cluster
Toward a practical “HPC Cloud”: Performance tuning of a virtualized HPC clusterToward a practical “HPC Cloud”: Performance tuning of a virtualized HPC cluster
Toward a practical “HPC Cloud”: Performance tuning of a virtualized HPC clusterRyousei Takano
 
ARMvisor, more details
ARMvisor, more detailsARMvisor, more details
ARMvisor, more detailsPeter Chang
 
Realtime scheduling for virtual machines in SKT
Realtime scheduling for virtual machines in SKTRealtime scheduling for virtual machines in SKT
Realtime scheduling for virtual machines in SKTThe Linux Foundation
 
Virtualization Primer for Java Developers
Virtualization Primer for Java DevelopersVirtualization Primer for Java Developers
Virtualization Primer for Java DevelopersRichard McDougall
 
Hyper V - Minasi Forum 2009
Hyper V - Minasi Forum 2009Hyper V - Minasi Forum 2009
Hyper V - Minasi Forum 2009Aidan Finn
 
Aidan Finn Hyper V The Future Of Infrastructure
Aidan Finn   Hyper V   The Future Of InfrastructureAidan Finn   Hyper V   The Future Of Infrastructure
Aidan Finn Hyper V The Future Of InfrastructureNathan Winters
 
2virtualizationtechnologyoverview 13540659831745-phpapp02-121127193019-phpapp01
2virtualizationtechnologyoverview 13540659831745-phpapp02-121127193019-phpapp012virtualizationtechnologyoverview 13540659831745-phpapp02-121127193019-phpapp01
2virtualizationtechnologyoverview 13540659831745-phpapp02-121127193019-phpapp01Vietnam Open Infrastructure User Group
 

Similar to I/O仮想化最前線〜ネットワークI/Oを中心に〜 (20)

Toward a practical “HPC Cloud”: Performance tuning of a virtualized HPC cluster
Toward a practical “HPC Cloud”: Performance tuning of a virtualized HPC clusterToward a practical “HPC Cloud”: Performance tuning of a virtualized HPC cluster
Toward a practical “HPC Cloud”: Performance tuning of a virtualized HPC cluster
 
Virtualization Technology Overview
Virtualization Technology OverviewVirtualization Technology Overview
Virtualization Technology Overview
 
Cooperative VM Migration for a virtualized HPC Cluster with VMM-bypass I/O de...
Cooperative VM Migration for a virtualized HPC Cluster with VMM-bypass I/O de...Cooperative VM Migration for a virtualized HPC Cluster with VMM-bypass I/O de...
Cooperative VM Migration for a virtualized HPC Cluster with VMM-bypass I/O de...
 
virtualization tutorial at ACM bangalore Compute 2009
virtualization tutorial at ACM bangalore Compute 2009virtualization tutorial at ACM bangalore Compute 2009
virtualization tutorial at ACM bangalore Compute 2009
 
Hyper V R2 Deep Dive
Hyper V R2 Deep DiveHyper V R2 Deep Dive
Hyper V R2 Deep Dive
 
ARMvisor @ Linux Symposium 2012
ARMvisor @ Linux Symposium 2012ARMvisor @ Linux Symposium 2012
ARMvisor @ Linux Symposium 2012
 
Hardware supports for Virtualization
Hardware supports for VirtualizationHardware supports for Virtualization
Hardware supports for Virtualization
 
Toward a practical “HPC Cloud”: Performance tuning of a virtualized HPC cluster
Toward a practical “HPC Cloud”: Performance tuning of a virtualized HPC clusterToward a practical “HPC Cloud”: Performance tuning of a virtualized HPC cluster
Toward a practical “HPC Cloud”: Performance tuning of a virtualized HPC cluster
 
ARMvisor, more details
ARMvisor, more detailsARMvisor, more details
ARMvisor, more details
 
XS Boston 2008 SR-IOV
XS Boston 2008 SR-IOVXS Boston 2008 SR-IOV
XS Boston 2008 SR-IOV
 
Realtime scheduling for virtual machines in SKT
Realtime scheduling for virtual machines in SKTRealtime scheduling for virtual machines in SKT
Realtime scheduling for virtual machines in SKT
 
Virtualization Primer for Java Developers
Virtualization Primer for Java DevelopersVirtualization Primer for Java Developers
Virtualization Primer for Java Developers
 
XS Japan 2008 BitVisor English
XS Japan 2008 BitVisor EnglishXS Japan 2008 BitVisor English
XS Japan 2008 BitVisor English
 
Hyper V - Minasi Forum 2009
Hyper V - Minasi Forum 2009Hyper V - Minasi Forum 2009
Hyper V - Minasi Forum 2009
 
XS Boston 2008 Self IO Emulation
XS Boston 2008 Self IO EmulationXS Boston 2008 Self IO Emulation
XS Boston 2008 Self IO Emulation
 
Aidan Finn Hyper V The Future Of Infrastructure
Aidan Finn   Hyper V   The Future Of InfrastructureAidan Finn   Hyper V   The Future Of Infrastructure
Aidan Finn Hyper V The Future Of Infrastructure
 
2virtualizationtechnologyoverview 13540659831745-phpapp02-121127193019-phpapp01
2virtualizationtechnologyoverview 13540659831745-phpapp02-121127193019-phpapp012virtualizationtechnologyoverview 13540659831745-phpapp02-121127193019-phpapp01
2virtualizationtechnologyoverview 13540659831745-phpapp02-121127193019-phpapp01
 
The kvm virtualization way
The kvm virtualization wayThe kvm virtualization way
The kvm virtualization way
 
Nakajima hvm-be final
Nakajima hvm-be finalNakajima hvm-be final
Nakajima hvm-be final
 
Graphics virtualization
Graphics virtualizationGraphics virtualization
Graphics virtualization
 

More from Ryousei Takano

Error Permissive Computing
Error Permissive ComputingError Permissive Computing
Error Permissive ComputingRyousei Takano
 
Opportunities of ML-based data analytics in ABCI
Opportunities of ML-based data analytics in ABCIOpportunities of ML-based data analytics in ABCI
Opportunities of ML-based data analytics in ABCIRyousei Takano
 
ABCI: An Open Innovation Platform for Advancing AI Research and Deployment
ABCI: An Open Innovation Platform for Advancing AI Research and DeploymentABCI: An Open Innovation Platform for Advancing AI Research and Deployment
ABCI: An Open Innovation Platform for Advancing AI Research and DeploymentRyousei Takano
 
USENIX NSDI 2016 (Session: Resource Sharing)
USENIX NSDI 2016 (Session: Resource Sharing)USENIX NSDI 2016 (Session: Resource Sharing)
USENIX NSDI 2016 (Session: Resource Sharing)Ryousei Takano
 
Flow-centric Computing - A Datacenter Architecture in the Post Moore Era
Flow-centric Computing - A Datacenter Architecture in the Post Moore EraFlow-centric Computing - A Datacenter Architecture in the Post Moore Era
Flow-centric Computing - A Datacenter Architecture in the Post Moore EraRyousei Takano
 
A Look Inside Google’s Data Center Networks
A Look Inside Google’s Data Center NetworksA Look Inside Google’s Data Center Networks
A Look Inside Google’s Data Center NetworksRyousei Takano
 
クラウド時代の半導体メモリー技術
クラウド時代の半導体メモリー技術クラウド時代の半導体メモリー技術
クラウド時代の半導体メモリー技術Ryousei Takano
 
AIST Super Green Cloud: lessons learned from the operation and the performanc...
AIST Super Green Cloud: lessons learned from the operation and the performanc...AIST Super Green Cloud: lessons learned from the operation and the performanc...
AIST Super Green Cloud: lessons learned from the operation and the performanc...Ryousei Takano
 
IEEE CloudCom 2014参加報告
IEEE CloudCom 2014参加報告IEEE CloudCom 2014参加報告
IEEE CloudCom 2014参加報告Ryousei Takano
 
Expectations for optical network from the viewpoint of system software research
Expectations for optical network from the viewpoint of system software researchExpectations for optical network from the viewpoint of system software research
Expectations for optical network from the viewpoint of system software researchRyousei Takano
 
Exploring the Performance Impact of Virtualization on an HPC Cloud
Exploring the Performance Impact of Virtualization on an HPC CloudExploring the Performance Impact of Virtualization on an HPC Cloud
Exploring the Performance Impact of Virtualization on an HPC CloudRyousei Takano
 
不揮発メモリとOS研究にまつわる何か
不揮発メモリとOS研究にまつわる何か不揮発メモリとOS研究にまつわる何か
不揮発メモリとOS研究にまつわる何かRyousei Takano
 
High-resolution Timer-based Packet Pacing Mechanism on the Linux Operating Sy...
High-resolution Timer-based Packet Pacing Mechanism on the Linux Operating Sy...High-resolution Timer-based Packet Pacing Mechanism on the Linux Operating Sy...
High-resolution Timer-based Packet Pacing Mechanism on the Linux Operating Sy...Ryousei Takano
 
クラウドの垣根を超えた高性能計算に向けて~AIST Super Green Cloudでの試み~
クラウドの垣根を超えた高性能計算に向けて~AIST Super Green Cloudでの試み~クラウドの垣根を超えた高性能計算に向けて~AIST Super Green Cloudでの試み~
クラウドの垣根を超えた高性能計算に向けて~AIST Super Green Cloudでの試み~Ryousei Takano
 
From Rack scale computers to Warehouse scale computers
From Rack scale computers to Warehouse scale computersFrom Rack scale computers to Warehouse scale computers
From Rack scale computers to Warehouse scale computersRyousei Takano
 
高性能かつスケールアウト可能なHPCクラウド AIST Super Green Cloud
高性能かつスケールアウト可能なHPCクラウド AIST Super Green Cloud高性能かつスケールアウト可能なHPCクラウド AIST Super Green Cloud
高性能かつスケールアウト可能なHPCクラウド AIST Super Green CloudRyousei Takano
 
Iris: Inter-cloud Resource Integration System for Elastic Cloud Data Center
Iris: Inter-cloud Resource Integration System for Elastic Cloud Data CenterIris: Inter-cloud Resource Integration System for Elastic Cloud Data Center
Iris: Inter-cloud Resource Integration System for Elastic Cloud Data CenterRyousei Takano
 
A Scalable and Distributed Electrical Power Monitoring System Utilizing Cloud...
A Scalable and Distributed Electrical Power Monitoring System Utilizing Cloud...A Scalable and Distributed Electrical Power Monitoring System Utilizing Cloud...
A Scalable and Distributed Electrical Power Monitoring System Utilizing Cloud...Ryousei Takano
 

More from Ryousei Takano (20)

Error Permissive Computing
Error Permissive ComputingError Permissive Computing
Error Permissive Computing
 
Opportunities of ML-based data analytics in ABCI
Opportunities of ML-based data analytics in ABCIOpportunities of ML-based data analytics in ABCI
Opportunities of ML-based data analytics in ABCI
 
ABCI: An Open Innovation Platform for Advancing AI Research and Deployment
ABCI: An Open Innovation Platform for Advancing AI Research and DeploymentABCI: An Open Innovation Platform for Advancing AI Research and Deployment
ABCI: An Open Innovation Platform for Advancing AI Research and Deployment
 
ABCI Data Center
ABCI Data CenterABCI Data Center
ABCI Data Center
 
USENIX NSDI 2016 (Session: Resource Sharing)
USENIX NSDI 2016 (Session: Resource Sharing)USENIX NSDI 2016 (Session: Resource Sharing)
USENIX NSDI 2016 (Session: Resource Sharing)
 
Flow-centric Computing - A Datacenter Architecture in the Post Moore Era
Flow-centric Computing - A Datacenter Architecture in the Post Moore EraFlow-centric Computing - A Datacenter Architecture in the Post Moore Era
Flow-centric Computing - A Datacenter Architecture in the Post Moore Era
 
A Look Inside Google’s Data Center Networks
A Look Inside Google’s Data Center NetworksA Look Inside Google’s Data Center Networks
A Look Inside Google’s Data Center Networks
 
クラウド時代の半導体メモリー技術
クラウド時代の半導体メモリー技術クラウド時代の半導体メモリー技術
クラウド時代の半導体メモリー技術
 
AIST Super Green Cloud: lessons learned from the operation and the performanc...
AIST Super Green Cloud: lessons learned from the operation and the performanc...AIST Super Green Cloud: lessons learned from the operation and the performanc...
AIST Super Green Cloud: lessons learned from the operation and the performanc...
 
IEEE CloudCom 2014参加報告
IEEE CloudCom 2014参加報告IEEE CloudCom 2014参加報告
IEEE CloudCom 2014参加報告
 
Expectations for optical network from the viewpoint of system software research
Expectations for optical network from the viewpoint of system software researchExpectations for optical network from the viewpoint of system software research
Expectations for optical network from the viewpoint of system software research
 
Exploring the Performance Impact of Virtualization on an HPC Cloud
Exploring the Performance Impact of Virtualization on an HPC CloudExploring the Performance Impact of Virtualization on an HPC Cloud
Exploring the Performance Impact of Virtualization on an HPC Cloud
 
不揮発メモリとOS研究にまつわる何か
不揮発メモリとOS研究にまつわる何か不揮発メモリとOS研究にまつわる何か
不揮発メモリとOS研究にまつわる何か
 
High-resolution Timer-based Packet Pacing Mechanism on the Linux Operating Sy...
High-resolution Timer-based Packet Pacing Mechanism on the Linux Operating Sy...High-resolution Timer-based Packet Pacing Mechanism on the Linux Operating Sy...
High-resolution Timer-based Packet Pacing Mechanism on the Linux Operating Sy...
 
クラウドの垣根を超えた高性能計算に向けて~AIST Super Green Cloudでの試み~
クラウドの垣根を超えた高性能計算に向けて~AIST Super Green Cloudでの試み~クラウドの垣根を超えた高性能計算に向けて~AIST Super Green Cloudでの試み~
クラウドの垣根を超えた高性能計算に向けて~AIST Super Green Cloudでの試み~
 
From Rack scale computers to Warehouse scale computers
From Rack scale computers to Warehouse scale computersFrom Rack scale computers to Warehouse scale computers
From Rack scale computers to Warehouse scale computers
 
高性能かつスケールアウト可能なHPCクラウド AIST Super Green Cloud
高性能かつスケールアウト可能なHPCクラウド AIST Super Green Cloud高性能かつスケールアウト可能なHPCクラウド AIST Super Green Cloud
高性能かつスケールアウト可能なHPCクラウド AIST Super Green Cloud
 
Iris: Inter-cloud Resource Integration System for Elastic Cloud Data Center
Iris: Inter-cloud Resource Integration System for Elastic Cloud Data CenterIris: Inter-cloud Resource Integration System for Elastic Cloud Data Center
Iris: Inter-cloud Resource Integration System for Elastic Cloud Data Center
 
IEEE/ACM SC2013報告
IEEE/ACM SC2013報告IEEE/ACM SC2013報告
IEEE/ACM SC2013報告
 
A Scalable and Distributed Electrical Power Monitoring System Utilizing Cloud...
A Scalable and Distributed Electrical Power Monitoring System Utilizing Cloud...A Scalable and Distributed Electrical Power Monitoring System Utilizing Cloud...
A Scalable and Distributed Electrical Power Monitoring System Utilizing Cloud...
 

Recently uploaded

Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Zilliz
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfOrbitshub
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelDeepika Singh
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...apidays
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...apidays
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamUiPathCommunity
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusZilliz
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2
 

Recently uploaded (20)

Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 

I/O仮想化最前線〜ネットワークI/Oを中心に〜

  • 1. I/O 2012 8 24 @
  • 2. VM VM OS –  –  2
  • 3. I/O •  I/O –  •  DB HPC –  •  •  –  I/O PCI SR-IOV … •  –  –  I/O 3
  • 4. PCI pass- VM virtio, vhost SR-IOV through VMM Open vSwitch VT-d VM: Virtual Machine VMM: Virtual Machine Monitor SR-IOV: Single Root-I/O Virtualization 4
  • 5. •  I/O –  virtio vhost –  PCI –  SR-IOV •  QEMU/KVM •  –  5
  • 6. 6
  • 7. •  CPU I/O OS –  OS •  OS 7
  • 8. •  VM –  VM I/F OS •  VM 1960 –  1972 IBM VM/370 –  1973 ACM workshop on virtual computer systems OS OS OS VM VM VM VMM 8
  • 9. Intel •  –  VMWare 1999 •  Popek Goldberg –  Xen 2003 VMM •  OS •  –  Intel VT AMD-V (2006) –  ! –  •  KVM (2006) BitVisor (2009) BHyVe (2011) 9
  • 10. Intel VT (Virtualization Technology) •  CPU –  IA32 Intel 64 VT-x –  Itanium VT-i •  I/O –  VT-d (Virtualization Technology for Directed I/O) –  VT-c (Virtualization Technology for Connectivity) •  VMDq IOAT SR-IOV •  AMD VMDq: Virtual Machine Device Queues IOAT: IO Acceleration Technology 10
  • 11. KVM: Kernel-based Virtual Machine •  –  Xen ring aliasing •  CPU QEMU –  BIOS VMX root mode OS VMX non-root mode OS proc. QEMU Ring 3 device memory VM Entry emulation management VMCS VM Exit KVM Ring 0 Guest OS Kernel Linux Kernel 11
  • 12. CPU Xen KVM VM VM (Xen DomU) VM (QEMU process) (Dom0) Guest OS Guest OS Process Process VCPU VCPU threads Xen Hypervisor Linux KVM Domain Process scheduler scheduler Physical Physical CPU CPU 12
  • 13. OS Guest OS VA PA GVA GPA GVA VMM GPA HPA MMU# MMU# (CR3) (CR3) page page H/W 13
  • 14. PVM HVM EPT# HVM Guest Guest Guest OS OS OS GVA HPA GVA GPA GVA GPA OS OS SPT VMM VMM VMM GVA HPA GPA HPA MMU# MMU# MMU# (CR3) (CR3) (CR3) page page page H/W 14
  • 15. Intel Extended Page Table GVA TLB OS page walk CR3 GVA GPA TLB GVA HPA VMM EPTP GPA HPA 3 Intel x64 4 HPA TLB: Translation Look-aside Buffer 15
  • 16. I/O 16
  • 17. I/O •  IO (PIO) •  •  DMA (Direct Memory Access) I/O DMA CPU CPU 4.EOI 1.DMA IN/OUT 3. 2.DMA I/O EOI: End Of Interrupt 17
  • 18. PCI •  –  INTx •  4 –  MSI/MSI-x (Message Signaled Interrupt) •  DMA write •  IDT (Interrupt Description Table) OS •  VMM MSI PCI INT A INTx CPU IOAPIC (Local APIC) EOI 18
  • 19. PCI •  PCI BDF . –  PCI •  1 •  NIC 1 •  SR-IOV VF $ lspci –tv ... snip ... 2 GbE  -[0000:00]-+-00.0  Intel Corporation 5500 I/O Hub to ESI Port              +-01.0-[01]--+-00.0  Broadcom Corporation NetXtreme II BCM5716 Gigabit Ethernet              |            -00.1  Broadcom Corporation NetXtreme II BCM5716 Gigabit Ethernet              +-03.0-[05]--              +-07.0-[06]----00.0  Intel Corporation 82599EB 10-Gigabit SFI/SFP+ Network Connect              +-09.0-[03]-- ... snip ... . 19
  • 20. VM I/O •  I/O VM (virtio, vhost) PCI pass- through SR-IOV –  VMM VMM Open vSwitch •  QEMU ne2000 rtl8139 e1000 VT-d •  –  Xen split driver model –  virtio vhost –  VMWare VMXNET3 •  Direct assignment VMM bypass I/O –  PCI –  SR-IOV 20
  • 21. VM I/O I/O PCI SR-IOV VM1 VM2 VM1 VM2 VM1 VM2 Guest OS Guest OS Guest OS … … … Guest Physical Physical driver driver driver VMM VMM VMM vSwitch Physical driver NIC NIC NIC Switch (VEB) I/O emulation PCI passthrough SR-IOV VM 21
  • 22. Edge Virtual Bridging (IEEE 802.1Qbg) •  VM •  (a) Software VEB (b) Hardware VEB (c) VEPA, VN-Tag VM1 VM2 VM1 VM2 VM1 VM2 VNIC VNIC VNIC VNIC VNIC VNIC VMM vSwitch VMM VMM NIC NIC switch NIC switch VEB: Virtual Ethernet Bridging VEPA: Virtual Ethernet Port Aggregator 22
  • 23. I/O •  OS –  •  VM Exits VMX root mode VMX non-root mode QEMU Ring 3 e1000 copy Linux Kernel/ tap Guest OS Kernel KVM vSwitch buffer Ring 0 Physical driver e1000 23
  • 24. virtio •  VM Exits •  virtio_ring –  I/O VMX root mode VMX non-root mode QEMU Ring 3 virtio_net copy Linux Kernel/ tap Guest OS Kernel KVM vSwitch buffer Ring 0 Physical driver virtio_net 24
  • 25. vhost •  tap QEMU •  macvlan/macvtap VMX root mode VMX non-root mode QEMU Ring 3 Linux Kernel/ vhost_net KVM Guest OS Kernel macvtap buffer Ring 0 physical driver macvlan virtio_net 25
  • 26. VM PCI pass- SR-IOV •  (virtio, vhost) through –  VMM DMA VMM Open vSwitch –  VMM VT-d VMX root mode VMX non-root mode QEMU Ring 3 Linux Kernel/ Guest OS Kernel KVM Ring 0 buffer physical driver EOI H/W VT-d DMA 26
  • 27. VM1 VM2 : Guest OS VMM … VM Exit VMCS VMM OS VM Entry DMA IOMMU NIC VMCS: Virtual Machine Control Structure 27
  • 28. Intel VT-d: I/O •  VMM OS –  I/O –  OS •  VMM •  •  VT-d –  DMA remapping (IOMMU) –  Interrupt remapping VT-d Interrupt remapping 28
  • 29. VT-d: DMA remapping •  –  OS –  DMA NG •  VM DMA –  IOMMU MMU+EPT DMA I/O CPU 29
  • 30. ource-id” in this document. The remapping hardware may determine the source-id of a in implementation-specific ways. For example, some I/O bus protocols may provide the device identity as part of each I/O transaction. In other cases (for Root-Complex devices, for example), the source-id may be derived based on the Root-Complex internal tion. DMA remapping page walk ress devices, the source-id is the requester identifier in the PCI Express transaction layer requester identifier of a device, which is composed of its PCI Bus/Device/Function assigned by configuration software and uniquely identifies the hardware function that request. Figure 3-6 illustrates the requester-id1 as defined by the PCI Express n. 1 ID DMA Remapping—Intel® Virtualization Technology for Directed I/O •  BDF ID 5 87 3 2 0 DMA Remapping—Intel® Virtualization Technology for Directed I/O Bus # Device # Function # •  ID DMA Figure 3-6. Requester Identifier Format •  IOTLB hardware encounters a page-table entry with either Read or Write fieldisClear — If address translating a Atomic Operation (AtomicOp) request, the request bloc (Dev 31, Func 7) Context entry 255 g sections describe the data structures for mapping I/O devices to domains. Figure 3-8 shows a multi-level (3-level) page-table structure with 4KB page mappings tables. Figure 3-9 shows a 2-level page-table structure with 2MB super pages. Root-Entry (Dev 0, Func 1) DMA try functions as the top level (Dev 0, Func 0) to map devices on a specific PCI bus to6 their (Bus 255) Root entry 255 structure Context entry 0 3 3 3 2 22 11 domains. Each root-entry structure contains the following fields: Translation Context-entry Table Address 3 9 8 0 9 10 21 0 for Bus N Structures for Domain A DMA with address bits (Bus N) Root entry N t flag: The present field is used by software to indicate to hardware whether the root- 0s 63:39 validated to be 0s present and initialized. Software may Clear the present field for root entries 12-bits 9-bits 9-bits 9-bits onding to bus numbers that are either not present in the platform, or don’t have any eam devices attached. If the present field of a root-entry used to process a DMA request (Bus 0) Root entry 0 + the DMA request is blocked, resulting in a translation fault. << 3 Root-entry Table t-entry table pointer: The context-entry table 255 Context entry pointer references the context-entry r devices on the bus identified by the root-entry. Section 3.3.3 describes context entries in detail. << 3 + SP = 0 illustrates the root-entry format. The root entries are programmed Translation the root-entry Address through 4KB page ocation of the root-entry table in system memory is programmed through the Root-entry Structures for Domain B ss register. The root-entry table is 4KB in sizeentry 0 accommodates 256 root entries to Context and << 3 + SP = 0 CI bus number space (0-255). In the case of a PCI device, the bus number (upper 8-bits) Context-entry Table a DMA transaction’s source-id field is used for Bus 0 into the root-entry structure. to index 1 4KB page table 2 6 1 llustrates how these tables are used to map devices to domains. 7 3 2 0 Figure 3-7. Device to Domain Mapping Structures ASR + SP = 0 Context Entry 4KB page table , 3.3.3 Context-Entry , 4KB page table A context-entry maps a specific I/O device on a bus to the domain to which it is assigned, and, 3-8. Example Multi-level Page Table Figure in turn, to the address translation structures for the domain. The context entries are programmed Intel Virtualization Technology for Directed I/O through the memory-resident context-entry tables. Each root-entry in the root-entry table contains the pointer to the context-entry table for the corresponding bus number. Each context-entry table Express devices entries, with each entryRouting-ID Interpretation (ARI), bits traditionally bus. For a PCI contains 256 supporting Alternative representing a unique PCI device function on the 30
  • 31. VT-d: Interrupt remapping •  MSI VM •  MSI/MSI-x •  Interrupt remapping table (IRT) MSI write request –  VT-d CPU •  DMA write request destination ID –  VT-d VMM IRT 31
  • 32. mance for I/O Virtualization Exit-Less Interrupt mit2 Nadav Har’El1 1 •  “ELI: Bare-Metal Performance Assaf Schuster2 Dan Tsafrir2 for I/O Virtualization”, A. Gordon, et al., ASPLOS 2012 2 Technion—Israel Institute of Technology namit,muli,assaf,dan}@cs.technion.ac.il –  OS VM Exits ELI (Exit-Less Interrupt) –  netperf Apache memcached BMM 97-100% guest/host context switch (exits and entries) interrupt to the host CPU forces an exit and delivers the through the handling costIDT. host (handling physical interrupts and their completions) guest interrupt Guests receive virtual interrupts, which are not necessarily related IDT handler to physical interrupts. The host may decide to inject guest the guest with a (a) baseline assigned virtual interrupt because the host received a corresponding physical physical interrupt interrupt interrupt host interrupt, or the host injection completion interrupt may decide to inject the guest with a virtual interrupt manufactured by the host. The host injects virtual interrupts shadow guest ELI through the guest IDT. When the processor enters guest mode after shadow IDT (b) delivery an injection, the guest receives and handles the virtual interrupt. interrupt host IDT VM non-assigned During interrupt completion the guest will access its LAPIC. Just handling, interrupt like the IDT, full access to a core’s physical LAPIC implies total ELI (exit) ELI guest control of the core, so the host cannot easily give untrusted guests delivery & delivery hypervisor (c) access to the physical LAPIC. For guests using the first LAPIC x2APIC completion host Non present generation, the processor forces an exit when the guest accesses (d) bare-metal LAPIC memory area. For guests using x2APIC, the host traps the physical LAPIC accesses through an MSR bitmap. When running a guest, interrupt the host provides the CPU with a bitmap specifying whichtime benign MSRs the guest is allowed to access directly and which sensitive Figure 1. Exits during interrupt handling MSRs must not be accessed by the guest directly. When the guest Figure 2. ELI interrupt delivery flow accesses sensitive MSRs, execution exits back to the host. In general, 32
  • 33. PCI-SIG IO Virtualization •  I/O PCIe Gen2 –  SR-IOV (Single Root-I/O Virtualization) •  VM •  NIC –  MR-IOV (Multi Root-I/O Virtualization) •  •  •  NEC ExpEther •  VMM SR-IOV –  KVM Xen VMWare Hyper-V –  Linux VFIO 33
  • 34. SR-IOV NIC •  1 NIC NIC vNIC VM –  vNIC = VF (Virtual Function) VM1 VM2 VM3 vNIC vNIC vNIC VMM RX TX Virtual Function L2 Classified Sorter MAC/PHY 34
  • 35. SR-IOV NIC •  Physical Function (PF) –  VMM •  Virtual Function (VF) –  VM OS VF –  PF PF –  82576 8 256 VM Guest OS VM Device System Device Config Space Config Space VF driver VFn0 PFn0 Virtual NIC VFn0 VMM PF driver VFn1 Physical NIC VFn2 : 35
  • 36. 1.  + tap VM (virtio, vhost) PCI pass- through SR-IOV –  VMM Open vSwitch –  •  VT-d •  Open vSwitch 2.  MAC tap : macvlan/macvtap –  VM1 VM2 VM1 VM2 1. 2. eth0 eth0 eth0 eth0 VMM VMM tap0 tap1 tap0 tap1 macvlan0 macvlan1 eth0 eth0 36
  • 37. Open vSwitch •  –  Linux •  •  OvS –  OpenFlow –  •  Linux kernel 3.3 •  Pica8 Pronto http://openvswitch.org/ 37
  • 38. VM VM1 VM2 VM VLAN •  OS VLAN eth0 eth0 •  1 VM VLAN ID VMM tap0 tap1 # ovs-vsctl add-br br0 # ovs-vsctl add-port br0 tap0 tag=101 # ovs-vsctl add-port br0 tap1 tag=102 vSwitch (br0) # ovs-vsctl add-port br0 eth0 VLAN ID 101 eth0 VLAN ID 102 VLAN tap0 <-> br0_101 <-> eth0.101 38
  • 39. QoS •  Linux Qdisc •  ingress policing egress shaping VM1 VM2 ingress policing # ovs-vsctl set Interface tap0 eth0 eth0 ingress_policing_rate=10000 # ovs-vsctl set Interface tap0 VMM ingress_policing_burst=1000 tap0 tap1 : 10Mbps ingress policing : 10MB vSwitch (br0) egress shaping eth0 39
  • 40. QoS •  Linux Qdisc •  ingress policing egress shaping VM1 VM2 egress shaping # ovs-vsctl -- set port eth0 qos=@newqos eth0 eth0 -- --id=@newqos create qos type=linux-htb other-config:max-rate=40000000 queues=0=@q0,1=@q1 VMM -- --id=@q0 create queue other-config:min- tap0 tap1 rate=10000000 other-config:max-rate=10000000 -- --id=@q1 create queue other-config:min- ingress policing rate=20000000 other-config:max-rate=20000000 vSwitch (br0) egress shaping # ovs-ofctl add-flow br0 “in_port=3 idle_timeout=0 actions=enqueue1:1 eth0 HTB HFSC 40
  • 41. QEMU/KVM 41
  • 42. •  Linux •  QEMU/KVM –  QEMU PCI –  libvirt Virt-manager → •  Open vSwitch 1.6.1 •  PCI & SR-IOV " Intel Gigabit ET dual port server adapter [SR-IOV ] " Intel Ethernet Converged Network Adapter X520-LR1 [SR-IOV ] " Mellanox ConnectX-2 QDR Infiniband HCA   Broadcom on board GbE NIC (BMC5709)   Brocade BR1741M-k 10 Gigabit Converged HCA 42
  • 43. QEMU/KVM VM CPU 2 (CPU model host) #!/bin/sh Memory 2GB sudo /usr/bin/kvm Network virtio_net -cpu host -smp 2 Storage virtio_blk -m 2000 -net nic,model=virtio,macaddr=00:16:3e:1d:ff:01 -net tap,ifname=tap0,script=/etc/ovs-ifup,downscript=/etc/ovs- ifdown -monitor telnet::5963,server,nowait -serial telnet::5964,server,nowait -daemonize -nographic -drive file=/work/kvm/vm01.img,if=virtio $@ 43
  • 44. QEMU/KVM $ cat /etc/ovs-ifup #!/bin/sh switch='br0' /sbin/ip link set mtu 9000 dev $1 up /opt/bin/ovs-vsctl add-port ${switch} $1 $ cat /etc/ovs-ifdown #!/bin/sh switch='br0' /sbin/ip link set $1 down /opt/bin/ovs-vsctl del-port ${switch} $1 QEMU/KVM tap ovs-vsctl brctl 44
  • 45. PCI 1.  BIOS Intel VT VT-d 2.  Linux VT-d –  intel_iommu=on 3.  PCI 4.  OS 5.  OS “How to assign devices with VT-d in KVM,” http://www.linux-kvm.org/page/How_to_assign_devices_with_VT- d_in_KVM 45
  • 46. PCI •  PCI BDF ID ID •  OS pci_stub # echo "8086 10fb" > /sys/bus/pci/drivers/pci-stub/new_id # echo "0000:06:00.0" > /sys/bus/pci/devices/0000:06:00.0/driver/unbind # echo "0000:06:00.0" > /sys/bus/pci/drivers/pci-stub/bind •  QEMU –  -device pci-assign,host=06:00.0 •  QEMU –  device_add pci-assign,host=06:00.0,id=vf0 –  device_del vf0 46
  • 47. SR-IOV VF •  VF # modprobe –r ixgbe max_vfs VF # modprobe ixgbe max_vfs=8 •  OS VF PCI $ lspci –tv ... snip ...  -[0000:00]-+-00.0  Intel Corporation 5500 I/O Hub to ESI Port              +-01.0-[01]--+-00.0  Broadcom Corporation NetXtreme II BCM5716 Gigabit Ethernet Physical Function (PF)              |            -00.1  Broadcom Corporation NetXtreme II BCM5716 Gigabit Ethernet              +-03.0-[05]--              +-07.0-[06]----00.0  Intel Corporation 82599EB 10-Gigabit SFI/SFP+ Network Connect              |            +-10.0  Intel Corporation 82599 Ethernet Controller Virtual Function              |            +-10.2  Intel Corporation 82599 Ethernet Controller Virtual Function              |            +-10.4  Intel Corporation 82599 Ethernet Controller Virtual Function Virtual Function (VF)              |            +-10.6  Intel Corporation 82599 Ethernet Controller Virtual Function              |            +-11.0  Intel Corporation 82599 Ethernet Controller Virtual Function              |            +-11.2  Intel Corporation 82599 Ethernet Controller Virtual Function              |            +-11.4  Intel Corporation 82599 Ethernet Controller Virtual Function              |            -11.6  Intel Corporation 82599 Ethernet Controller Virtual Function              +-09.0-[03]-- ... snip ... 47
  • 48. SR-IOV •  PCI •  OS pci_stub # echo "8086 10fb" > /sys/bus/pci/drivers/pci-stub/new_id # echo "0000:06:10.0" > /sys/bus/pci/devices/0000:06:10.0/driver/unbind # echo "0000:06:10.0" > /sys/bus/pci/drivers/pci-stub/bind •  QEMU –  -device pci-assign,host=06:10.0 •  QEMU –  device_add pci-assign,host=06:10.0,id=vf0 –  device_del vf0 48
  • 49. SR-IOV OS •  OS VF PCI $ cat /proc/interrupts CPU0 CPU1 ...snip... 29: 114941 114133 PCI-MSI-edge eth1-rx-0 $ lspci 30: 77616 78385 PCI-MSI-edge eth1-tx-0 00:00.0 Host bridge: Intel Corporation 440FX - 82441FX PMC [Natoma] (rev 02) 31: 5 5 PCI-MSI-edge eth1:mbx 00:01.0 ISA bridge: Intel Corporation 82371SB PIIX3 ISA [Natoma/Triton II] 00:01.1 IDE interface: Intel Corporation 82371SB PIIX3 IDE [Natoma/Triton II] 00:01.3 Bridge: Intel Corporation 82371AB/EB/MB PIIX4 ACPI (rev 03) 00:02.0 VGA compatible controller: Cirrus Logic GD 5446 00:03.0 Ethernet controller: Red Hat, Inc Virtio network device 00:04.0 SCSI storage controller: Red Hat, Inc Virtio block device 00:05.0 Ethernet controller: Intel Corporation 82599 Ethernet Controller Virtual Function (rev 01) 49
  • 50. SR-IOV •  VF NIC •  VF OS # ip link set dev eth5 vf 0 rate 200 # ip link set dev eth5 vf 1 rate 400 # ip link show dev eth5 42: eth5: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT qlen 1000 link/ether 00:1b:21:81:55:3e brd ff:ff:ff:ff:ff:ff vf 0 MAC 00:16:3e:1d:ee:01, tx rate 200 (Mbps), spoof checking on vf 1 MAC 00:16:3e:1d:ee:02, tx rate 400 (Mbps), spoof checking on OS 2010-OS-117 13 OS 50
  • 51. SR-IOV TIPS •  VF MAC # ip link set dev eth5 vf 0 00:16:3e:1d:ee:01 •  VF VLAN ID # ip link set dev eth5 vf 0 vlan 101 •  Intel 82576 GbE 82599 X540 10GbE NIC –  –  http://www.intel.com/content/www/us/en/ethernet- controllers/ethernet-controllers.html 51
  • 52. VM •  VM NG •  PCI Bonding –  PCI NIC –  NIC virtio NIC active-standby bonding –  S •  SR-IOV NIC VF virio PV 1 NIC 52
  • 53. SR-IOV: GesutOS bond0 eth0 eth1 (virtio) (igbvf) tap0 Host OS Host OS tap0 br0 br0 eth0 eth0 (igb) (igb) SR-IOV NIC SR-IOV NIC 53
  • 54. SR-IOV: GesutOS (qemu) device_del vf0 bond0 eth0 eth1 (virtio) (igbvf) tap0 Host OS Host OS tap0 br0 br0 eth0 eth0 (igb) (igb) SR-IOV NIC SR-IOV NIC 54
  • 55. SR-IOV: (qemu) migrate -d tcp:x.x.x.x:y GesutOS GesutOS bond0 eth0 (virtio) $ qemu -incoming tcp:0:y ... tap0 Host OS Host OS tap0 br0 br0 eth0 eth0 (igb) (igb) SR-IOV NIC SR-IOV NIC 55
  • 56. SR-IOV (qemu) device_add pci-assign, host=05:10.0,id=vf0 GesutOS bond0 eth0 eth1 (virtio) (igbvf) tap0 Host OS Host OS tap0 br0 br0 eth0 eth0 (igb) (igb) SR-IOV NIC SR-IOV NIC 56
  • 57. MPI Guest OS rank 1 → bond0 eth0 eth1 (virtio) (igbvf) tap0 192.168.0.1 tap0 192.168.0.2 192.168.0.3 br0 rank 0 br0 eth0 eth0 (igb) (igb) SR-IOV NIC SR-IOV NIC NIC 192.168.0.0/24 57
  • 58. SymVirt •  VM –  Infiniband •  OS VMM SymVirt (Symbiotic Virtualization) –  PCI Cloud scheduler Cloud scheduler –  VM allocation re-allocation •  Failu re!! Failure prediction –  SymCR: VM migration –  SymPFT: global storage global storage (VM images) (VM images) 58
  • 59. SymVirt •  SymVirt coordinator –  OS MPI •  global consistency !VM •  SymVirt controller/agent –  Application confirm confirm linkup SymVirt coordinator SymVirt SymVirt wait signal Guest OS mode VMM mode detach migration re-attach SymVirt controller/agent R. Takano, et al., “Cooperative VM Migration for a Virtualized HPC Cluster with VMM-Bypass I/O devices”, 8th IEEE e-Science 2012 ( ) 59
  • 60. HPC 60
  • 61. •  AIST Super Cluster 2004 TOP500 #19 •  AIST Green Cloud 2010 AIST Super Cloud 2011 1/10 1~2 –  HPCI EC2 ! •  IT 61
  • 62. •  •  ← •  DB HPC TOP3 IDC 2011 1.  2.  3.  62
  • 63. e.g., ASC 63
  • 64. AIST Green Cloud AGC 1 16 HPC Compute node Dell PowerEdge M610 Host machine environment CPU Intel quad-core Xeon E5540/2.53GHz x2 OS Debian 6.0.1 Chipset Intel 5520 Linux kernel 2.6.32-5-amd64 Memory 48 GB DDR3 KVM 0.12.50 InfiniBand Mellanox ConnectX (MT26428) Compiler gcc/gfortran 4.4.5 MPI Open MPI 1.4.2 Blade switch VM environment InfiniBand Mellanox M3601Q (QDR 16 ports) VCPU 8 Memory 45 GB 1 1 VM 64
  • 65. MPI Point-to-Point 10000 (higher is better) 2.4 GB/s qperf 3.2 GB/s 1000 Bandwidth [MB/sec] 100 PCI KVM 10 Bare Metal Bare Metal KVM 1 1 10 100 1k 10k 100k 1M 10M 100M 1G Message size [byte] Bare Metal: 65
  • 66. NPB BT-MZ: (higher is better) 300 100 Performance [Gop/s total] 250 Degradation of PE: 80 Parallel efficiency [%] KVM: 2%, EC2 CCI: 14% 200 Bare Metal 60 150 KVM Amazon EC2 40 100 Bare Metal (PE) KVM (PE) 20 50 Amazon EC2 (PE) 0 0 1 2 4 8 16 EC2 Cluster compute Number of nodes instances (CCI) 66
  • 67. Bloss: Rank 0 Rank 0 N MPI OpenMP Bcast 760 MB Liner Solver (require 10GB mem. Reduce 1 GB coarse-grained MPI comm. Parallel Efficiency 1 GB 120 Bcast Eigenvector calc. (higher is better) Gather 100 350 MB Parallel Efficiency [%] 80 60 Degradation of PE: 40 KVM: 8%, EC2 CCI: 22% 20 Bare Metal KVM Amazon EC2 Ideal 0 1 2 4 8 16 Number of nodes 67
  • 68. VMWare ESXi •  Dell PowerEdge T410 –  CPU Intel Hexa-core Xeon X5650, single socket –  6GB DDR3-1333 –  HBA: QLogic QLE2460 (single-port 4Gbps Fibre Channel) •  IBM DS3400 FC SAN •  VMM: VMWare ESXi 5.0 T410 Fibre DS3400 Channel •  OS Windows server 2008 R2 •  Ethernet –  8 vCPU (out-of-band ) –  3840 MB •  –  IOMeter 2006.07.27 (http://www.iometer.org/)
  • 69. Bare Metal Machine Raw Device Mapping VMDirectPath I/O (BMM) (RDM) (FPT) VM VM Windows Windows Windows NTFS NTFS NTFS Volume manager Volume manager Volume manager Disk class driver Disk class driver Disk class driver Storport/FC HBA driver Storport/SCSI driver Storport/FC HBA driver VMKernel VMKernel FC HBA driver LUN LUN LUN
  • 70. 12 OS
  • 71. ESXi •  FC SAN PCI RDM –  VMM SCSI FC –  RDM PCI HBA •  OS Linux Windows Linux ESXi •  BMM – 
  • 72. •  PCI HPC –  "InfiniBand PCI HPC ", SACSIS2011, pp.109-116, 2011 5 . –  “HPC ”, ACS37 , 2012 5 . •  PCI •  VM SR-IOV –  •  –  VM •  VM 72
  • 73. •  HPC –  73
  • 74. Yabusame •  QEMU/KVM –  –  http://grivon.apgrid.org/quick-kvm-migration 74
  • 75. •  I/O •  I/O –  I/O –  virtio vhost –  : PCI SR-IOV •  VMM ! –  SymVirt BitVisor 75