Windows Server 2012 Hyper-V
Networking Evolved

Didier Van Hoye
Didier Van Hoye

Technical Architect – FGIA


Microsoft MVP & MEET Member




 http://workinghardinit.wordpress.com
 @workinghardinit
What We’ll Discuss

Windows Server 2012 Networking
 Changed & Improved features
 New features
 Relationship to Hyper-V
Why We’ll Discuss This

We face many network challenges
 Keep systems & services running
  High to continuous availability
  High reliability & reducing complexity
  Security, multitenancy, extensibility
 Cannot keep throwing money at it (CAPEX)
  Network virtualization, QOS, bandwidth management in box
  Performance (latency, throughput, scalability)
  Leverage existing hardware
 Control operational cost (OPEX)  Reduce complexity
Eternal Challenge = Balanced Design
                            COST




                                            AVAILABILITY
                          CPU     MEMORY

             CAPACITY


                        NETWORK   STORAGE




                        PERFORMANCE
Network Bottlenecks
                                     1    2            3    4            5    6            7       8
                                     9   10           11   12           13   14           15      16




 In the host networking stack
                                 0            0   0             0   0             0   0                     0




                                 1            1   1             1   1             1   1                     1




                                 0            0   0             0   0             0   0                     0




                                 1            1   1             1   1             1   1                     1




                                                                                               PowerEdge M1000e




                                                                        2                                         4




 In the NICs




 In the switches
Socket, NUMA, Core, K-Group

 Processor: One physical processor, which can consist         Kernel Group (K-Group)
  of one or more NUMA nodes. Today a physical
  processor ≈ a socket, with multiple cores.
 Non-uniform memory architecture (NUMA) node:
  A set of logical processors and cache that are close to
  one another.
 Core: One processing unit, which can consist of one or
  more logical processors.
 Logical processor (LP): One logical computing
  engine from the perspective of the operating system,
  application or driver. In effect, a logical processor is a
  thread (think hyper threading).
 Kernel Group: A set of up to 64 logical processors.
Advanced Network Features (1)

 Receive Side Scaling (RSS)
 Receive Segment Coalescing (RSC)
 Dynamic Virtual Machine Queuing (DVMQ)
 Single Root I/O Virtualization (SR-IOV)
 NIC TEAMING
 RDMA/Multichannel support for virtual machines on SMB3.0
Receive Side Scaling (RSS)

 Windows Server 2012 scales RSS to the next generation of
    servers & workloads
   Spreads interrupts across all available CPUs
   Even for those very large scale hosts
   RSS now works across K-Groups
   Even RSS is “Numa Aware” to optimize performance
   Now load balances UDP traffic across CPUs
   40% to 100% more throughput (backups, file copies, web)
Node 0            Node 1   Node 2      Node 3




            Queues




                     Incoming Packets             RSS NIC with 8 Queues



RSS improves scalability on multiple processors / NUMA nodes by distributing
      TCP/UDP receive traffic across the cores in ≠ nodes / K-Groups
Receive Segment Coalescing (RSC)

 Coalesces packets in the NIC
  so the stack processes fewer headers
 Multiple packets belonging to a connection
  are coalesced by the NIC to a larger packet (max of 64 K)
  and processed within a single interrupt
 10 - 20% improvement in throughput &
  CPU workload  Offload to NIC
 Enabled by default on all 10Gbps
Receive Segment Coalescing

              Coalesced into larger buffer




                                NIC with RSC

      Incoming Packets




         RSC helps by coalescing multiple inbound packets into a
         larger buffer or “packet” which reduces per packet CPU
              costs as less headers need to be processed.
Dynamic Virtual Machine Queue (DVMQ)

VMQ is to virtualization what RSS is to native workloads.
It makes sure that Routing, Filtering etc. is done by the NIC in queues and
that the interrupts for those queues don’t get done by 1 processor (0).
Most inbox 10Gbps Ethernet adapters support this.
Enabled by default.




                                                     Network I/O path with VMQ
          Network I/O path without VMQ
Dynamic Virtual Machine Queue (DVMQ)

        Root Partition             Root Partition             Root Partition




        CPU    CPU   CPU     CPU   CPU    CPU   CPU     CPU   CPU    CPU   CPU     CPU
         0      1     2       3     0      1     2       3     0      1     2       3




              Physical NIC               Physical NIC               Physical NIC


              No VMQ                     Static VMQ            Dynamic VMQ

      Adaptive  optimal performance across changing workloads
Single-Root I/O Virtualization (SR-IOV)

       Reduces CPU utilization for processing network traffic
       Reduces latency path
                                                 Root Partition          Virtual Machine
       Increases throughput
       Requires:                                 Hyper-V Switch
                                                                            Virtual NIC
        Chipset: Interrupt & DMA remapping           Routing
        BIOS Support                                  VLAN
        CPU: Hardware virtualization, EPT or NPT     Filtering      VMBUS
                                                     Data Copy
                                                                          Virtual Function

                                                     Physical NIC Physical NIC
                                                          SR-IOV


                                                    Network I/O path without SR-IOV
                                                                     with SR-IOV
SR-IOV Enabling & Live Migration
      Turn On IOV                             Live Migration                   Post Migration
       Enable IOV (VM NIC Property)           Switch back to Software path    Reassign Virtual Function
       Virtual Function is “Assigned”         Remove VF from VM                   Assuming resources are
                                                                                     available
       “NIC” automatically created            Migrate as normal
       Traffic flows through VF
            Software path is not used

               Virtual Machine
                Network Stack
                   “NIC”                     VM has connectivity                      “NIC”
                                             even if
      Software NIC                            Switch not in IOV mode      Software NIC
                                              IOV physical NIC not
                                               present
     Software Switch                          Different NIC vendor       Software Switch
       (IOV Mode)                             Different NIC firmware       (IOV Mode)
                          Virtual Function                                                   Virtual Function


     Physical NIC Physical NIC
         SR-IOV                                                                SR-IOV Physical NIC
NIC TEAMING

 Customers are dealing with
  way to many issues.
 NIC vendors would like to
  get rid of supporting this.
 Microsoft needs this to be
  competitive & complete the
  solution stack + reduce
  support issues.
NIC Teaming
                                                                             Hyper-V Extensible Switch
 Teaming modes:               LBFO Admin GUI
  Switch dependent                                                              Frame distribution/aggregation
                                                                                        Failure detection
  Switch independent                WMI                                        Control protocol implementation

                                                                                    LBFO Provider
 Load balancing:                   LBFO
                              Configuration DLL   IOCTL
  Address Hash                                                                 Port 1      Port 2       Port 3

                                                                                     Virtual miniport 1
  Hyper-Port
                                                                                          IM MUX




                                                               Kernel mode
 Hashing modes:




                                                   User mode
                                                                                         Protocol edge

  4-tuple
                                                                                NIC 1        NIC 2         NIC 3
  2-tuple
  MAC address
 Active/Active & Active/Standby                                                  Network switch
 Vendor Agnostic
NIC TEAMING (LBFO)
         VM (Guest Running Any OS)        VM (Guest Running Windows Server 2012)

                                                      LBFO Teamed NIC


            Hyper-V virtual switch

                     SR-IOV Not exposed    Hyper-V virtual       Hyper-V virtual
                                              switch                switch
              LBFO Teamed NIC




       SR-IOV NIC         SR-IOV NIC       SR-IOV NIC             SR-IOV NIC


       Parent NIC Teaming                    Guest NIC Teaming
SMB Direct (SMB over RDMA)
What
   Addresses congestion in network stack by offloading the stack to   SMB Client                 SMB Server
    the network adapter
Advantages
                                                                        Application
   Scalable, fast and efficient storage access
                                                                                       User
   High throughput, low latency & minimal CPU utilization
   Load balancing, automatic failover & bandwidth aggregation via                     Kernel

    SMB Multichannel                                                    SMB Client                      SMB Server


Scenarios
                                                                        Network w/              Network w/
                                                                                                               NTFS
 High performance remote file access for application                     RDMA                    RDMA
                                                                                                               SCSI
                                                                         support                 support
  servers like Hyper-V, SQL Server, IIS and HPC
 Used by File Server and Clustered Shared Volumes (CSV) for
  storage communications within a cluster                                      R-NIC            R-NIC
                                                                                                               Disk

Required hardware
   RDMA-capable network interface (R-NIC)
   Three types: iWARP, RoCE & Infiniband
SMB Multichannel

 Multiple connections per SMB session
 Full Throughput
  Bandwidth aggregation with multiple NICs
  Multiple CPUs cores engaged when using Receive Side Scaling (RSS)

 Automatic Failover
  SMB Multichannel implements end-to-end failure detection
  Leverages NIC teaming if present, but does not require it

 Automatic Configuration
  SMB detects and uses multiple network paths
SMB Multichannel Single NIC Port
     1 session, without Multichannel                            1 session, with Multichannel
      No failover                                               No failover
      Can’t use full 10Gbps                                     Full 10Gbps available
        Only one TCP/IP connection                                Multiple TCP/IP connections
        Only one CPU core engaged                                 Receive Side Scaling (RSS) helps
                                                                    distribute load across CPU cores



           SMB Client            CPU utilization per core        SMB Client             CPU utilization per core
            RSS                                                   RSS
              NIC                                                   NIC
             10GbE                                                 10GbE


             Switch                                                Switch
             10GbE                                                 10GbE


              NIC                                                   NIC
             10GbE                                                 10GbE
            RSS                                                   RSS

                            Core 1   Core 2   Core 3   Core 4                       Core 1   Core 2   Core 3   Core 4
           SMB Server                                             SMB Server
SMB Multichannel Multiple NIC Ports
     1 session, without Multichannel                     1 session, with Multichannel
      No automatic failover                              Automatic NIC failover
      Can’t use full bandwidth                           Combined NIC bandwidth available
        Only one NIC engaged                               Multiple NICs engaged
        Only one CPU core engaged                          Multiple CPU cores engaged




                SMB Client 1          SMB Client 2                  SMB Client 1            SMB Client 2
          RSS              RSS                                RSS                  RSS
          NIC              NIC      NIC          NIC          NIC              NIC        NIC          NIC
         10GbE            10GbE    10GbE        10GbE        10GbE            10GbE      10GbE        10GbE


         Switch           Switch   Switch       Switch       Switch            Switch    Switch       Switch
         10GbE            10GbE    10GbE        10GbE        10GbE             10GbE     10GbE        10GbE


          NIC              NIC      NIC          NIC          NIC              NIC        NIC          NIC
         10GbE            10GbE    10GbE        10GbE        10GbE            10GbE      10GbE        10GbE
          RSS              RSS                                RSS                  RSS

                SMB Server 1          SMB Server 2               SMB Server 1               SMB Server 2
SMB Multichannel & NIC Teaming
   1 session, NIC Teaming without MC 1 session, NIC Teaming with MC
      Automatic NIC failover                               Automatic NIC failover (faster with
      Can’t use full bandwidth                              NIC Teaming)
        Only one NIC engaged                               Combined NIC bandwidth available
        Only one CPU core engaged                            Multiple NICs engaged
                                                              Multiple CPU cores engaged


             SMB Client 1              SMB Client 2                SMB Client 1              SMB Client 2
       RSS   NIC Teaming RSS           NIC Teaming           RSS   NIC Teaming RSS            NIC Teaming
          NIC             NIC       NIC            NIC          NIC             NIC       NIC             NIC
        10GbE           10GbE      1GbE           1GbE        10GbE           10GbE      1GbE            1GbE


        Switch          Switch     Switch         Switch      Switch          Switch     Switch        Switch
        10GbE           10GbE       1GbE           1GbE       10GbE           10GbE       1GbE          1GbE


          NIC             NIC       NIC             NIC         NIC             NIC       NIC             NIC
        10GbE           10GbE      1GbE           1GbE        10GbE           10GbE      1GbE            1GbE
       RSS                   RSS                             RSS                   RSS
             NIC Teaming                NIC Teaming                NIC Teaming                NIC Teaming
             SMB Server 2              SMB Server 2                SMB Server 1              SMB Server 2
SMB Direct & Multichannel
   1 session, without Multichannel                         1 session, with Multichannel
      No automatic failover                                 Automatic NIC failover
      Can’t use full bandwidth                              Combined NIC bandwidth available
        Only one NIC engaged                                  Multiple NICs engaged
        RDMA capability not used                              Multiple RDMA connections



            SMB Client 1                SMB Client 2               SMB Client 1             SMB Client 2

         R-NIC        R-NIC         R-NIC        R-NIC          R-NIC         R-NIC     R-NIC         R-NIC
        54GbIB       54GbIB         10GbE        10GbE         54GbIB        54GbIB     10GbE         10GbE


        Switch       Switch         Switch        Switch       Switch        Switch     Switch        Switch
        54GbIB       54GbIB         10GbE         10GbE        54GbIB        54GbIB     10GbE         10GbE


         R-NIC        R-NIC         R-NIC        R-NIC          R-NIC         R-NIC     R-NIC         R-NIC
        54GbIB       54GbIB         10GbE        10GbE         54GbIB        54GbIB     10GbE         10GbE

           SMB Server 1                 SMB Server 2              SMB Server 1              SMB Server 2
SMB Multichannel Auto Configuration
       Auto configuration looks at NIC type/speed => Same NICs are
        used for RDMA/Multichannel (doesn’t mix 10Gbps/1Gbps,
        RDMA/non-RDMA)
       Let the algorithms work before you decide to intervene
       Choose adapters wisely for their function


                SMB Client           SMB Client           SMB Client           SMB Client
             RSS
             NIC         NIC     R-NIC       R-NIC    R-NIC        NIC      NIC        NIC
            10GbE       1GbE     10GbE      32GbIB    10GbE       1GbE     1GbE       Wireless



            Switch      Switch   Switch      Switch   Switch      Switch   Switch     Switch
            10GbE        1GbE    10GbE         IB     10GbE        1GbE     1GbE      Wireless


             NIC         NIC     R-NIC       R-NIC    R-NIC        NIC      NIC        NIC
            10GbE       1GbE     10GbE      32GbIB    10GbE       1GbE     1GbE       Wireless
               RSS

               SMB Server           SMB Server            SMB Server           SMB Server
Networking Features Cheat Sheet



      Metric        Large     Receive      Receive   Virtual   Remote   Single Root
                    Send      Segment      Side      Machine   DMA      I/O
                    Offload   Coalescing   Scaling   Queues    (RDMA)   Virtualization
                    (LSO)     (RSC)        (RSS)     (VMQ)              (SR-IOV)
      Lower
      Latency
      Higher
      Scalability
      Higher
      Throughput
      Lower Path
      Length
Advanced Network Features (2)
 Consistent Device Naming
 DCTCP/DCB/QOS
 DHCP Guard/Router Guard/Port Mirroring
 Port ACLs
 IPSEC Task Offload for Virtual Machines (IPsecTOv2)
 Network virtualization & Extensible Switch
Consistent Device Naming
DCTCP Requires Less Buffer Memory




     1Gbps flow controlled by TCP     1Gbps flow controlled by DCTCP
      Needs 400 to 600KB of memory    Requires 30KB of memory
      TCP saw tooth visible           Smooth
Datacenter TCP (DCTCP)
 W2K12 deals with network congestion by reacting to
 the degree & not merely the presence of congestion.
 DCTCP aims to achieve low latency, high burst tolerance and
 high throughput, with small buffer switches.
 Requires Explicit Congestion Notification (ECN, RFC 3168)
 capable switches.
 Algorithm enabled when it makes sense
 (low round trip times, i.e. in the data center).
Datacenter TCP (DCTCP)




http://www.flickr.com/photos/srgblog/414839326
Datacenter TCP (DCTCP)


                                    Running out of buffer in a
                                       switch gets you in to stop/go
                                       hell by getting a boatload of
                                       green, orange & red lights
                                       along your way
                                    Big buffers mitigate this but
                                       are very expensive

http://www.flickr.com/photos/mwichary/3321222807/                    http://www.flickr.com/photos/bexross/2636921208/
Datacenter TCP (DCTP)


         You want to be in a green wave




                                                                                http://www.flickr.com/photos/highwaysagency/6281302040/

                                                                                Windows Server 2012 & ECN
                                                                                provides network traffic control
http://www.telegraph.co.uk/motoring/news/5149151/Motorists-to-be-given-green-
traffic-lights-if-they-stick-to-speed-limit.html                                by default
Data Center Bridging (DCB)

 Prevents congestion in NIC & network by reserving
 bandwidth for particular traffic types

 Windows 2012 provides support & control for DCB, tags
 packets by traffic type

 Provides lossless transport for mission critical workloads
DCB is like a car pool lane …




              http://www.flickr.com/photos/philopp/7332438786/
DCB Requirements

1.   Enhanced Transmission Selection (IEEE 802.1Qaz)

2.   Priority Flow Control (IEEE 802.1Qbb)

3.   (Optional) Data Center Bridging Exchange protocol

4.   (Not required) Congestion Notification (IEEE 802.1Qau)
Hyper-V Qos beyond the VM

           Management OS           VM 1                  VM n


               Live Migration

                  Storage
                                       Hyper-V virtual switch
               Management

                                          LBFO Teamed NIC
Manage the Network Bandwidth      10 GbE Phy NIC    10 GbE Phy NIC

with a Maximum (value) and/or a
Minimum (value or weight)
Hyper-V Qos beyond the VM
Default Flow per Virtual Switch

  Customers may group a number of
  VMs that each don’t have                                     Gold
                                        VM1       VM2
  minimum bandwidth. They will be                             Tenant
  bucketed into a default flow, which
  has minimum weight allocation.
                                         ?             ?               10
  This is to prevent starvation.
                                        Hyper-V Extensible Switch




                                              1 Gbps
Maximum Bandwidth for Tenants
  One common customer pain
  point is WAN links are             Unified Remote Access
  expensive                                 Gateway


  Cap VM throughput to the       <100Mb               ∞
  Internet to avoid bill shock
                                          Hyper-V Extensible Switch




                                           Internet          Intranet
Bandwidth Network Management

 Manage the Network
 Bandwidth with a
 Maximum and a
 Minimum value
 SLAs for hosted Virtual
 Machines
 Control per VMs and not
 per HOST
DHCP & Router Guard, Port Mirroring
IPsec Task Offload

 IPsec is CPU intensive => Offload to NIC
 In demand due to compliance (SOX, HIPPA, etc.)
 IPsec is required & needed for secured operations
 Only available to host/parent workloads in W2K8R2
     Now extended to virtual machines
     Managed by the Hyper-V switch
Port ACL
                                     Port ACL
 Allow/Deny/Counter
 MAC, IPv4 or IPv6 addresses
 Wildcards allowed in IP addresses
Note: Counters are implemented as ACLs
  Counts packets to address/range
  Read via Note: Counters
            WMI/PowerShell      are implemented as ACLs
  Counters are– Counts resource metering you can do for charge/show back, planning etc.
               tied into the packets to address/range

               – Read via WMI/PowerShell
    ACLs are the basic building blocks the resource metering you
               – Counters are tied into of virtual switch security functions
                  can do for charge/show back, planning etc.
Questions & Answers


          http://workinghardinit.wordpress.com
                             @workinghardinit

Windows Server 2012 Hyper-V Networking Evolved

  • 1.
    Windows Server 2012Hyper-V Networking Evolved Didier Van Hoye
  • 2.
    Didier Van Hoye TechnicalArchitect – FGIA Microsoft MVP & MEET Member http://workinghardinit.wordpress.com @workinghardinit
  • 3.
    What We’ll Discuss WindowsServer 2012 Networking  Changed & Improved features  New features  Relationship to Hyper-V
  • 4.
    Why We’ll DiscussThis We face many network challenges  Keep systems & services running  High to continuous availability  High reliability & reducing complexity  Security, multitenancy, extensibility  Cannot keep throwing money at it (CAPEX)  Network virtualization, QOS, bandwidth management in box  Performance (latency, throughput, scalability)  Leverage existing hardware  Control operational cost (OPEX)  Reduce complexity
  • 5.
    Eternal Challenge =Balanced Design COST AVAILABILITY CPU MEMORY CAPACITY NETWORK STORAGE PERFORMANCE
  • 6.
    Network Bottlenecks 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16  In the host networking stack 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 PowerEdge M1000e 2 4  In the NICs  In the switches
  • 7.
    Socket, NUMA, Core,K-Group  Processor: One physical processor, which can consist Kernel Group (K-Group) of one or more NUMA nodes. Today a physical processor ≈ a socket, with multiple cores.  Non-uniform memory architecture (NUMA) node: A set of logical processors and cache that are close to one another.  Core: One processing unit, which can consist of one or more logical processors.  Logical processor (LP): One logical computing engine from the perspective of the operating system, application or driver. In effect, a logical processor is a thread (think hyper threading).  Kernel Group: A set of up to 64 logical processors.
  • 8.
    Advanced Network Features(1) Receive Side Scaling (RSS) Receive Segment Coalescing (RSC) Dynamic Virtual Machine Queuing (DVMQ) Single Root I/O Virtualization (SR-IOV) NIC TEAMING RDMA/Multichannel support for virtual machines on SMB3.0
  • 9.
    Receive Side Scaling(RSS)  Windows Server 2012 scales RSS to the next generation of servers & workloads  Spreads interrupts across all available CPUs  Even for those very large scale hosts  RSS now works across K-Groups  Even RSS is “Numa Aware” to optimize performance  Now load balances UDP traffic across CPUs  40% to 100% more throughput (backups, file copies, web)
  • 10.
    Node 0 Node 1 Node 2 Node 3 Queues Incoming Packets RSS NIC with 8 Queues RSS improves scalability on multiple processors / NUMA nodes by distributing TCP/UDP receive traffic across the cores in ≠ nodes / K-Groups
  • 11.
    Receive Segment Coalescing(RSC)  Coalesces packets in the NIC so the stack processes fewer headers  Multiple packets belonging to a connection are coalesced by the NIC to a larger packet (max of 64 K) and processed within a single interrupt  10 - 20% improvement in throughput & CPU workload  Offload to NIC  Enabled by default on all 10Gbps
  • 12.
    Receive Segment Coalescing Coalesced into larger buffer NIC with RSC Incoming Packets RSC helps by coalescing multiple inbound packets into a larger buffer or “packet” which reduces per packet CPU costs as less headers need to be processed.
  • 13.
    Dynamic Virtual MachineQueue (DVMQ) VMQ is to virtualization what RSS is to native workloads. It makes sure that Routing, Filtering etc. is done by the NIC in queues and that the interrupts for those queues don’t get done by 1 processor (0). Most inbox 10Gbps Ethernet adapters support this. Enabled by default. Network I/O path with VMQ Network I/O path without VMQ
  • 14.
    Dynamic Virtual MachineQueue (DVMQ) Root Partition Root Partition Root Partition CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU 0 1 2 3 0 1 2 3 0 1 2 3 Physical NIC Physical NIC Physical NIC No VMQ Static VMQ Dynamic VMQ Adaptive  optimal performance across changing workloads
  • 15.
    Single-Root I/O Virtualization(SR-IOV)  Reduces CPU utilization for processing network traffic  Reduces latency path Root Partition Virtual Machine  Increases throughput  Requires: Hyper-V Switch Virtual NIC  Chipset: Interrupt & DMA remapping Routing  BIOS Support VLAN  CPU: Hardware virtualization, EPT or NPT Filtering VMBUS Data Copy Virtual Function Physical NIC Physical NIC SR-IOV Network I/O path without SR-IOV with SR-IOV
  • 16.
    SR-IOV Enabling &Live Migration Turn On IOV Live Migration Post Migration  Enable IOV (VM NIC Property)  Switch back to Software path  Reassign Virtual Function  Virtual Function is “Assigned”  Remove VF from VM  Assuming resources are available  “NIC” automatically created  Migrate as normal  Traffic flows through VF  Software path is not used Virtual Machine Network Stack “NIC” VM has connectivity “NIC” even if Software NIC  Switch not in IOV mode Software NIC  IOV physical NIC not present Software Switch  Different NIC vendor Software Switch (IOV Mode)  Different NIC firmware (IOV Mode) Virtual Function Virtual Function Physical NIC Physical NIC SR-IOV SR-IOV Physical NIC
  • 17.
    NIC TEAMING  Customersare dealing with way to many issues.  NIC vendors would like to get rid of supporting this.  Microsoft needs this to be competitive & complete the solution stack + reduce support issues.
  • 18.
    NIC Teaming Hyper-V Extensible Switch  Teaming modes: LBFO Admin GUI  Switch dependent Frame distribution/aggregation Failure detection  Switch independent WMI Control protocol implementation LBFO Provider  Load balancing: LBFO Configuration DLL IOCTL  Address Hash Port 1 Port 2 Port 3 Virtual miniport 1  Hyper-Port IM MUX Kernel mode  Hashing modes: User mode Protocol edge  4-tuple NIC 1 NIC 2 NIC 3  2-tuple  MAC address  Active/Active & Active/Standby Network switch  Vendor Agnostic
  • 19.
    NIC TEAMING (LBFO) VM (Guest Running Any OS) VM (Guest Running Windows Server 2012) LBFO Teamed NIC Hyper-V virtual switch SR-IOV Not exposed Hyper-V virtual Hyper-V virtual switch switch LBFO Teamed NIC SR-IOV NIC SR-IOV NIC SR-IOV NIC SR-IOV NIC Parent NIC Teaming Guest NIC Teaming
  • 20.
    SMB Direct (SMBover RDMA) What  Addresses congestion in network stack by offloading the stack to SMB Client SMB Server the network adapter Advantages Application  Scalable, fast and efficient storage access User  High throughput, low latency & minimal CPU utilization  Load balancing, automatic failover & bandwidth aggregation via Kernel SMB Multichannel SMB Client SMB Server Scenarios Network w/ Network w/ NTFS  High performance remote file access for application RDMA RDMA SCSI support support servers like Hyper-V, SQL Server, IIS and HPC  Used by File Server and Clustered Shared Volumes (CSV) for storage communications within a cluster R-NIC R-NIC Disk Required hardware  RDMA-capable network interface (R-NIC)  Three types: iWARP, RoCE & Infiniband
  • 21.
    SMB Multichannel  Multipleconnections per SMB session  Full Throughput  Bandwidth aggregation with multiple NICs  Multiple CPUs cores engaged when using Receive Side Scaling (RSS)  Automatic Failover  SMB Multichannel implements end-to-end failure detection  Leverages NIC teaming if present, but does not require it  Automatic Configuration  SMB detects and uses multiple network paths
  • 22.
    SMB Multichannel SingleNIC Port 1 session, without Multichannel 1 session, with Multichannel  No failover  No failover  Can’t use full 10Gbps  Full 10Gbps available  Only one TCP/IP connection  Multiple TCP/IP connections  Only one CPU core engaged  Receive Side Scaling (RSS) helps distribute load across CPU cores SMB Client CPU utilization per core SMB Client CPU utilization per core RSS RSS NIC NIC 10GbE 10GbE Switch Switch 10GbE 10GbE NIC NIC 10GbE 10GbE RSS RSS Core 1 Core 2 Core 3 Core 4 Core 1 Core 2 Core 3 Core 4 SMB Server SMB Server
  • 23.
    SMB Multichannel MultipleNIC Ports 1 session, without Multichannel 1 session, with Multichannel  No automatic failover  Automatic NIC failover  Can’t use full bandwidth  Combined NIC bandwidth available  Only one NIC engaged  Multiple NICs engaged  Only one CPU core engaged  Multiple CPU cores engaged SMB Client 1 SMB Client 2 SMB Client 1 SMB Client 2 RSS RSS RSS RSS NIC NIC NIC NIC NIC NIC NIC NIC 10GbE 10GbE 10GbE 10GbE 10GbE 10GbE 10GbE 10GbE Switch Switch Switch Switch Switch Switch Switch Switch 10GbE 10GbE 10GbE 10GbE 10GbE 10GbE 10GbE 10GbE NIC NIC NIC NIC NIC NIC NIC NIC 10GbE 10GbE 10GbE 10GbE 10GbE 10GbE 10GbE 10GbE RSS RSS RSS RSS SMB Server 1 SMB Server 2 SMB Server 1 SMB Server 2
  • 24.
    SMB Multichannel &NIC Teaming 1 session, NIC Teaming without MC 1 session, NIC Teaming with MC  Automatic NIC failover  Automatic NIC failover (faster with  Can’t use full bandwidth NIC Teaming)  Only one NIC engaged  Combined NIC bandwidth available  Only one CPU core engaged  Multiple NICs engaged  Multiple CPU cores engaged SMB Client 1 SMB Client 2 SMB Client 1 SMB Client 2 RSS NIC Teaming RSS NIC Teaming RSS NIC Teaming RSS NIC Teaming NIC NIC NIC NIC NIC NIC NIC NIC 10GbE 10GbE 1GbE 1GbE 10GbE 10GbE 1GbE 1GbE Switch Switch Switch Switch Switch Switch Switch Switch 10GbE 10GbE 1GbE 1GbE 10GbE 10GbE 1GbE 1GbE NIC NIC NIC NIC NIC NIC NIC NIC 10GbE 10GbE 1GbE 1GbE 10GbE 10GbE 1GbE 1GbE RSS RSS RSS RSS NIC Teaming NIC Teaming NIC Teaming NIC Teaming SMB Server 2 SMB Server 2 SMB Server 1 SMB Server 2
  • 25.
    SMB Direct &Multichannel 1 session, without Multichannel 1 session, with Multichannel  No automatic failover  Automatic NIC failover  Can’t use full bandwidth  Combined NIC bandwidth available  Only one NIC engaged  Multiple NICs engaged  RDMA capability not used  Multiple RDMA connections SMB Client 1 SMB Client 2 SMB Client 1 SMB Client 2 R-NIC R-NIC R-NIC R-NIC R-NIC R-NIC R-NIC R-NIC 54GbIB 54GbIB 10GbE 10GbE 54GbIB 54GbIB 10GbE 10GbE Switch Switch Switch Switch Switch Switch Switch Switch 54GbIB 54GbIB 10GbE 10GbE 54GbIB 54GbIB 10GbE 10GbE R-NIC R-NIC R-NIC R-NIC R-NIC R-NIC R-NIC R-NIC 54GbIB 54GbIB 10GbE 10GbE 54GbIB 54GbIB 10GbE 10GbE SMB Server 1 SMB Server 2 SMB Server 1 SMB Server 2
  • 26.
    SMB Multichannel AutoConfiguration  Auto configuration looks at NIC type/speed => Same NICs are used for RDMA/Multichannel (doesn’t mix 10Gbps/1Gbps, RDMA/non-RDMA)  Let the algorithms work before you decide to intervene  Choose adapters wisely for their function SMB Client SMB Client SMB Client SMB Client RSS NIC NIC R-NIC R-NIC R-NIC NIC NIC NIC 10GbE 1GbE 10GbE 32GbIB 10GbE 1GbE 1GbE Wireless Switch Switch Switch Switch Switch Switch Switch Switch 10GbE 1GbE 10GbE IB 10GbE 1GbE 1GbE Wireless NIC NIC R-NIC R-NIC R-NIC NIC NIC NIC 10GbE 1GbE 10GbE 32GbIB 10GbE 1GbE 1GbE Wireless RSS SMB Server SMB Server SMB Server SMB Server
  • 27.
    Networking Features CheatSheet Metric Large Receive Receive Virtual Remote Single Root Send Segment Side Machine DMA I/O Offload Coalescing Scaling Queues (RDMA) Virtualization (LSO) (RSC) (RSS) (VMQ) (SR-IOV) Lower Latency Higher Scalability Higher Throughput Lower Path Length
  • 28.
    Advanced Network Features(2) Consistent Device Naming DCTCP/DCB/QOS DHCP Guard/Router Guard/Port Mirroring Port ACLs IPSEC Task Offload for Virtual Machines (IPsecTOv2) Network virtualization & Extensible Switch
  • 29.
  • 30.
    DCTCP Requires LessBuffer Memory 1Gbps flow controlled by TCP 1Gbps flow controlled by DCTCP  Needs 400 to 600KB of memory  Requires 30KB of memory  TCP saw tooth visible  Smooth
  • 31.
    Datacenter TCP (DCTCP) W2K12 deals with network congestion by reacting to the degree & not merely the presence of congestion.  DCTCP aims to achieve low latency, high burst tolerance and high throughput, with small buffer switches.  Requires Explicit Congestion Notification (ECN, RFC 3168) capable switches.  Algorithm enabled when it makes sense (low round trip times, i.e. in the data center).
  • 32.
  • 33.
    Datacenter TCP (DCTCP)  Running out of buffer in a switch gets you in to stop/go hell by getting a boatload of green, orange & red lights along your way  Big buffers mitigate this but are very expensive http://www.flickr.com/photos/mwichary/3321222807/ http://www.flickr.com/photos/bexross/2636921208/
  • 34.
    Datacenter TCP (DCTP) You want to be in a green wave http://www.flickr.com/photos/highwaysagency/6281302040/ Windows Server 2012 & ECN provides network traffic control http://www.telegraph.co.uk/motoring/news/5149151/Motorists-to-be-given-green- traffic-lights-if-they-stick-to-speed-limit.html by default
  • 35.
    Data Center Bridging(DCB)  Prevents congestion in NIC & network by reserving bandwidth for particular traffic types  Windows 2012 provides support & control for DCB, tags packets by traffic type  Provides lossless transport for mission critical workloads
  • 36.
    DCB is likea car pool lane … http://www.flickr.com/photos/philopp/7332438786/
  • 37.
    DCB Requirements 1. Enhanced Transmission Selection (IEEE 802.1Qaz) 2. Priority Flow Control (IEEE 802.1Qbb) 3. (Optional) Data Center Bridging Exchange protocol 4. (Not required) Congestion Notification (IEEE 802.1Qau)
  • 38.
    Hyper-V Qos beyondthe VM Management OS VM 1 VM n Live Migration Storage Hyper-V virtual switch Management LBFO Teamed NIC Manage the Network Bandwidth 10 GbE Phy NIC 10 GbE Phy NIC with a Maximum (value) and/or a Minimum (value or weight)
  • 39.
  • 40.
    Default Flow perVirtual Switch Customers may group a number of VMs that each don’t have Gold VM1 VM2 minimum bandwidth. They will be Tenant bucketed into a default flow, which has minimum weight allocation. ? ? 10 This is to prevent starvation. Hyper-V Extensible Switch 1 Gbps
  • 41.
    Maximum Bandwidth forTenants One common customer pain point is WAN links are Unified Remote Access expensive Gateway Cap VM throughput to the <100Mb ∞ Internet to avoid bill shock Hyper-V Extensible Switch Internet Intranet
  • 42.
    Bandwidth Network Management Manage the Network Bandwidth with a Maximum and a Minimum value  SLAs for hosted Virtual Machines  Control per VMs and not per HOST
  • 43.
    DHCP & RouterGuard, Port Mirroring
  • 44.
    IPsec Task Offload IPsec is CPU intensive => Offload to NIC  In demand due to compliance (SOX, HIPPA, etc.)  IPsec is required & needed for secured operations  Only available to host/parent workloads in W2K8R2  Now extended to virtual machines  Managed by the Hyper-V switch
  • 45.
    Port ACL Port ACL  Allow/Deny/Counter  MAC, IPv4 or IPv6 addresses  Wildcards allowed in IP addresses Note: Counters are implemented as ACLs  Counts packets to address/range  Read via Note: Counters WMI/PowerShell are implemented as ACLs  Counters are– Counts resource metering you can do for charge/show back, planning etc. tied into the packets to address/range – Read via WMI/PowerShell ACLs are the basic building blocks the resource metering you – Counters are tied into of virtual switch security functions can do for charge/show back, planning etc.
  • 46.
    Questions & Answers http://workinghardinit.wordpress.com @workinghardinit