Socket, NUMA, Core, K-Group– Processor: One physical processor, which can consist Kernel Group (K-Group) of one or more NUMA nodes. Today a physical processor ≈ a socket, with multiple cores.– Non-uniform memory architecture (NUMA) node: A set of logical processors and cache that are close to one another.– Core: One processing unit, which can consist of one or more logical processors.– Logical processor (LP): One logical computing engine from the perspective of the operating system, application or driver. In effect, a logical processor is a thread (think hyper threading).– Kernel Group: A set of up to 64 logical processors.
Advanced Network Features (1)Receive Side Scaling (RSS)Receive Segment Coalescing (RSC)Dynamic Virtual Machine Queuing (DVMQ)Single Root I/O Virtualization (SR-IOV)NIC TEAMINGRDMA/Multichannel support for virtual machines on SMB3.0
Receive Side Scaling (RSS)– Windows Server 2012 scales RSS to the next generation of servers & workloads– Spreads interrupts across all available CPUs– Even for those very large scale hosts– RSS now works across K-Groups– Even RSS is “Numa Aware” to optimize performance– Now load balances UDP traffic across CPUs– 40% to 100% more throughput (backups, file copies, web)
Node 0 Node 1 Node 2 Node 3 Queues Incoming Packets RSS NIC with 8 QueuesRSS improves scalability on multiple processors / NUMA nodes by distributing TCP/UDP receive traffic across the cores in ≠ nodes / K-Groups
Receive Segment Coalescing (RSC)– Coalesces packets in the NIC so the stack processes fewer headers– Multiple packets belonging to a connection are coalesced by the NIC to a larger packet (max of 64 K) and processed within a single interrupt– 10 - 20% improvement in throughput & CPU workload Offload to NIC– Enabled by default on all 10Gbps
Receive Segment Coalescing Coalesced into larger buffer NIC with RSCIncoming Packets RSC helps by coalescing multiple inbound packets into a larger buffer or “packet” which reduces per packet CPU costs as less headers need to be processed.
Dynamic Virtual Machine Queue (DVMQ)VMQ is to virtualization what RSS is to native workloads.It makes sure that Routing, Filtering etc. is done by the NIC in queues andthat the interrupts for those queues don’t get done by 1 processor (0).Most inbox 10Gbps Ethernet adapters support this.Enabled by default. Network I/O path without VMQ Network I/O path with VMQ
Dynamic Virtual Machine Queue (DVMQ) Root Partition Root Partition Root Partition CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU 0 1 2 3 0 1 2 3 0 1 2 3 Physical NIC Physical NIC Physical NIC No VMQ Static VMQ Dynamic VMQ Adaptive optimal performance across changing workloads
Single-Root I/O Virtualization (SR-IOV)– Reduces CPU utilization for processing network traffic– Reduces latency path Root Partition Virtual Machine– Increases throughput Hyper-V Switch Virtual NIC– Requires: Routing VLAN • Chipset: Interrupt & DMA remapping Filtering VMBUS • BIOS Support Data Copy • CPU: Hardware virtualization, EPT or NPT Virtual Function Physical NIC Physical NIC SR-IOV without SR-IOV Network I/O path with SR-IOV
SR-IOV Enabling & Live MigrationTurn On IOV Live Migration Post Migration Enable IOV (VM NIC Property) Switch back to Software path Reassign Virtual Function Virtual Function is “Assigned” Remove VF from VM Assuming resources are available “NIC” automatically created Migrate as normal Traffic flows through VF Software path is not used Virtual Machine Network Stack “NIC” VM has connectivity even if “NIC” Switch not in IOV mode IOV physical NIC not Software NIC present Software NIC Different NIC vendorSoftware Switch Different NIC firmware Software Switch (IOV Mode) (IOV Mode) Virtual Function Virtual FunctionPhysical NIC Physical NIC SR-IOV SR-IOV Physical NIC
NIC TEAMING– Customers are dealing with way to many issues.– NIC vendors would like to get rid of supporting this.– Microsoft needs this to be competitive & complete the solution stack + reduce support issues.
NIC Teaming Hyper-V Extensible Switch– Teaming modes: • Switch dependent LBFO Admin GUI • Switch independent Frame distribution/aggregation Failure detection– Load balancing: WMI Control protocol implementation • Address Hash LBFO Provider LBFO • Hyper-Port Configuration DLL IOCTL Port 1 Port 2 Port 3– Hashing modes: Virtual miniport 1 • 4-tuple IM MUX • 2-tuple Kernel mode User mode Protocol edge • MAC address– Active/Active & Active/Standby NIC 1 NIC 2 NIC 3– Vendor Agnostic Network switch
NIC TEAMING (LBFO) VM (Guest Running Any OS) VM (Guest Running Windows Server 2012) LBFO Teamed NIC Hyper-V virtual switch SR-IOV Not exposed Hyper-V virtual Hyper-V virtual switch switch LBFO Teamed NICSR-IOV NIC SR-IOV NIC SR-IOV NIC SR-IOV NIC Parent NIC Teaming Guest NIC Teaming
NIC Teaming & QOS• NIC Teaming, Hyper-V switch, QoS and actual performance | part 1 – Theory• NIC Teaming, Hyper-V switch, QoS and actual performance | part 2 – Preparing the lab• NIC Teaming, Hyper-V switch, QoS and actual performance | part 3 – Performance• NIC Teaming, Hyper-V switch, QoS and actual performance | part 4 – Traffic classes
SMB Direct (SMB over RDMA)What SMB Client SMB Server• Addresses congestion in network stack by offloading the stack to the network adapterAdvantages Application• Scalable, fast and efficient storage access User• High throughput, low latency & minimal CPU utilization• Load balancing, automatic failover & bandwidth aggregation via SMB Kernel Multichannel SMB Client SMB ServerScenarios• High performance remote file access for application Network w/ Network w/ servers like Hyper-V, SQL Server, IIS and HPC RDMA RDMA NTFS SCSI• Used by File Server and Clustered Shared Volumes (CSV) for storage support support communications within a clusterRequired hardware R-NIC R-NIC• RDMA-capable network interface (R-NIC) Disk• Three types: iWARP, RoCE & Infiniband
SMB Multichannel Multiple connections per SMB session Full Throughput • Bandwidth aggregation with multiple NICs • Multiple CPUs cores engaged when using Receive Side Scaling (RSS) Automatic Failover • SMB Multichannel implements end-to-end failure detection • Leverages NIC teaming if present, but does not require it Automatic Configuration • SMB detects and uses multiple network paths
SMB Multichannel Single NIC Port1 session, without Multichannel 1 session, with Multichannel No failover No failover Can’t use full 10Gbps Full 10Gbps available Only one TCP/IP connection Multiple TCP/IP connections Only one CPU core engaged Receive Side Scaling (RSS) helps distribute load across CPU cores SMB Client CPU utilization per core SMB Client CPU utilization per core RSS RSS NIC NIC 10GbE 10GbE Switch Switch 10GbE 10GbE NIC NIC 10GbE 10GbE RSS RSS Core 1 Core 2 Core 3 Core 4 Core 1 Core 2 Core 3 Core 4 SMB Server SMB Server
SMB Multichannel Multiple NIC Ports1 session, without Multichannel 1 session, with Multichannel No automatic failover Automatic NIC failover Can’t use full bandwidth Combined NIC bandwidth available Only one NIC engaged Multiple NICs engaged Only one CPU core engaged Multiple CPU cores engaged SMB Client 1 SMB Client 2 SMB Client 1 SMB Client 2 RSS RSS RSS RSS NIC NIC NIC NIC NIC NIC NIC NIC 10GbE 10GbE 10GbE 10GbE 10GbE 10GbE 10GbE 10GbE Switch Switch Switch Switch Switch Switch Switch Switch 10GbE 10GbE 10GbE 10GbE 10GbE 10GbE 10GbE 10GbE NIC NIC NIC NIC NIC NIC NIC NIC 10GbE 10GbE 10GbE 10GbE 10GbE 10GbE 10GbE 10GbE RSS RSS RSS RSS SMB Server 1 SMB Server 2 SMB Server 1 SMB Server 2
SMB Multichannel & NIC Teaming1 session, NIC Teaming without MC 1 session, NIC Teaming with MC Automatic NIC failover Automatic NIC failover (faster with Can’t use full bandwidth NIC Teaming) Only one NIC engaged Combined NIC bandwidth available Only one CPU core engaged Multiple NICs engaged Multiple CPU cores engaged SMB Client 1 SMB Client 2 SMB Client 1 SMB Client 2 RSS NIC Teaming RSS NIC Teaming RSS NIC Teaming RSS NIC Teaming NIC NIC NIC NIC NIC NIC NIC NIC 10GbE 10GbE 1GbE 1GbE 10GbE 10GbE 1GbE 1GbE Switch Switch Switch Switch Switch Switch Switch Switch 10GbE 10GbE 1GbE 1GbE 10GbE 10GbE 1GbE 1GbE NIC NIC NIC NIC NIC NIC NIC NIC 10GbE 10GbE 1GbE 1GbE 10GbE 10GbE 1GbE 1GbE RSS RSS RSS RSS NIC Teaming NIC Teaming NIC Teaming NIC Teaming SMB Server 2 SMB Server 2 SMB Server 1 SMB Server 2
SMB Direct & Multichannel1 session, without Multichannel 1 session, with Multichannel No automatic failover Automatic NIC failover Can’t use full bandwidth Combined NIC bandwidth available Only one NIC engaged Multiple NICs engaged RDMA capability not used Multiple RDMA connections SMB Client 1 SMB Client 2 SMB Client 1 SMB Client 2 R-NIC R-NIC R-NIC R-NIC R-NIC R-NIC R-NIC R-NIC 54GbIB 54GbIB 10GbE 10GbE 54GbIB 54GbIB 10GbE 10GbE Switch Switch Switch Switch Switch Switch Switch Switch 54GbIB 54GbIB 10GbE 10GbE 54GbIB 54GbIB 10GbE 10GbE R-NIC R-NIC R-NIC R-NIC R-NIC R-NIC R-NIC R-NIC 54GbIB 54GbIB 10GbE 10GbE 54GbIB 54GbIB 10GbE 10GbE SMB Server 1 SMB Server 2 SMB Server 1 SMB Server 2
SMB Multichannel Auto Configuration– Auto configuration looks at NIC type/speed => Same NICs are used for RDMA/Multichannel (doesn’t mix 10Gbps/1Gbps, RDMA/non-RDMA)– Let the algorithms work before you decide to intervene– Choose adapters wisely for their function SMB Client SMB Client SMB Client SMB Client RSS NIC NIC R-NIC R-NIC R-NIC NIC NIC NIC 10GbE 1GbE 10GbE 32GbIB 10GbE 1GbE 1GbE Wireless Switch Switch Switch Switch Switch Switch Switch Switch 10GbE 1GbE 10GbE IB 10GbE 1GbE 1GbE Wireless NIC NIC R-NIC R-NIC R-NIC NIC NIC NIC 10GbE 1GbE 10GbE 32GbIB 10GbE 1GbE 1GbE Wireless RSS SMB Server SMB Server SMB Server SMB Server
Networking Features Cheat SheetMetric Large Send Receive Receive Virtual Remote Single Root I/O Offload Segment Side Scaling Machine DMA Virtualization (LSO) Coalescing (RSS) Queues (RDMA) (SR-IOV) (RSC) (VMQ)LowerLatencyHigherScalabilityHigherThroughputLower PathLength
DCTCP Requires Less Buffer Memory1Gbps flow controlled by TCP 1Gbps flow controlled by DCTCP Needs 400 to 600KB of memory Requires 30KB of memory TCP saw tooth visible Smooth
Datacenter TCP (DCTCP)– W2K12 deals with network congestion by reacting to the degree & not merely the presence of congestion.– DCTCP aims to achieve low latency, high burst tolerance and high throughput, with small buffer switches.– Requires Explicit Congestion Notification (ECN, RFC 3168) capable switches.– Algorithm enabled when it makes sense (low round trip times, i.e. in the data center).
Datacenter TCP (DCTCP) Running out of buffer in a switch gets you in to stop/go hell by getting a boatload of green, orange & red lights along your way Big buffers mitigate this but are very expensivehttp://www.flickr.com/photos/mwichary/3321222807/ http://www.flickr.com/photos/bexross/2636921208/
Datacenter TCP (DCTP) You want to be in a green wave http://www.flickr.com/photos/highwaysagency/6281302040/http://www.telegraph.co.uk/motoring/news/5149151/Motorists-to-be- Windows Server 2012 & ECN providesgiven-green-traffic-lights-if-they-stick-to-speed-limit.html network traffic control by default
Data Center Bridging (DCB)– Prevents congestion in NIC & network by reserving bandwidth for particular traffic types– Windows 2012 provides support & control for DCB, tags packets by traffic type– Provides lossless transport for mission critical workloads
DCB is like a car pool lane …http://www.flickr.com/photos/philopp/7332438786/
DCB Requirements1. Enhanced Transmission Selection (IEEE 802.1Qaz)2. Priority Flow Control (IEEE 802.1Qbb)3. (Optional) Data Center Bridging Exchange protocol4. (Not required) Congestion Notification (IEEE 802.1Qau)
Hyper-V Qos beyond the VM Management OS VM 1 VM n Live Migration Storage Hyper-V virtual switch ManagementManage the Network Bandwidth LBFO Teamed NICwith a Maximum (value) and/or a 10 GbE Phy NIC 10 GbE Phy NICMinimum (value or weight)
Hyper-V Qos beyond the VMhttp://www.hyper-v.nu/archives/hvredevoort/2012/06/building-a-converged-fabric-with-windows-server-2012-powershell/
Default Flow per Virtual SwitchCustomers may group a number ofVMs that each don’t have minimum Gold VM1 VM2bandwidth. They will be bucketized Tenantinto a default flow, which hasminimum weight allocation. This is to ? ? 10prevent starvation. Hyper-V Extensible Switch 1 Gbps
Maximum Bandwidth for TenantsOne common customer pain point isWAN links are expensive Unified Remote Access GatewayCap VM throughput to the Internetto avoid bill shock <100Mb ∞ Hyper-V Extensible Switch Internet Intranet
Bandwidth Network Management• Manage the Network Bandwidth with a Maximum and a Minimum value• SLAs for hosted Virtual Machines• Control per VMs and not per HOST
IPsec Task Offload– IPsec is CPU intensive => Offload to NIC– In demand due to compliance (SOX, HIPPA, etc.)– IPsec is required & needed for secured operations– Only available to host/parent workloads in W2K8R2 Now extended to virtual machines Managed by the Hyper-V switch
Port ACL Allow/Deny/Counter MAC, IPv4 or IPv6 addresses Wildcards allowed in IP addressesACLs are the basic building blocks of virtual switch security functionsNote: Counters are implemented as ACLs• Counts packets to address/range• Read via WMI/PowerShell• Counters are tied into the resource metering you can do for charge/show back, planning etc.