Socket, NUMA, Core, K-Group Processor: One physical processor, which can consist Kernel Group (K-Group) of one or more NUMA nodes. Today a physical processor ≈ a socket, with multiple cores. Non-uniform memory architecture (NUMA) node: A set of logical processors and cache that are close to one another. Core: One processing unit, which can consist of one or more logical processors. Logical processor (LP): One logical computing engine from the perspective of the operating system, application or driver. In effect, a logical processor is a thread (think hyper threading). Kernel Group: A set of up to 64 logical processors.
Advanced Network Features (1) Receive Side Scaling (RSS) Receive Segment Coalescing (RSC) Dynamic Virtual Machine Queuing (DVMQ) Single Root I/O Virtualization (SR-IOV) NIC TEAMING RDMA/Multichannel support for virtual machines on SMB3.0
Receive Side Scaling (RSS) Windows Server 2012 scales RSS to the next generation of servers & workloads Spreads interrupts across all available CPUs Even for those very large scale hosts RSS now works across K-Groups Even RSS is “Numa Aware” to optimize performance Now load balances UDP traffic across CPUs 40% to 100% more throughput (backups, file copies, web)
Node 0 Node 1 Node 2 Node 3 Queues Incoming Packets RSS NIC with 8 QueuesRSS improves scalability on multiple processors / NUMA nodes by distributing TCP/UDP receive traffic across the cores in ≠ nodes / K-Groups
Receive Segment Coalescing (RSC) Coalesces packets in the NIC so the stack processes fewer headers Multiple packets belonging to a connection are coalesced by the NIC to a larger packet (max of 64 K) and processed within a single interrupt 10 - 20% improvement in throughput & CPU workload Offload to NIC Enabled by default on all 10Gbps
Receive Segment Coalescing Coalesced into larger buffer NIC with RSC Incoming Packets RSC helps by coalescing multiple inbound packets into a larger buffer or “packet” which reduces per packet CPU costs as less headers need to be processed.
Dynamic Virtual Machine Queue (DVMQ)VMQ is to virtualization what RSS is to native workloads.It makes sure that Routing, Filtering etc. is done by the NIC in queues andthat the interrupts for those queues don’t get done by 1 processor (0).Most inbox 10Gbps Ethernet adapters support this.Enabled by default. Network I/O path with VMQ Network I/O path without VMQ
Dynamic Virtual Machine Queue (DVMQ) Root Partition Root Partition Root Partition CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU 0 1 2 3 0 1 2 3 0 1 2 3 Physical NIC Physical NIC Physical NIC No VMQ Static VMQ Dynamic VMQ Adaptive optimal performance across changing workloads
Single-Root I/O Virtualization (SR-IOV) Reduces CPU utilization for processing network traffic Reduces latency path Root Partition Virtual Machine Increases throughput Requires: Hyper-V Switch Virtual NIC Chipset: Interrupt & DMA remapping Routing BIOS Support VLAN CPU: Hardware virtualization, EPT or NPT Filtering VMBUS Data Copy Virtual Function Physical NIC Physical NIC SR-IOV Network I/O path without SR-IOV with SR-IOV
SR-IOV Enabling & Live Migration Turn On IOV Live Migration Post Migration Enable IOV (VM NIC Property) Switch back to Software path Reassign Virtual Function Virtual Function is “Assigned” Remove VF from VM Assuming resources are available “NIC” automatically created Migrate as normal Traffic flows through VF Software path is not used Virtual Machine Network Stack “NIC” VM has connectivity “NIC” even if Software NIC Switch not in IOV mode Software NIC IOV physical NIC not present Software Switch Different NIC vendor Software Switch (IOV Mode) Different NIC firmware (IOV Mode) Virtual Function Virtual Function Physical NIC Physical NIC SR-IOV SR-IOV Physical NIC
NIC TEAMING Customers are dealing with way to many issues. NIC vendors would like to get rid of supporting this. Microsoft needs this to be competitive & complete the solution stack + reduce support issues.
NIC Teaming Hyper-V Extensible Switch Teaming modes: LBFO Admin GUI Switch dependent Frame distribution/aggregation Failure detection Switch independent WMI Control protocol implementation LBFO Provider Load balancing: LBFO Configuration DLL IOCTL Address Hash Port 1 Port 2 Port 3 Virtual miniport 1 Hyper-Port IM MUX Kernel mode Hashing modes: User mode Protocol edge 4-tuple NIC 1 NIC 2 NIC 3 2-tuple MAC address Active/Active & Active/Standby Network switch Vendor Agnostic
NIC TEAMING (LBFO) VM (Guest Running Any OS) VM (Guest Running Windows Server 2012) LBFO Teamed NIC Hyper-V virtual switch SR-IOV Not exposed Hyper-V virtual Hyper-V virtual switch switch LBFO Teamed NIC SR-IOV NIC SR-IOV NIC SR-IOV NIC SR-IOV NIC Parent NIC Teaming Guest NIC Teaming
SMB Direct (SMB over RDMA)What Addresses congestion in network stack by offloading the stack to SMB Client SMB Server the network adapterAdvantages Application Scalable, fast and efficient storage access User High throughput, low latency & minimal CPU utilization Load balancing, automatic failover & bandwidth aggregation via Kernel SMB Multichannel SMB Client SMB ServerScenarios Network w/ Network w/ NTFS High performance remote file access for application RDMA RDMA SCSI support support servers like Hyper-V, SQL Server, IIS and HPC Used by File Server and Clustered Shared Volumes (CSV) for storage communications within a cluster R-NIC R-NIC DiskRequired hardware RDMA-capable network interface (R-NIC) Three types: iWARP, RoCE & Infiniband
SMB Multichannel Multiple connections per SMB session Full Throughput Bandwidth aggregation with multiple NICs Multiple CPUs cores engaged when using Receive Side Scaling (RSS) Automatic Failover SMB Multichannel implements end-to-end failure detection Leverages NIC teaming if present, but does not require it Automatic Configuration SMB detects and uses multiple network paths
SMB Multichannel Single NIC Port 1 session, without Multichannel 1 session, with Multichannel No failover No failover Can’t use full 10Gbps Full 10Gbps available Only one TCP/IP connection Multiple TCP/IP connections Only one CPU core engaged Receive Side Scaling (RSS) helps distribute load across CPU cores SMB Client CPU utilization per core SMB Client CPU utilization per core RSS RSS NIC NIC 10GbE 10GbE Switch Switch 10GbE 10GbE NIC NIC 10GbE 10GbE RSS RSS Core 1 Core 2 Core 3 Core 4 Core 1 Core 2 Core 3 Core 4 SMB Server SMB Server
SMB Multichannel Multiple NIC Ports 1 session, without Multichannel 1 session, with Multichannel No automatic failover Automatic NIC failover Can’t use full bandwidth Combined NIC bandwidth available Only one NIC engaged Multiple NICs engaged Only one CPU core engaged Multiple CPU cores engaged SMB Client 1 SMB Client 2 SMB Client 1 SMB Client 2 RSS RSS RSS RSS NIC NIC NIC NIC NIC NIC NIC NIC 10GbE 10GbE 10GbE 10GbE 10GbE 10GbE 10GbE 10GbE Switch Switch Switch Switch Switch Switch Switch Switch 10GbE 10GbE 10GbE 10GbE 10GbE 10GbE 10GbE 10GbE NIC NIC NIC NIC NIC NIC NIC NIC 10GbE 10GbE 10GbE 10GbE 10GbE 10GbE 10GbE 10GbE RSS RSS RSS RSS SMB Server 1 SMB Server 2 SMB Server 1 SMB Server 2
SMB Multichannel & NIC Teaming 1 session, NIC Teaming without MC 1 session, NIC Teaming with MC Automatic NIC failover Automatic NIC failover (faster with Can’t use full bandwidth NIC Teaming) Only one NIC engaged Combined NIC bandwidth available Only one CPU core engaged Multiple NICs engaged Multiple CPU cores engaged SMB Client 1 SMB Client 2 SMB Client 1 SMB Client 2 RSS NIC Teaming RSS NIC Teaming RSS NIC Teaming RSS NIC Teaming NIC NIC NIC NIC NIC NIC NIC NIC 10GbE 10GbE 1GbE 1GbE 10GbE 10GbE 1GbE 1GbE Switch Switch Switch Switch Switch Switch Switch Switch 10GbE 10GbE 1GbE 1GbE 10GbE 10GbE 1GbE 1GbE NIC NIC NIC NIC NIC NIC NIC NIC 10GbE 10GbE 1GbE 1GbE 10GbE 10GbE 1GbE 1GbE RSS RSS RSS RSS NIC Teaming NIC Teaming NIC Teaming NIC Teaming SMB Server 2 SMB Server 2 SMB Server 1 SMB Server 2
SMB Direct & Multichannel 1 session, without Multichannel 1 session, with Multichannel No automatic failover Automatic NIC failover Can’t use full bandwidth Combined NIC bandwidth available Only one NIC engaged Multiple NICs engaged RDMA capability not used Multiple RDMA connections SMB Client 1 SMB Client 2 SMB Client 1 SMB Client 2 R-NIC R-NIC R-NIC R-NIC R-NIC R-NIC R-NIC R-NIC 54GbIB 54GbIB 10GbE 10GbE 54GbIB 54GbIB 10GbE 10GbE Switch Switch Switch Switch Switch Switch Switch Switch 54GbIB 54GbIB 10GbE 10GbE 54GbIB 54GbIB 10GbE 10GbE R-NIC R-NIC R-NIC R-NIC R-NIC R-NIC R-NIC R-NIC 54GbIB 54GbIB 10GbE 10GbE 54GbIB 54GbIB 10GbE 10GbE SMB Server 1 SMB Server 2 SMB Server 1 SMB Server 2
SMB Multichannel Auto Configuration Auto configuration looks at NIC type/speed => Same NICs are used for RDMA/Multichannel (doesn’t mix 10Gbps/1Gbps, RDMA/non-RDMA) Let the algorithms work before you decide to intervene Choose adapters wisely for their function SMB Client SMB Client SMB Client SMB Client RSS NIC NIC R-NIC R-NIC R-NIC NIC NIC NIC 10GbE 1GbE 10GbE 32GbIB 10GbE 1GbE 1GbE Wireless Switch Switch Switch Switch Switch Switch Switch Switch 10GbE 1GbE 10GbE IB 10GbE 1GbE 1GbE Wireless NIC NIC R-NIC R-NIC R-NIC NIC NIC NIC 10GbE 1GbE 10GbE 32GbIB 10GbE 1GbE 1GbE Wireless RSS SMB Server SMB Server SMB Server SMB Server
Networking Features Cheat Sheet Metric Large Receive Receive Virtual Remote Single Root Send Segment Side Machine DMA I/O Offload Coalescing Scaling Queues (RDMA) Virtualization (LSO) (RSC) (RSS) (VMQ) (SR-IOV) Lower Latency Higher Scalability Higher Throughput Lower Path Length
Advanced Network Features (2) Consistent Device Naming DCTCP/DCB/QOS DHCP Guard/Router Guard/Port Mirroring Port ACLs IPSEC Task Offload for Virtual Machines (IPsecTOv2) Network virtualization & Extensible Switch
DCTCP Requires Less Buffer Memory 1Gbps flow controlled by TCP 1Gbps flow controlled by DCTCP Needs 400 to 600KB of memory Requires 30KB of memory TCP saw tooth visible Smooth
Datacenter TCP (DCTCP) W2K12 deals with network congestion by reacting to the degree & not merely the presence of congestion. DCTCP aims to achieve low latency, high burst tolerance and high throughput, with small buffer switches. Requires Explicit Congestion Notification (ECN, RFC 3168) capable switches. Algorithm enabled when it makes sense (low round trip times, i.e. in the data center).
Datacenter TCP (DCTCP) Running out of buffer in a switch gets you in to stop/go hell by getting a boatload of green, orange & red lights along your way Big buffers mitigate this but are very expensivehttp://www.flickr.com/photos/mwichary/3321222807/ http://www.flickr.com/photos/bexross/2636921208/
Datacenter TCP (DCTP) You want to be in a green wave http://www.flickr.com/photos/highwaysagency/6281302040/ Windows Server 2012 & ECN provides network traffic controlhttp://www.telegraph.co.uk/motoring/news/5149151/Motorists-to-be-given-green-traffic-lights-if-they-stick-to-speed-limit.html by default
Data Center Bridging (DCB) Prevents congestion in NIC & network by reserving bandwidth for particular traffic types Windows 2012 provides support & control for DCB, tags packets by traffic type Provides lossless transport for mission critical workloads
DCB is like a car pool lane … http://www.flickr.com/photos/philopp/7332438786/
DCB Requirements1. Enhanced Transmission Selection (IEEE 802.1Qaz)2. Priority Flow Control (IEEE 802.1Qbb)3. (Optional) Data Center Bridging Exchange protocol4. (Not required) Congestion Notification (IEEE 802.1Qau)
Hyper-V Qos beyond the VM Management OS VM 1 VM n Live Migration Storage Hyper-V virtual switch Management LBFO Teamed NICManage the Network Bandwidth 10 GbE Phy NIC 10 GbE Phy NICwith a Maximum (value) and/or aMinimum (value or weight)
Default Flow per Virtual Switch Customers may group a number of VMs that each don’t have Gold VM1 VM2 minimum bandwidth. They will be Tenant bucketed into a default flow, which has minimum weight allocation. ? ? 10 This is to prevent starvation. Hyper-V Extensible Switch 1 Gbps
Maximum Bandwidth for Tenants One common customer pain point is WAN links are Unified Remote Access expensive Gateway Cap VM throughput to the <100Mb ∞ Internet to avoid bill shock Hyper-V Extensible Switch Internet Intranet
Bandwidth Network Management Manage the Network Bandwidth with a Maximum and a Minimum value SLAs for hosted Virtual Machines Control per VMs and not per HOST
IPsec Task Offload IPsec is CPU intensive => Offload to NIC In demand due to compliance (SOX, HIPPA, etc.) IPsec is required & needed for secured operations Only available to host/parent workloads in W2K8R2 Now extended to virtual machines Managed by the Hyper-V switch
Port ACL Port ACL Allow/Deny/Counter MAC, IPv4 or IPv6 addresses Wildcards allowed in IP addressesNote: Counters are implemented as ACLs Counts packets to address/range Read via Note: Counters WMI/PowerShell are implemented as ACLs Counters are– Counts resource metering you can do for charge/show back, planning etc. tied into the packets to address/range – Read via WMI/PowerShell ACLs are the basic building blocks the resource metering you – Counters are tied into of virtual switch security functions can do for charge/show back, planning etc.