7. Socket, NUMA, Core, K-Group
– Processor: One physical processor, which can consist Kernel Group (K-Group)
of one or more NUMA nodes. Today a physical
processor ≈ a socket, with multiple cores.
– Non-uniform memory architecture (NUMA) node:
A set of logical processors and cache that are close to
one another.
– Core: One processing unit, which can consist of one or
more logical processors.
– Logical processor (LP): One logical computing engine
from the perspective of the operating system,
application or driver. In effect, a logical processor is a
thread (think hyper threading).
– Kernel Group: A set of up to 64 logical processors.
8. Advanced Network Features (1)
Receive Side Scaling (RSS)
Receive Segment Coalescing (RSC)
Dynamic Virtual Machine Queuing (DVMQ)
Single Root I/O Virtualization (SR-IOV)
NIC TEAMING
RDMA/Multichannel support for virtual machines on SMB3.0
9. Receive Side Scaling (RSS)
– Windows Server 2012 scales RSS to the next generation of
servers & workloads
– Spreads interrupts across all available CPUs
– Even for those very large scale hosts
– RSS now works across K-Groups
– Even RSS is “Numa Aware” to optimize performance
– Now load balances UDP traffic across CPUs
– 40% to 100% more throughput (backups, file copies, web)
10. Node 0 Node 1 Node 2 Node 3
Queues
Incoming Packets RSS NIC with 8 Queues
RSS improves scalability on multiple processors / NUMA nodes by distributing TCP/UDP
receive traffic across the cores in ≠ nodes / K-Groups
11. Receive Segment Coalescing (RSC)
– Coalesces packets in the NIC so the stack processes
fewer headers
– Multiple packets belonging to a connection are coalesced by
the NIC to a larger packet (max of 64 K) and processed
within a single interrupt
– 10 - 20% improvement in throughput & CPU workload
Offload to NIC
– Enabled by default on all 10Gbps
12. Receive Segment Coalescing
Coalesced into larger buffer
NIC with RSC
Incoming Packets
RSC helps by coalescing multiple inbound packets into a
larger buffer or “packet” which reduces per packet CPU
costs as less headers need to be processed.
13. Dynamic Virtual Machine Queue (DVMQ)
VMQ is to virtualization what RSS is to native workloads.
It makes sure that Routing, Filtering etc. is done by the NIC in queues and
that the interrupts for those queues don’t get done by 1 processor (0).
Most inbox 10Gbps Ethernet adapters support this.
Enabled by default.
Network I/O path without VMQ Network I/O path with VMQ
14. Dynamic Virtual Machine Queue (DVMQ)
Root Partition Root Partition Root Partition
CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU
0 1 2 3 0 1 2 3 0 1 2 3
Physical NIC Physical NIC Physical NIC
No VMQ Static VMQ Dynamic VMQ
Adaptive optimal performance across changing workloads
15. Single-Root I/O Virtualization (SR-IOV)
– Reduces CPU utilization for processing network traffic
– Reduces latency path Root Partition Virtual Machine
– Increases throughput Hyper-V Switch
Virtual NIC
– Requires: Routing
VLAN
• Chipset: Interrupt & DMA remapping Filtering VMBUS
• BIOS Support Data Copy
• CPU: Hardware virtualization, EPT or NPT Virtual Function
Physical NIC Physical NIC
SR-IOV
without SR-IOV
Network I/O path with SR-IOV
16. SR-IOV Enabling & Live Migration
Turn On IOV Live Migration Post Migration
Enable IOV (VM NIC Property) Switch back to Software path Reassign Virtual Function
Virtual Function is “Assigned” Remove VF from VM Assuming resources are
available
“NIC” automatically created Migrate as normal
Traffic flows through VF
Software path is not used
Virtual Machine
Network Stack
“NIC” VM has connectivity even if “NIC”
Switch not in IOV mode
IOV physical NIC not
Software NIC present Software NIC
Different NIC vendor
Software Switch Different NIC firmware Software Switch
(IOV Mode) (IOV Mode)
Virtual Function Virtual Function
Physical NIC Physical NIC
SR-IOV SR-IOV Physical NIC
17. NIC TEAMING
– Customers are dealing with
way to many issues.
– NIC vendors would like to
get rid of supporting this.
– Microsoft needs this to be
competitive & complete the
solution stack + reduce
support issues.
18. NIC Teaming
Hyper-V Extensible Switch
– Teaming modes:
• Switch dependent LBFO Admin GUI
• Switch independent Frame distribution/aggregation
Failure detection
– Load balancing: WMI Control protocol implementation
• Address Hash LBFO Provider
LBFO
• Hyper-Port Configuration DLL IOCTL
Port 1 Port 2 Port 3
– Hashing modes: Virtual miniport 1
• 4-tuple
IM MUX
• 2-tuple
Kernel mode
User mode
Protocol edge
• MAC address
– Active/Active & Active/Standby NIC 1 NIC 2 NIC 3
– Vendor Agnostic
Network switch
19. NIC TEAMING (LBFO)
VM (Guest Running Any OS) VM (Guest Running Windows Server 2012)
LBFO Teamed NIC
Hyper-V virtual switch
SR-IOV Not exposed Hyper-V virtual Hyper-V virtual
switch switch
LBFO Teamed NIC
SR-IOV NIC SR-IOV NIC SR-IOV NIC SR-IOV NIC
Parent NIC Teaming Guest NIC Teaming
20. NIC Teaming & QOS
• NIC Teaming, Hyper-V switch, QoS and actual performance |
part 1 – Theory
• NIC Teaming, Hyper-V switch, QoS and actual performance |
part 2 – Preparing the lab
• NIC Teaming, Hyper-V switch, QoS and actual performance |
part 3 – Performance
• NIC Teaming, Hyper-V switch, QoS and actual performance |
part 4 – Traffic classes
21. SMB Direct (SMB over RDMA)
What
SMB Client SMB Server
• Addresses congestion in network stack by offloading the stack to the
network adapter
Advantages Application
• Scalable, fast and efficient storage access
User
• High throughput, low latency & minimal CPU utilization
• Load balancing, automatic failover & bandwidth aggregation via SMB Kernel
Multichannel SMB Client SMB Server
Scenarios
• High performance remote file access for application Network w/ Network w/
servers like Hyper-V, SQL Server, IIS and HPC RDMA RDMA
NTFS
SCSI
• Used by File Server and Clustered Shared Volumes (CSV) for storage support support
communications within a cluster
Required hardware R-NIC R-NIC
• RDMA-capable network interface (R-NIC) Disk
• Three types: iWARP, RoCE & Infiniband
22. SMB Multichannel
Multiple connections per SMB session
Full Throughput
• Bandwidth aggregation with multiple NICs
• Multiple CPUs cores engaged when using Receive Side Scaling (RSS)
Automatic Failover
• SMB Multichannel implements end-to-end failure detection
• Leverages NIC teaming if present, but does not require it
Automatic Configuration
• SMB detects and uses multiple network paths
23. SMB Multichannel Single NIC Port
1 session, without Multichannel 1 session, with Multichannel
No failover No failover
Can’t use full 10Gbps Full 10Gbps available
Only one TCP/IP connection Multiple TCP/IP connections
Only one CPU core engaged Receive Side Scaling (RSS) helps
distribute load across CPU cores
SMB Client CPU utilization per core SMB Client CPU utilization per core
RSS RSS
NIC NIC
10GbE 10GbE
Switch Switch
10GbE 10GbE
NIC NIC
10GbE 10GbE
RSS RSS
Core 1 Core 2 Core 3 Core 4 Core 1 Core 2 Core 3 Core 4
SMB Server SMB Server
24. SMB Multichannel Multiple NIC Ports
1 session, without Multichannel 1 session, with Multichannel
No automatic failover Automatic NIC failover
Can’t use full bandwidth Combined NIC bandwidth available
Only one NIC engaged Multiple NICs engaged
Only one CPU core engaged Multiple CPU cores engaged
SMB Client 1 SMB Client 2 SMB Client 1 SMB Client 2
RSS RSS RSS RSS
NIC NIC NIC NIC NIC NIC NIC NIC
10GbE 10GbE 10GbE 10GbE 10GbE 10GbE 10GbE 10GbE
Switch Switch Switch Switch Switch Switch Switch Switch
10GbE 10GbE 10GbE 10GbE 10GbE 10GbE 10GbE 10GbE
NIC NIC NIC NIC NIC NIC NIC NIC
10GbE 10GbE 10GbE 10GbE 10GbE 10GbE 10GbE 10GbE
RSS RSS RSS RSS
SMB Server 1 SMB Server 2 SMB Server 1 SMB Server 2
25. SMB Multichannel & NIC Teaming
1 session, NIC Teaming without MC 1 session, NIC Teaming with MC
Automatic NIC failover Automatic NIC failover (faster with
Can’t use full bandwidth NIC Teaming)
Only one NIC engaged Combined NIC bandwidth available
Only one CPU core engaged Multiple NICs engaged
Multiple CPU cores engaged
SMB Client 1 SMB Client 2 SMB Client 1 SMB Client 2
RSS NIC Teaming RSS NIC Teaming RSS NIC Teaming RSS NIC Teaming
NIC NIC NIC NIC NIC NIC NIC NIC
10GbE 10GbE 1GbE 1GbE 10GbE 10GbE 1GbE 1GbE
Switch Switch Switch Switch Switch Switch Switch Switch
10GbE 10GbE 1GbE 1GbE 10GbE 10GbE 1GbE 1GbE
NIC NIC NIC NIC NIC NIC NIC NIC
10GbE 10GbE 1GbE 1GbE 10GbE 10GbE 1GbE 1GbE
RSS RSS RSS RSS
NIC Teaming NIC Teaming NIC Teaming NIC Teaming
SMB Server 2 SMB Server 2 SMB Server 1 SMB Server 2
26. SMB Direct & Multichannel
1 session, without Multichannel 1 session, with Multichannel
No automatic failover Automatic NIC failover
Can’t use full bandwidth Combined NIC bandwidth available
Only one NIC engaged Multiple NICs engaged
RDMA capability not used Multiple RDMA connections
SMB Client 1 SMB Client 2 SMB Client 1 SMB Client 2
R-NIC R-NIC R-NIC R-NIC R-NIC R-NIC R-NIC R-NIC
54GbIB 54GbIB 10GbE 10GbE 54GbIB 54GbIB 10GbE 10GbE
Switch Switch Switch Switch Switch Switch Switch Switch
54GbIB 54GbIB 10GbE 10GbE 54GbIB 54GbIB 10GbE 10GbE
R-NIC R-NIC R-NIC R-NIC R-NIC R-NIC R-NIC R-NIC
54GbIB 54GbIB 10GbE 10GbE 54GbIB 54GbIB 10GbE 10GbE
SMB Server 1 SMB Server 2 SMB Server 1 SMB Server 2
27. SMB Multichannel Auto Configuration
– Auto configuration looks at NIC type/speed => Same NICs are used for
RDMA/Multichannel (doesn’t mix 10Gbps/1Gbps, RDMA/non-RDMA)
– Let the algorithms work before you decide to intervene
– Choose adapters wisely for their function
SMB Client SMB Client SMB Client SMB Client
RSS
NIC NIC R-NIC R-NIC R-NIC NIC NIC NIC
10GbE 1GbE 10GbE 32GbIB 10GbE 1GbE 1GbE Wireless
Switch Switch Switch Switch Switch Switch Switch Switch
10GbE 1GbE 10GbE IB 10GbE 1GbE 1GbE Wireless
NIC NIC R-NIC R-NIC R-NIC NIC NIC NIC
10GbE 1GbE 10GbE 32GbIB 10GbE 1GbE 1GbE Wireless
RSS
SMB Server SMB Server SMB Server SMB Server
28. Networking Features Cheat Sheet
Metric Large Send Receive Receive Virtual Remote Single Root I/O
Offload Segment Side Scaling Machine DMA Virtualization
(LSO) Coalescing (RSS) Queues (RDMA) (SR-IOV)
(RSC) (VMQ)
Lower
Latency
Higher
Scalability
Higher
Throughput
Lower Path
Length
29. Advanced Network Features (2)
Consistent Device Naming
DCTCP/DCB/QOS
DHCP Guard/Router Guard/Port Mirroring
Port ACLs
IPSEC Task Offload for Virtual Machines (IPsecTOv2)
Network virtualization & Extensible Switch
32. DCTCP Requires Less Buffer Memory
1Gbps flow controlled by TCP 1Gbps flow controlled by DCTCP
Needs 400 to 600KB of memory Requires 30KB of memory
TCP saw tooth visible Smooth
33. Datacenter TCP (DCTCP)
– W2K12 deals with network congestion by reacting to
the degree & not merely the presence of congestion.
– DCTCP aims to achieve low latency, high burst tolerance and
high throughput, with small buffer switches.
– Requires Explicit Congestion Notification (ECN, RFC 3168)
capable switches.
– Algorithm enabled when it makes sense
(low round trip times, i.e. in the data center).
34. Datacenter TCP (DCTCP)
Running out of buffer in a
switch gets you in to stop/go
hell by getting a boatload of
green, orange & red lights
along your way
Big buffers mitigate this but
are very expensive
http://www.flickr.com/photos/mwichary/3321222807/ http://www.flickr.com/photos/bexross/2636921208/
35. Datacenter TCP (DCTP)
You want to be in a green wave
http://www.flickr.com/photos/highwaysagency/6281302040/
http://www.telegraph.co.uk/motoring/news/5149151/Motorists-to-be- Windows Server 2012 & ECN provides
given-green-traffic-lights-if-they-stick-to-speed-limit.html
network traffic control by default
36. Data Center Bridging (DCB)
– Prevents congestion in NIC & network by reserving
bandwidth for particular traffic types
– Windows 2012 provides support & control for DCB, tags
packets by traffic type
– Provides lossless transport for mission critical workloads
37. DCB is like a car pool lane …
http://www.flickr.com/photos/philopp/7332438786/
38. DCB Requirements
1. Enhanced Transmission Selection (IEEE 802.1Qaz)
2. Priority Flow Control (IEEE 802.1Qbb)
3. (Optional) Data Center Bridging Exchange protocol
4. (Not required) Congestion Notification (IEEE 802.1Qau)
39. Hyper-V Qos beyond the VM
Management OS VM 1 VM n
Live Migration
Storage
Hyper-V virtual switch
Management
Manage the Network Bandwidth LBFO Teamed NIC
with a Maximum (value) and/or a 10 GbE Phy NIC 10 GbE Phy NIC
Minimum (value or weight)
40. Hyper-V Qos beyond the VM
http://www.hyper-v.nu/archives/hvredevoort/2012/06/building-a-converged-fabric-with-windows-server-2012-powershell/
41. Default Flow per Virtual Switch
Customers may group a number of
VMs that each don’t have minimum Gold
VM1 VM2
bandwidth. They will be bucketized Tenant
into a default flow, which has
minimum weight allocation. This is to
? ? 10
prevent starvation.
Hyper-V Extensible Switch
1 Gbps
42. Maximum Bandwidth for Tenants
One common customer pain point is
WAN links are expensive Unified Remote Access
Gateway
Cap VM throughput to the Internet
to avoid bill shock <100Mb ∞
Hyper-V Extensible Switch
Internet Intranet
43. Bandwidth Network Management
• Manage the Network Bandwidth
with a Maximum and a
Minimum value
• SLAs for hosted Virtual Machines
• Control per VMs and not per
HOST
45. IPsec Task Offload
– IPsec is CPU intensive => Offload to NIC
– In demand due to compliance (SOX, HIPPA, etc.)
– IPsec is required & needed for secured operations
– Only available to host/parent workloads in W2K8R2
Now extended to virtual machines
Managed by the Hyper-V switch
46. Port ACL
Allow/Deny/Counter
MAC, IPv4 or IPv6 addresses
Wildcards allowed in IP addresses
ACLs are the basic building blocks of virtual switch security functions
Note: Counters are implemented as ACLs
• Counts packets to address/range
• Read via WMI/PowerShell
• Counters are tied into the resource metering you can do for charge/show back, planning etc.