0
Extreme Performance Series:
Network Speed Ahead
Lenin Singaravelu, VMware
Haoqiang Zheng, VMware
VSVC5596
#VSVC5596
2
Agenda
 Networking Performance in vSphere 5.5
• Network Processing Deep Dive
• Performance Improvements in vSphere 5.5
...
3
vSphere Networking Architecture: A Simplified View
PNIC Driver
Virtualization Layer
vSwitch, vDS, VXLAN, NetIOC,
Teaming...
4
Transmit Processing for a VM
 One transmit thread per VM, executing all parts of the stack
• Transmit thread can also e...
5
Receive Processing For a VM
 One thread per device
 NetQueue enabled devices: one thread per NetQueue
• Each NetQueue ...
6
Advanced Performance Monitoring using net-stats
 net-stats: single-host network performance monitoring tool
since vSphe...
7
vSphere 5.1 Networking Performance Summary
 TCP Performance
• 10GbE Line Rate to/from 1vCPU VM to external host with TS...
8
Agenda
 Networking Performance in vSphere 5.5
• Network Processing Deep Dive
• Performance Improvements in vSphere 5.5
...
9
What’s New in vSphere – 5.5 Performance
 80 Gbps on a single host
 Support for 40Gbps NICs
 vmknic IPv6 Optimizations...
10
80 Gbps1 On Single Host
8 PNICs over
8 vSwitches
16 Linux VMs
Apache
4x Intel E5-4650, 2.7 GHz
32 cores, 64 threads
IXI...
11
40 Gbps NIC Support
 Inbox support for Mellanox
ConnectX-3 40Gbps over Ethernet
 Max Throughput to 1 vCPU VM
• 14.1 G...
12
Vmknic IPv6 Enhancements
 TCP Checksum offload for Transmit and Receive
 Software Large Receive Offload (LRO) for TCP...
13
Reduced CPU Cycles/Byte
 Change NetQueue allocation model for some PNICs from
throughput-based to CPU-usage based
• Fe...
14
Packet Capture Framework
New Experimental Packet Capture Framework in vSphere 5.5
• Designed for capture at moderate pa...
15
Agenda
 Networking Performance in vSphere 5.5
• Network Processing Deep Dive
• Performance Improvements in vSphere 5.5...
16
Improve Receive Throughput to a Single VM
Single thread for receives can become bottleneck at high packet
rates (> 1 Mi...
17
Improve Transmit Throughput with Multiple vNICs
 Some applications use multiple vNICs for very high throughput
 Commo...
18
Achieve Higher Consolidation Ratios
Switch vNIC coalescing to “static”:
ethernetX.coalescingScheme = “static”
• Reduce ...
19
Achieving tens of microseconds latency
20
The Realm of Virtualization Grows
WebService
AppService
E-Mail
Desktops Databases
X
X Soft Real-Time Apps
HPC
X
High Fr...
21
The Latency Sensitivity Feature in vSphere 5.5
Physical
Hardware
 Latency-sensitivity Feature
• Minimize virtualizatio...
22
Ping Latency Test
35 us
Default VM
to Native
Native
to Native
Latency Sensitive VM
to Native
18 us 20 us32 us
557 us
46...
23
Agenda
 Network Performance in vSphere 5.5
 Virtualizing Extremely Latency Sensitive Applications
• Sources of Latenc...
24
Maximize CPU Reservation
25
Maximize Memory Reservation
26
Ping Latency Test
35 us
Default VM
to Native
18 us 32 us
557 us
Median Latency
99.99% Latency
Jitter Metric
Native
to N...
27
Sources of the Latency/Jitter
CPU Contention
CPU Scheduling Overhead
Networking Stack Overhead
PNIC
NetVirt
VM
vNIC
Dev...
28
System View from CPU Scheduler’s Perspective
vcpu-0 vcpu-1 MKS I/O
System threads
hostd ssh
Mem
Mgr
I/O
VMs User thread...
29
Causes of Scheduling Related Execution Delay
A B C D E
E: HT, power management, cache related efficiency loss
Wakeup Fi...
30
Setting Latency-Sensitivity to HIGH
31
Reduce CPU Contention Using Exclusive Affinity
vcpu
A B C D E
Exclusive Affinity
PCPUs
I/O
Intr
32
Reduce CPU Contention Using Exclusive Affinity (II)
 What about other contexts?
• Share cores without exclusive affini...
33
Use DERatio to Monitor CPU Contention
34
Side Effects of Exclusive Affinity
There’s no such thing as a free lunch
 CPU cycles may be wasted
• The CPU will NOT ...
35
Latency/Jitter from Networking Stack
PNIC
NetVirt
VM
vNIC
DevEmu Execute more code
Context Switches
- Scheduling Delays...
36
Pass-Through Devices
 SR-IOV or DirectPath I/O allow VM direct access to device
 Bypass Virtual Networking Stack, red...
37
Agenda
 Network Performance in vSphere 5.5
 Virtualizing Extremely Latency Sensitive Applications
• Sources of Latenc...
38
Performance of Latency Sensitivity Feature
 Single 2-vCPU VM to Native, RHEL 6.2, RTT from ‘ping –i 0.001’
 Intel Xeo...
39
Performance with Multiple VMs
 4x 2-vCPU VMs on a 12-core host, same ping workload
 4-VM performance very similar to ...
40
Extreme Performance with SolarFlare PNIC
 1VM with Solarflare SFN6122F-R7 PNIC, Native with same PNIC
• Netperf TCP_RR...
41
Latency Sensitivity Feature Caveats
 Designed for applications with latency requirements of the order of
few tens of m...
42
Latency Sensitivity Feature Best Practices
 Hardware Power Management
• Disable C States
• Set BIOS policy to “High Pe...
43
Summary
 Good out-of-box network performance with vSphere 5.5
• Supports 40 Gbps NICs, Capable of saturating 80 Gbps o...
44
Latency Sensitive Feature Docs
 Deploying Extremely Latency-Sensitive Applications in vSphere 5.5
http://www.vmware.co...
45
Network Performance Docs
 VXLAN Performance on vSphere 5.1
http://www.vmware.com/files/pdf/techpaper/VMware-vSphere-
V...
46
Performance Community Resources
Performance Technology Pages
• http://www.vmware.com/technical-resources/performance/re...
47
Performance Technical Resources
Performance Technical Papers
• http://www.vmware.com/resources/techresources/cat/91,96
...
48
Extreme Performance Series Sessions
Extreme Performance Series:
vCenter of the Universe – Session #VSVC5234
Monster Vir...
49
Other VMware Activities Related to This Session
 HOL:
HOL-SDC-1304
vSphere Performance Optimization
 Group Discussion...
THANK YOU
Extreme Performance Series:
Network Speed Ahead
Lenin Singaravelu, VMware
Haoqiang Zheng, VMware
VSVC5596
#VSVC5596
5353
Networking Performance Goals
 Tune vSphere for best out-of-box performance for wide range
of applications
• Extreme ...
5454
Design Choices for Higher Performance
 Asynchronous transmit and receive paths for most network
stack consumers
 Ab...
VMworld 2013: Extreme Performance Series: Network Speed Ahead
Upcoming SlideShare
Loading in...5
×

VMworld 2013: Extreme Performance Series: Network Speed Ahead

3,727

Published on

VMworld 2013

Lenin Singaravelu, VMware
Haoqiang Zheng, VMware

Learn more about VMworld and register at http://www.vmworld.com/index.jspa?src=socmed-vmworld-slideshare

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
3,727
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
28
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Transcript of "VMworld 2013: Extreme Performance Series: Network Speed Ahead "

  1. 1. Extreme Performance Series: Network Speed Ahead Lenin Singaravelu, VMware Haoqiang Zheng, VMware VSVC5596 #VSVC5596
  2. 2. 2 Agenda  Networking Performance in vSphere 5.5 • Network Processing Deep Dive • Performance Improvements in vSphere 5.5 • Tuning for Extreme Workloads  Virtualizing Extremely Latency Sensitive Applications  Available Resources  Extreme Performance Series Sessions
  3. 3. 3 vSphere Networking Architecture: A Simplified View PNIC Driver Virtualization Layer vSwitch, vDS, VXLAN, NetIOC, Teaming, DVFilter, … VM vNIC VM vNIC VM vNIC vmknic (VMkernel TCP/IP) NFS/iSCSI vMotion Mgmt FT VM vNIC VM vNIC VM SR-IOV VF VM Direct Path Device Emulation Layer VM vNIC VM vNIC VM DirectPath w/VMotion Pass-through
  4. 4. 4 Transmit Processing for a VM  One transmit thread per VM, executing all parts of the stack • Transmit thread can also execute receive path for destination VM  Wakeup of transmit thread: Two mechanisms • Immediate, forcible wakeup by VM (low delay, high CPU overhead) • Opportunistic wakeup by other threads or when VM halts (potentially higher delay, low CPU overhead) PNIC Driver Network Virtualization Layer VM vNIC VM vNIC VM vNIC DevEmu DevEmu DevEmu Ring Entries To Packets Packets to Ring Entries Switching, Encapsulation, Teaming, … Destination VM Packets To Ring Entries VMKcall Opport.
  5. 5. 5 Receive Processing For a VM  One thread per device  NetQueue enabled devices: one thread per NetQueue • Each NetQueue processes traffic for one or more MAC addresses (vNICs) • NetQueue  vNIC mapping determined by unicast throughput and FCFS.  vNICs can share queues • Low throughput, Too many vNICs or Queue type mismatch (LRO Queue vs. non-LRO vNIC) PNIC Driver Network Virtualization Layer VM vNIC VM vNIC VM vNIC DevEmu DevEmu DevEmu Dedicated Queue Shared Queue
  6. 6. 6 Advanced Performance Monitoring using net-stats  net-stats: single-host network performance monitoring tool since vSphere 5.1  Runs on ESXi console. net-stats –h for help and net-stats –A to monitor all ports • Measure packet rates and drops at various layers (vNIC backend, vSwitch, PNIC) in a single place • Identify VMkernel threads for transmit and receive processing • Break down CPU cost for networking into interrupt, receive, vCPU and transmit thread • PNIC Stats: NetQueue allocation information, interrupt rate • vNIC Stats: Coalescing and RSS information
  7. 7. 7 vSphere 5.1 Networking Performance Summary  TCP Performance • 10GbE Line Rate to/from 1vCPU VM to external host with TSO & LRO • Up to 26.3 Gbps between 2x 1-vCPU Linux VMs on same host • Able to scale or maintain throughput even at 8X PCPU overcommit (64 VMs on a 8 core, HT-enabled machine)  UDP Performance • 0.7+ Million PPS (MPPS) with a 1vCPU VM, rising to 2.5+ MPPS with more VMs (over single 10GbE) • Low loss rate at very high throughput might require tuning vNIC ring and socket buffer sizes  Latency • 35+ us for ping –i 0.001 in 1vCPU-1VM case over 10GbE • Can increase to hundreds of microseconds under contention  Note: Microbenchmark performance is highly dependent on CPU clock speed and size of last-level cache (LLC)
  8. 8. 8 Agenda  Networking Performance in vSphere 5.5 • Network Processing Deep Dive • Performance Improvements in vSphere 5.5 • Tuning for Extreme Workloads  Virtualizing Extremely Latency Sensitive Applications  Available Resources  Extreme Performance Series Sessions
  9. 9. 9 What’s New in vSphere – 5.5 Performance  80 Gbps on a single host  Support for 40Gbps NICs  vmknic IPv6 Optimizations  Reduced CPU Cycles/Byte • vSphere Native Drivers • VXLAN Offloads • Dynamic queue balancing  Experimental Packet Capture Framework  Latency-Sensitivity Feature
  10. 10. 10 80 Gbps1 On Single Host 8 PNICs over 8 vSwitches 16 Linux VMs Apache 4x Intel E5-4650, 2.7 GHz 32 cores, 64 threads IXIA XT80-V2 Traffic Generator HTTP Get 1MB File 75+ Gbps 4.1 Million PPS HTTP POST 1MB File 75+ Gbps 7.3 Million PPS2 1Why stop at 80 Gbps? vSphere allows a maximum of 8x 10GbE PNICs. 2Software LRO less aggressive than TSO in aggregating packets
  11. 11. 11 40 Gbps NIC Support  Inbox support for Mellanox ConnectX-3 40Gbps over Ethernet  Max Throughput to 1 vCPU VM • 14.1 Gbps Receive • 36.2 Gbps Transmit  Max Throughput to Single VM • 23.6 Gbps Receive with RSS enabled in vNIC and PNIC.  Max Throughput to Single Host • 37.3 Gbps Receive RHEL 6.3 + VMXNET3 2x Intel Xeon E5-2667@2.90GHz Mellanox MT27500 Netperf TCP_STREAM workload
  12. 12. 12 Vmknic IPv6 Enhancements  TCP Checksum offload for Transmit and Receive  Software Large Receive Offload (LRO) for TCP over IPv6  Zero-copy receives between vSwitch and TCP/IP stack Dirtying 48 GB RAM Intel Xeon E5-2667, 2.9 GHz 34.5 Gbps IPv4 32.5 Gbps IPv6 4x 10 GbE Links
  13. 13. 13 Reduced CPU Cycles/Byte  Change NetQueue allocation model for some PNICs from throughput-based to CPU-usage based • Fewer NetQueues used for low traffic workload  TSO, Checksum offload for VXLAN for some PNICs  Native vSphere drivers for Emulex PNIC • Eliminate vmklinux layer from device drivers • 10% - 35% lower CPU Cycles/byte in VMkernel
  14. 14. 14 Packet Capture Framework New Experimental Packet Capture Framework in vSphere 5.5 • Designed for capture at moderate packet rates • Capture packets at one or more layers of vSphere network stack • --trace option timestamps packets as it passes through key points of the stack Useful in identifying sources of packet drops and latency • e.g., between UplinkRcv, Vmxnet3Rx to check for packet drops by firewall • e.g, With --trace enabled, diff of timestamp at Vmxnet3Tx, UplinkSnd informs us if NetIOC delayed any packet PNIC NetVirt VM vNIC DevEmu UplinkSnd EtherswitchOutput EtherswitchDispatch Vmxnet3Tx UplinkRcv EtherswitchDispatch EtherswitchOutput Vmxnet3Rx
  15. 15. 15 Agenda  Networking Performance in vSphere 5.5 • Network Processing Deep Dive • Performance Improvements in vSphere 5.5 • Tuning for Extreme Workloads  Virtualizing Extremely Latency Sensitive Applications  Available Resources  Extreme Performance Series Sessions
  16. 16. 16 Improve Receive Throughput to a Single VM Single thread for receives can become bottleneck at high packet rates (> 1 Million PPS or > 15Gbps) Use VMXNET3 virtual device, Enable RSS inside Guest Enable RSS in Physical NICs (only available on some PNICs) Add ethernetX.pnicFeatures = “4” to VM’s configuration parameters Side effects: Increased CPU Cycles/Byte PNIC NetVirt VM vNIC DevEmu PNIC NetVirt DevEmu VM vNIC vCPU0 vCPUn 14.1 Gbps on 40G PNIC 23.6 Gbps
  17. 17. 17 Improve Transmit Throughput with Multiple vNICs  Some applications use multiple vNICs for very high throughput  Common transmit thread for all vNICs can become bottleneck  Add ethernetX.ctxPerDev = “1” to VM’s configuration parameters  Side effects: Increased CPU Cycles/Byte PNIC NetVirt VM vNICvNIC DevEmu DevEmu PNIC NetVirt VM vNICvNIC DevEmu DevEmu0.9 MPPS UDP Tx Rate 1.41 MPPS UDP Tx Rate
  18. 18. 18 Achieve Higher Consolidation Ratios Switch vNIC coalescing to “static”: ethernetX.coalescingScheme = “static” • Reduce interrupts to VM and vmkcalls from VM for networking traffic • Less interruptions => more efficient processing => more requests processed at lower cost Disable vNIC RSS in Guest for multi-vCPU VMs • At low throughput and low vCPU utilization, RSS only adds overhead Side-effects: Potentially higher latency for some requests and some workloads
  19. 19. 19 Achieving tens of microseconds latency
  20. 20. 20 The Realm of Virtualization Grows WebService AppService E-Mail Desktops Databases X X Soft Real-Time Apps HPC X High Frequency Trading Tier1 Apps  Highly Latency-Sensitive Applications • Low latency/jitter requirement (10 us – 100 us) • Normally considered to be “non-virtualizable”
  21. 21. 21 The Latency Sensitivity Feature in vSphere 5.5 Physical Hardware  Latency-sensitivity Feature • Minimize virtualization overhead • Achieve near bare-metal performance Latency-sensitivity HypervisorHypervisor
  22. 22. 22 Ping Latency Test 35 us Default VM to Native Native to Native Latency Sensitive VM to Native 18 us 20 us32 us 557 us 46 us Median Latency 99.99% Latency Jitter Metric 10X
  23. 23. 23 Agenda  Network Performance in vSphere 5.5  Virtualizing Extremely Latency Sensitive Applications • Sources of Latency and Jitter • Latency Sensitivity Feature • Performance • Best Practices  Available Resources  Extreme Performance Series Sessions
  24. 24. 24 Maximize CPU Reservation
  25. 25. 25 Maximize Memory Reservation
  26. 26. 26 Ping Latency Test 35 us Default VM to Native 18 us 32 us 557 us Median Latency 99.99% Latency Jitter Metric Native to Native
  27. 27. 27 Sources of the Latency/Jitter CPU Contention CPU Scheduling Overhead Networking Stack Overhead PNIC NetVirt VM vNIC DevEmu
  28. 28. 28 System View from CPU Scheduler’s Perspective vcpu-0 vcpu-1 MKS I/O System threads hostd ssh Mem Mgr I/O VMs User threads  We have more than just vCPUs from VMs  CPU contention can occasionally occur on an under-committed system  Some system threads run at higher priority
  29. 29. 29 Causes of Scheduling Related Execution Delay A B C D E E: HT, power management, cache related efficiency loss Wakeup Finish runningStart running A: ready time, waiting for other contexts to finish B: scheduling overhead and world switch overhead C: actual execution time D: overlap time (caused by interrupts etc)
  30. 30. 30 Setting Latency-Sensitivity to HIGH
  31. 31. 31 Reduce CPU Contention Using Exclusive Affinity vcpu A B C D E Exclusive Affinity PCPUs I/O Intr
  32. 32. 32 Reduce CPU Contention Using Exclusive Affinity (II)  What about other contexts? • Share cores without exclusive affinity • May be contented and may cause jitters for the latency sensitive VM PCPUs vcpu0 vcpu1 vcpu2 vcpu3 vcpu0 vcpu1 MKSI/O hostd ssh Mem Mgr I/O
  33. 33. 33 Use DERatio to Monitor CPU Contention
  34. 34. 34 Side Effects of Exclusive Affinity There’s no such thing as a free lunch  CPU cycles may be wasted • The CPU will NOT be used by other contexts when the vCPU is idle  Exclusive affinity is only applied when: • The VM’s latency-sensitivity is HIGH • The VM has enough CPU allocation
  35. 35. 35 Latency/Jitter from Networking Stack PNIC NetVirt VM vNIC DevEmu Execute more code Context Switches - Scheduling Delays - Variance from Coalescing Large Receive Offload modifies TCP ACK behavior  Disable vNIC Coalescing  Disable LRO for vNICs  Use Pass-Through device for networking traffic
  36. 36. 36 Pass-Through Devices  SR-IOV or DirectPath I/O allow VM direct access to device  Bypass Virtual Networking Stack, reducing CPU cost and Latency  Pass-through NICs negate many benefits of virtualization • Only some versions of DirectPath I/O allow sharing of devices and VMotion • SR-IOV allows sharing of devices, but does not support Vmotion • No support for NetIOC, FT, Memory Overcommit, HA, …
  37. 37. 37 Agenda  Network Performance in vSphere 5.5  Virtualizing Extremely Latency Sensitive Applications • Sources of Latency and Jitter • Latency Sensitivity Feature • Performance • Best Practices  Available Resources  Extreme Performance Series Sessions
  38. 38. 38 Performance of Latency Sensitivity Feature  Single 2-vCPU VM to Native, RHEL 6.2, RTT from ‘ping –i 0.001’  Intel Xeon E5-2640 @ 2.50 GHz, Intel 82599EB PNIC  Median reduced by 15 us over Default, 6 us over SR-IOV • 99.99th percentile lower than 50 us  Performance gap to native is between 2us-10us
  39. 39. 39 Performance with Multiple VMs  4x 2-vCPU VMs on a 12-core host, same ping workload  4-VM performance very similar to that of 1-VM  Median reduced by 20us over Default, 6 us over SR-IOV  99.99th percentile ~ 75 us: 400+us better than Default, 50us better than SR-IOV
  40. 40. 40 Extreme Performance with SolarFlare PNIC  1VM with Solarflare SFN6122F-R7 PNIC, Native with same PNIC • Netperf TCP_RR workload • OpenOnload® enabled for netperf and netserver  Median RTT of 6 us.  99.99th percentile <= 25us.
  41. 41. 41 Latency Sensitivity Feature Caveats  Designed for applications with latency requirements of the order of few tens of microseconds  Exclusive affinity reduces flexibility for CPU Scheduler • Even if vCPU is idle, other threads cannot use PCPU  Last-Level CPU Cache sharing not addressed • Performance can be impacted by a competing VM with big memory footprint  Storage and other resources (load balancer, firewalls, …) sharing not addressed  Pass-through NICs not compatible with many virtualization features  Automatic VMXNET3 tunings not suited for high packet rates • LRO, vNIC coalescing ON might be better
  42. 42. 42 Latency Sensitivity Feature Best Practices  Hardware Power Management • Disable C States • Set BIOS policy to “High Performance”  Use SR-IOV if possible  Full CPU and memory reservations  Under-commit resources for best performance  Ensure that Demand-Entitlement Ratio of the VM is lower than 100
  43. 43. 43 Summary  Good out-of-box network performance with vSphere 5.5 • Supports 40 Gbps NICs, Capable of saturating 80 Gbps on single host  Supports extreme throughput to single VM • Tunables to increase parallelism to nearly double receive/transmit throughput  Latency-sensitivity feature enables extremely latency sensitive workloads • Median latency <20us, 99.99% latency <50us possible with vSphere • Only few microseconds higher than Native  Only extreme workloads require tuning • Be aware of tradeoffs before tuning
  44. 44. 44 Latency Sensitive Feature Docs  Deploying Extremely Latency-Sensitive Applications in vSphere 5.5 http://www.vmware.com/files/pdf/techpaper/latency-sensitive-perf- vsphere55.pdf  Best Practices for Performance Tuning of Latency-Sensitive Workloads in vSphere VMs http://www.vmware.com/files/pdf/techpaper/VMW-Tuning-Latency- Sensitive-Workloads.pdf  Network IO Latency on vSphere 5 http://www.vmware.com/files/pdf/techpaper/network-io-latency- perf-vsphere5.pdf
  45. 45. 45 Network Performance Docs  VXLAN Performance on vSphere 5.1 http://www.vmware.com/files/pdf/techpaper/VMware-vSphere- VXLAN-Perf.pdf  Performance and Use Cases of VMware DirectPath I/O for Networking http://blogs.vmware.com/performance/2010/12/performance-and- use-cases-of-vmware-directpath-io-for-networking.html  VMware Network I/O Control: Architecture, Performance and Best Practices http://www.vmware.com/files/pdf/techpaper/VMW_Netioc_BestPrac tices.pdf  Multicast Performance on vSphere 5.0 http://blogs.vmware.com/performance/2011/08/multicast- performance-on-vsphere-50.html
  46. 46. 46 Performance Community Resources Performance Technology Pages • http://www.vmware.com/technical-resources/performance/resources.html Technical Marketing Blog • http://blogs.vmware.com/vsphere/performance/ Performance Engineering Blog VROOM! • http://blogs.vmware.com/performance Performance Community Forum • http://communities.vmware.com/community/vmtn/general/performance Virtualizing Business Critical Applications • http://www.vmware.com/solutions/business-critical-apps/
  47. 47. 47 Performance Technical Resources Performance Technical Papers • http://www.vmware.com/resources/techresources/cat/91,96 Performance Best Practices • http://www.vmware.com/pdf/Perf_Best_Practices_vSphere4.0.pdf • http://www.vmware.com/pdf/Perf_Best_Practices_vSphere4.1.pdf • http://www.vmware.com/pdf/Perf_Best_Practices_vSphere5.0.pdf • http://www.vmware.com/pdf/Perf_Best_Practices_vSphere5.1.pdf  Troubleshooting Performance Related Problems in vSphere Environments • http://communities.vmware.com/docs/DOC-14905 (vSphere 4.1) • http://communities.vmware.com/docs/DOC-19166 (vSphere 5) • http://communities.vmware.com/docs/DOC-23094 (vSphere 5.x with vCOps)
  48. 48. 48 Extreme Performance Series Sessions Extreme Performance Series: vCenter of the Universe – Session #VSVC5234 Monster Virtual Machines – Session # VSVC4811 Network Speed Ahead – Session # VSVC5596 Storage in a Flash – Session # VSVC5603 Extreme Performance Series: Silent Killer: How Latency Destroys Performance...And What to Do About It – Session # VSVC5187 Big Data: Virtualized SAP HANA Performance, Scalability and Practices – Session # VAPP5591 Hands on Labs: HOL-SDC-1304 – Optimize vSphere Performance  includes vFRC
  49. 49. 49 Other VMware Activities Related to This Session  HOL: HOL-SDC-1304 vSphere Performance Optimization  Group Discussions: VSVC1001-GD Performance with Mark Achtemichuk
  50. 50. THANK YOU
  51. 51. Extreme Performance Series: Network Speed Ahead Lenin Singaravelu, VMware Haoqiang Zheng, VMware VSVC5596 #VSVC5596
  52. 52. 5353 Networking Performance Goals  Tune vSphere for best out-of-box performance for wide range of applications • Extreme applications might require modified settings  Line-Rate throughput on latest devices  Near-Native throughput, latency, CPU Cycles/Byte
  53. 53. 5454 Design Choices for Higher Performance  Asynchronous transmit and receive paths for most network stack consumers  Ability to use multiple VMkernel I/O threads per VM or PNIC • Sacrifice some efficiency to drive higher throughput  Interrupt coalescing at PNIC, vNIC to reduce CPU Cycles/Byte • Coalescing introduces variance in packet processing cost and latency  Packet aggregation in software and hardware (TSO, LRO) • Aggregation may hurt latency-sensitive workloads  Co-locate I/O threads and vCPUs on same Last-Level Cache to improve efficiency
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×