Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
vSphere 6.x Host Resource Deep Dive
Frank Denneman
Niels Hagoort
INF8430
#INF8430
Agenda
• Compute
• Storage
• Network
• Q&A
Introduction
www.cloudfix.nl
Niels Hagoort
• Independent Architect
• VMware VCDX #212
• VMware vExpert (NSX)
Frank Dennema...
Compute
( N U M A , N U M A , N U M A )
Insights In Virtual Data Centers
Modern dual sockets CPU servers are Non-Uniform
Memory Access (NUMA) systems
Local and Remote Memory
NUMA Focus Points
• Caching Snoop modes
• DIMM configuration
• Size VM match CPU topology
CPU Cache
( t h e f o r g o t t e n h e r o )
CPU Architecture
Caching Snoop Modes
DIMM Configuration
( a n d w h y 3 8 4 G B i s n o t a n o p t i m a l c o n f i g u r a t i o n )
Memory Constructs
3-DPC - 384 GB – 2400 MHz DIMM
DIMMS Per Channel
2-DPC - 384 GB – 2400 MHz DIMM
Current Sweet Spot: 512GB
Right Size your VM
( A l i g n m e n t e q u a l s c o n s i s t e n t p e r f o r m a n c e )
ESXi NUMA focus points
• CPU scheduler allocates Core or HT cycles
• NUMA scheduler initial placement + LB
• vCPU configur...
Scheduling constructs
12 vCPU On 20 Core System
Align To CPU Topology
• Resize vCPU configuration to match core count
• Use vcpu.numa.preferHT
• Use cores per socket (COR...
Prefer HT + 12 Cores Per Socket
Storage
( H o w f a r a w a y i s y o u r d a t a ? )
The Importance of Access Latency
Location of operands CPU Cycles Perspective
CPU Register 1 Brain (Nanosecond)
L1/L3 cache...
Every Layer = CPU Cycles & Latency
Industry Moves Toward NVMe
• SSD bandwidth capabilities exceeds current
controller bandwidth
• Protocol inefficiencies dom...
I/O Queue Per CPU
Driver Stack
Not All Drivers Are Created Equal
Network
pNIC considerations for VXLAN
performance
• Additional layer of packet processing
• Consumes CPU cycles for each packet for
encapsulation/de-capsulation
• Some of t...
1.
2.
3.
[root@ESXi02:~] vmkload_mod -s bnx2x
vmkload_mod module information
input file: /usr/lib/vmware/vmkmod/bnx2x
Version: Vers...
[root@ESXi01:~] esxcli system module parameters list -m bnx2x
Name Type Value Description
---------------------------- ---...
• Check the supported features of your pNIC
• Check the HCL for supported features in the driver
module
• Check the driver...
RSS & NetQueue
• NIC support required (RSS / VMDq)
• VMDq is the hardware feature, NetQueue is the
feature baked into vSph...
Without RSS for VXLAN (1 thread per pNIC)
RSS enabled (>1 threads per pNIC)
How to enable RSS (Intel)
1. Unload module: esxcfg-module -u ixgbe
2. Enable inbox: vmkload_mod ixgbe RSS="4,4”
Enable asy...
Receive throughput with VXLAN using 10GbE
Intel examples:
Intel Ethernet products RSS for VXLAN technology
Intel Ethernet X520/540 series Scale RSS on VXLAN Outer U...
“What is the maximum performance of
the vSphere (D)vSwitch?”
• By default one transmit (Tx) thread per VM
• By default, one receive (Netpoll) thread per pNIC
• Transmit (Tx) and recei...
Netpoll Thread
%SYS is ± 100% dur ing tes t. pN IC r ec eives .
( this is the N ETPOLL thr ead)
NetQueue Scaling
{"name": "vmnic0", "switch": "DvsPortset-0", "id": 33554435, "mac": "38:ea:a7:36:78:8c", "rxmode": 0, "up...
Tx Thread
PKTGEN is polling, c ons uming near 100% C PU
%SYS = ± 1 0 0 %
This is the Tx thr ead
• VMXNET3 is required!
• example for vNIC2:
ethernet2.ctxPerDev = "1“
Additional Tx Thread
Additional Tx thread
%SYS = ± 2 0 0 %
C PU thr eads in s ame N U MA node as VM
{"name": "pktgen_load_test21.eth0", "switch...
• Transmit (Tx) and receive (Netpoll) threads can be
scaled!
• Take the extra CPU cycles for network IO into
account!
Summ...
Q&A
Keep an eye out for
our upcoming book!
@frankdenneman
@NHagoort
@frankdenneman
@NHagoort
VMworld 2016: vSphere 6.x Host Resource Deep Dive
VMworld 2016: vSphere 6.x Host Resource Deep Dive
VMworld 2016: vSphere 6.x Host Resource Deep Dive
Upcoming SlideShare
Loading in …5
×

VMworld 2016: vSphere 6.x Host Resource Deep Dive

7,038 views

Published on

VMworld 2016: vSphere 6.x Host Resource Deep Dive

Published in: Technology
  • Sex in your area is here: ♥♥♥ http://bit.ly/39sFWPG ♥♥♥
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Follow the link, new dating source: ♥♥♥ http://bit.ly/39sFWPG ♥♥♥
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • DOWNLOAD THIS BOOKS INTO AVAILABLE FORMAT (2019 Update) ......................................................................................................................... ......................................................................................................................... Download Full PDF EBOOK here { https://tinyurl.com/v5572s8 } ......................................................................................................................... Download Full EPUB Ebook here { https://tinyurl.com/v5572s8 } ......................................................................................................................... Download Full doc Ebook here { https://tinyurl.com/v5572s8 } ......................................................................................................................... Download PDF EBOOK here { https://tinyurl.com/v5572s8 } ......................................................................................................................... Download EPUB Ebook here { https://tinyurl.com/v5572s8 } ......................................................................................................................... Download doc Ebook here { https://tinyurl.com/v5572s8 } ......................................................................................................................... .........................................................................................................................
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • It's great
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

VMworld 2016: vSphere 6.x Host Resource Deep Dive

  1. 1. vSphere 6.x Host Resource Deep Dive Frank Denneman Niels Hagoort INF8430 #INF8430
  2. 2. Agenda • Compute • Storage • Network • Q&A
  3. 3. Introduction www.cloudfix.nl Niels Hagoort • Independent Architect • VMware VCDX #212 • VMware vExpert (NSX) Frank Denneman • Enjoying Summer 2016 • VMware VCDX #29 • VMware vExpert www.frankdenneman.nl
  4. 4. Compute ( N U M A , N U M A , N U M A )
  5. 5. Insights In Virtual Data Centers
  6. 6. Modern dual sockets CPU servers are Non-Uniform Memory Access (NUMA) systems
  7. 7. Local and Remote Memory
  8. 8. NUMA Focus Points • Caching Snoop modes • DIMM configuration • Size VM match CPU topology
  9. 9. CPU Cache ( t h e f o r g o t t e n h e r o )
  10. 10. CPU Architecture
  11. 11. Caching Snoop Modes
  12. 12. DIMM Configuration ( a n d w h y 3 8 4 G B i s n o t a n o p t i m a l c o n f i g u r a t i o n )
  13. 13. Memory Constructs
  14. 14. 3-DPC - 384 GB – 2400 MHz DIMM
  15. 15. DIMMS Per Channel
  16. 16. 2-DPC - 384 GB – 2400 MHz DIMM
  17. 17. Current Sweet Spot: 512GB
  18. 18. Right Size your VM ( A l i g n m e n t e q u a l s c o n s i s t e n t p e r f o r m a n c e )
  19. 19. ESXi NUMA focus points • CPU scheduler allocates Core or HT cycles • NUMA scheduler initial placement + LB • vCPU configuration impacts IP & LB
  20. 20. Scheduling constructs
  21. 21. 12 vCPU On 20 Core System
  22. 22. Align To CPU Topology • Resize vCPU configuration to match core count • Use vcpu.numa.preferHT • Use cores per socket (CORRECTLY) • Attend INF8089 at 5 PM in this room
  23. 23. Prefer HT + 12 Cores Per Socket
  24. 24. Storage ( H o w f a r a w a y i s y o u r d a t a ? )
  25. 25. The Importance of Access Latency Location of operands CPU Cycles Perspective CPU Register 1 Brain (Nanosecond) L1/L3 cache 10 End of this room Local Memory 100 Entrance of building Disk 10^6 New York
  26. 26. Every Layer = CPU Cycles & Latency
  27. 27. Industry Moves Toward NVMe • SSD bandwidth capabilities exceeds current controller bandwidth • Protocol inefficiencies dominant contributor to access time • NVMe architected from the ground up for non - volatile memory
  28. 28. I/O Queue Per CPU
  29. 29. Driver Stack
  30. 30. Not All Drivers Are Created Equal
  31. 31. Network
  32. 32. pNIC considerations for VXLAN performance
  33. 33. • Additional layer of packet processing • Consumes CPU cycles for each packet for encapsulation/de-capsulation • Some of the offload capabilities of the NIC cannot be used (TCP based) • VXLAN offloading! (TSO / CSO) VXLAN
  34. 34. 1. 2. 3.
  35. 35. [root@ESXi02:~] vmkload_mod -s bnx2x vmkload_mod module information input file: /usr/lib/vmware/vmkmod/bnx2x Version: Version 1.78.80.v60.12, Build: 2494585, Interface: 9.2 Built on: Feb 5 2015 Build Type: release License: GPL Name-space: com.broadcom.bnx2x#9.2.3.0 Required name-spaces: com.broadcom.cnic_register#9.2.3.0 com.vmware.driverAPI#9.2.3.0 com.vmware.vmkapi#v2_3_0_0 Parameters: skb_mpool_max: int Maximum attainable private socket buffer memory pool size for the driver. skb_mpool_initial: int Driver's minimum private socket buffer memory pool size. heap_max: int Maximum attainable heap size for the driver. heap_initial: int Initial heap size allocated for the driver. disable_feat_preemptible: int For debug purposes, disable FEAT_PREEMPTIBLE when set to value of 1 disable_rss_dyn: int For debug purposes, disable RSS_DYN feature when set to value of 1 disable_fw_dmp: int For debug purposes, disable firmware dump feature when set to value of 1 enable_vxlan_ofld: int Allow vxlan TSO/CSO offload support.[Default is disabled, 1: enable vxlan offload, 0: disable vxlan offload] debug_unhide_nics: int Force the exposure of the vmnic interface for debugging purposes[Default is to hide the nics]1. In SRIOV mode expose the PF enable_default_queue_filters: int Allow filters on the default queue. [Default is disabled for non-NPAR mode, enabled by default on NPAR mode] multi_rx_filters: int Define the number of RX filters per NetQueue: (allowed values: -1 to Max # of RX filters per NetQueue, -1: use the default number of RX filters; 0: Disable use of multiple RX filters; 1..Max # the number of RX filters per NetQueue: will force the number of RX filters to use for NetQueue ........
  36. 36. [root@ESXi01:~] esxcli system module parameters list -m bnx2x Name Type Value Description ---------------------------- ---- ----- ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- ------------------------------------------------------------------------ RSS int Control the number of queues in an RSS pool. Max 4. autogreeen uint Set autoGrEEEn (0:HW default; 1:force on; 2:force off) debug uint Default debug msglevel debug_unhide_nics int Force the exposure of the vmnic interface for debugging purposes[Default is to hide the nics]1. In SRIOV mode expose the PF disable_feat_preemptible int For debug purposes, disable FEAT_PREEMPTIBLE when set to value of 1 disable_fw_dmp int For debug purposes, disable firmware dump feature when set to value of 1 disable_iscsi_ooo uint Disable iSCSI OOO support disable_rss_dyn int For debug purposes, disable RSS_DYN feature when set to value of 1 disable_tpa uint Disable the TPA (LRO) feature dropless_fc uint Pause on exhausted host ring eee set EEE Tx LPI timer with this value; 0: HW default enable_default_queue_filters int Allow filters on the default queue. [Default is disabled for non-NPAR mode, enabled by default on NPAR mode] enable_vxlan_ofld int Allow vxlan TSO/CSO offload support.[Default is disabled, 1: enable vxlan offload, 0: disable vxlan offload] gre_tunnel_mode uint Set GRE tunnel mode: 0 - NO_GRE_TUNNEL; 1 - NVGRE_TUNNEL; 2 - L2GRE_TUNNEL; 3 - IPGRE_TUNNEL gre_tunnel_rss uint Set GRE tunnel RSS mode: 0 - GRE_OUTER_HEADERS_RSS; 1 - GRE_INNER_HEADERS_RSS; 2 - NVGRE_KEY_ENTROPY_RSS heap_initial int Initial heap size allocated for the driver. heap_max int Maximum attainable heap size for the driver. int_mode uint Force interrupt mode other than MSI-X (1 INT#x; 2 MSI) max_agg_size_param uint max aggregation size mrrs int Force Max Read Req Size (0..3) (for debug) multi_rx_filters int Define the number of RX filters per NetQueue: (allowed values: -1 to Max # of RX filters per NetQueue, -1: use the default number of RX filters; 0: Disable use of multiple RX filters; 1..Max # the number of RX filters per NetQueue: will force the number of RX filters to use for NetQueue native_eee uint num_queues uint Set number of queues (default is as a number of CPUs) num_rss_pools int Control the existence of a RSS pool. When 0,RSS pool is disabled. When 1, there will bea RSS pool (given that RSS > 0). ........
  37. 37. • Check the supported features of your pNIC • Check the HCL for supported features in the driver module • Check the driver module; does it requires you to enable features? • Other async (vendor) driver available? Driver Summary
  38. 38. RSS & NetQueue • NIC support required (RSS / VMDq) • VMDq is the hardware feature, NetQueue is the feature baked into vSphere • RSS & NetQueue similar in basic functionality • RSS uses hashes based on IP/TCP port/MAC • NetQueue uses MAC filters
  39. 39. Without RSS for VXLAN (1 thread per pNIC)
  40. 40. RSS enabled (>1 threads per pNIC)
  41. 41. How to enable RSS (Intel) 1. Unload module: esxcfg-module -u ixgbe 2. Enable inbox: vmkload_mod ixgbe RSS="4,4” Enable async: vmkload_mod ixgbe RSS=“1,1”
  42. 42. Receive throughput with VXLAN using 10GbE
  43. 43. Intel examples: Intel Ethernet products RSS for VXLAN technology Intel Ethernet X520/540 series Scale RSS on VXLAN Outer UDP information Intel Ehternet X710 series Scale RSS on VXLAN Inner or Outer header information X710 series = better at balancing over queues > CPU threads
  44. 44. “What is the maximum performance of the vSphere (D)vSwitch?”
  45. 45. • By default one transmit (Tx) thread per VM • By default, one receive (Netpoll) thread per pNIC • Transmit (Tx) and receive (Netpoll) threads consume CPU cycles • Each additional thread provides capacity (1 thread = 1 core) Network IO CPU consumption
  46. 46. Netpoll Thread %SYS is ± 100% dur ing tes t. pN IC r ec eives . ( this is the N ETPOLL thr ead)
  47. 47. NetQueue Scaling {"name": "vmnic0", "switch": "DvsPortset-0", "id": 33554435, "mac": "38:ea:a7:36:78:8c", "rxmode": 0, "uplink": "true", "txpps": 247, "txmbps": 9.4, "txsize": 4753, "txeps": 0.00, "rxpps": 624291, "rxmbps": 479.9, "rxsize": 96, "rxeps": 0.00, "wdt": [ {"used": 0.00, "ready": 0.00, "wait": 41.12, "runct": 0, "remoteactct": 0, "migct": 0, "overrunct": 0, "afftype": "pcpu", "affval": 39, "name": "242.vmnic0-netpoll-10"}, {"used": 0.00, "ready": 0.00, "wait": 41.12, "runct": 0, "remoteactct": 0, "migct": 0, "overrunct": 0, "afftype": "pcpu", "affval": 39, "name": "243.vmnic0-netpoll-11"}, {"used": 82.56, "ready": 0.49, "wait": 16.95, "runct": 8118, "remoteactct": 1, "migct": 9, "overrunct": 33, "afftype": "pcpu", "affval": 45, "name": "244.vmnic0-netpoll-12"}, {"used": 18.71, "ready": 0.75, "wait": 80.54, "runct": 6494, "remoteactct": 0, "migct": 0, "overrunct": 0, "afftype": "vcpu", "affval": 19302041, "name": "245.vmnic0-netpoll-13"}, {"used": 55.64, "ready": 0.55, "wait": 43.81, "runct": 7491, "remoteactct": 0, "migct": 4, "overrunct": 5, "afftype": "vcpu", "affval": 19299346, "name": "246.vmnic0-netpoll-14"}, {"used": 0.14, "ready": 0.10, "wait": 99.48, "runct": 197, "remoteactct": 6, "migct": 6, "overrunct": 0, "afftype": "vcpu", "affval": 19290577, "name": "247.vmnic0-netpoll-15"}, {"used": 0.00, "ready": 0.00, "wait": 0.00, "runct": 0, "remoteactct": 0, "migct": 0, "overrunct": 0, "afftype": "pcpu", "affval": 45, "name": "1242.vmnic0-0-tx"}, {"used": 0.00, "ready": 0.00, "wait": 0.00, "runct": 0, "remoteactct": 0, "migct": 0, "overrunct": 0, "afftype": "pcpu", "affval": 22, "name": "1243.vmnic0-1-tx"}, {"used": 0.00, "ready": 0.00, "wait": 0.00, "runct": 0, "remoteactct": 0, "migct": 0, "overrunct": 0, "afftype": "pcpu", "affval": 24, "name": "1244.vmnic0-2-tx"}, {"used": 0.00, "ready": 0.00, "wait": 0.00, "runct": 0, "remoteactct": 0, "migct": 0, "overrunct": 0, "afftype": "pcpu", "affval": 39, "name": "1245.vmnic0-3-tx"} ], 3 NetPoll threads are us ed (3 wordlets ) .
  48. 48. Tx Thread PKTGEN is polling, c ons uming near 100% C PU %SYS = ± 1 0 0 % This is the Tx thr ead
  49. 49. • VMXNET3 is required! • example for vNIC2: ethernet2.ctxPerDev = "1“ Additional Tx Thread
  50. 50. Additional Tx thread %SYS = ± 2 0 0 % C PU thr eads in s ame N U MA node as VM {"name": "pktgen_load_test21.eth0", "switch": "DvsPortset-0", "id": 33554619, "mac": "00:50:56:87:10:52", "rxmode": 0, "uplink": "false", "txpps": 689401, "txmbps": 529.5, "txsize": 96, "txeps": 0.00, "rxpps": 609159, "rxmbps": 467.8, "rxsize": 96, "rxeps": 54.09, "wdt": [ {"used": 99.81, "ready": 0.19, "wait": 0.00, "runct": 1176, "remoteactct": 0, "migct": 12, "overrunct": 1176, "afftype": "vcpu", "affval": 15691696, "name": "323.NetWdt-Async-15691696"}, {"used": 99.85, "ready": 0.15, "wait": 0.00, "runct": 2652, "remoteactct": 0, "migct": 12, "overrunct": 12, "afftype": "vcpu", "affval": 15691696, "name": "324.NetWorldlet-Async-33554619"} ], 2 wo r l d l e t s
  51. 51. • Transmit (Tx) and receive (Netpoll) threads can be scaled! • Take the extra CPU cycles for network IO into account! Summary
  52. 52. Q&A
  53. 53. Keep an eye out for our upcoming book! @frankdenneman @NHagoort
  54. 54. @frankdenneman @NHagoort

×