Align To CPU Topology
• Resize vCPU configuration to match core count
• Use vcpu.numa.preferHT
• Use cores per socket (CORRECTLY)
• Attend INF8089 at 5 PM in this room
The Importance of Access Latency
Location of operands CPU Cycles Perspective
CPU Register 1 Brain (Nanosecond)
L1/L3 cache 10 End of this room
Local Memory 100 Entrance of building
Disk 10^6 New York
Industry Moves Toward NVMe
• SSD bandwidth capabilities exceeds current
controller bandwidth
• Protocol inefficiencies dominant contributor to
access time
• NVMe architected from the ground up for non -
volatile memory
• Additional layer of packet processing
• Consumes CPU cycles for each packet for
encapsulation/de-capsulation
• Some of the offload capabilities of the NIC cannot
be used (TCP based)
• VXLAN offloading! (TSO / CSO)
VXLAN
[root@ESXi02:~] vmkload_mod -s bnx2x
vmkload_mod module information
input file: /usr/lib/vmware/vmkmod/bnx2x
Version: Version 1.78.80.v60.12, Build: 2494585, Interface: 9.2 Built on: Feb 5 2015
Build Type: release
License: GPL
Name-space: com.broadcom.bnx2x#9.2.3.0
Required name-spaces:
com.broadcom.cnic_register#9.2.3.0
com.vmware.driverAPI#9.2.3.0
com.vmware.vmkapi#v2_3_0_0
Parameters:
skb_mpool_max: int
Maximum attainable private socket buffer memory pool size for the driver.
skb_mpool_initial: int
Driver's minimum private socket buffer memory pool size.
heap_max: int
Maximum attainable heap size for the driver.
heap_initial: int
Initial heap size allocated for the driver.
disable_feat_preemptible: int
For debug purposes, disable FEAT_PREEMPTIBLE when set to value of 1
disable_rss_dyn: int
For debug purposes, disable RSS_DYN feature when set to value of 1
disable_fw_dmp: int
For debug purposes, disable firmware dump feature when set to value of 1
enable_vxlan_ofld: int
Allow vxlan TSO/CSO offload support.[Default is disabled, 1: enable vxlan offload, 0: disable vxlan offload]
debug_unhide_nics: int
Force the exposure of the vmnic interface for debugging purposes[Default is to hide the nics]1. In SRIOV mode expose the PF
enable_default_queue_filters: int
Allow filters on the default queue. [Default is disabled for non-NPAR mode, enabled by default on NPAR mode]
multi_rx_filters: int
Define the number of RX filters per NetQueue: (allowed values: -1 to Max # of RX filters per NetQueue, -1:
use the default number of RX filters; 0: Disable use of multiple RX filters; 1..Max # the number of RX filters
per NetQueue: will force the number of RX filters to use for NetQueue
........
[root@ESXi01:~] esxcli system module parameters list -m bnx2x
Name Type Value Description
---------------------------- ---- ----- -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
------------------------------------------------------------------------
RSS int Control the number of queues in an RSS pool. Max 4.
autogreeen uint Set autoGrEEEn (0:HW default; 1:force on; 2:force off)
debug uint Default debug msglevel
debug_unhide_nics int Force the exposure of the vmnic interface for debugging purposes[Default is to hide the nics]1. In SRIOV mode expose the PF
disable_feat_preemptible int For debug purposes, disable FEAT_PREEMPTIBLE when set to value of 1
disable_fw_dmp int For debug purposes, disable firmware dump feature when set to value of 1
disable_iscsi_ooo uint Disable iSCSI OOO support
disable_rss_dyn int For debug purposes, disable RSS_DYN feature when set to value of 1
disable_tpa uint Disable the TPA (LRO) feature
dropless_fc uint Pause on exhausted host ring
eee set EEE Tx LPI timer with this value; 0: HW default
enable_default_queue_filters int Allow filters on the default queue. [Default is disabled for non-NPAR mode, enabled by default on NPAR mode]
enable_vxlan_ofld int Allow vxlan TSO/CSO offload support.[Default is disabled, 1: enable vxlan offload, 0: disable vxlan offload]
gre_tunnel_mode uint Set GRE tunnel mode: 0 - NO_GRE_TUNNEL; 1 - NVGRE_TUNNEL; 2 - L2GRE_TUNNEL; 3 - IPGRE_TUNNEL
gre_tunnel_rss uint Set GRE tunnel RSS mode: 0 - GRE_OUTER_HEADERS_RSS; 1 - GRE_INNER_HEADERS_RSS; 2 - NVGRE_KEY_ENTROPY_RSS
heap_initial int Initial heap size allocated for the driver.
heap_max int Maximum attainable heap size for the driver.
int_mode uint Force interrupt mode other than MSI-X (1 INT#x; 2 MSI)
max_agg_size_param uint max aggregation size
mrrs int Force Max Read Req Size (0..3) (for debug)
multi_rx_filters int Define the number of RX filters per NetQueue: (allowed values: -1 to Max # of RX filters per NetQueue, -1: use the default number of RX filters; 0: Disable use of
multiple RX filters; 1..Max # the number of RX filters per NetQueue: will force the number of RX filters to use for NetQueue
native_eee uint
num_queues uint Set number of queues (default is as a number of CPUs)
num_rss_pools int Control the existence of a RSS pool. When 0,RSS pool is disabled. When 1, there will bea RSS pool (given that RSS > 0).
........
• Check the supported features of your pNIC
• Check the HCL for supported features in the driver
module
• Check the driver module; does it requires you to
enable features?
• Other async (vendor) driver available?
Driver Summary
RSS & NetQueue
• NIC support required (RSS / VMDq)
• VMDq is the hardware feature, NetQueue is the
feature baked into vSphere
• RSS & NetQueue similar in basic functionality
• RSS uses hashes based on IP/TCP port/MAC
• NetQueue uses MAC filters
Intel examples:
Intel Ethernet products RSS for VXLAN technology
Intel Ethernet X520/540 series Scale RSS on VXLAN Outer UDP information
Intel Ehternet X710 series Scale RSS on VXLAN Inner or Outer header information
X710 series = better at balancing over queues > CPU threads
“What is the maximum performance of
the vSphere (D)vSwitch?”
• By default one transmit (Tx) thread per VM
• By default, one receive (Netpoll) thread per pNIC
• Transmit (Tx) and receive (Netpoll) threads
consume CPU cycles
• Each additional thread provides capacity
(1 thread = 1 core)
Network IO CPU consumption
Netpoll Thread
%SYS is ± 100% dur ing tes t. pN IC r ec eives .
( this is the N ETPOLL thr ead)