More Related Content
Similar to 2020-ntn-vsphere_performance_principles_bondzio.pdf
Similar to 2020-ntn-vsphere_performance_principles_bondzio.pdf (20)
2020-ntn-vsphere_performance_principles_bondzio.pdf
- 1. DOAG 2020 │ ©2020 VMware, Inc.
ESXi Performance
Principles
DOAG Edition
Valentin Bondzio
Sr. Staff TSE / GSS Premier Services
2020-01-23
- 3. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 3
Brief Intro
@VMware since 2009
Global Support Services / Premier Services
Focus on Resource Management, Performance and Windows Internals
Originally from Berlin, living in Ireland since 2007
And most importantly …
- 6. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc.
Agenda
6
CPU Scheduling and Usage Accounting
The “basics”
“Power Management”
The Good, the Better and the Ugly
ESXi Memory Management
More “basics”
Local resource distribution
What else is running on ESXi
CPU Topology Abstraction
CPU Socket != NUMA node
- 7. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc.
Agenda
7
CPU Scheduling and Usage Accounting
The “basics”
“Power Management”
The Good, the Better and the Ugly
ESXi Memory Management
More “basics”
Local resource distribution
What else is running on ESXi
CPU Topology Abstraction
CPU Socket != NUMA node
+I/O stuff
- 8. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc.
Agenda
8
CPU Scheduling and Usage Accounting
The “basics”
“Power Management”
The Good, the Better and the Ugly
ESXi Memory Management
More “basics”
Local resource distribution
What else is running on ESXi
CPU Topology Abstraction
CPU Socket != NUMA node
+I/O stuff
+vMotion
- 9. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc.
Agenda
9
CPU Scheduling and Usage Accounting
The “basics”
“Power Management”
The Good, the Better and the Ugly
ESXi Memory Management
More “basics”
Local resource distribution
What else is running on ESXi
CPU Topology Abstraction
CPU Socket != NUMA node
+I/O stuff
+vMotion
+Backup
- 10. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 10
Resource guarantees and weighting (shares) on a per VM or “Resource Pool” level
CPU Scheduler Overview
- 11. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 11
Dispatch VMs (its “worlds”) to honor CPU settings (Local)
CPU Scheduler Overview
What does the scheduler do?
vCPU
HT / Core
vCPU
vCPU
vCPU vCPU vCPU
- 12. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 12
Dispatch VMs (its “worlds”) to honor CPU settings (Local)
• For fairness: select VM with the least (consumed CPU time / fair share)
CPU Scheduler Overview
What does the scheduler do?
vCPU
HT / Core
vCPU
vCPU
vCPU vCPU vCPU
- 13. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 13
Dispatch VMs (its “worlds”) to honor CPU settings (Local)
• For fairness: select VM with the least (consumed CPU time / fair share)
• For priority: run latency-sensitive VM (high) before anyone else
CPU Scheduler Overview
What does the scheduler do?
vCPU
HT / Core
vCPU vCPU
vCPU
vCPU vCPU vCPU
IO
- 14. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 14
LLC
Place the worlds / threads on physical CPUs (Global)
CPU Scheduler Overview
What does the scheduler do?
Core
HT 0
HT 1
Core
HT 0
HT 1
Core
HT 0
HT 1
Core
HT 0
HT 1
Core
HT 0
HT 1
Core
HT 0
HT 1
Core
HT 0
HT 1
Core
HT 0
HT 1
LLC
- 15. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 15
LLC
Place the worlds / threads on physical CPUs (Global)
CPU Scheduler Overview
What does the scheduler do?
• To balance load across physical execution contexts (PCPUs)
Core
HT 0
HT 1
Core
HT 0
HT 1
Core
HT 0
HT 1
Core
HT 0
HT 1
Core
HT 0
HT 1
Core
HT 0
HT 1
Core
HT 0
HT 1
Core
HT 0
HT 1
LLC
VM VM VM VM
- 16. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 16
LLC
Place the worlds / threads on physical CPUs (Global)
CPU Scheduler Overview
What does the scheduler do?
• To balance load across physical execution contexts (PCPUs)
• To preserve cache state, minimize migration cost
Core
HT 0
HT 1
Core
HT 0
HT 1
Core
HT 0
HT 1
Core
HT 0
HT 1
Core
HT 0
HT 1
Core
HT 0
HT 1
Core
HT 0
HT 1
Core
HT 0
HT 1
LLC
VM VM VM VM
- 17. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 17
LLC
Place the worlds / threads on physical CPUs (Global)
CPU Scheduler Overview
What does the scheduler do?
• To balance load across physical execution contexts (PCPUs)
• To preserve cache state, minimize migration cost
• To avoid contention from hardware (HT, LLC, etc.) and sibling vCPUs (from the same VM)
Core
HT 0
HT 1
Core
HT 0
HT 1
Core
HT 0
HT 1
Core
HT 0
HT 1
Core
HT 0
HT 1
Core
HT 0
HT 1
Core
HT 0
HT 1
Core
HT 0
HT 1
LLC
VM VM VM VM VM
- 18. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 18
LLC
Place the worlds / threads on physical CPUs (Global)
CPU Scheduler Overview
What does the scheduler do?
• To balance load across physical execution contexts (PCPUs)
• To preserve cache state, minimize migration cost
• To avoid contention from hardware (HT, LLC, etc.) and sibling vCPUs (from the same VM)
• To keep VMs or threads that have frequent communications close to each other
Core
HT 0
HT 1
Core
HT 0
HT 1
Core
HT 0
HT 1
Core
HT 0
HT 1
Core
HT 0
HT 1
Core
HT 0
HT 1
Core
HT 0
HT 1
Core
HT 0
HT 1
LLC
VM VM VM VM
VM VM
VM
- 19. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 19
CPU Scheduler Overview
How does that look?
10:10:29am up 2 days 48 min, 674 worlds, 1 VMs, 2 vCPUs; CPU load average: 0.02, 0.01, 0.01
PCPU USED(%): 0.3 0.1 0.0 0.3 0.2 0.1 0.0 0.0 0.0 0.2 50 50 4.1 0.1 0.1 0.0 0.0 0.0 0.1 0.0 0.0 0.1 0.0 0.0 AVG: 4.4
PCPU UTIL(%): 0.5 0.1 0.1 0.6 0.2 0.2 0.0 0.2 0.0 0.3 100 100 4.2 0.2 0.1 0.1 0.0 0.0 0.1 0.0 0.0 0.2 0.1 0.1 AVG: 8.6
CORE UTIL(%): 0.6 0.7 0.4 0.9 0.3 100 4.3 0.2 0.0 0.1 0.4 0.7 AVG: 9.1
ID GID NAME NWLD %USED %RUN %SYS %WAIT %VMWAIT %RDY %IDLE %OVRLP
96337 148153 vmx 1 0.02 0.01 0.02 61.82 - 37.86 0.00 0.00
96339 148153 NetWorld-VM-96338 1 0.00 0.00 0.00 99.68 - 0.00 0.00 0.00
96340 148153 NUMASchedRemapEpochInitial 1 0.00 0.00 0.00 99.68 - 0.00 0.00 0.00
96341 148153 vmast.96338 1 0.03 0.05 0.00 99.63 - 0.00 0.00 0.00
96343 148153 vmx-vthread-6 1 0.00 0.00 0.00 99.68 - 0.00 0.00 0.00
96344 148153 vmx-mks:Debian86 1 0.00 0.00 0.00 61.55 - 38.13 0.00 0.00
96345 148153 vmx-svga:Debian86 1 0.00 0.00 0.00 99.68 - 0.00 0.00 0.00
96346 148153 vmx-vcpu-0:Debian86 1 62.35 99.68 0.00 0.00 0.00 0.00 0.00 0.05
96348 148153 vmx-vcpu-1:Debian86 1 62.36 99.67 0.00 0.00 0.00 0.01 0.00 0.05
96347 148153 PVSCSI-96338:0 1 0.00 0.00 0.00 99.68 - 0.00 0.00 0.00
96350 148153 vmx-vthread-7:Debian86 1 0.00 0.00 0.00 99.68 - 0.00 0.00 0.00
- 20. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 20
CPU Scheduler Overview
How does that look?
10:10:29am up 2 days 48 min, 674 worlds, 1 VMs, 2 vCPUs; CPU load average: 0.02, 0.01, 0.01
PCPU USED(%): 0.3 0.1 0.0 0.3 0.2 0.1 0.0 0.0 0.0 0.2 50 50 4.1 0.1 0.1 0.0 0.0 0.0 0.1 0.0 0.0 0.1 0.0 0.0 AVG: 4.4
PCPU UTIL(%): 0.5 0.1 0.1 0.6 0.2 0.2 0.0 0.2 0.0 0.3 100 100 4.2 0.2 0.1 0.1 0.0 0.0 0.1 0.0 0.0 0.2 0.1 0.1 AVG: 8.6
CORE UTIL(%): 0.6 0.7 0.4 0.9 0.3 100 4.3 0.2 0.0 0.1 0.4 0.7 AVG: 9.1
ID GID NAME NWLD %USED %RUN %SYS %WAIT %VMWAIT %RDY %IDLE %OVRLP
96337 148153 vmx 1 0.02 0.01 0.02 61.82 - 37.86 0.00 0.00
96339 148153 NetWorld-VM-96338 1 0.00 0.00 0.00 99.68 - 0.00 0.00 0.00
96340 148153 NUMASchedRemapEpochInitial 1 0.00 0.00 0.00 99.68 - 0.00 0.00 0.00
96341 148153 vmast.96338 1 0.03 0.05 0.00 99.63 - 0.00 0.00 0.00
96343 148153 vmx-vthread-6 1 0.00 0.00 0.00 99.68 - 0.00 0.00 0.00
96344 148153 vmx-mks:Debian86 1 0.00 0.00 0.00 61.55 - 38.13 0.00 0.00
96345 148153 vmx-svga:Debian86 1 0.00 0.00 0.00 99.68 - 0.00 0.00 0.00
96346 148153 vmx-vcpu-0:Debian86 1 62.35 99.68 0.00 0.00 0.00 0.00 0.00 0.05
96348 148153 vmx-vcpu-1:Debian86 1 62.36 99.67 0.00 0.00 0.00 0.01 0.00 0.05
96347 148153 PVSCSI-96338:0 1 0.00 0.00 0.00 99.68 - 0.00 0.00 0.00
96350 148153 vmx-vthread-7:Debian86 1 0.00 0.00 0.00 99.68 - 0.00 0.00 0.00
- 22. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 22
CPU Usage Accounting
What states are there
Not Running
Running
- 23. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 23
CPU Usage Accounting
What states are there
Idle
(descheduled)
Running Ready
- 24. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 24
CPU Usage Accounting
In an ideal world
Idle
(descheduled)
Running
Ready
Usage
- 25. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 25
CPU Usage Accounting
What is charged against the VM
Idle
(descheduled)
Running
Ready
Usage Overlap HT busy Frequency ..
- 26. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 26
CPU Usage Accounting
What is charged against the VM
Idle
(descheduled)
Running
Ready
Usage Overlap HT busy Frequency ..
“stolen time”
- 27. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 27
CPU Usage Accounting
What is charged against the VM
Idle
(descheduled)
Running
Ready
Usage Overlap HT busy Frequency ..
“stolen time”
s
y
s
V
m
w
a
I
t
wait
- 28. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 28
CPU Usage Accounting
What is charged against the VM
Idle
(descheduled)
Running
Ready
Usage Overlap HT busy Frequency ..
“stolen time”
s
y
s
V
m
w
a
I
t
wait
C
S
T
P
R
D
Y
M
L
M
T
- 29. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 29
%LAT_C captures the gap between “ideal” execution (demand) and “current” execution.
• “Ideal”: unlimited dedicated cores running at nominal processor frequency
stolen time aka “%LAT_C”
CPU Usage Accounting
Ideal Current
Demand
- 30. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 30
%LAT_C captures the gap between “ideal” execution (demand) and “current” execution.
• “Ideal”: unlimited dedicated cores running at nominal processor frequency
stolen time aka “%LAT_C”
CPU Usage Accounting
Ideal Current
%LAT_C
Demand
- 31. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 31
%LAT_C captures the gap between “ideal” execution (demand) and “current” execution.
• “Ideal”: unlimited dedicated cores running at nominal processor frequency
stolen time aka “%LAT_C”
CPU Usage Accounting
Ideal Current
%LAT_C
Sources of Compute Latency:
• VM resource contention: check %RDY and %CSTP
• Power management (P-State): frequency throttling
• Hardware contention: HTs are in use
Demand
- 32. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 32
Does enabling HT “spawn” a less capable “logical core”?
Intel® Hyper-Threading Technology
Cores and Threads
- 33. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 33
Does enabling HT “spawn” a less capable “logical core”?
Intel® Hyper-Threading Technology
Cores and Threads
“physical” core
“logical”
core
“physical” core
- 34. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 34
Does enabling HT “spawn” a less capable “logical core”?
Intel® Hyper-Threading Technology
Cores and Threads
“physical” core
“logical”
core
“physical” core
- 35. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 35
Does enabling HT “spawn” a less capable “logical core”?
Maybe two slightly less capable “logical” cores?
Intel® Hyper-Threading Technology
Cores and Threads
“physical” core
“logical”
core
“physical” core
- 36. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 36
Does enabling HT “spawn” a less capable “logical core”?
Maybe two slightly less capable “logical” cores?
Intel® Hyper-Threading Technology
Cores and Threads
“physical” core
“logical”
core
“physical” core
“physical” core
“logical”
core0
“logical”
core1
- 37. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 37
Does enabling HT “spawn” a less capable “logical core”?
Maybe two slightly less capable “logical” cores?
Intel® Hyper-Threading Technology
Cores and Threads
“physical” core
“logical”
core
“physical” core
“physical” core
“logical”
core0
“logical”
core1
- 38. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 38
Does enabling HT “spawn” a less capable “logical core”?
Maybe two slightly less capable “logical” cores?
Intel® Hyper-Threading Technology
Cores and Threads
“physical” core
“logical”
core
“physical” core
“physical” core
“logical”
core0
“logical”
core1
- 39. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 39
Intel® Hyper-Threading Technology
Individual throughput reduction, aggregated throughput increase at high load
100
100
~125
- 40. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 40
Intel® Hyper-Threading Technology on ESXi
Throughput reduction is accounted for in USED
100 100
- 41. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 41
Intel® Hyper-Threading Technology on ESXi
Throughput reduction is accounted for in USED
100 100
125
- 42. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 42
Intel® Hyper-Threading Technology on ESXi
Throughput reduction is accounted for in USED
100 100
125
2 x 50 + 12.5 = 62.5
- 43. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 43
Intel® Hyper-Threading Technology on ESXi
Throughput reduction is accounted for in USED
100 100
125
HTEfficiencyShift – Default: 2
HT is:
1: 50 %
2: 25 %
3: 12.5 %
4: 6.25 %
5: 3.125 %
more efficient than no-HT
2 x 50 + 12.5 = 62.5
- 47. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 47
Umbrella Term
Power Management
P-States
Options aka: Power Regulator, CPU Power Management, EIST
- 48. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 48
Umbrella Term
Power Management
P-States
Deep C-States
Options aka: Power Regulator, CPU Power Management, EIST
- 49. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 49
Power Management refresher …
P-State = voltage / frequency point
C-State = idle state, running or varying degrees of stuff turned off
P2
P1
/ NF
P0
/ TB
Frequency
C0 C1-Cn
P3 P4 P5 P6 P7 P8 P9 P10 P11 P12 P13
- 58. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 58
ESXi Power Management Policy
Only affects what’s presented from the BIOS
- 59. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 59
Who controls what? → allow control / use
Power Management refresher …
CPU
BIOS
ESXi
VM /
guest
- 60. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 60
Who controls what? → allow control / use
Power Management refresher …
CPU
BIOS
ESXi
VM /
guest
deep C-
States
P-States
HLT / C1-Cn
P-States
- 61. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 61
Who controls what? → allow control / use
Power Management refresher …
CPU
BIOS
ESXi
VM /
guest
HLT / C1
deep C-
States
P-States
HLT / C1-Cn
P-States
- 62. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 62
Who controls what? → allow control / use
Power Management refresher …
CPU
BIOS
ESXi
VM /
guest
HLT / C1
deep C-
States
P-States
HLT / C1-Cn
P-States
- 63. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 63
ESXi Power Management Policy
Only affects what’s presented from the BIOS (DELL terminology)
- 64. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 64
ESXi Power Management Policy
Only affects what’s presented from the BIOS (DELL terminology)
System Profile → "Performance Per Watt (DAPC)"
"Performance Per Watt (OS)"
"Performance"
"Dense Configuration"
"Custom"
- 65. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 65
ESXi Power Management Policy
Only affects what’s presented from the BIOS (DELL terminology)
System Profile → "Performance Per Watt (DAPC)"
"Performance Per Watt (OS)"
"Performance"
"Dense Configuration"
"Custom"
CPU Power Management → "System DPBM (DAPC)"
"OS DBPM"
"Maximum Performance“
- 66. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 66
ESXi Power Management Policy
Only affects what’s presented from the BIOS (DELL terminology)
System Profile → "Performance Per Watt (DAPC)"
"Performance Per Watt (OS)"
"Performance"
"Dense Configuration"
"Custom"
CPU Power Management → "System DPBM (DAPC)"
"OS DBPM"
"Maximum Performance“
C States → "Enabled"
"Disabled"
- 67. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 67
ESXi Power Management Policy
Only affects what’s presented from the BIOS (DELL terminology)
System Profile → "Performance Per Watt (DAPC)"
"Performance Per Watt (OS)"
"Performance"
"Dense Configuration"
"Custom"
CPU Power Management → "System DPBM (DAPC)"
"OS DBPM"
"Maximum Performance“
C States → "Enabled"
"Disabled"
P-States
P-States
P-States
- 68. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 68
ESXi Power Management Policy
Only affects what’s presented from the BIOS (DELL terminology)
System Profile → "Performance Per Watt (DAPC)"
"Performance Per Watt (OS)"
"Performance"
"Dense Configuration"
"Custom"
CPU Power Management → "System DPBM (DAPC)"
"OS DBPM"
"Maximum Performance“
C States → "Enabled"
"Disabled"
P-States
P-States
P-States
- 69. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 69
ESXi Power Management Policy
Only affects what’s presented from the BIOS (DELL terminology)
System Profile → "Performance Per Watt (DAPC)"
"Performance Per Watt (OS)"
"Performance"
"Dense Configuration"
"Custom"
CPU Power Management → "System DPBM (DAPC)"
"OS DBPM"
"Maximum Performance“
C States → "Enabled"
"Disabled"
P-States
P-States
P-States
C-States
C-States
- 70. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 70
Most likely …
Which BIOS policy am I running on?
- 71. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 71
Most likely “Dynamic”
Most likely …
Which BIOS policy am I running on?
- 72. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 72
Most likely “Dynamic”
Most likely …
Which BIOS policy am I running on?
- 73. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 73
Most likely “Dynamic”
Very likely “Performance”
Most likely …
Which BIOS policy am I running on?
- 74. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 74
Most likely “Dynamic”
Which BIOS policy am I running on?
4:30:58pm up 2 min, 1276 worlds, 0 VMs, 0 vCPUs; CPU load average: 0.02, 0.00, 0.00
Power Usage: 94W, Power Cap: N/A
PSTATE MHZ:
CPU %USED %UTIL %C0 %C1 %C2 %A/MPERF
0 0.3 0.7 1 23 76 50.0
1 0.0 0.0 0 0 100 50.1
2 0.1 0.2 0 6 94 50.0
3 0.0 0.0 0 0 100 50.1
4 5.2 10.4 10 5 85 50.0
5 0.0 0.0 0 5 95 51.0
6 0.0 0.1 0 3 97 50.0
7 0.0 0.0 0 0 100 50.0
8 0.1 0.4 0 16 84 50.0
9 0.0 0.0 0 0 100 50.0
10 0.0 0.0 0 0 100 50.0
(…)
- 75. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 75
Most likely “Dynamic”
Which BIOS policy am I running on?
4:30:58pm up 2 min, 1276 worlds, 0 VMs, 0 vCPUs; CPU load average: 0.02, 0.00, 0.00
Power Usage: 94W, Power Cap: N/A
PSTATE MHZ:
CPU %USED %UTIL %C0 %C1 %C2 %A/MPERF
0 0.3 0.7 1 23 76 50.0
1 0.0 0.0 0 0 100 50.1
2 0.1 0.2 0 6 94 50.0
3 0.0 0.0 0 0 100 50.1
4 5.2 10.4 10 5 85 50.0
5 0.0 0.0 0 5 95 51.0
6 0.0 0.1 0 3 97 50.0
7 0.0 0.0 0 0 100 50.0
8 0.1 0.4 0 16 84 50.0
9 0.0 0.0 0 0 100 50.0
10 0.0 0.0 0 0 100 50.0
(…)
- 76. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 76
Most likely “Performance”
Which BIOS policy am I running on?
4:38:51pm up 1 min, 1276 worlds, 0 VMs, 0 vCPUs; CPU load average: 0.02, 0.00, 0.00
Power Usage: 142W, Power Cap: N/A
PSTATE MHZ:
CPU %USED %UTIL %C0 %C1 %A/MPERF
0 0.0 0.1 0 100 108.3
1 0.1 0.1 0 100 108.4
2 0.1 0.1 0 100 108.3
3 0.0 0.1 0 100 108.4
4 0.0 0.0 0 100 108.3
5 18.0 16.7 17 83 108.3
6 0.0 0.1 0 100 108.4
7 0.2 0.2 0 100 108.3
8 0.0 0.0 0 100 108.3
9 0.1 0.2 0 100 108.3
10 0.0 0.1 0 100 108.3
(…)
- 77. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 77
Most likely “Performance”
Which BIOS policy am I running on?
4:38:51pm up 1 min, 1276 worlds, 0 VMs, 0 vCPUs; CPU load average: 0.02, 0.00, 0.00
Power Usage: 142W, Power Cap: N/A
PSTATE MHZ:
CPU %USED %UTIL %C0 %C1 %A/MPERF
0 0.0 0.1 0 100 108.3
1 0.1 0.1 0 100 108.4
2 0.1 0.1 0 100 108.3
3 0.0 0.1 0 100 108.4
4 0.0 0.0 0 100 108.3
5 18.0 16.7 17 83 108.3
6 0.0 0.1 0 100 108.4
7 0.2 0.2 0 100 108.3
8 0.0 0.0 0 100 108.3
9 0.1 0.2 0 100 108.3
10 0.0 0.1 0 100 108.3
(…)
- 78. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 78
Most likely “Custom”
Which BIOS policy am I running on?
5:09:53pm up 6 min, 827 worlds, 0 VMs, 0 vCPUs; CPU load average: 0.01, 0.01, 0.00
Power Usage: 107W, Power Cap: N/A
PSTATE MHZ: 2401 2400 2300 2200 2100 2000 1900 1800 1700 1600 1500 1400 1300 1200
CPU %USED %UTIL %C0 %C1 %C2 %P0 %P1 %P2 %P3 %P4 %P5 %P6 %P7 %P8 %P9 %P10 %P11 %P12 %P13 %A/MPERF
0 0.2 0.4 0 16 83 62 0 0 0 0 0 0 0 0 0 0 0 0 38 75.2
1 0.0 0.0 0 0 100 0 0 0 0 0 0 0 0 0 0 0 0 0 100 59.3
2 0.0 0.1 0 5 95 15 0 0 0 0 0 0 0 0 0 0 0 0 85 57.9
3 0.0 0.0 0 1 98 38 0 0 0 0 0 0 0 0 0 0 0 0 62 61.5
4 0.0 0.0 0 4 96 5 0 0 0 0 0 0 0 0 0 0 0 0 95 52.0
5 0.0 0.0 0 0 100 0 0 0 0 0 0 0 0 0 0 0 0 0 100 50.3
6 0.1 0.1 0 1 99 7 0 0 0 0 0 0 0 0 0 0 0 0 93 67.7
7 0.1 0.1 0 0 100 99 0 0 0 0 0 0 0 0 0 0 0 0 1 77.7
8 0.0 0.0 0 0 100 10 0 0 0 0 0 0 0 0 0 0 0 0 90 50.8
9 0.0 0.1 0 0 100 0 0 0 0 0 0 0 0 0 0 0 0 0 100 51.6
10 0.0 0.0 0 3 97 8 0 0 0 0 0 0 0 0 0 0 0 0 92 54.0
(…)
- 79. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 79
Most likely “Custom”
Which BIOS policy am I running on?
5:09:53pm up 6 min, 827 worlds, 0 VMs, 0 vCPUs; CPU load average: 0.01, 0.01, 0.00
Power Usage: 107W, Power Cap: N/A
PSTATE MHZ: 2401 2400 2300 2200 2100 2000 1900 1800 1700 1600 1500 1400 1300 1200
CPU %USED %UTIL %C0 %C1 %C2 %P0 %P1 %P2 %P3 %P4 %P5 %P6 %P7 %P8 %P9 %P10 %P11 %P12 %P13 %A/MPERF
0 0.2 0.4 0 16 83 62 0 0 0 0 0 0 0 0 0 0 0 0 38 75.2
1 0.0 0.0 0 0 100 0 0 0 0 0 0 0 0 0 0 0 0 0 100 59.3
2 0.0 0.1 0 5 95 15 0 0 0 0 0 0 0 0 0 0 0 0 85 57.9
3 0.0 0.0 0 1 98 38 0 0 0 0 0 0 0 0 0 0 0 0 62 61.5
4 0.0 0.0 0 4 96 5 0 0 0 0 0 0 0 0 0 0 0 0 95 52.0
5 0.0 0.0 0 0 100 0 0 0 0 0 0 0 0 0 0 0 0 0 100 50.3
6 0.1 0.1 0 1 99 7 0 0 0 0 0 0 0 0 0 0 0 0 93 67.7
7 0.1 0.1 0 0 100 99 0 0 0 0 0 0 0 0 0 0 0 0 1 77.7
8 0.0 0.0 0 0 100 10 0 0 0 0 0 0 0 0 0 0 0 0 90 50.8
9 0.0 0.1 0 0 100 0 0 0 0 0 0 0 0 0 0 0 0 0 100 51.6
10 0.0 0.0 0 3 97 8 0 0 0 0 0 0 0 0 0 0 0 0 92 54.0
(…)
- 80. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 80
The magic of Turbo Boost
Dynamic, supported overclocking
P1
TB1
Frequency
C0
C-State
depth
P1
TB1
C1 C1
C1
- 81. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 81
The magic of Turbo Boost
Dynamic, supported overclocking
P1
TB1
Frequency
C0
C-State
depth
C6
P1
TB1
C1 C1
C1
P1
TB1
C0
P1
TB1
C6 C6
TB2 TB2
TB3 TB3
TB4 TB4
- 82. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 82
The magic of Turbo Boost
Dynamic, supported overclocking
P1
TB1
Frequency
C0
C-State
depth
C6
P1
TB1
C1 C1
C1
P1
TB1
C0
C6 C6
TB2
TB3
TB4
TB5
C6
TB6
TB7
- 83. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 83
Power Policy “playfield"
BIOS “Dynamic” pre Haswell
Bad
Good
Optimal*
- 84. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 84
Power Policy “playfield"
BIOS “Dynamic” pre Haswell
Bad
Good
Optimal*
BIOS “Dynamic” on Haswell+
- 85. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 85
Power Policy “playfield"
BIOS “Dynamic” pre Haswell
BIOS “Maximum / High Performance”
Same* as Custom BIOS + High Performance ESXi policy (with the exception of C1E)
Bad
Good
Optimal*
BIOS “Dynamic” on Haswell+
- 86. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 86
Power Policy “playfield"
BIOS “Dynamic” pre Haswell
BIOS “Maximum / High Performance”
Same* as Custom BIOS + High Performance ESXi policy (with the exception of C1E)
Custom BIOS + Custom or Balanced ESXi policy
Bad
Good
Optimal*
* a few workloads fare better with more deterministic performance
BIOS “Dynamic” on Haswell+
- 88. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 88
Power Policy “playfield"
Custom done right!
Custom BIOS
+
ESXi Balanced
“Dynamic”
- 89. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 89
Power Policy “playfield"
Custom done right!
Custom BIOS
+
ESXi Balanced
“Dynamic”
- 90. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 90
Power Policy “playfield"
Custom done right!
“Performance”
Custom BIOS
+
ESXi Balanced
“Dynamic”
Custom BIOS
+
ESXi Balanced
“Dynamic”
- 91. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 91
Power Policy “playfield"
Custom done right!
“Performance”
Custom BIOS
+
ESXi Balanced
“Dynamic”
Custom BIOS
+
ESXi Balanced
“Dynamic”
- 92. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 92
“Why doesn’t the frequency I see in Task Manager
change?”
Frequently Asked Questions
Power Management Trivia
- 93. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 93
“Why doesn’t the frequency I see in Task Manager
change?”
• Possibility 1: You are looking at the brand string
• Possibility 2: You are looking in the right place
(but the guest OS has no way of knowing)
Frequently Asked Questions
Power Management Trivia
- 94. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 94
“Why doesn’t the frequency I see in Task Manager
change?”
• Possibility 1: You are looking at the brand string
• Possibility 2: You are looking in the right place
(but the guest OS has no way of knowing)
• Base frequency should be:
CPUID.(EAX=16h):EAX[15-00]
– But it seems Windows is getting that from SMBIOS
Frequently Asked Questions
Power Management Trivia
- 95. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 95
“Why doesn’t the frequency I see in Task Manager
change?”
• Possibility 1: You are looking at the brand string
• Possibility 2: You are looking in the right place
(but the guest OS has no way of knowing)
• Base frequency should be:
CPUID.(EAX=16h):EAX[15-00]
– But it seems Windows is getting that from SMBIOS
Frequently Asked Questions
Power Management Trivia
# grep cpuid ./WinTest.vmx
cpuid.16.eax = "----------------0100011100011000"
cpuid.coresPerSocket = "6"
cpuid.brandstring = "VMware (R) SuperSecretCPU (R) @ 18.2 GHz"
- 96. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 96
“I turned off all C-States, why is it still showing C1 in esxtop?”
Frequently Asked Questions
Power Management Trivia
- 97. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 97
“I turned off all C-States, why is it still showing C1 in esxtop?”
Frequently Asked Questions
Power Management Trivia
4:38:51pm up 1 min, 1276 worlds, 0 VMs, 0 vCPUs; CPU load average: 0.02, 0.00, 0.00
Power Usage: 142W, Power Cap: N/A
PSTATE MHZ:
CPU %USED %UTIL %C0 %C1 %A/MPERF
0 0.0 0.1 0 100 108.3
1 0.1 0.1 0 100 108.4
2 0.1 0.1 0 100 108.3
3 0.0 0.1 0 100 108.4
4 0.0 0.0 0 100 108.3
5 18.0 16.7 17 83 108.3
6 0.0 0.1 0 100 108.4
7 0.2 0.2 0 100 108.3
8 0.0 0.0 0 100 108.3
(…)
- 98. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 98
“I turned off all C-States, why is it still showing C1 in esxtop?”
• You can’t turn off C1, you can disable different levels of deep C-States (C2+)
Frequently Asked Questions
Power Management Trivia
4:38:51pm up 1 min, 1276 worlds, 0 VMs, 0 vCPUs; CPU load average: 0.02, 0.00, 0.00
Power Usage: 142W, Power Cap: N/A
PSTATE MHZ:
CPU %USED %UTIL %C0 %C1 %A/MPERF
0 0.0 0.1 0 100 108.3
1 0.1 0.1 0 100 108.4
2 0.1 0.1 0 100 108.3
3 0.0 0.1 0 100 108.4
4 0.0 0.0 0 100 108.3
5 18.0 16.7 17 83 108.3
6 0.0 0.1 0 100 108.4
7 0.2 0.2 0 100 108.3
8 0.0 0.0 0 100 108.3
(…)
- 99. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 99
“I won’t have any issues if I have everything set to High Performance in the BIOS, right?”
Frequently Asked Questions
Power Management Trivia
- 100. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 100
“I won’t have any issues if I have everything set to High Performance in the BIOS, right?”
• No, besides possibly:
– PSU redundancy issues
– Power capping
– Temperature
– Firmware bugs
Frequently Asked Questions
Power Management Trivia
- 101. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 101
“I won’t have any issues if I have everything set to High Performance in the BIOS, right?”
• No, besides possibly:
– PSU redundancy issues
– Power capping
– Temperature
– Firmware bugs
• And definitely …
– No ability to control P-/deep C-States
– No maximum Turbo Boost frequencies …
Frequently Asked Questions
Power Management Trivia
- 102. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 102
“I won’t have any issues if I have everything set to High Performance in the BIOS, right?”
• No, besides possibly:
– PSU redundancy issues
– Power capping
– Temperature
– Firmware bugs
• And definitely …
– No ability to control P-/deep C-States
– No maximum Turbo Boost frequencies …
Frequently Asked Questions
Power Management Trivia
http://www.intel.com/content/dam/www/public/us/en/documents/specification-updates/xeon-e5-v3-spec-update.pdf
- 103. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 103
“I won’t have any issues if I have everything set to High Performance in the BIOS, right?”
• No, besides possibly:
– PSU redundancy issues
– Power capping
– Temperature
– Firmware bugs
• And definitely …
– No ability to control P-/deep C-States
– No maximum Turbo Boost frequencies …
Frequently Asked Questions
Power Management Trivia
http://www.intel.com/content/dam/www/public/us/en/documents/specification-updates/xeon-e5-v3-spec-update.pdf
- 104. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 104
Frequently Asked Questions
Power Management Trivia
- 105. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 105
“I can clearly see C2 in perfmon on Windows,
why are you lying to me?”
Frequently Asked Questions
Power Management Trivia
- 106. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 106
“I can clearly see C2 in perfmon on Windows,
why are you lying to me?”
• This is either a perfmon bug or a choice to
represent
an “enlightened” idle feature
– “Intelligent Timer Tick Distribution (ITTD)”
– needs Windows 2012 R2 / vHW 11
– disable via “monitor.disable_guest_idle_msr = true”
• you really shouldn’t have to ever …
Frequently Asked Questions
Power Management Trivia
- 107. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 107
What runs where and when
The high level picture
CPU
VMK VMM
OS / APPs
- 108. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 108
What runs where and when
Mostly Direct Exec
CPU
OS / APPs
- 109. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 109
What runs where and when
Mostly Direct Exec
PCPU
vCPU
(…)
0xffffffff810a99d0 <+416>: test %eax,%eax
0xffffffff810a99d2 <+418>: je 0xffffffff810a9932 <cpu_startup_entry+258>
0xffffffff810a99d8 <+424>: callq 0xffffffff810c6ed0 <rcu_irq_enter>
0xffffffff810a99dd <+429>: mov 0x82740c(%rip),%r13
0xffffffff810a99e4 <+436>: test %r13,%r13
0xffffffff810a99e7 <+439>: je 0xffffffff810a9a07 <cpu_startup_entry+471>
0xffffffff810a99e9 <+441>: mov 0x0(%r13),%rax
0xffffffff810a99ed <+445>: no0xffffffff810a99f0 <+448>: mov 0x8(%r13),%rdi
0xffffffff810a99f4 <+452>: add $0x10,%r13
0xffffffff810a99f8 <+456>: xor %esi,%esi
0xffffffff810a99fa <+458>: mov %ebp,%edx
0xffffffff810a99fc <+460>: callq *%rax
0xffffffff810a99fe <+462>: mov 0x0(%r13),%rax
0xffffffff810a9a02 <+466>: test %rax,%rax
0xffffffff810a9a05 <+469>: jne 0xffffffff810a99f0 <cpu_startup_entry+448>
0xffffffff810a9a07 <+471>: callq 0xffffffff810c6e40 <rcu_irq_exit>
0xffffffff810a9a0c <+476>: jmpq 0xffffffff810a9932 <cpu_startup_entry+258>
0xffffffff810a9a11 <+481>: nopl 0x0(%rax)
0xffffffff810a9a18 <+488>: mov %gs:0xa0e4,%eax
0xffffffff810a9a20 <+496>: mov %eax,%eax
0xffffffff810a9a22 <+498>: bt %rax,(%rbx)
(…)
- 110. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 110
What runs where and when
What about Idle?
CPU
vCPU
(…)
0xffffffff81052c20 <+0>: sti
0xffffffff81052c21 <+1>: hlt
*loud screeching sound*
- 111. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 111
What runs where and when
VMM traps on the privileged instruction and puts (with VMK) the vCPU to “sleep
CPU
VMM
(…)
0xffffffff81052c20 <+0>: sti
0xffffffff81052c21 <+1>: hlt
*tells VMK to deschedule*
- 112. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 112
What runs where and when
The scheduler decides what next to run
CPU
VMK
- 113. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 113
What runs where and when
E.g. a vCPU / world that is ready to run
CPU
other vCPU
- 114. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 114
What runs where and when
ESXi’s _own_ idle thread
CPU
C1-Cn
- 115. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 115
Manage host physical memory to abstract physical memory away from guest.
Allow memory over-commitment to provide an illusion of virtual DRAM to the guest.
Hide transient host memory pressure from application
Memory Management Overview
Goals and Objectives
Host Physical Memory Guest Memory
ESXi
- 117. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 117
Virtual Memory
Process 0
Process 1
Process 2
Process 3
Process n
- 118. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 118
Virtual Memory
From the process’ point of
view, it provides:
• Contiguous address space
• Isolation / Security
Process 0
Process 1
Process 2
Process 3
Process n
256 TB
256 TB
256 TB
256 TB
256 TB
- 119. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 119
Virtual Memory
From the process’ point of
view, it provides:
• Contiguous address space
• Isolation / Security
Virtual Memory abstracts
Process 0
Process 1
Process 2
Process 3
Process n
Magic
256 TB
256 TB
256 TB
256 TB
256 TB
- 120. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 120
Virtual Memory
From the process’ point of
view, it provides:
• Contiguous address space
• Isolation / Security
Virtual Memory abstracts
• It provides the possibility to
overcommit …
Process 0
Process 1
Process 2
Process 3
Process n
Magic
256 TB
256 TB
256 TB
256 TB
256 TB
- 121. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 121
Virtual Memory
From the process’ point of
view, it provides:
• Contiguous address space
• Isolation / Security
Virtual Memory abstracts
• It provides the possibility to
overcommit …
The process is unaware what
is backing the virtual address
• Physical Memory
• Swap File
Process 0
Process 1
Process 2
Process 3
Process n
Magic
256 TB
256 TB
256 TB
256 TB
256 TB
64 TB
256 TB
- 123. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 123
Virtual Physical Memory
VM 0
VM 1
VM 2
VM 3
VM n
Abstraction …
- 124. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 124
Virtual Physical Memory
From the VMs point of view,
it provides:
• Contiguous address space
• Isolation / Security
VM 0
VM 1
VM 2
VM 3
VM n
6 TB
6 TB
6 TB
6 TB
6 TB
Abstraction …
- 125. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 125
Virtual Physical Memory
From the VMs point of view,
it provides:
• Contiguous address space
• Isolation / Security
Virt. Physical Mem. abstracts
VM 0
VM 1
VM 2
VM 3
VM n
Magic
6 TB
6 TB
6 TB
6 TB
6 TB
Abstraction …
- 126. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 126
Virtual Physical Memory
From the VMs point of view,
it provides:
• Contiguous address space
• Isolation / Security
Virt. Physical Mem. abstracts
• It provides the possibility to
overcommit …
VM 0
VM 1
VM 2
VM 3
VM n
Magic
6 TB
6 TB
6 TB
6 TB
6 TB
Abstraction …
- 127. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 127
Virtual Physical Memory
From the VMs point of view,
it provides:
• Contiguous address space
• Isolation / Security
Virt. Physical Mem. abstracts
• It provides the possibility to
overcommit …
The VM is unaware what is
backing the physical address
• Physical Memory
• Swap File
VM 0
VM 1
VM 2
VM 3
VM n
Magic
6 TB
6 TB
6 TB
6 TB
6 TB
16 TB
*** TB
Abstraction …
- 128. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 128
Virtual Physical Memory
From the VMs point of view,
it provides:
• Contiguous address space
• Isolation / Security
Virt. Physical Mem. abstracts
• It provides the possibility to
overcommit …
The VM is unaware what is
backing the physical address
• Physical Memory
• Swap File
• Or COW, ZIP, BLN
VM 0
VM 1
VM 2
VM 3
VM n
Magic
6 TB
6 TB
6 TB
6 TB
6 TB
16 TB
*** TB
*** TB
Abstraction …
*** TB
- 129. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 129
Virtual Physical Memory
From the VMs point of view,
it provides:
• Contiguous address space
• Isolation / Security
Virt. Physical Mem. abstracts
• It provides the possibility to
overcommit …
The VM is unaware what is
backing the physical address
• Physical Memory
• Swap File
• Or COW, ZIP, BLN
VM 0
VM 1
VM 2
VM 3
VM n
Magic
6 TB
6 TB
6 TB
6 TB
6 TB
16 TB
*** TB
*** TB
Abstraction …
*** TB
*** TB
*
- 130. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 130
Understanding VM memory usage on ESXi
Memory Management Overview
How to Hide Memory Pressure?
- 131. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 131
Understanding VM memory usage on ESXi
Memory Management Overview
How to Hide Memory Pressure?
Total Memory Size
- 132. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 132
Understanding VM memory usage on ESXi
Memory Management Overview
How to Hide Memory Pressure?
Total Memory Size
Allocated Memory
Free Memory
- 133. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 133
Understanding VM memory usage on ESXi
Memory Management Overview
How to Hide Memory Pressure?
Total Memory Size
Allocated Memory
Free Memory
Active Memory
Idle Memory
- 134. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 134
Understanding VM memory usage on ESXi
Reclaim memory from VM if it using more than it is entitled.
• Entitlement depends on configuration (reservation / shares / limit).
• Techniques to reclaim memory from VMs includes:
– Page sharing > Ballooning > Compression > Host swapping
– Breaks host large pages
Memory Management Overview
How to Hide Memory Pressure?
Total Memory Size
Allocated Memory
Free Memory
Active Memory
Idle Memory
- 138. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 138
Active Memory
Not the same as guest stats!
!=
- 139. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 139
Active Memory
ESXi VM level heuristic
• Weighted, moving average
• OS / VMTools independent
• “Memory Sampling”
aka Touched
- 140. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 140
Active Memory
ESXi VM level heuristic
• Weighted, moving average
• OS / VMTools independent
• “Memory Sampling”
Un-maps 100 random pages
over the entire VMs mapped
address space
aka Touched
VM mapped memory
4 KB
100 x
4 KB 4 KB 4 KB 4 KB 4 KB 4 KB …
- 141. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 141
Active Memory
ESXi VM level heuristic
• Weighted, moving average
• OS / VMTools independent
• “Memory Sampling”
Un-maps 100 random pages
over the entire VMs mapped
address space
Monitors R/W for a minute
(access traps to the VMM)
aka Touched
VM mapped memory
4 KB
100 x
4 KB 4 KB 4 KB 4 KB 4 KB 4 KB …
/ min
- 142. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 142
Active Memory
ESXi VM level heuristic
• Weighted, moving average
• OS / VMTools independent
• “Memory Sampling”
Un-maps 100 random pages
over the entire VMs mapped
address space
Monitors R/W for a minute
(access traps to the VMM)
aka Touched
VM mapped memory
4 KB
100 x
4 KB 4 KB 4 KB 4 KB 4 KB 4 KB …
/ min
Read
Read Write
- 143. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 143
Active Memory
ESXi VM level heuristic
• Weighted, moving average
• OS / VMTools independent
• “Memory Sampling”
Un-maps 100 random pages
over the entire VMs mapped
address space
Monitors R/W for a minute
(access traps to the VMM)
After one minute, re-maps all
remaining pages, starts again
aka Touched
VM mapped memory
4 KB
100 x
4 KB 4 KB 4 KB 4 KB 4 KB 4 KB …
/ min
- 158. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 158
Active Memory
Guests working set tends to be between active and consumed
consumed
active guest WS
- 159. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 159
Active Memory
Guest WS might over report (greedy app)
active guest WS
- 160. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 160
Active Memory
But guest WS will not underreport
consumed
active
guest WS
- 161. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 161
Active Memory
Not then end all of guest workload estimation
- 162. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 162
Hierarchical Resource Groups
From an ESXi perspective
host The host owns all resources
- 163. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 163
Hierarchical Resource Groups
From an ESXi perspective
host
system vim iofilters user
The host owns all resources
Those are distributed by
hierarchical resource groups
- 164. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 164
Hierarchical Resource Groups
From an ESXi perspective
host
system vim iofilters user
The host owns all resources
Those are distributed by
hierarchical resource groups
minfree kernel helper ft drivers vmotion …
- 165. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 165
Hierarchical Resource Groups
From an ESXi perspective
host
system vim iofilters user
The host owns all resources
Those are distributed by
hierarchical resource groups
minfree kernel helper ft drivers vmotion …
vmkboot CpuSched Init …
- 166. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 166
Hierarchical Resource Groups
From an ESXi perspective
host
system vim iofilters user
The host owns all resources
Those are distributed by
hierarchical resource groups
Consumers can demand
(request) resources
minfree kernel helper ft drivers vmotion …
vmkboot CpuSched Init …
- 167. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 167
Hierarchical Resource Groups
From an ESXi perspective
host
system vim iofilters user
vCenter shows the sum of all
user resources as:
Total Reservation Capacity
- 168. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 168
Hierarchical Resource Groups
From an ESXi perspective
host
system vim iofilters user
vCenter shows the sum of all
user resources as:
Total Reservation Capacity
Global Resource Pools are
then distributed back to
hosts into Local RPs
• Based on VMs demand
…
pool4
pool3
pool2
pool1
- 169. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 169
Hierarchical Resource Groups
From an ESXi perspective
host
system vim iofilters user
vCenter shows the sum of all
user resources as:
Total Reservation Capacity
Global Resource Pools are
then distributed back to
hosts into Local RPs
• Based on VMs demand
…
vm.vmid
vm.vmid
vm.vmid
…
pool4
pool3
pool2
pool1
- 170. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 170
Hierarchical Resource Groups
From an ESXi perspective
user Local Resource Groups are
created and incrementally
numbered when clients are
instantiated:
• VM starts / vMotions etc.
• Based on VMs demand
…
vm.vmid
vm.vmid
…
pool430
pool231
pool15
pool1
- 171. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 171
Hierarchical Resource Groups
From an ESXi perspective
user Local Resource Groups are
created and incrementally
numbered when clients are
instantiated:
• VM starts / vMotions etc.
• Based on VMs demand
The local hierarchy is equal
to the global one
• Check for VM / LRG siblings
…
vm.vmid
vm.vmid
…
pool430
pool231
pool15
pool1
vm.vmid pool321
vm.vmid vm.vmid …
- 172. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 172
Hierarchical Resource Groups
From an ESXi perspective
user Local Resource Groups are
created and incrementally
numbered when clients are
instantiated:
• VM starts / vMotions etc.
• Based on VMs demand
The local hierarchy is equal
to the global one
• Check for VM / LRG siblings
VM groups have multiple leaf
consumers
• vmid is local, not global
…
vm.vmid
vm.vmid
…
pool430
pool231
pool15
pool1
vm.vmid pool321
vm.vmid vm.vmid …
vmm uw ...
- 173. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 173
cpu.resv Reservation
cpu.limit Limit
cpu.shares Shares
cpu.resvLimit Expandable*
mem.resv Reservation
mem.limit Limit
mem.shares Shares
mem.resvLimit Expandable*
Memory
CPU
Hierarchical Resource Groups
Both Memory and CPU resources
host
system vim iofilters user
…
vm.vmid
vm.vmid
vm.vmid
…
pool4
pool3
pool2
pool1
- 174. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 174
ESXi CLI (via SSH)
… for CPU … for Memory … for comparison
Tools
sched-stats memstats esxtop
- 175. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 175
Tools
cmdline for local groups (no VMs)
sched-stats
- 176. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 176
Tools
cmdline for local groups (no VMs)
# sched-stats -t groups | awk 'NR == 1
sched-stats
- 177. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 177
Tools
cmdline for local groups (no VMs)
# sched-stats -t groups | awk 'NR == 1
|| $2 ~ /^(vm.|pool)[0-9]+/
sched-stats
- 178. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 178
Tools
cmdline for local groups (no VMs)
# sched-stats -t groups | awk 'NR == 1
|| $2 ~ /^(vm.|pool)[0-9]+/
|| /^ +[0-4] /
sched-stats
- 179. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 179
Tools
cmdline for local groups (no VMs)
# sched-stats -t groups | awk 'NR == 1
|| $2 ~ /^(vm.|pool)[0-9]+/
|| /^ +[0-4] /
{printf ("%-10s%-12s%-9s%-6s%-6s%-6s%-9s%-6s%-9s%-9s%-10sn"
,$1, $2, $3, $6, $8, $9, $10, $11, $12, $13, $14)}'
sched-stats
- 180. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 180
Tools
cmdline for local groups (no VMs)
# sched-stats -t groups | awk 'NR == 1
|| $2 ~ /^(vm.|pool)[0-9]+/
|| /^ +[0-4] /
{printf ("%-10s%-12s%-9s%-6s%-6s%-6s%-9s%-6s%-9s%-9s%-10sn"
,$1, $2, $3, $6, $8, $9, $10, $11, $12, $13, $14)}'
vmgid name pgid vsmps amin amax minLimit units ashares resvMHz availMHz
0 host 0 933 1600 1600 1600 pct 4096000 5232 33168
1 system 0 659 10 -1 -1 pct 500 288 33168
2 vim 0 271 4944 -1 -1 mhz 500 4344 33768
3 iofilters 0 3 0 -1 -1 pct 1000 0 33168
4 user 0 0 0 -1 -1 pct 9000 0 33168
sched-stats
- 181. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 181
Tools
cmdline for local groups (no VMs)
# sched-stats -t groups | awk 'NR == 1
|| $2 ~ /^(vm.|pool)[0-9]+/
|| /^ +[0-4] /
{printf ("%-10s%-12s%-9s%-6s%-6s%-6s%-9s%-6s%-9s%-9s%-10sn"
,$1, $2, $3, $6, $8, $9, $10, $11, $12, $13, $14)}'
vmgid name pgid vsmps amin amax minLimit units ashares resvMHz availMHz
0 host 0 933 1600 1600 1600 pct 4096000 5232 33168
1 system 0 659 10 -1 -1 pct 500 288 33168
2 vim 0 271 4944 -1 -1 mhz 500 4344 33768
3 iofilters 0 3 0 -1 -1 pct 1000 0 33168
4 user 0 0 0 -1 -1 pct 9000 0 33168
sched-stats
- 182. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 182
Tools
cmdline for local groups (no VMs)
# sched-stats -t groups | awk 'NR == 1
|| $2 ~ /^(vm.|pool)[0-9]+/
|| /^ +[0-4] /
{printf ("%-10s%-12s%-9s%-6s%-6s%-6s%-9s%-6s%-9s%-9s%-10sn"
,$1, $2, $3, $6, $8, $9, $10, $11, $12, $13, $14)}'
vmgid name pgid vsmps amin amax minLimit units ashares resvMHz availMHz
0 host 0 933 1600 1600 1600 pct 4096000 5232 33168
1 system 0 659 10 -1 -1 pct 500 288 33168
2 vim 0 271 4944 -1 -1 mhz 500 4344 33768
3 iofilters 0 3 0 -1 -1 pct 1000 0 33168
4 user 0 0 0 -1 -1 pct 9000 0 33168
sched-stats
- 183. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 183
Tools
cmdline for local groups (no VMs)
# sched-stats -t groups | awk 'NR == 1
|| $2 ~ /^(vm.|pool)[0-9]+/
|| /^ +[0-4] /
{printf ("%-10s%-12s%-9s%-6s%-6s%-6s%-9s%-6s%-9s%-9s%-10sn"
,$1, $2, $3, $6, $8, $9, $10, $11, $12, $13, $14)}'
vmgid name pgid vsmps amin amax minLimit units ashares resvMHz availMHz
0 host 0 933 1600 1600 1600 pct 4096000 5232 33168
1 system 0 659 10 -1 -1 pct 500 288 33168
2 vim 0 271 4944 -1 -1 mhz 500 4344 33768
3 iofilters 0 3 0 -1 -1 pct 1000 0 33168
4 user 0 0 0 -1 -1 pct 9000 0 33168
sched-stats
- 184. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 184
Tools
cmdline for local groups (with VMs)
# memstats -r group-stats
-g0 -l2
-s gid:name:min:max::conResv:availResv
-u mb
| sed -n '/^-+/,/.*n/p'
---------------------------------------------------------------------------------
gid name min max conResv availResv
---------------------------------------------------------------------------------
0 host 97823 97823 28917 68907
1 system 20024 -1 20008 68923
2 vim 0 -1 3378 68907
3 iofilters 0 -1 25 68907
4 user 0 -1 5490 68907
---------------------------------------------------------------------------------
memstats
- 185. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 185
Tools
cmdline for local groups (with VMs)
# memstats -r group-stats
-g0 -l2
-s gid:name:min:max::conResv:availResv
-u mb
| sed -n '/^-+/,/.*n/p'
---------------------------------------------------------------------------------
gid name min max conResv availResv
---------------------------------------------------------------------------------
0 host 97823 97823 28917 68907
1 system 20024 -1 20008 68923
2 vim 0 -1 3378 68907
3 iofilters 0 -1 25 68907
4 user 0 -1 5490 68907
---------------------------------------------------------------------------------
memstats
- 186. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 186
Tools
cmdline for local groups (with VMs)
# memstats -r group-stats
-g0 -l2
-s gid:name:min:max::conResv:availResv
-u mb
| sed -n '/^-+/,/.*n/p'
---------------------------------------------------------------------------------
gid name min max conResv availResv
---------------------------------------------------------------------------------
0 host 97823 97823 28917 68907
1 system 20024 -1 20008 68923
2 vim 0 -1 3378 68907
3 iofilters 0 -1 25 68907
4 user 0 -1 5490 68907
---------------------------------------------------------------------------------
memstats
- 189. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 189
DIMMs
Socket / Package
(N)UMA
+ terminology
0
- 190. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 190
DIMMs
Socket / Package
NUMA node
(N)UMA
+ terminology
0
- 191. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 191
DIMMs
Socket / Package
NUMA node
(N)UMA
+ terminology
0
1
- 192. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 192
DIMMs
Socket / Package
NUMA node
Socket != NUMA node
(N)UMA
+ terminology
0
2
1
- 193. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 193
DIMMs
Socket / Package
NUMA node
Socket != NUMA node
(N)UMA
+ terminology
0
2
1
- 194. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 194
DIMMs
Socket / Package
NUMA node
Socket != NUMA node
LLC / DIE
(N)UMA
+ terminology
0
2
1
- 195. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 195
DIMMs
Socket / Package
NUMA node
Socket != NUMA node
LLC / DIE
(CoD, SNC / Zen1/2)
(N)UMA
+ terminology
0
2
1
- 196. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 196
Importance of Memory Access Latency
Jim Gray’s Storage Latency Analogy (slightly adapted)
- 197. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 197
You want to calculate a + b and the operands are in:
Importance of Memory Access Latency
Jim Gray’s Storage Latency Analogy (slightly adapted)
- 198. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 198
You want to calculate a + b and the operands are in:
Importance of Memory Access Latency
Jim Gray’s Storage Latency Analogy (slightly adapted)
your head
- 199. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 199
You want to calculate a + b and the operands are in:
Importance of Memory Access Latency
Jim Gray’s Storage Latency Analogy (slightly adapted)
your head
=
register / 1 cycle
- 200. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 200
You want to calculate a + b and the operands are in:
Importance of Memory Access Latency
Jim Gray’s Storage Latency Analogy (slightly adapted)
your head
=
register / 1 cycle
this room
- 201. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 201
You want to calculate a + b and the operands are in:
Importance of Memory Access Latency
Jim Gray’s Storage Latency Analogy (slightly adapted)
your head
=
register / 1 cycle
this room
=
L1-L2 / 10 cycles
- 202. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 202
You want to calculate a + b and the operands are in:
Importance of Memory Access Latency
Jim Gray’s Storage Latency Analogy (slightly adapted)
your head
=
register / 1 cycle
this room
=
L1-L2 / 10 cycles
this building
- 203. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 203
You want to calculate a + b and the operands are in:
Importance of Memory Access Latency
Jim Gray’s Storage Latency Analogy (slightly adapted)
your head
=
register / 1 cycle
this room
=
L1-L2 / 10 cycles
this building
=
DRAM / 100 cycles
- 204. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 204
You want to calculate a + b and the operands are in:
Importance of Memory Access Latency
Jim Gray’s Storage Latency Analogy (slightly adapted)
your head
=
register / 1 cycle
this room
=
L1-L2 / 10 cycles
this building
=
DRAM / 100 cycles
Finland + Algeria
- 205. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 205
You want to calculate a + b and the operands are in:
Importance of Memory Access Latency
Jim Gray’s Storage Latency Analogy (slightly adapted)
your head
=
register / 1 cycle
this room
=
L1-L2 / 10 cycles
this building
=
DRAM / 100 cycles
Finland + Algeria
=
Disk / 10^6 cycles
- 206. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 206
Importance of Memory Access Latency
Numbers based on Intel i7-3770 @ 3.4 GHz
- 207. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 207
Importance of Memory Access Latency
Numbers based on Intel i7-3770 @ 3.4 GHz
- 208. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 208
Importance of Memory Access Latency
Numbers based on Intel i7-3770 @ 3.4 GHz
- 209. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 209
Importance of Memory Access Latency
Numbers based on Intel i7-3770 @ 3.4 GHz
access size cycles ns
L3 / Last Level Cache
core
0
core
1
core
2
core
3
L1 L1 L1 L1
L2 L2 L2 L2
IMC QPI
DRAM
- 210. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 210
Importance of Memory Access Latency
Numbers based on Intel i7-3770 @ 3.4 GHz
access size cycles ns
L1 32 KB 4-5 1.5
L3 / Last Level Cache
core
0
core
1
core
2
core
3
L1 L1 L1 L1
L2 L2 L2 L2
IMC QPI
DRAM
- 211. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 211
Importance of Memory Access Latency
Numbers based on Intel i7-3770 @ 3.4 GHz
access size cycles ns
L1 32 KB 4-5 1.5
L2 256 KB 12 4
L3 / Last Level Cache
core
0
core
1
core
2
core
3
L1 L1 L1 L1
L2 L2 L2 L2
IMC QPI
DRAM
- 212. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 212
Importance of Memory Access Latency
Numbers based on Intel i7-3770 @ 3.4 GHz
access size cycles ns
L1 32 KB 4-5 1.5
L2 256 KB 12 4
L3 8 MB 30 10
L3 / Last Level Cache
core
0
core
1
core
2
core
3
L1 L1 L1 L1
L2 L2 L2 L2
IMC QPI
DRAM
- 213. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 213
Importance of Memory Access Latency
Numbers based on Intel i7-3770 @ 3.4 GHz
access size cycles ns
L1 32 KB 4-5 1.5
L2 256 KB 12 4
L3 8 MB 30 10
L3 / Last Level Cache
core
0
core
1
core
2
core
3
L1 L1 L1 L1
L2 L2 L2 L2
IMC QPI
DRAM
- 214. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 214
Importance of Memory Access Latency
Numbers based on Intel i7-3770 @ 3.4 GHz
access size cycles ns
L1 32 KB 4-5 1.5
L2 256 KB 12 4
L3 8 MB 30 10
DRAM GBs 30+ 66*
L3 / Last Level Cache
core
0
core
1
core
2
core
3
L1 L1 L1 L1
L2 L2 L2 L2
IMC QPI
DRAM
- 215. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 215
N(UMA)
All sockets share the FSB to
the Northbridge and hence
the bandwidth
• NB also known as “Memory
Controller Hub” or MCH
Uniform memory access
latency between every CPU
and every DIMM
Von Neumann Bottleneck
getting worse with faster
CPUs / more RAM
Pre-Opteron/Nehalem
1 2
NB
0 3
- 216. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 216
N(UMA)
All sockets share the FSB to
the Northbridge and hence
the bandwidth
• NB also known as “Memory
Controller Hub” or MCH
Uniform memory access
latency between every CPU
and every DIMM
Von Neumann Bottleneck
getting worse with faster
CPUs / more RAM
Pre-Opteron/Nehalem
1 2
NB
0 3
- 217. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 217
N(UMA)
All sockets share the FSB to
the Northbridge and hence
the bandwidth
• NB also known as “Memory
Controller Hub” or MCH
Uniform memory access
latency between every CPU
and every DIMM
Von Neumann Bottleneck
getting worse with faster
CPUs / more RAM
Pre-Opteron/Nehalem
1 2
NB
0 3
- 218. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 218
0 1
3 2
NUMA
Every NUMA node has its
own Integrated Memory
Controller (IMC)
• Some AMD’s (Bulldozer and
newer) have two nodes per
socket / package
Remote access has to go
over the interconnect and
remote CPU’s IMC
• This adds additional latency
making local and remote
access Non-Uniform
Post-Opteron/Nehalem
- 219. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 219
0 1
3 2
NUMA
Every NUMA node has its
own Integrated Memory
Controller (IMC)
• Some AMD’s (Bulldozer and
newer) have two nodes per
socket / package
Remote access has to go
over the interconnect and
remote CPU’s IMC
• This adds additional latency
making local and remote
access Non-Uniform
Post-Opteron/Nehalem
- 220. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 220
0 1
3 2
NUMA
Every NUMA node has its
own Integrated Memory
Controller (IMC)
• Some AMD’s (Bulldozer and
newer) have two nodes per
socket / package
Remote access has to go
over the interconnect and
remote CPU’s IMC
• This adds additional latency
making local and remote
access Non-Uniform
Post-Opteron/Nehalem
- 221. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 221
0 1
3 2
NUMA
Every NUMA node has its
own Integrated Memory
Controller (IMC)
• Some AMD’s (Bulldozer and
newer) have two nodes per
socket / package
Remote access has to go
over the interconnect and
remote CPU’s IMC
• This adds additional latency
making local and remote
access Non-Uniform
Post-Opteron/Nehalem
- 222. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 222
0 1
3 2
NUMA
2 QPI / IC
CPU
/ns
0 1 2 3
0 72 291 323 294
1 296 72 293 315
2 319 296 71 296
3 290 325 300 71
local adjacent “routed”
- 223. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 223
CPU
/ns
0 1 2 3
0 136 194 198 201
1 194 135 194 196
2 201 194 135 200
3 202 197 198 135
0 1
3 2
NUMA
3 QPI / IC
local adjacent
- 224. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 224
0 1
3 2
NUMA
Basic Migration Types
NUMA clients (vCPUs +
memory) are kept local to a
home node
Balance migrations re-assign
the home node, memory
follows vCPUs!
Locality migrations set home
node to where the most
memory resides
- 225. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 225
0 1
3 2
NUMA
Basic Migration Types
NUMA clients (vCPUs +
memory) are kept local to a
home node
Balance migrations re-assign
the home node, memory
follows vCPUs!
Locality migrations set home
node to where the most
memory resides
- 226. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 226
0 1
3 2
NUMA
Basic Migration Types
NUMA clients (vCPUs +
memory) are kept local to a
home node
Balance migrations re-assign
the home node, memory
follows vCPUs!
Locality migrations set home
node to where the most
memory resides
- 227. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 227
0 1
3 2
NUMA
Basic Migration Types
NUMA clients (vCPUs +
memory) are kept local to a
home node
Balance migrations re-assign
the home node, memory
follows vCPUs!
Locality migrations set home
node to where the most
memory resides
- 228. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 228
0 1
3 2
NUMA
Basic Migration Types
NUMA clients (vCPUs +
memory) are kept local to a
home node
Balance migrations re-assign
the home node, memory
follows vCPUs!
Locality migrations set home
node to where the most
memory resides
- 229. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 229
0 1
3 2
NUMA
Basic Migration Types
NUMA clients (vCPUs +
memory) are kept local to a
home node
Balance migrations re-assign
the home node, memory
follows vCPUs!
Locality migrations set home
node to where the most
memory resides
- 230. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 230
0 1
3 2
NUMA
Basic Migration Types
NUMA clients (vCPUs +
memory) are kept local to a
home node
Balance migrations re-assign
the home node, memory
follows vCPUs!
Locality migrations set home
node to where the most
memory resides
- 231. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 231
NUMA migration incurs significant cost.
• All pages need to be remapped, i.e. %localMemory initially drops to 0% and slowly recovers.
• Copying memory pages across NUMA boundaries cost memory bandwidth.
NUMA Scheduler Consideration
Local Contention vs Remote Access
- 232. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 232
NUMA migration incurs significant cost.
• All pages need to be remapped, i.e. %localMemory initially drops to 0% and slowly recovers.
• Copying memory pages across NUMA boundaries cost memory bandwidth.
NUMA Scheduler Consideration
Local Contention vs Remote Access
0
10
20
30
40
50
60
70
80
90
100
0
1
2
3
4
5
6
7
8
9
1
4
7
10
13
16
19
22
25
28
31
34
37
40
43
46
49
52
55
58
61
64
67
70
73
76
79
82
85
%Local-Mem
#Migrations
time (30sec)
Memory Locality & NUMA-migrations
(with NUMA Migration)
%local #migrations
0
20
40
60
80
100
120
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
1
4
7
10
13
16
19
22
25
28
31
34
37
40
43
46
49
52
55
58
61
64
67
70
73
76
79
82
85
%Local
#Migrations
time (30sec units)
Memory Locality & NUMA-migrations
(No NUMA Migration)
%local #migrations
- 233. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 233
We had good(ish) reasonsos
vNUMA auto-sizing history
(…) 2007 2008 2009 2010 2011 2012 2013 2014 (…)
ESX 4.0 ESX 4.1 ESXi 5.0 ESXi 5.1 ESXi 5.5 ESXi 6.0
ESX 3.5
- 234. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 234
We had good(ish) reasonsos
vNUMA auto-sizing history
(…) 2007 2008 2009 2010 2011 2012 2013 2014 (…)
My starting data @ VMware
ESX 4.0 ESX 4.1 ESXi 5.0 ESXi 5.1 ESXi 5.5 ESXi 6.0
ESX 3.5
- 235. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 235
We had good(ish) reasonsos
vNUMA auto-sizing history
(…) 2007 2008 2009 2010 2011 2012 2013 2014 (…)
cpuid.coresPerSocket
My starting data @ VMware
ESX 4.0 ESX 4.1 ESXi 5.0 ESXi 5.1 ESXi 5.5 ESXi 6.0
ESX 3.5
- 236. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 236
CPS in GUI & supported
We had good(ish) reasonsos
vNUMA auto-sizing history
(…) 2007 2008 2009 2010 2011 2012 2013 2014 (…)
cpuid.coresPerSocket
My starting data @ VMware
ESX 4.0 ESX 4.1 ESXi 5.0 ESXi 5.1 ESXi 5.5 ESXi 6.0
ESX 3.5
- 237. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 237
Max vSMP 8
CPS in GUI & supported
We had good(ish) reasonsos
vNUMA auto-sizing history
(…) 2007 2008 2009 2010 2011 2012 2013 2014 (…)
cpuid.coresPerSocket
My starting data @ VMware
ESX 4.0 ESX 4.1 ESXi 5.0 ESXi 5.1 ESXi 5.5 ESXi 6.0
ESX 3.5
- 238. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 238
Max vSMP 8
CPS in GUI & supported
We had good(ish) reasonsos
vNUMA auto-sizing history
(…) 2007 2008 2009 2010 2011 2012 2013 2014 (…)
cpuid.coresPerSocket vNUMA
My starting data @ VMware
ESX 4.0 ESX 4.1 ESXi 5.0 ESXi 5.1 ESXi 5.5 ESXi 6.0
ESX 3.5
- 239. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 239
Max vSMP 32
Max vSMP 8
CPS in GUI & supported
We had good(ish) reasonsos
vNUMA auto-sizing history
(…) 2007 2008 2009 2010 2011 2012 2013 2014 (…)
cpuid.coresPerSocket vNUMA
My starting data @ VMware
ESX 4.0 ESX 4.1 ESXi 5.0 ESXi 5.1 ESXi 5.5 ESXi 6.0
ESX 3.5
- 240. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 240
numa.vcpu.min = 9
Max vSMP 32
Max vSMP 8
CPS in GUI & supported
We had good(ish) reasonsos
vNUMA auto-sizing history
(…) 2007 2008 2009 2010 2011 2012 2013 2014 (…)
cpuid.coresPerSocket vNUMA
My starting data @ VMware
ESX 4.0 ESX 4.1 ESXi 5.0 ESXi 5.1 ESXi 5.5 ESXi 6.0
ESX 3.5
- 241. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 241
numa.vcpu.min = 9
Max vSMP 32
Max vSMP 8
CPS in GUI & supported
We had good(ish) reasonsos
vNUMA auto-sizing history
(…) 2007 2008 2009 2010 2011 2012 2013 2014 (…)
cpuid.coresPerSocket vNUMA
My starting data @ VMware
ESX 4.0 ESX 4.1 ESXi 5.0 ESXi 5.1 ESXi 5.5 ESXi 6.0
ESX 3.5
cpuid.coresPerSocket → numa.vcpu.maxPerVirtualNode
- 242. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 242
VPD doesn’t affect ESXi sched.
PPD does define ESXi NUMA sched.
• AKA NUMA client
Doesn’t influence ESXi sched.
Might influence Guest / App sched.
CPU Topology
vNUMA Topology
Two level’s of abstraction
Virtual and Physical Proximity Domains
VPD
PPD
CPS
- 243. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 243
VPD doesn’t affect ESXi sched.
PPD does define ESXi NUMA sched.
• AKA NUMA client
Doesn’t influence ESXi sched.
Might influence Guest / App sched.
CPU Topology
vNUMA Topology
Two level’s of abstraction
Virtual and Physical Proximity Domains
VPD
PPD
C
PPD
VPD
C C C C C
- 244. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 244
VPD doesn’t affect ESXi sched.
PPD does define ESXi NUMA sched.
• AKA NUMA client
Doesn’t influence ESXi sched.
Might influence Guest / App sched.
CPU Topology
vNUMA Topology
Two level’s of abstraction
Virtual and Physical Proximity Domains
VPD
PPD
CPS
PPD
- 245. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 245
VPD doesn’t affect ESXi sched.
PPD does define ESXi NUMA sched.
• AKA NUMA client
Doesn’t influence ESXi sched.
Might influence Guest / App sched.
CPU Topology
vNUMA Topology
Two level’s of abstraction
Virtual and Physical Proximity Domains
VPD
PPD
CPS
PPD
VPD
- 247. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 247
Running Compute Intensive Benchmark
Case Study: Project Pacific
https://blogs.vmware.com/performance/2019/10/how-does-project-pacific-deliver-8-better-
performance-than-bare-metal.html
- 248. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 248
Running Compute Intensive Benchmark
Case Study: Project Pacific
43.5% local memory access
on native Linux
https://blogs.vmware.com/performance/2019/10/how-does-project-pacific-deliver-8-better-
performance-than-bare-metal.html
- 249. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 249
Running Compute Intensive Benchmark
Case Study: Project Pacific
43.5% local memory access
on native Linux
99.2% local memory
access on Pacific Cluster
https://blogs.vmware.com/performance/2019/10/how-does-project-pacific-deliver-8-better-
performance-than-bare-metal.html
- 251. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc.
vSphere 6.0 achieves Line Rate throughput on a 40GigE NIC
Throughput ↑ from 20.5 to 35.5 Gbps
CPU Used ↓ from 36 to 13 % (per Gbps)
Herculean Network IO
- 252. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 252
By default, vSphere tunes for lower CPU usage by batching I/O operations
Virtual NIC coalescing - recap
Trading CPU Cycles for Lower Latency
Network
- 253. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 253
By default, vSphere tunes for lower CPU usage by batching I/O operations
• By default, that is also the case for the RX and TX path on vNICs (here vmxnet3)
• When disabled:
– Every packet received interrupts immediately
– Every packet will be issued immediately
Virtual NIC coalescing - recap
Trading CPU Cycles for Lower Latency
Network
- 254. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 254
By default, vSphere tunes for lower CPU usage by batching I/O operations
• By default, that is also the case for the RX and TX path on vNICs (here vmxnet3)
• When disabled:
– Every packet received interrupts immediately
– Every packet will be issued immediately
Virtual NIC coalescing - recap
Trading CPU Cycles for Lower Latency
Network
- 255. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 255
By default, vSphere tunes for lower CPU usage by batching I/O operations
• By default, that is also the case for the RX and TX path on vNICs (here vmxnet3)
• When disabled:
– Every packet received interrupts immediately
– Every packet will be issued immediately
Virtual NIC coalescing - recap
Trading CPU Cycles for Lower Latency
1
1
Network
- 256. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 256
By default, vSphere tunes for lower CPU usage by batching I/O operations
• By default, that is also the case for the RX and TX path on vNICs (here vmxnet3)
• When disabled:
– Every packet received interrupts immediately
– Every packet will be issued immediately
Virtual NIC coalescing - recap
Trading CPU Cycles for Lower Latency
1
1
Network
- 257. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 257
By default, vSphere tunes for lower CPU usage by batching I/O operations
• By default, that is also the case for the RX and TX path on vNICs (here vmxnet3)
• When disabled:
– Every packet received interrupts immediately
– Every packet will be issued immediately
Virtual NIC coalescing - recap
Trading CPU Cycles for Lower Latency
1 2 3 4 5 6 7 8 9 .. .. ..
1 2 3 4 5 6 7 8 9 .. .. ..
Network
- 258. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 258
Possible Latency Optimizations
Network latency optimization on the VM level
Network
- 259. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 259
Disable LRO (Large Receive Offload)
• Host wide: “Net.Vmxnet3SwLRO = false”
• Small packets are no longer concatenated into larger ones
Possible Latency Optimizations
Network latency optimization on the VM level
Network
- 260. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 260
Disable LRO (Large Receive Offload)
• Host wide: “Net.Vmxnet3SwLRO = false”
• Small packets are no longer concatenated into larger ones
Disable (vNIC) coalescing
• VMX option: “ethernetX.coalescingScheme = disabled”
• Issue TX immediately and immediately interrupt on RX
Possible Latency Optimizations
Network latency optimization on the VM level
Network
- 261. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 261
Disable LRO (Large Receive Offload)
• Host wide: “Net.Vmxnet3SwLRO = false”
• Small packets are no longer concatenated into larger ones
Disable (vNIC) coalescing
• VMX option: “ethernetX.coalescingScheme = disabled”
• Issue TX immediately and immediately interrupt on RX
Disable Dynamic queueing
• NetQueue feature, load balances and combines less used queues
• Disabling guarantees a single queue for the VM
Possible Latency Optimizations
Network latency optimization on the VM level
Network
- 262. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc.
Network – Recommendations
Use vmxnet3 Guest Network Driver
Very efficient and required for maximum performance=
Evaluate Disabling Interrupt Coalescing
Default mechanism may induce small amounts of latency in favor of throughout
It’s a 10Gb+ World
1Gb saturation is real, more bandwidth required today, especially in light of vSAN, MonsterVM vMotion
Use Latency Sensitivity High ‘Cautiously’
While it can reduce latency and jitter in the 10us use case, it comes at a cost with core reservations, etc
Requires FULL CPU and MEM reservation – or it won’t work and won’t tell you
- 263. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc.
Herculean Storage IO
• More than 1 Million IOPs from 1 VM
Hypervisor: vSphere 5.1
Server: HP DL380 Gen8
CPU: 2 x Intel Xeon E5-2690, HT disabled
Memory: 256GB
HBAs: 5 x QLE2562
Storage: 2 x Violin Memory 6616 Flash Arrays
VM: Windows Server 2008 R2, 8 vCPUs and 48GB.
Iometer Config: 4K IO size w/ 16 workers
Reference: http://blogs.vmware.com/performance/2012/08/1millioniops-on-1vm.html
- 264. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc.
Bare-metal to virtual TPC-C* gap then and now(ish)
* Non-complaint,
fair-use
implementation of
the workload on
Oracle 12c. Not
comparable to
official results.
- 265. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc.
Bare-metal to virtual TPC-C* gap then and now(ish)
* Non-complaint,
fair-use
implementation of
the workload on
Oracle 12c. Not
comparable to
official results.
-
30
%
- 266. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc.
Bare-metal to virtual TPC-C* gap then and now(ish)
* Non-complaint,
fair-use
implementation of
the workload on
Oracle 12c. Not
comparable to
official results.
-
30
%
-
10%
- 267. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc.
Scaling out vs. up on the same host to amortize overhead
1416.37
0
200
400
600
800
1000
1200
1400
1600
Baremetal tpsE
Throughput
Score
TPC-E on native HP Proliant DL 385 G8
http://blogs.vmware.com/vsphere/2013/09/worlds-first-tpc-vms-benchmark-result.html
http://www.tpc.org/4064 / http://www.tpc.org/5201
470.31
468.11
457.55
0
200
400
600
800
1000
1200
1400
1600
Virtual tpsE of 3 VMs running TPC-VMS
Throughput
Score
TPC-VMS on virtualized HP Proliant DL 385 G8
VM3
VM2
VM1
- 268. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc.
Storage I/O latencies are higher in virtual
The Problem - with Database Logs
- 269. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc.
Storage I/O latencies are higher in virtual
Usually not a noticeable problem for Data IO
• Long (5+ ms) latency on HDDs
• Random I/O, Many threads banging on the same spindle(s)
• Even some SSDs are ~1ms
The Problem - with Database Logs
- 270. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc.
Storage I/O latencies are higher in virtual
Usually not a noticeable problem for Data IO
• Long (5+ ms) latency on HDDs
• Random I/O, Many threads banging on the same spindle(s)
• Even some SSDs are ~1ms
Not OK for Redo Log access
The Problem - with Database Logs
- 271. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc.
Storage I/O latencies are higher in virtual
Usually not a noticeable problem for Data IO
• Long (5+ ms) latency on HDDs
• Random I/O, Many threads banging on the same spindle(s)
• Even some SSDs are ~1ms
Not OK for Redo Log access
• Short (<<1ms latency)
• Sequential I/O, Single-threaded, Write-Only
• Typically a write-back cache in the HBA or the array
• Check the Top 5 wait events in Oracle AWR or equivalent database health reports
The Problem - with Database Logs
- 272. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc.
The Solution - Trade CPU Cycles for Lower Latency
- 273. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc.
By default, vSphere tunes for lower CPU usage by batching I/O operations
The Solution - Trade CPU Cycles for Lower Latency
- 274. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc.
By default, vSphere tunes for lower CPU usage by batching I/O operations
But when sensing low IOPS, vSphere stops batching and switches to low latency mode
The Solution - Trade CPU Cycles for Lower Latency
- 275. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc.
By default, vSphere tunes for lower CPU usage by batching I/O operations
But when sensing low IOPS, vSphere stops batching and switches to low latency mode
• For lowest latency, put the log device on a vSCSI adapter by itself
• Batching and coalescing is on a per-vSCSI bus, not device(!) basis
• Explicit tuning can prove more effective though
The Solution - Trade CPU Cycles for Lower Latency
- 276. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc.
Explicit workaround on the issuing path:
• Default is Asynchronous request passing from vSCSI adapter to VMKernel
– But dynamically adjust for low IOPS case
The Solution - Trade CPU Cycles for Lower Latency
- 277. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc.
Explicit workaround on the issuing path:
• Default is Asynchronous request passing from vSCSI adapter to VMKernel
– But dynamically adjust for low IOPS case
• To explicitly force immediate initiation of I/O operation (sync)
– scsiNNN.reqCallThreshold = “1”
The Solution - Trade CPU Cycles for Lower Latency
- 278. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc.
Explicit workaround on the issuing path:
• Default is Asynchronous request passing from vSCSI adapter to VMKernel
– But dynamically adjust for low IOPS case
• To explicitly force immediate initiation of I/O operation (sync)
– scsiNNN.reqCallThreshold = “1”
Explicit workaround on the completion path:
• Default is coalescing of Virtual Interrupts
– vSphere automatically suspends interrupt coalescing for low IOPS workloads
The Solution - Trade CPU Cycles for Lower Latency
- 279. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc.
Explicit workaround on the issuing path:
• Default is Asynchronous request passing from vSCSI adapter to VMKernel
– But dynamically adjust for low IOPS case
• To explicitly force immediate initiation of I/O operation (sync)
– scsiNNN.reqCallThreshold = “1”
Explicit workaround on the completion path:
• Default is coalescing of Virtual Interrupts
– vSphere automatically suspends interrupt coalescing for low IOPS workloads
• Or explicitly disable Virtual Interrupt Coalescing
– For PVSCSI: scsiNNN.intrCoalescing = “False”
– For other vHBAs: scsiNNN.ic = “False”
The Solution - Trade CPU Cycles for Lower Latency
- 280. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc.
VMFS on par or faster than RDM (approx. 1%)
Reference: http://www.vmware.com/techpapers/2017/sql-server-vsphere65-perf.html
Myth Revisited: RDM versus VMFS
- 281. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc.
Storage – Recommendations
Use Multiple vSCSI Adapters
Allows for more queues and I/O’s in flight
Use pvscsi vSCSI Adapter
More efficient I/O’s per cycle
Don’t Use RDM’s
Unless needed for shared disk clustering, no longer a performance advantage
VMware Snapshots Should Be ‘Temporary’
Despite constant performance improvements, snapshots should not live forever, Co-Stop, Syncronous
Leverage Your Storage OEM’s Integration Guide
They provide necessary guidance around items like multi-pathing
- 283. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 283
vMotion Workflow
vMotion Network
Datastore
Source
ESXi Host
Destination
ESXi Host
- 284. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 284
vMotion Workflow
Create VM on Destination
1
vMotion Network
Datastore
Source
ESXi Host
Destination
ESXi Host
- 285. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 285
Copy Memory
vMotion Workflow
Create VM on Destination
1
2
vMotion Network
Datastore
Source
ESXi Host
Destination
ESXi Host
- 286. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 286
Quiesce VM on Source
Copy Memory
vMotion Workflow
Create VM on Destination
1
2
3
vMotion Network
Datastore
Source
ESXi Host
Destination
ESXi Host
- 287. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 287
Quiesce VM on Source
Copy Memory
vMotion Workflow
Create VM on Destination
1
2
3
Transfer Device State
4 vMotion Network
Datastore
Source
ESXi Host
Destination
ESXi Host
- 288. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 288
Quiesce VM on Source
Copy Memory
vMotion Workflow
Create VM on Destination
1
2
3
Transfer Device State
Resume VM on Destination
4
5
vMotion Network
Datastore
Source
ESXi Host
Destination
ESXi Host
- 289. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 289
Quiesce VM on Source
Copy Memory
vMotion Workflow
Create VM on Destination
1
2
3
Transfer Device State
Resume VM on Destination
4
5
vMotion Network
Datastore
Source
ESXi Host
Destination
ESXi Host
Execution
Switchover
Time of 1 sec
- 290. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 290
Quiesce VM on Source
Copy Memory
vMotion Workflow
Create VM on Destination
1
2
3
Transfer Device State
Resume VM on Destination
Power Off VM on Source
4
5
6
vMotion Network
Datastore
Source
ESXi Host
Destination
ESXi Host
Execution
Switchover
Time of 1 sec
- 291. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 291
Memory Copy
Source VM Memory
Destination VM Memory
Phase 0:
Copy the VM’s 40GB of memory, trace pages. As we send that memory, the VM dirties 10GB
Iterative Memory Pre-Copy
- 292. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 292
Memory Copy
Source VM Memory
Destination VM Memory
Phase 0:
Copy the VM’s 40GB of memory, trace pages. As we send that memory, the VM dirties 10GB
Iterative Memory Pre-Copy
- 293. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 293
Memory Copy
Source VM Memory
Destination VM Memory
Phase 0:
Copy the VM’s 40GB of memory, trace pages. As we send that memory, the VM dirties 10GB
Phase 1:
Retransmit the dirtied 10GB. In the process, the VM dirties another 3GB
Iterative Memory Pre-Copy
- 294. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 294
Memory Copy
Source VM Memory
Destination VM Memory
Phase 0:
Copy the VM’s 40GB of memory, trace pages. As we send that memory, the VM dirties 10GB
Phase 1:
Retransmit the dirtied 10GB. In the process, the VM dirties another 3GB
Phase 2:
Send the 3GB. While that transfer is happening, the VM dirties 1GB
Iterative Memory Pre-Copy
- 295. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 295
Memory Copy
Source VM Memory
Destination VM Memory
Phase 0:
Copy the VM’s 40GB of memory, trace pages. As we send that memory, the VM dirties 10GB
Phase 1:
Retransmit the dirtied 10GB. In the process, the VM dirties another 3GB
Phase 2:
Send the 3GB. While that transfer is happening, the VM dirties 1GB
Phase 3:
Send the remaining 1GB
Iterative Memory Pre-Copy
- 296. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 296
vMotion of Oracle RAC
It’s been working for a while …
- 297. 297
Confidential │ ©2018 VMware, Inc.
pre 6.5*
Trace Cost
LP remap
Prealloced memory
RDTSC cost
(SDPS)
Common Issues for Monster VMs
- 300. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 300
How to troubleshoot any issue
No matter how complicated
- 301. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 301
1. Identify a related system or component
that your team is not responsible for
How to troubleshoot any issue
No matter how complicated
- 302. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 302
1. Identify a related system or component
that your team is not responsible for
2. Hypothesize that the issue is with that component
How to troubleshoot any issue
No matter how complicated
- 303. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 303
1. Identify a related system or component
that your team is not responsible for
2. Hypothesize that the issue is with that component
3. Assign the issue to the responsible team
How to troubleshoot any issue
No matter how complicated
- 304. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 304
1. Identify a related system or component
that your team is not responsible for
2. Hypothesize that the issue is with that component
3. Assign the issue to the responsible team
4. When proven wrong, go to 1.
How to troubleshoot any issue
No matter how complicated
- 305. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 305
Tuning guide for a completely different system
Some advanced option found on a blog
Vaguely fitting KB
etc.
Perfectly valid methods to “troubleshoot” or “tune”
/s
- 306. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 306
The biggest enemy
"XY Problem"
1. I have problem X
1. I have problem Y
Y
- 307. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 307
The biggest enemy
"XY Problem"
1. I have problem X
1. I have problem Y
2. Help me solve problem Y
Y
- 308. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 308
The biggest enemy
"XY Problem"
1. I have problem X
1. I have problem Y
2. Help me solve problem Y
3. Hey! I still have a problem
Y
?
- 309. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 309
The biggest enemy
"XY Problem"
1. I have problem X
2. I think it is because of Y
3. I have problem Y
4. Help me solve problem Y
5. Hey! I still have a problem
Y
?
- 310. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 310
The biggest enemy
"XY Problem"
1. I have problem X
2. I think it is because of Y
3. I have problem Y
4. Help me solve problem Y
5. Hey! I still have a problem
X
Y
?
- 311. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 311
The biggest enemy
"XY Problem"
1. I have problem X
2. I think it is because of Y
3. I have problem Y
4. Help me solve problem Y
5. Hey! I still have a problem
tl;dr
don’t jump to conclusions
X
Y
?
!
- 313. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 313
Where to use caution
Believing anybody
“Trust, but verify.“*
- 314. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 314
Where to use caution
Believing anybody
* From the Russian proverb:
"Доверяй, но проверяй"
{Doveryai, no proveryai}
“Trust, but verify.“*
- 315. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 315
Where to use caution
Comparing hosts, past and present, etc.
!=
- 316. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 316
Don’t assume newer == better
Where to use caution
Comparing hosts, past and present, etc.
!=
- 317. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 317
Don’t assume newer == better
Identify all differences
Where to use caution
Comparing hosts, past and present, etc.
!=
- 318. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 318
Where to use caution
Relying on Traffic Light Dashboards alone
- 319. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 319
All metrics green?
Where to use caution
Relying on Traffic Light Dashboards alone
- 320. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 320
All metrics green?
→ All good then! (false negative)
Where to use caution
Relying on Traffic Light Dashboards alone
- 321. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 321
All metrics green?
→ All good then! (false negative)
Some metrics red?
Where to use caution
Relying on Traffic Light Dashboards alone
- 322. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 322
All metrics green?
→ All good then! (false negative)
Some metrics red?
→ Something must be broken! (false positive)
Where to use caution
Relying on Traffic Light Dashboards alone
- 323. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 323
Where to use caution
Working through a list of known issues
Very good to start with!
• Don’t spend more than half and hour
- 324. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 324
Where to use caution
Working through a list of known issues
Very good to start with!
• Don’t spend more than half and hour
Can be from different perspectives
• Application
• Resources, e.g.:
– CPU contention
– Memory pressure
– Disk latency
– Etc.
- 325. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 325
Apply different methodologies as needed
e.g. directionally
- 326. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 326
Apply different methodologies as needed
e.g. directionally
Top → Down: drill down from the application / its metrics
• app specific / difficult to "profile" the whole path
- 327. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 327
Apply different methodologies as needed
e.g. directionally
Top → Down: drill down from the application / its metrics
• app specific / difficult to "profile" the whole path
Bottom → Up: investigate from the resource point of view
• easy to run into false positives / not all resources evenly covered
- 328. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 328
Apply different methodologies as needed
e.g. directionally
Top → Down: drill down from the application / its metrics
• app specific / difficult to "profile" the whole path
Bottom → Up: investigate from the resource point of view
• easy to run into false positives / not all resources evenly covered
Recommendation: Bottom Up Checklist first
- 329. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 329
What makes you think there is a performance issue
Has it ever performed well
What has changed since
Can it be quantified
What else is affected
What is the timing
Is it reproducible
etc.
Ask questions
Good ones, preferably
- 331. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 331
Take notes along the way
seriously
"Remember kids, the
only difference between
science and screwing
around is writing it
down."
- 332. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 332
Provide an exact timeline
Part of notetaking but often forgotten
2017-11-28
23:00 UTC
Upgrade
2017-11-29
07:00 UTC
Issue first
noticed
2017-11-29
> 23:59 UTC
Tried
everything
under the sun
and wrote
down nothing
2017-11-30
08:00
Called
GSS
- 333. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 333
Be accurate and universal
https://xkcd.com/1179/
- 334. 334
DOAG 2020 │ ©2020 VMware, Inc.
SR examples
“The case of the unexplained …”
- 335. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 335
Initial SR description:
• Oracle DB on virtual 64bit W2K8 three times slower than physical
• on 32bit W2K8 and 32/64bit RHEL5, only 5% slower than physical
• benchmarked with production equivalent test script
Example 1 – Oracle DB performance
Tales from GSS
- 336. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 336
Initial SR description:
• Oracle DB on virtual 64bit W2K8 three times slower than physical
• on 32bit W2K8 and 32/64bit RHEL5, only 5% slower than physical
• benchmarked with production equivalent test script
Troubleshooting in support:
• checked logs for errors
• basics like power management, limits, etc
• research if similar issues have been reported
Example 1 – Oracle DB performance
Tales from GSS
- 337. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 337
Example 1 – Oracle DB performance
Tales from GSS
- 338. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 338
Reproducing in-house:
Example 1 – Oracle DB performance
Tales from GSS
- 339. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 339
Reproducing in-house:
• the customer provided two pre-configured VMs
Example 1 – Oracle DB performance
Tales from GSS
- 340. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 340
Reproducing in-house:
• the customer provided two pre-configured VMs
• during initial run, the 64bit VM performed worse by a factor of 3
Example 1 – Oracle DB performance
Tales from GSS
- 341. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 341
Reproducing in-house:
• the customer provided two pre-configured VMs
• during initial run, the 64bit VM performed worse by a factor of 3
• automated benchmark start and result collection, dropped to 1.6 on avg.
Example 1 – Oracle DB performance
Tales from GSS
- 342. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 342
Example 1 – Oracle DB performance
Tales from GSS
- 343. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 343
Example 1 – Oracle DB performance
Tales from GSS
- 344. DOAG 2020 NOON2NOON │ ©2020 VMware, Inc. 344
Murphy's law strikes:
Example 1 – Oracle DB performance
Tales from GSS