This document provides an agenda and overview of topics relating to Intel processors and PowerShell scripts. The agenda includes discussions of NUMA, cluster-on-die, hyper-threading, CPU oversubscription, Intel processor models including Haswell and Broadwell, and PowerShell health check and documentation scripts. Key points covered are how NUMA affects memory access times, benefits of Intel's tick-tock development cycle, and recommendations around hyper-threading and avoiding CPU oversubscription until production workloads can be analyzed. Resources for health check scripts from Sacha Thomet and Carl Webster are also referenced.
3. Intel Processors
◦NUMA - Why is it SO important?
◦Cluster-On-Die – The Snoop Mode or Snoop Dogg.
◦To Hyper-thread or not - that is the question.
◦Oversubscription of vCPU to pCPU – you better
pause and think about it before your hosts do!
4. NUMA
◦NUMA is Non-Uniform Memory Access
◦It was first introduced in 2007 (circa) with the 1st
generation Core i-series Nahalem processors.
5. Non-Uniform? Is that like casual dress
Friday for RAM?
◦Memory access times are NOT uniform and depend
on the location of the memory and the node from
which it is accessed.
7. Intel Haswell and now Broadwell
Processors
◦Nahalem & Sandy Bridge processors were a great
leap forward.
◦Ivy Bridge, still in many systems today.
◦The Haswell and now Broadwell families are
outstanding technology when building a High
Performance Computing (HPC) platform.
10. Snoop Mode Performance
Taken from two White Papers:
1) FUJITSU Server PRIMERGY Memory Performance of Xeon E5-2600 v4 (BroadwellEP) based Systems
2) FUJITSU Server PRIMERGY Memory Performance of Xeon E5-2600 v3 (HaswellEP) based Systems
14. Hyper-threading
◦ It was designed to increase parallelism in a compute environment
that is I/O-bound (non-CPU intensive).
◦ When enabled hyper-threading presents twice the number of
logical cores to the Operating System.
◦ Allows for the parallel execution of multiple threads on the same
physical core.
◦ But each physical core contains only a single execution resource.
◦ So the two threads scheduled on the same physical core are
effectively sharing the execution resource and clock cycles.
◦ Not only this, but the parallel executions run in lockstep.
15. Hyper-threading – Continued…
◦One logical core represents the physical core and the
other represents its hyper-threaded twin. This twin runs
at approximately 30% of the performance of the physical
one.
◦The CPU Scheduler of any modern Operating System
(and Hypervisor) is hyper-threading aware.
◦Application Vendors can query the Operating System to
return the real physical cores and ensure their threads
are prioritised to these where possible.
18. Over-subscription
◦Having multiple VMs that, when combined, exceed
the number of physical cores means that you’re
overcommitting the CPU resources.
◦Citrix say that the CPU over-subscription sweet spot
is likely somewhere in between 1.5 and 2x.
◦I believe you should be starting at no more than 1.5.
19. Over-subscription – Continued…
I much prefer to follow a formula from Andy Morgan.
◦ Each physical core = 1
◦ Each HT core = 0.25
◦ Reserve at least one core for hypervisor
◦ Don't overcommit until you're running your production workload and can view impact. The
minute you introduce overcommit, you're opening yourself up to periods of instability or
no guarantee that workloads will not affect one another.
((number of physical Cores x 1) + (hyper-threaded cores x 0.25) - 1) / vCPUs
So on a 12 core/socket system with HT for a XenApp workload with 6 CPUs:
((12 x 1) + (12 x .25) - 1) / 6
= 2 – 3 XenApp hosts per socket at the most
20. Summary
◦Understand your workloads and usage patterns.
◦Apply the appropriate CPU architecture and
features:
◦NUMA
◦COD
◦Hyper-threading
21. PowerShell Health Check &
Documentation Scripts
◦Carl Webster’s documentation scripts
◦Sacha Thomet’s health check scripts
◦My health check scripts