CPU Optimizations in the CERN Cloud - February 2016
1.
2. CPU optimizations in the CERN Cloud
Ops Midcycle - High Performance Computing with OpenStack - Manchester, 2016
Belmiro Moreira
belmiro.moreira@cern.ch @belmiromoreira
Arne Wiebalck
Tim Bell
Sean Crosby (Univ. of Melbourne)
Ulrich Schwickerath
6. OpenStack at CERN by numbers
6
~ 5500 Compute Nodes (~140k cores)
• ~ 5300 KVM
• ~ 200 Hyper-V
~ 2800 Images ( ~ 44 TB in use)
~ 2000 Volumes ( ~ 800 TB allocated)
~ 2200 Users
~ 2500 Projects
> 17000 VMs running
Number of VMs created (green) and VMs deleted (red) every 30 minutes
7. The “20% overhead” problem
• When running the batch system on top of the Cloud Infrastructure
we reach the limit of the total number of hosts in LSF
• On our batch full node VMs we noticed that the HS06 rating was
~20% lower than on the underlying host
• Smaller VMs behaved much better: ~8% (sum of simultaneous
HS06 runs on 4x8core VMs on a 32core host)
7
8. HS06 on virtual batch workers
8
HWDB
HS06
VM Size
(cores)
Per VM
HS06
Total HS06 Overhead
357±16
4x 8 82.3±11 329 7.8%
2x 16 150±5 300 16%
1x 32 284±11 284 20.4%
Intel(R) Xeon(R) CPU E5-2650 v2 @ 2.60GHz
9. Testing Optimizations – KSM off
9
• ATLAS T0 batch VMs show an IOwait of 20-30%
• Compute nodes started to swap even when leaving 2 GB for
the OS
10. Optimization by numbers – EPT off
10
HWDB HS06 VM Size (cores) Per VM HS06 Total HS06 Overhead
357±16
4x 8 82.3±11 329 7.8%
2x 16 150±5 300 16%
1x 32 284±11 284 20.4%
HWDB HS06
VM Size
(cores)
Per VM HS06 Total HS06 Overhead
Overhead
Reduction
357±16
4x 8 87±11 348 2.5% 68%
2x 16 163.5±1 327 8.4% 52%
1x 32 311±1 311 12.9% 37%
Before:
After:
11. General virtualization issue?
• Crosscheck w/ SLC6 VMs on Hyper-V
- 0.8% HS06 loss on 4x 8-core
- 3.3% HS06 loss on 1x 32-core SLC6 VM
• No general virtualization overhead issue!
- Rather a feature or configuration issue
• What’s the difference between the VMs on Hyper-V and KVM?
11
12. NUMA
• Hyper-V VMs have vCPUs pinned to
physical NUMA nodes
- Pinned to sets that correspond to
physical NUMA nodes
• OpenStack wider support for this is available in Kilo
12
13. NUMA - in the lab
… reduced the overhead to ~3% of the bare metal
13
14. Deploying in production
• EPT off; KSM on; NUMA-aware
• System services add ~1-2% overhead
• We got a total overhead of:
~5%
14
15. and then Extremely slow nodes...
• Small fraction of jobs 10x slower
- VMs look OK, actually pretty good
- Hosts: 30-50% system load, >100k IRQ/s
(mostly TLB shoot-downs)
• Load attributed to qemu-kvm
- ‘perf top’: 90% in _raw_spin_lock
- ‘systemtap’: paging64_page_fault
and kvm_mmu_pte* …
15
VM CPU utilization
Compute Node CPU utilization
16. Back to the drawing board
• Needed to combine optimizations with EPT on
• Huge pages a way out?
- Idea: reduce the number of pages to be handled, increase hit ratio
• 1GB huge pages
- Best HS06 results (with EPT on)
• 2MB huge pages
- Also one of the default sizes
- Performance loss around 5% compared to bare metal on batch VMs
16
17. Optimization by numbers
17
- NUMA + Pinning
- 2MB huge pages
- EPT on
- KSM on
VM sizes
(cores)
Before After
4x 8 7.8% 3.3%
2x 16 16% 4.6%
1x 32 20.4% 3-6%
19. Summary
• Reduced the virtualization HS06 overhead to a few
percent compared to bare metal
- On full node VMs!
- NUMA + pinning + huge pages + EPT on + KSM on
• Pre-deployment testing very difficult
- EPT off side-effects initially undetected
19