In this session we'll explore measuring VM performance and evaluating changes to settings or infrastructure which can affect performance positively. We'll also share the best current practice for architecture for high performance clouds from our experience.
2. 1. Measuring performance and evaluating changes to
settings or infrastructure which can affect performance
positively
2. Best current practice for architecture for high performance
clouds
Agenda
3. ● Better application performance -- e.g. time to load a page,
time to rebuild, time to execute specific query
● Happier customers (in cloud / multi-tenant environments)
● Lower cost per delivered resource (per VM)
○ through higher density
Why
4. “For every fact there is an infinity of hypotheses.”
“The real purpose of the scientific method is to make sure
nature hasn’t misled you into thinking you know something
you actually don’t know.”
Robert M. Pirsig, Zen and the Art of Motorcycle Maintenance
Mandatory inspirational quote
5. What to measure
Application performance
Proxies for application performance
Common synthetic workloads
Throughput vs latency
2000kg of X per day (cost/efficiency) vs
10 seconds to first X (what user cares about)
8. Proxies for app performance
Latency:
Break down total latency to components
Throughput:
Identify bottlenecks
Measure just the largest contributors. E.g. database insert.
9. Common synthetic workloads
Throughput:
- 4k random read, iodepth 32 or ∞
- 4k random write, iodepth 32 or ∞
- 4k random read/write 50/50, iodepth 32 or ∞
- Sequential read
- Sequential write
Word of advice: real workloads don’t look like this at all!
10. Common synthetic workloads
Latency:
- random read 4k, iodepth 1
- random write 4k, iodepth 1
Latency under load:
- Same as throughput but look at latency
13. ● Compute platform
○ Hardware selection
○ HW tuning
○ OS / hypervisor tuning
● Network
● Storage system
Best practices
14. Typically
- 2x E5-2690v4 -- 28 cores, 56 threads,
@3.2 GHz all-cores turbo
- 256-384-512 GB RAM
- 10/40 GbE NICs, optionally with RDMA
- firmware versions and BIOS settings
- Understand power management -- esp. C-states, P-states and “bias”
- Think of rack level optimization - how do we get the lowest total cost per
delivered resource.
Host hardware
16. Recent Linux kernel, KVM and QEMU
… but beware of the bleeding edge
E.g. qemu-kvm-ev from RHEV (repackaged by CentOS)
tuned-adm virtual-host
tuned-adm virtual-guest
Host OS, guest OS
17. ● Use virtio-net driver
● regular virtio vs vhost_net
● SR-IOV (PCIe pass-through)
Networking
18. ● cache=none -- direct IO, bypass host buffer cache
● io=native -- use Linux Native AIO, not POSIX AIO (threads)
● virtio-blk -> dataplane
● virtio-scsi -> multiqueue
● in guest virtio_blk.queue_depth 128 -> 256
Block I/O
19. - balloon
- KSM (RAM dedup)
- huge pages, THP
- NUMA
- use local-node memory if you can
- route IRQs of network and storage adapters to a core on the node they
are on
Compute - Memory
21. Typically 4x 10GE per hypervisor, 2 for storage, 2 for inter-VM/internet
Typical cluster has just 2 switches. Up to 128x 10GE ports at low cost.
40/56 GbE and 25G
VLANs, Jumbo frames, flow control.
RDMA
Networks
22. ● Lots of snake oil out there!
● performance numbers from hardware configurations totally unlike what you’d
use in production
● synthetic tests with high iodepth - 10 nodes, 10 workloads * iodepth 256 each.
(because why not)
● testing with ramdisk backend
● synthetic workloads don’t work to approximate real world (example)
Storage
23. Performance matters for your users.
Work with partners who understand this and help you with it.
Conclustion