Optimizing KVM virtual
machines
for performance
Boyan Krosnov
Chief of Product
StorPool Storage
● Better application performance -- e.g. time to load a page, time to rebuild,
time to execute specific query
● Happier customers (in cloud / multi-tenant environments)
● Lower cost per delivered resource (per VM)
○ through higher density
Why optimize
● Compute - CPU and memory
● Storage
● Network
● Do you optimize for throughput or for latency?
Where:
● VM, guest OS, drivers, etc.
● Host OS and hypervisor
● Host hardware
● Network
● Storage system
What
Typically 2x 10GE per hypervisor for storage traffic
Same or separate 2x 10GE for internet and inter-VM traffic
Typical cluster has just 2 switches. Up to 96x 10GE ports at low cost.
We are starting to see 40/56 GigE clusters and we expect many 25 GigE networks
in the next year.
VLANs, Jumbo frames, flow control.
RDMA
A word on networks
Typically
- 2x E5-2697v3 -- 28 cores, 56 threads, @3.1GHz all-cores turbo
- 256-384-512 GB RAM
- 10/40 GigE NICs with RDMA
-
- firmware versions and BIOS settings matter
- Understand power management -- esp. C-states and P-states
- Think of rack level optimization - how do we get the lowest total cost per
delivered resource.
A word on host hardware
RHEL7 Virtualization_Tuning_and_Optimization_Guide
Also
https://pve.proxmox.com/wiki/Performance_Tweaks
http://events.linuxfoundation.org/sites/events/files/slides/CloudOpen2013_Khoa_Huynh_v3.pdf
http://www.linux-kvm.org/images/f/f9/2012-forum-virtio-blk-performance-improvement.pdf
http://www.slideshare.net/janghoonsim/kvm-performance-optimization-for-ubuntu
… but don’t trust everything you read. Perform your own benchmarking!
Good references
Recent Linux kernel, KVM and QEMU
… but beware of the bleeding edge
E.g. qemu-kvm-ev from RHEV (repackaged by CentOS)
tuned-adm virtual-host
tuned-adm virtual-guest
Host OS, guest OS
● Use virtio-net driver
● regular virtio vs vhost_net
● SR-IOV (PCIe pass-through)
Networking
● cache=none -- direct IO, bypass host buffer cache
● io=native -- use Linux Native AIO, not POSIX AIO (threads)
●
● virtio-blk -> dataplane
● virtio-scsi -> multiqueue
●
● in guest virtio_blk.queue_depth 128 -> 256
Block I/O
- balloon
- KSM (RAM dedup)
- huge pages, THP
- NUMA
- use local-node memory if you can
- route IRQs of network and storage adapters to a core on the node they are on
Compute - Memory
Pinning
HT
NUMA
Compute - CPU
Demo
Boyan Krosnov
b k @ storool.com
@bkrosnov
https://storpool.com/

Optimization_of_Virtual_Machines_for_High_Performance