Optimization of OpenNebula VMs for Higher Performance - Boyan Krosnov
Optimizing KVM virtual
Chief of Product
● Better application performance -- e.g. time to load a page, time to rebuild,
time to execute specific query
● Happier customers (in cloud / multi-tenant environments)
● Lower cost per delivered resource (per VM)
○ through higher density
● Compute - CPU and memory
● Do you optimize for throughput or for latency?
● VM, guest OS, drivers, etc.
● Host OS and hypervisor
● Host hardware
● Storage system
Typically 2x 10GE per hypervisor for storage traffic
Same or separate 2x 10GE for internet and inter-VM traffic
Typical cluster has just 2 switches. Up to 96x 10GE ports at low cost.
We are starting to see 40/56 GigE clusters and we expect many 25 GigE networks
in the next year.
VLANs, Jumbo frames, flow control.
A word on networks
- 2x E5-2697v3 -- 28 cores, 56 threads, @3.1GHz all-cores turbo
- 256-384-512 GB RAM
- 10/40 GigE NICs with RDMA
- firmware versions and BIOS settings matter
- Understand power management -- esp. C-states and P-states
- Think of rack level optimization - how do we get the lowest total cost per
A word on host hardware
… but don’t trust everything you read. Perform your own benchmarking!
Recent Linux kernel, KVM and QEMU
… but beware of the bleeding edge
E.g. qemu-kvm-ev from RHEV (repackaged by CentOS)
Host OS, guest OS
● Use virtio-net driver
● regular virtio vs vhost_net
● SR-IOV (PCIe pass-through)
● cache=none -- direct IO, bypass host buffer cache
● io=native -- use Linux Native AIO, not POSIX AIO (threads)
● virtio-blk -> dataplane
● virtio-scsi -> multiqueue
● in guest virtio_blk.queue_depth 128 -> 256
- KSM (RAM dedup)
- huge pages, THP
- use local-node memory if you can
- route IRQs of network and storage adapters to a core on the node they are on
Compute - Memory