DevConf-2018
Gluster storage is integrated with oVirt as file based storage using FUSE, enabling all oVirt features with very little special code. However FUSE is not the most efficient and scalable way to access Gluster storage, resulting in poor virtual machine performance. With newly added native Gluster support a VM can access gluster storage directly in the most efficient way. Decreased storage access latency results in a better IOPS, thus making storage more responsive and improve the VMs performance. Participants will be able to learn more on how file system access works for VMs, review the reason for potential performance issues in hyperconverged setups, and how to improve it.
5 Reasons Driving Warehouse Management Systems Demand
Improving hyperconverged performance
1. This presentation is licensed under a Creative Commons Attribution 4.0 International License
Improving Hyperconverged Performance
Denis Chaplygin
Senior Software Engineer
Jan 2018
3. ● Separate storage
○ Stores VM images
○ Needs to be shared between compute hosts
○ Data availability provided by storage
○ Storage access may not be redundant
● Separate compute hosts
○ Host agent (VDSM) manages VMs, storages and networks
● Engine host is just a VM...
○ ...and Hosted Engine makes it highly available
3
oVirt overview cont’d
5. ● GlusterFS is a general purpose, scale-out, distributed
file-system supporting thousands of clients
● Aggregates storage exports over network connection
to provide a single unified namespace
● File-system completely in userspace, runs on
commodity hardware
● Gluster cluster is a collection of storage servers
5
Gluster overview cont’d
6. Two Plus Two Equals Five
● oVirt + Self Hosted Engine + GlusterFS
● Gluster volumes are oVirt storage domains
● Same nodes used to
○ Host the engine
○ Run payload VMs
○ Provide shared storage
● And now, storage (thanks to Gluster), is highly available and redundant
6
Hyperconverged - Integration of oVirt and Gluster.
10. ● VMs disk images are stored on a shared storage, either block-
based or file-based
● VDSM mounts storage domains on the each host
● Storage domain is a special on-disk data structure, containing
some metadata alongside VM data
● In the case of the filesystem-based storage, VMs are configured to
use files in that directory as their drive images
10
oVirt VM disk image store cont’d
11. VM typical FOP Flow
11
VM
QEMU
VFS FUSE
GLUSTER
CLIENT
GLUSTER
VOLUME
Host Gluster server
User
space
Kernel
space
13. ● libgfapi is a userspace library for accessing data in
glusterfs
● No FUSE mount required
● Speed and latency have improved due to less
overhead
● In the post-Meltdown world, context switches are very
expensive
13
LibGfApi overview
14. VM disk access path with libgfapi
14
VM
QEMU
VFS FUSE
GLUSTER
CLIENT
GLUSTER
VOLUME
Host Gluster Server
User
Space
Kernel
Space
15. ● QEMU has a GlusterFS block driver that uses libgfapi
● FUSE overhead no longer exists when QEMU works with VM
images on gluster volumes
● gluster[+transport]://[server[:port]]/volname/image[?socket=...]
● Unfortunately, libgfapi support is a little bit limited:
○ Multiple servers can not be specified
○ Migrations between network and non-network drives are not
yet possible
15
LibGfApi QEMU integration
16. ● Concept of a disk type - we can’t use a binary
(file/block) logic anymore
● Special handling of ‘network’ disk types during VM
creation
● Supports changing disk type on the fly during storage
migrations
● Support for other operations, which earlier required
actual presence of a file
16
LibGfApi VDSM support
17. ● Libgfapi support should be switchable at the engine or
cluster level
● libgfapi support is only available in newer VDSM
○ On older version of VDSM, the engine has to detect
and disable the libgfapi feature
● Gluster support during initial VM creation
17
LibGfApi engine support
18. ● Supported on oVirt 4.2 or oVirt 4.1, starting from v4.1.6
● VM restart is required
18
Enabling libgfapi
root# engine-config -s LibgfApiSupported=true --cver=4.2
22. Same scenario as for IOPS
● Just 2% increase of
bandwidth on the single brick
volume
● Huge 22% increase of
bandwidth on the replica 3
volume
22
Bandwidth
23. ● MySQL database running DVD store simulator test
suite inside a VM
● Compared an average number of transactions per
minute, with and without libgfapi enabled
23
Realistic workload
24. ● Under low-to-moderate load (10-20 simultaneous
clients), increase of transactions per minute with
libgfapi enabled is about 11%
● Under higher load (80 simultaneous clients),
increase of transactions per minute with libgfapi
enabled is about 24%
24
Realistic workload - Results
26. 26
Summary
● Combining two projects can give you more than just their sum
● Treating gluster as a typical network filesystem, as NFS for
example, introduces some overhead and disadvantages
○ Fortunately, gluster has a special, userspace-only, library for
direct file access
● Removing FUSE overhead gives you up to a 24% performance
boost under database load for free
27. This presentation is licensed under a Creative Commons Attribution 4.0 International License
THANK YOU
http://www.ovirt.org
dchaplyg@redhat.com