Published on

OLS presentation on KVM Virtfs 9P file system passthrough.


  1. 1. IBM Linux Technology Center VirtFS A virtualization aware File System pass-through Venkateswararao Jujjuri (JV) Linux Symposium 2010 © 2010 IBM Corporation
  2. 2. IBM Linux Technology Center Paravirtual Applications and System Services  Move up the virtualization intelligence into system services.  Being explored by research and academic communities but largely ignored by the mainline.  Provides hybrid environment leveraging the security, isolation, and performance.  Visibility into guest operations allow hypervisor to offer variety of use cases.  Desktops, network sharing, file systems  Avoids a layer of indirection and boosts performance.  Adding this to the existing device virtualization takes the virtualization to next level. © 2010 IBM Corporation
  3. 3. IBM Linux Technology Center Paravirtual File Systems  Good target as an entry into paravirtual system services.  Virtual storage in the form of virtual disks offer many limitations.  Can't be shared between multiple guests.  Redundant caching  Unnecessary indirection between FS and block layer.  Using traditional distributed file systems over virtualized network device is also not a solution.  Configuration, management and encapsulation overheads.  Double caching.  Different semantics for different File Systems. © 2010 IBM Corporation
  4. 4. IBM Linux Technology Center Use cases of Paravirtual File Systems  Replace virtual disk as the root filesystem.  Rapid cloning, Easy management, secure.  Can be used to access synthetic file systems between host and guests.  Offer file system services to thin clients like LirbaryOS.  Cloud computing  Secure window of host file system on the guest.  Different portions of the same file system shared among different guests.  Knowledge about the guest activity enables hypervisor to offer services like de-dup, snapshots etc.  Better utilization of system resources. © 2010 IBM Corporation
  5. 5. IBM Linux Technology Center VirtFS  Paravirtual file system pass-through between the KVM host and guest.  Uses Plan-9 Protocol between Client and Server.  9P2000.L protocol is being developed/defined as part of this effort.  Server is part of QEMU and uses VirtIO transport.  File System is exported to the guest at the invocation of QEMU.  Client is part of the Guest Kernel.  Mounted on the guest with the mount tag defined during the QEMU invocation. © 2010 IBM Corporation
  6. 6. IBM Linux Technology Center Plan 9 Overview  Plan 9 OS is developed by AT&T Bell laboratories (Lucent Technologies).  Intention is to address Unix shortcomings  Seamless distributed system with integrated secure network resource sharing.  Three core design principles  Single set of simple, well-defined interfaces to services.  Simple protocol to securely distribute the interfaces across any network  Dynamic hierarchical structure to organize these interfaces.  Unix pioneered the concept of treating devices like files, Plan 9 took the metaphor further by using file operations as the simple well-defined interfaces to all system and application services. © 2010 IBM Corporation
  7. 7. IBM Linux Technology Center 9P Overview  9P represents the abstract interface used to access resources under Plan 9.  Any transport can be used. The only requirement is it should be a reliable, in-order transport.  Made into Linux kernel 2.6.14 and had major changes in 2.6.24.  Part of Linux mainline with VirtIO transport support.  9P2000.u extension  For POSIX adoption, during Linux port the protocol was extended with 9P2000.u version.  Provided support for numaric uid/gid, extended operations to support symlinks, links, special files etc.  Did not include full support for Linux operations. © 2010 IBM Corporation
  8. 8. IBM Linux Technology Center 9P2000.L Protocol extension  Aimed at addressing 9P2000.u protocol deficiencies while keeping the core protocol elements intact.  Placed in a separate/complimentary op-code name space.  No changes to the existing operations  Protocol version is negotiated during initial hand shake.  All the protocol extensions are optional. If a server doesn't support a particular extension, it returns an error, and a well behaved client will fall back to other extensions or to core 9P2000 operations. © 2010 IBM Corporation
  9. 9. IBM Linux Technology Center KVM and QEMU  Kernel based Virtual Machine - KVM  Is a full virtualization solution for Linux on x86 h/w containing virtualization extensions ( VT-X / AMD-V)  Set of Linux kernel modules offer a special process mode to the user spaces processes (kvm.ko, kvm- intel.ko or kvm-amd.ko)  Quick EMUlator – QEMU  Uses interfaces provided by KVM to offer full system virtualization.  Emulates standard PC hardware such as IDE disk, VGA graphics, PCI devices etc.  Any I/O requests a guest OS makes are intercepted and routed to the user mode to be emulated by the QEMU process. © 2010 IBM Corporation
  10. 10. IBM Linux Technology Center VirtIO Transport  A paravirtual IO bus based on hypervisor neutral DMA API.  Offers lockless ring queues between the guest and the host to enable zero-copy bulk data transfer.  VirtIO PCI transport allow VirtFS to be implemented in such a way that guest driven I/O Operations can be zero-copy. © 2010 IBM Corporation
  11. 11. IBM Linux Technology Center VirtFS Block Diagram Apps on Guest VirtIO Ring VFS Interface VirtFS (v9fs) Client Host User Space Guest Kernel VirtFS Server GPFS API (v9fs server in QEMU) ClusterFS VFS Interface HOST KERNEL HARDWARE © 2010 IBM Corporation
  12. 12. IBM Linux Technology Center VirtFS Implementation  KVM, QEMU, and VirtIO presents an ideal platform for the VirtFS server.  Two types of virtual devices  virtio-9p-pci, used to transport protocol messages and data between host and the guest.  Fsdev, used to define the exported file system characteristics like fs type and security model etc. © 2010 IBM Corporation
  13. 13. IBM Linux Technology Center Security  Client based security enforcement.  Server makes sure that the client control never crosses the exported portion.  Two models of security enforcement  One with complete isolation of guest user domain from that of the host.  Eliminates the need for root squashing  No setuid/setgid exposures.  Complete isolation enhances security.  Not very portable.  Other model shares user domains between host and the guests.  Follows transitional network file system model.  If not careful, it is susceptible to security holes. © 2010 IBM Corporation
  14. 14. IBM Linux Technology Center Security Model - Mapped  VirtFS server intercepts and maps all file object create and get/set attribute requests from client.  Files are created with VirtFS server's user credentials.  Client user credentials are stored in extended attributes.  Extended user attributes are allowed for regular files and directories only.  For special files, corresponding regular files are created on file server and appropriate mode bits are added to extended attributes.  This enhances security.  Guest user domain is completely isolated from host's.  Symlinks can't be followed locally on the file server.  File system will be VirtFS'ized. © 2010 IBM Corporation
  15. 15. IBM Linux Technology Center Security Model – Mapped (Cont...)  On Host (ls -l output) drwx------. 2 virfsuid virtfsgid 4096 2010-05-11 09:19 adir -rw-------. 1 virfsuid virtfsgid 0 2010-05-11 09:36 afifo -rw-------. 2 virfsuid virtfsgid 0 2010-05-11 09:19 afile -rw-------. 2 virfsuid virtfsgid 0 2010-05-11 09:19 alink -rw-------. 1 virfsuid virtfsgid 0 2010-05-11 09:57 asocket1 -rw-------. 1 virfsuid virtfsgid 0 2010-05-11 09:32 blkdev -rw-------. 1 virfsuid virtfsgid 0 2010-05-11 09:33 chardev -rw-------. 1 root root 6 2010-05-11 09:20 asymlink  On Guest (ls -l output) drwxr-xr-x 2 guestuser guestuser 4096 2010-05-11 12:19 adir prw-r--r-- 1 guestuser guestuser 0 2010-05-11 12:36 afifo -rw-r--r-- 2 guestuser guestuser 0 2010-05-11 12:19 afile -rw-r--r-- 2 guestuser guestuser 0 2010-05-11 12:19 alink srwxr-xr-x 1 guestuser guestuser 0 2010-05-11 12:57 asocket1 brw-r--r-- 1 guestuser guestuser 0, 0 2010-05-11 12:32 blkdev crw-r--r-- 1 guestuser guestuser 4, 5 2010-05-11 12:33 chardev lrwxrwxrwx 1 root root 6 2010-05-11 12:20 asymlink -> afile © 2010 IBM Corporation
  16. 16. IBM Linux Technology Center Security Model – Passthrough  All the requests are passed directly to underlying file system without any interception.  File system objects on the fileserver will be created with client-user's credentials.  Two methods to do this:  setuid/setgid during the creation.  chmod/chown immediately after creation.  All special files are created as-is.  Portable between NFS/CIFS.  Susceptible to security issues.  Client root can create files on the fileserver with root privileges if fileserver is running as root.  Symlinks can be followed locally. © 2010 IBM Corporation
  17. 17. IBM Linux Technology Center Security Model – Passthrough (Cont...)  On Host # grep 611 /etc/passwd hostuser:x:611:611::/home/hostuser:/bin/bash # ls -l -rwxrwxrwx. 2 hostuser hostuser 0 2010-05-12 18:14 file1 -rwxrwxrwx. 2 hostuser hostuser 0 2010-05-12 18:14 link1 srwxrwxr-x. 1 hostuser hostuser 0 2010-05-12 18:27 mysock lrwxrwxrwx. 1 hostuser hostuser 5 2010-05-12 18:25 symlink1 -> file1  On Guest $ grep 611 /etc/passwd guestuser:x:611:611::/home/guestuser:/bin/bash $ ls -l -rwxrwxrwx 2 guestuser guestuser 0 2010-05-12 21:14 file1 -rwxrwxrwx 2 guestuser guestuser 0 2010-05-12 21:14 link1 srwxrwxr-x 1 guestuser guestuser 0 2010-05-12 21:27 mysock lrwxrwxrwx 1 guestuser guestuser 5 2010-05-12 21:25 symlink1 ->file1 © 2010 IBM Corporation
  18. 18. IBM Linux Technology Center ACL Implementation  Access Control Lists (ACLs) allow fine grained control.  No universal standards.  Linux offers POSIX ACLs, but they are not versatile/rich enough to support NFSv4.  Rich ACL patch set for Linux is on the mailing list.  Strategy for VirtFS  Enforcement at client.  Support only one ACL model.  Start with POSIX ACLs  Help Rich ACLs to make into the mainline.  Convert to Rich ACLs once they are available on mainline. © 2010 IBM Corporation
  19. 19. IBM Linux Technology Center Where are we?  VirtFS server is in QEMU mainline.  Security model patchset had been accepted into QEMU mainline, part of QEMU 0.13  Several patches made into mainline Linux.  Fedora13 and Lucid mounts VirtFS (9P2000.U).  Making good progress on 9P2000.L. Implemented all the required VFS calls to satisfy Tuxera POSIX test suite. These patches are either on the list or already got accepted.  A patchset to generalize worker thread infrastructure in QEMU is on mainline. Working on to convert the current single thread server into multi-thread using that infrastructure.  Working on POSIX ACLs, byte range lock implementation. © 2010 IBM Corporation
  20. 20. IBM Linux Technology Center Comparison with NFS and CIFS Sequential Read Sequential Write © 2010 IBM Corporation
  21. 21. IBM Linux Technology Center Comparison with blockdev Sequential Read Sequential Write © 2010 IBM Corporation
  22. 22. IBM Linux Technology Center Next Steps  Fully Linux complaint, complete 9P2000.L protocol.  ACL implementation.  Page Cache sharing between host and guest(s)  dcache sharing between host and guests(s)  Interfacing with other filesystem APIs.  Enable consistent caching.  NFS and CIFS exportability.  Making it a rootfs for guests instead of using root volumes  Ongoing stability and scalability and performance improvements. © 2010 IBM Corporation
  23. 23. IBM Linux Technology Center Conclusions  Huge Potential for specialized filesystems in the virtualization space.  Growth in the cloud space will be a major catalyst.  Lot of scope for innovation.  A step towards paravirtual system services. © 2010 IBM Corporation
  24. 24. IBM Linux Technology Center Acknowledgments IBM Community Anthony Liguori – QEMU maintainer Eric Van Hensbergen – 9P maintainer Gim Garlick Venkateswararao Jujjuri (JV) Blue Swirl Badari Pulavarty Aneesh Kumar Sripathi Kodi Mohan Kumar Gautham R Shenoy © 2010 IBM Corporation
  25. 25. IBM Linux Technology Center Questions? © 2010 IBM Corporation