XPDS14: MirageOS 2.0: branch consistency for Xen Stub Domains - Anil Madhavapeddy, Univeristy of Cambridge


Published on

Published in: Technology
No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

XPDS14: MirageOS 2.0: branch consistency for Xen Stub Domains - Anil Madhavapeddy, Univeristy of Cambridge

  1. 1. MIRAGEOS 2.0: BRANCH CONSISTENCY FOR XEN STUB DOMAINS Dave Scott Citrix Systems @mugofsoup @eriangazag @avsm Thomas Gazagnaire University of Cambridge Anil Madhavapeddy University of Cambridge http://openmirage.org http://decks.openmirage.org/xendevsummit14/ Press <esc> to view the slide index, and the <arrow> keys to navigate.
  2. 2. INTRODUCING MIRAGE OS 2.0 These slides were written using Mirage on OSX: They are hosted in a 938kB Xen unikernel written in statically type-safe OCaml, including device drivers and network stack. Their application logic is just a couple of source files, written independently of any OS dependencies. Running on an ARM CubieBoard2, and hosted on the cloud. Binaries small enough to track the entire deployment in Git!
  4. 4. NEW FEATURES IN 2.0 Mirage OS 2.0 is an important step forward, supporting more, and more diverse, backends with much greater modularity. For information about the new components we cannot cover here, see openmirage.org: Xen/ARM Irmin OCaml-TLS Vchan Ctypes , for running unikernels on embedded devices . , Git-like distributed branchable storage. , a from-scratch native OCaml TLS stack. , for low-latency inter-VM communication. , modular C foreign function bindings.
  5. 5. THIS XEN DEV SUMMIT TALK We focus on how we have been using Mirage to: improve the core Xenstore toolstack using Irmin. a performance and distribution future for Xenstore. plans for upstreaming our patches. But first, some background...
  6. 6. IRMIN: MIRAGE 2.0 STORAGE Irmin is our library database that follows the modular design principles of MirageOS: https://github.com/mirage/irmin Runs in both userspace and kernelspace A key = value store (sound familiar?) Git-style: commit, branch, merge Preserves history by default Backend support for in-memory, Git and HTTP/REST stores. Mirage unikernels thus version control all their data, and have a distributed provenance graph of all activities.
  7. 7. BASE CONCEPTS OBJECT DAG (OR THE "BLOB STORE") Append-only and easily distributed. Provides stable serialisation of structured values. Backend independent storage memory or on-disk persistence encryption or plaintext Position and architecture independent pointers such as via SHA1 checksum of blocks.
  8. 8. BASE CONCEPTS HISTORY DAG (OR THE "GIT STORE") Append-only and easily distributed. Can be stored in the Object DAG store. Keeps track of history. Ordered audit log of all operations. Useful for merge (3-way merge is easier than 2-way) Snapshots and reverting operations for free.
  10. 10. IRMIN TOOLING opam update && opam install irmin Command-line frontend that uses: storage: in-memory format or Git network: custom format, Git or HTTP/REST interface: JSON interface for storing content easily OCaml library that supplies: merge-friendly data structures backend implementations (Git, HTTP/REST)
  11. 11. XENSTORE: VM METADATA Xenstore is our configuration database that stores VM metadata in directories (ala Plan 9). Runs in either userspace or kernelspace (just like Mirage) A key = value store (just like Irmin) Logs history by default (just like Irmin...)
  12. 12. XENSTORE: VM METADATA Xenstore is our configuration database that stores VM metadata in directories (ala Plan 9). Runs in either userspace or kernelspace (just like Mirage) A key = value store (just like Irmin) Logs history by default (just like Irmin...) TRANSACTION_START branch; TRANSACTION_END merge The "original plan" in 2002 was for seamless distribution across hosts/clusters/clouds. What happened? Unfortunately the previous transaction implementations all suck.
  13. 13. XENSTORE: CONFLICTS Terrible performance impact: a transaction involves 100 RPCs to set it up (one per r/w op), only to be aborted and retried. Longer lived transactions have a greater chance of conflict vs a shorter transaction, repeating the longer transaction. Concurrent transactions can lead to live-lock: Try starting lots of VMs in parallel! Much time wasted removing transactions (from xend )
  14. 14. XENSTORE: CONFLICTS Conflicts between Xenstore transactions are so devastating, we try hard to avoid transactions altogether. However they aren't going away.
  15. 15. XENSTORE: CONFLICTS Observe: typical Xenstore transactions (eg creating domains) shouldn't conflict. It's a flawed merging algorithm. If we were managing domain configurations in git , we would simply merge or rebase and it would work. Therefore the Irmin Xenstore simply does: DB.View.merge_path ~origin db [] transaction >>= function | `Ok () -> return true | `Conflict msg -> (* if merge doesn't work, try rebase *) DB.View.rebase_path ~origin db [] transaction >>= function | `Ok () -> return true | `Conflict msg -> (* A true conflict: tell the client *) ...
  17. 17. XENSTORE: TRANSACTIONS Big transactions give you high-level intent useful for debug and tracing minimise merge commits (1 per transaction) minimise backend I/O (1 op per commit) crash during transaction can tell the client to "abort retry" Solving the performance problems with big transactions in previous implementations greatly improves the overall health of Xenstore.
  18. 18. XENSTORE: RELIABILITY What happens if Xenstore crashes? Rings full of partially read/written packets. No reconnection protocol in common use. proposal on xen-devel but years before we can rely on it Per-connection state in Xenstore: watch registrations, pending watch events If Xenstore is restarted, many of the rings will be broken ... you'll probably have to reboot the host
  19. 19. XENSTORE: RELIABILITY Irmin to the rescue! Data structure libraries built on top of Irmin, for example mergeable queues. Use these for (eg) pending watch events. We can persist partially read/written packets so fragments can be recovered over restart We can persist connection information (i.e. ring information from an Introduce) and auto-reconnect on start Added bonus: easy to introspect state via xenstore-ls , can see each registered watch, queue etc
  20. 20. XENSTORE: TRACING When a bug is reported normal procedure is: stare at Xenstore logs for a very long time slowly deduce the state at the time the bug manifested (swearing and cursing is strictly optional) With Irmin+Xenstore, one can simply: git checkout to the revision Inspect the state with ls In the future: git bisect automation!
  21. 21. XENSTORE: TRACING $ git log --oneline --graph --decorate --all ... | | * | 1787fd2 Domain 0: merging transaction 394 | | |/ | * | 0d1521c Domain 0: merging transaction 395 | |/ * | 731356e Domain 0: merging transaction 396 |/ * 8795514 Domain 0: merging transaction 365 * 74f35b5 Domain 0: merging transaction 364 * acdd503 Domain 0: merging transaction 363
  22. 22. XENSTORE: DATA STORAGE Xenstore contains VM metadata ( /vm ) and domain metadata ( /local/domain ) But VM metadata is duplicated elsewhere and copied in/out xl config files, and xapi database (insert cloud toolstack here) With current daemons, it is unwise to persist large data. What if Xenstore could store and distribute this data efficiently, and if application data could be persisted reliably?
  23. 23. XENSTORE: THE DATA Irmin to the rescue! Check in VM metadata to Irmin clone , pull and push to move between hosts expose to host via FUSE, for Plan9 filesystem goodness maybe one day even echo start > VM/uuid/ctl FUSE code at https://github.com/dsheets/profuse VM data could be checked in to Irmin very important for unikernels that have no native storage
  24. 24. XENSTORE: UPSTREAMING Advanced prototype exists using Mirage libraries, but doesn't fully pass unit test suite. Before upstreaming: Write fixed-size backend for block device Preserving history is a good default, but history does need to be squashed from time to time. Upstream patches: switch to using using opam to build Xenstore reproducible builds via a custom Xen remote allows using modern OCaml libraries (Lwt, Mirage, etc...) In Xapi, delete existing db and replace with Xenstore 2.0
  25. 25. XENSTORE: CODE Prototype+unit tests at: (can build without Xen on MacOS X now) https://github.com/mirage/ocaml-xenstore-server opam init --comp=4.01.0 eval `opam config env` opam pin irmin git://github.com/mirage/irmin opam install xenstore irmin shared-memory-ring xen-evtchn io-page git clone git://github.com/mirage/ocaml-xenstore-server cd ocaml-xenstore-server make ./main.native --enable-unix --path /tmp/test-socket --database /tmp/db& ./cli.native -path /tmp/test-socket write foo=bar ./cli.native -path /tmp/test-socket write read foo cd /tmp/db; git log
  26. 26. HTTP://OPENMIRAGE.ORG/ Featuring blog posts about Mirage OS 2.0 by: Amir Chaudhry , Thomas Gazagnaire , David Kaloper , Thomas Leonard , Jon Ludlam , Hannes Mehnert , Mindy Preston , Dave Scott , and Jeremy Yallop . Mindy Preston and Jyotsna Prakash from OPW/GSoC will also be talking about their projects in the community panel! More Irmin+Xenstore posts with details: Introduction to Irmin Using Irmin to add fault-tolerance to Xenstore