Xen Memory Management


Published on

Basic overview of Xen Hypervisor Memory Management

Published in: Technology
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Xen Memory Management

  1. 1. Vancouver, February 2009 Memory management in (x86) Xen Tim Deegan
  2. 2. Xen’s memory services • Memory management • Allocating memory to guests, scrubbing free memory • Tracking memory usage with reference counts and types  Heap allocators and the frametable. • Virtual memory • Protecting guests from each other • Enforcing typing rules, e.g. read-only areas • Providing translation services between address spaces  MMU hypercalls, shadow pagetables, hardware-assisted paging © 2008 Citrix Systems, Inc. — All rights reserved 2
  3. 3. Terminology • Virtual address/Physical address/Machine address • Frame vs. Page • PFN: physical frame number • Guest’s abstraction for tracking/allocating RAM • Usually fairly contiguous • GFN: guest frame number • Guest’s idea of what hardware addresses are • Used in guest pagetables • MFN: machine frame number • Actual hardware addresses © 2008 Citrix Systems, Inc. — All rights reserved 3
  4. 4. Basic memory management • Buddy allocator hands out frames • Each guest has a max number of frames • Frame-table records for each frame: • Owner, if any • Linked list of other frames owned by this guest • Reference count (must be zero to free the frame) • Type, and a refcount for the type (must be zero to change type) • TLB-flush-avoidance timestamp © 2008 Citrix Systems, Inc. — All rights reserved 4
  5. 5. PV pagetables, a.k.a. direct paging • PFN  MFN table managed by the guest • Shared MFN  PFN table provided by Xen • GFN == MFN, so pagetables can be used directly by the hardware • Xen checks the contents of the guest pagetables before allowing the hardware to see them. © 2008 Citrix Systems, Inc. — All rights reserved 5
  6. 6. Enforcing isolation • Guest pagetables must have a pagetable type • Xen checks that page contents obey the typing rules before allowing them to take on PT type • Typing rules: • No mapping other guests’ frames • No read-write mappings of frames with PT type • Modifying an already-typed PT needs a call to Xen to check the modification obeys the rules. (Or trap-and-emulate assistance from Xen.) © 2008 Citrix Systems, Inc. — All rights reserved 6
  7. 7. Grant Tables • Guest-supplied ACLs allowing other guests to map their frames • Mapper makes a hypercall with a domid, an opaque index, and the address of a PTE • Xen checks that entry in the mappee’s grant table and if it’s OK, modifies the PTE • Needs explicit unmap hypercall when finished • Also available: grant-copy, where Xen memcpy()s from/to a granted frame instead of mapping it. © 2008 Citrix Systems, Inc. — All rights reserved 7
  8. 8. HVM pagetables • PFN  MFN table managed by Xen • GFN == PFN so need another layer of translation • Guest won’t cooperate in enforcing access control • Two options: • Xen builds shadow copies of guest pagetables with the extra translations and controls added; or • Hardware support for using a second set of pagetables containing extra translations and controls © 2008 Citrix Systems, Inc. — All rights reserved 8
  9. 9. Shadow pagetables • Keep Xen-maintained copies of guest frames that we think are being used as pagetables • Guest never sees the shadows so we can add any translations and restrictions we like • 13 different kinds of shadows depending on what kind of pagetable we think it is: a single frame can have up to 10 shadows at once • Also have three kinds of shadows for faking out superpages (2MB of contiguous PFNs does not mean 2MB of contiguous MFNs) © 2008 Citrix Systems, Inc. — All rights reserved 9
  10. 10. Shadow pagetables: building • Start with an empty top-level shadow of the PFN in CR3 • On pagefault, shadow the entries in the PT walk, making new shadows at each level if necessary. • Each shadow entry is the guest entry with the GFN replaces by an MFN (of the next-level shadow or of guest memory) and extra access restrictions: • Pages that have shadows are mapped read-only. • Extra restrictions can be specified in the PFN  MFN table. • We can restrict write access to guest’s frames for tracking page- dirtying during live migration. © 2008 Citrix Systems, Inc. — All rights reserved 10
  11. 11. Shadow pagetables: maintenance • Shadowed pages are always kept read-only. • When the guest writes to a shadowed frame, Xen’s pagefault handler must: • Emulate the current instruction to figure out what’s being written; • Write the new value into the guest pagetable; and • Update the equivalent parts of all shadows of the frame. © 2008 Citrix Systems, Inc. — All rights reserved 11
  12. 12. Shadow pagetables: tearing back down • Shadowing a frame is expensive • Thousands of cycles for trap and emulation of every write. • Easy to tell when a page becomes a PT; harder to tell when it stops: • Reference count based on higher-level shadows and CR3 contents, but hard to know when a PFN’s been used in CR3 for the last time • Guess based on odd-looking page contents • Guess based on memory access patterns • Get PV drivers to give us hints • Recycle under memory pressure by approximating LRU © 2008 Citrix Systems, Inc. — All rights reserved 12
  13. 13. Optimizations • Tagged TLBs (AMD’s ASID; Intel’s VPID) allow us to avoid a TLB flush on every VMEXIT/VMENTER • In theory can do even better now that Win2k8 supports context switching without TLB flushing. • Shadowing not-present entries with invalid entries lets us fast-track “real” pagefaults back to the guest • Out-of-sync shadows: let the guest write directly to the lowest level of pagetables and sync up the shadows whenever a hardware TLB would re-read (TLB flush, page faults, higher-level writes) © 2008 Citrix Systems, Inc. — All rights reserved 13
  14. 14. Hardware-assisted paging • Xen supplies a second set of pagetables describing the PFN  MFN translation and extra restrictions • CPU takes a pointer to this as well as a (PFN- space) CR3 value from the guest • MMU hardware applies the composition of the two translations and the intersection of the access rights © 2008 Citrix Systems, Inc. — All rights reserved 14
  15. 15. Hardware-assisted paging: performance Avoid expensive trap + emulate on writes to PTs, and extra logic on pagefault path TLB fill can now take 20 memory accesses! CPU’s TLB is much smaller than the set of shadows we can maintain • AMD’s RVI gives +10% performance over shadows on some workloads, -10% on others; Intel’s EPT seems more consistently better than shadowing • Performance depends heavily on using superpage mappings in the second pagetable © 2008 Citrix Systems, Inc. — All rights reserved 15
  16. 16. Fin © 2008 Citrix Systems, Inc. — All rights reserved 16