SlideShare a Scribd company logo
Vancouver, February 2009




Memory management in (x86) Xen




 Tim Deegan
Xen’s memory services
• Memory management
  • Allocating memory to guests, scrubbing free memory
  • Tracking memory usage with reference counts and types
   Heap allocators and the frametable.
• Virtual memory
  • Protecting guests from each other
  • Enforcing typing rules, e.g. read-only areas
  • Providing translation services between address spaces
   MMU hypercalls, shadow pagetables, hardware-assisted paging




               © 2008 Citrix Systems, Inc. — All rights reserved   2
Terminology
• Virtual address/Physical address/Machine address
• Frame vs. Page
• PFN: physical frame number
  • Guest’s abstraction for tracking/allocating RAM
  • Usually fairly contiguous
• GFN: guest frame number
  • Guest’s idea of what hardware addresses are
  • Used in guest pagetables
• MFN: machine frame number
  • Actual hardware addresses
                © 2008 Citrix Systems, Inc. — All rights reserved   3
Basic memory management
• Buddy allocator hands out frames
• Each guest has a max number of frames
• Frame-table records for each frame:
  •   Owner, if any
  •   Linked list of other frames owned by this guest
  •   Reference count (must be zero to free the frame)
  •   Type, and a refcount for the type (must be zero to change type)
  •   TLB-flush-avoidance timestamp




                  © 2008 Citrix Systems, Inc. — All rights reserved     4
PV pagetables, a.k.a. direct paging
• PFN  MFN table managed by the guest
• Shared MFN  PFN table provided by Xen
• GFN == MFN, so pagetables can be used directly
 by the hardware
• Xen checks the contents of the guest pagetables
 before allowing the hardware to see them.




            © 2008 Citrix Systems, Inc. — All rights reserved   5
Enforcing isolation
• Guest pagetables must have a pagetable type
• Xen checks that page contents obey the typing
 rules before allowing them to take on PT type
• Typing rules:
  • No mapping other guests’ frames
  • No read-write mappings of frames with PT type
• Modifying an already-typed PT needs a call to Xen
 to check the modification obeys the rules.
   (Or trap-and-emulate assistance from Xen.)


               © 2008 Citrix Systems, Inc. — All rights reserved   6
Grant Tables
• Guest-supplied ACLs allowing other guests to map
 their frames
• Mapper makes a hypercall with a domid, an
 opaque index, and the address of a PTE
• Xen checks that entry in the mappee’s grant table
 and if it’s OK, modifies the PTE
• Needs explicit unmap hypercall when finished
• Also available: grant-copy, where Xen memcpy()s
 from/to a granted frame instead of mapping it.

            © 2008 Citrix Systems, Inc. — All rights reserved   7
HVM pagetables
• PFN  MFN table managed by Xen
• GFN == PFN so need another layer of translation
• Guest won’t cooperate in enforcing access control
• Two options:
   • Xen builds shadow copies of guest pagetables
    with the extra translations and controls added; or
  • Hardware support for using a second set of
    pagetables containing extra translations and
    controls
            © 2008 Citrix Systems, Inc. — All rights reserved   8
Shadow pagetables
• Keep Xen-maintained copies of guest frames that
 we think are being used as pagetables
• Guest never sees the shadows so we can add any
 translations and restrictions we like
• 13 different kinds of shadows depending on what
 kind of pagetable we think it is: a single frame can
 have up to 10 shadows at once
• Also have three kinds of shadows for faking out
 superpages (2MB of contiguous PFNs does not
 mean 2MB of contiguous MFNs)
            © 2008 Citrix Systems, Inc. — All rights reserved   9
Shadow pagetables: building
• Start with an empty top-level shadow of the PFN in
 CR3
• On pagefault, shadow the entries in the PT walk,
 making new shadows at each level if necessary.
• Each shadow entry is the guest entry with the GFN
 replaces by an MFN (of the next-level shadow or of
 guest memory) and extra access restrictions:
  • Pages that have shadows are mapped read-only.
  • Extra restrictions can be specified in the PFN  MFN table.
  • We can restrict write access to guest’s frames for tracking page-
    dirtying during live migration.

                 © 2008 Citrix Systems, Inc. — All rights reserved      10
Shadow pagetables: maintenance
• Shadowed pages are always kept read-only.
• When the guest writes to a shadowed frame, Xen’s
 pagefault handler must:
  • Emulate the current instruction to figure out what’s being written;
  • Write the new value into the guest pagetable; and
  • Update the equivalent parts of all shadows of the frame.




                 © 2008 Citrix Systems, Inc. — All rights reserved        11
Shadow pagetables: tearing back down
• Shadowing a frame is expensive
  • Thousands of cycles for trap and emulation of every write.
• Easy to tell when a page becomes a PT; harder to
 tell when it stops:
  • Reference count based on higher-level shadows and CR3 contents,
      but hard to know when a PFN’s been used in CR3 for the last time
  •   Guess based on odd-looking page contents
  •   Guess based on memory access patterns
  •   Get PV drivers to give us hints
  •   Recycle under memory pressure by approximating LRU



                  © 2008 Citrix Systems, Inc. — All rights reserved      12
Optimizations
• Tagged TLBs (AMD’s ASID; Intel’s VPID) allow us
 to avoid a TLB flush on every VMEXIT/VMENTER
  • In theory can do even better now that Win2k8 supports context
   switching without TLB flushing.

• Shadowing not-present entries with invalid entries
 lets us fast-track “real” pagefaults back to the guest
• Out-of-sync shadows: let the guest write directly to
 the lowest level of pagetables and sync up the
 shadows whenever a hardware TLB would re-read
 (TLB flush, page faults, higher-level writes)

                © 2008 Citrix Systems, Inc. — All rights reserved   13
Hardware-assisted paging
• Xen supplies a second set of pagetables describing
 the PFN  MFN translation and extra restrictions
• CPU takes a pointer to this as well as a (PFN-
 space) CR3 value from the guest
• MMU hardware applies the composition of the two
 translations and the intersection of the access
 rights




            © 2008 Citrix Systems, Inc. — All rights reserved   14
Hardware-assisted paging: performance
Avoid expensive trap + emulate on writes to PTs,
 and extra logic on pagefault path
TLB fill can now take 20 memory accesses!
CPU’s TLB is much smaller than the set of
 shadows we can maintain
• AMD’s RVI gives +10% performance over shadows
 on some workloads, -10% on others; Intel’s EPT
 seems more consistently better than shadowing
• Performance depends heavily on using superpage
 mappings in the second pagetable
            © 2008 Citrix Systems, Inc. — All rights reserved   15
Fin




© 2008 Citrix Systems, Inc. — All rights reserved    16

More Related Content

What's hot

Fosdem 18: Securing embedded Systems using Virtualization
Fosdem 18: Securing embedded Systems using VirtualizationFosdem 18: Securing embedded Systems using Virtualization
Fosdem 18: Securing embedded Systems using Virtualization
The Linux Foundation
 
Virtualization - Kernel Virtual Machine (KVM)
Virtualization - Kernel Virtual Machine (KVM)Virtualization - Kernel Virtual Machine (KVM)
Virtualization - Kernel Virtual Machine (KVM)
Wan Leung Wong
 
Redesigning Xen Memory Sharing (Grant) Mechanism
Redesigning Xen Memory Sharing (Grant) MechanismRedesigning Xen Memory Sharing (Grant) Mechanism
Redesigning Xen Memory Sharing (Grant) Mechanism
The Linux Foundation
 

What's hot (20)

SR-IOV Introduce
SR-IOV IntroduceSR-IOV Introduce
SR-IOV Introduce
 
XPDS13: Xen in OSS based In–Vehicle Infotainment Systems - Artem Mygaiev, Glo...
XPDS13: Xen in OSS based In–Vehicle Infotainment Systems - Artem Mygaiev, Glo...XPDS13: Xen in OSS based In–Vehicle Infotainment Systems - Artem Mygaiev, Glo...
XPDS13: Xen in OSS based In–Vehicle Infotainment Systems - Artem Mygaiev, Glo...
 
Kvm virtualization platform
Kvm virtualization platformKvm virtualization platform
Kvm virtualization platform
 
Xen Hypervisor
Xen HypervisorXen Hypervisor
Xen Hypervisor
 
Linux kernel architecture
Linux kernel architectureLinux kernel architecture
Linux kernel architecture
 
Project meeting: Android Graphics Architecture Overview
Project meeting: Android Graphics Architecture OverviewProject meeting: Android Graphics Architecture Overview
Project meeting: Android Graphics Architecture Overview
 
Rootlinux17: Hypervisors on ARM - Overview and Design Choices by Julien Grall...
Rootlinux17: Hypervisors on ARM - Overview and Design Choices by Julien Grall...Rootlinux17: Hypervisors on ARM - Overview and Design Choices by Julien Grall...
Rootlinux17: Hypervisors on ARM - Overview and Design Choices by Julien Grall...
 
Fosdem 18: Securing embedded Systems using Virtualization
Fosdem 18: Securing embedded Systems using VirtualizationFosdem 18: Securing embedded Systems using Virtualization
Fosdem 18: Securing embedded Systems using Virtualization
 
Kernel Recipes 2017 - 20 years of Linux Virtual Memory - Andrea Arcangeli
Kernel Recipes 2017 - 20 years of Linux Virtual Memory - Andrea ArcangeliKernel Recipes 2017 - 20 years of Linux Virtual Memory - Andrea Arcangeli
Kernel Recipes 2017 - 20 years of Linux Virtual Memory - Andrea Arcangeli
 
Rootlinux17: An introduction to Xen Project Virtualisation
Rootlinux17:  An introduction to Xen Project VirtualisationRootlinux17:  An introduction to Xen Project Virtualisation
Rootlinux17: An introduction to Xen Project Virtualisation
 
淺談探索 Linux 系統設計之道
淺談探索 Linux 系統設計之道 淺談探索 Linux 系統設計之道
淺談探索 Linux 系統設計之道
 
Booting Android: bootloaders, fastboot and boot images
Booting Android: bootloaders, fastboot and boot imagesBooting Android: bootloaders, fastboot and boot images
Booting Android: bootloaders, fastboot and boot images
 
Design and Concepts of Android Graphics
Design and Concepts of Android GraphicsDesign and Concepts of Android Graphics
Design and Concepts of Android Graphics
 
Virtualization - Kernel Virtual Machine (KVM)
Virtualization - Kernel Virtual Machine (KVM)Virtualization - Kernel Virtual Machine (KVM)
Virtualization - Kernel Virtual Machine (KVM)
 
Cache coloring Xen Summit 2020
Cache coloring Xen Summit 2020Cache coloring Xen Summit 2020
Cache coloring Xen Summit 2020
 
Computing Performance: On the Horizon (2021)
Computing Performance: On the Horizon (2021)Computing Performance: On the Horizon (2021)
Computing Performance: On the Horizon (2021)
 
FD.io Vector Packet Processing (VPP)
FD.io Vector Packet Processing (VPP)FD.io Vector Packet Processing (VPP)
FD.io Vector Packet Processing (VPP)
 
Enable DPDK and SR-IOV for containerized virtual network functions with zun
Enable DPDK and SR-IOV for containerized virtual network functions with zunEnable DPDK and SR-IOV for containerized virtual network functions with zun
Enable DPDK and SR-IOV for containerized virtual network functions with zun
 
Redesigning Xen Memory Sharing (Grant) Mechanism
Redesigning Xen Memory Sharing (Grant) MechanismRedesigning Xen Memory Sharing (Grant) Mechanism
Redesigning Xen Memory Sharing (Grant) Mechanism
 
Virtualization with KVM (Kernel-based Virtual Machine)
Virtualization with KVM (Kernel-based Virtual Machine)Virtualization with KVM (Kernel-based Virtual Machine)
Virtualization with KVM (Kernel-based Virtual Machine)
 

Similar to Xen Memory Management

Xen and the Art of Virtualization
Xen and the Art of VirtualizationXen and the Art of Virtualization
Xen and the Art of Virtualization
Susheel Thakur
 
Vmwareperformancetroubleshooting 100224104321-phpapp02
Vmwareperformancetroubleshooting 100224104321-phpapp02Vmwareperformancetroubleshooting 100224104321-phpapp02
Vmwareperformancetroubleshooting 100224104321-phpapp02
Suresh Kumar
 
Oscon 2012 : From Datacenter to the Cloud - Featuring Xen and XCP
Oscon 2012 : From Datacenter to the Cloud - Featuring Xen and XCPOscon 2012 : From Datacenter to the Cloud - Featuring Xen and XCP
Oscon 2012 : From Datacenter to the Cloud - Featuring Xen and XCP
The Linux Foundation
 
2virtualizationtechnologyoverview 13540659831745-phpapp02-121127193019-phpapp01
2virtualizationtechnologyoverview 13540659831745-phpapp02-121127193019-phpapp012virtualizationtechnologyoverview 13540659831745-phpapp02-121127193019-phpapp01
2virtualizationtechnologyoverview 13540659831745-phpapp02-121127193019-phpapp01
Vietnam Open Infrastructure User Group
 

Similar to Xen Memory Management (20)

Xen and the Art of Virtualization
Xen and the Art of VirtualizationXen and the Art of Virtualization
Xen and the Art of Virtualization
 
More on Virtualization 3.pptx
More on Virtualization 3.pptxMore on Virtualization 3.pptx
More on Virtualization 3.pptx
 
6. Live VM migration
6. Live VM migration6. Live VM migration
6. Live VM migration
 
5. IO virtualization
5. IO virtualization5. IO virtualization
5. IO virtualization
 
003-vmm.pptx
003-vmm.pptx003-vmm.pptx
003-vmm.pptx
 
17-virtualization.pptx
17-virtualization.pptx17-virtualization.pptx
17-virtualization.pptx
 
Vmwareperformancetroubleshooting 100224104321-phpapp02 (1)
Vmwareperformancetroubleshooting 100224104321-phpapp02 (1)Vmwareperformancetroubleshooting 100224104321-phpapp02 (1)
Vmwareperformancetroubleshooting 100224104321-phpapp02 (1)
 
Vmwareperformancetroubleshooting 100224104321-phpapp02
Vmwareperformancetroubleshooting 100224104321-phpapp02Vmwareperformancetroubleshooting 100224104321-phpapp02
Vmwareperformancetroubleshooting 100224104321-phpapp02
 
XPDS13: Zero-copy display of guest framebuffers using GEM - John Baboval, Citrix
XPDS13: Zero-copy display of guest framebuffers using GEM - John Baboval, CitrixXPDS13: Zero-copy display of guest framebuffers using GEM - John Baboval, Citrix
XPDS13: Zero-copy display of guest framebuffers using GEM - John Baboval, Citrix
 
Xen & virtualization
Xen & virtualizationXen & virtualization
Xen & virtualization
 
Porting Xen Paravirtualization to MIPS Architecture
Porting Xen Paravirtualization to MIPS ArchitecturePorting Xen Paravirtualization to MIPS Architecture
Porting Xen Paravirtualization to MIPS Architecture
 
Current and Future of Non-Volatile Memory on Linux
Current and Future of Non-Volatile Memory on LinuxCurrent and Future of Non-Volatile Memory on Linux
Current and Future of Non-Volatile Memory on Linux
 
Oscon 2012 : From Datacenter to the Cloud - Featuring Xen and XCP
Oscon 2012 : From Datacenter to the Cloud - Featuring Xen and XCPOscon 2012 : From Datacenter to the Cloud - Featuring Xen and XCP
Oscon 2012 : From Datacenter to the Cloud - Featuring Xen and XCP
 
network ram parallel computing
network ram parallel computingnetwork ram parallel computing
network ram parallel computing
 
Xen in Safety-Critical Systems - Critical Summit 2022
Xen in Safety-Critical Systems - Critical Summit 2022Xen in Safety-Critical Systems - Critical Summit 2022
Xen in Safety-Critical Systems - Critical Summit 2022
 
OSSEU18: NVDIMM and Virtualization - George Dunlap, Citrix
OSSEU18: NVDIMM and Virtualization  - George Dunlap, CitrixOSSEU18: NVDIMM and Virtualization  - George Dunlap, Citrix
OSSEU18: NVDIMM and Virtualization - George Dunlap, Citrix
 
2virtualizationtechnologyoverview 13540659831745-phpapp02-121127193019-phpapp01
2virtualizationtechnologyoverview 13540659831745-phpapp02-121127193019-phpapp012virtualizationtechnologyoverview 13540659831745-phpapp02-121127193019-phpapp01
2virtualizationtechnologyoverview 13540659831745-phpapp02-121127193019-phpapp01
 
Your Linux AMI: Optimization and Performance (CPN302) | AWS re:Invent 2013
Your Linux AMI: Optimization and Performance (CPN302) | AWS re:Invent 2013Your Linux AMI: Optimization and Performance (CPN302) | AWS re:Invent 2013
Your Linux AMI: Optimization and Performance (CPN302) | AWS re:Invent 2013
 
XPDDS18: NVDIMM Overview - George Dunlap, Citrix
XPDDS18: NVDIMM Overview - George Dunlap, Citrix XPDDS18: NVDIMM Overview - George Dunlap, Citrix
XPDDS18: NVDIMM Overview - George Dunlap, Citrix
 
Virtualization for Emerging Memory Devices
Virtualization for Emerging Memory DevicesVirtualization for Emerging Memory Devices
Virtualization for Emerging Memory Devices
 

More from The Linux Foundation

More from The Linux Foundation (20)

ELC2019: Static Partitioning Made Simple
ELC2019: Static Partitioning Made SimpleELC2019: Static Partitioning Made Simple
ELC2019: Static Partitioning Made Simple
 
XPDDS19: How TrenchBoot is Enabling Measured Launch for Open-Source Platform ...
XPDDS19: How TrenchBoot is Enabling Measured Launch for Open-Source Platform ...XPDDS19: How TrenchBoot is Enabling Measured Launch for Open-Source Platform ...
XPDDS19: How TrenchBoot is Enabling Measured Launch for Open-Source Platform ...
 
XPDDS19 Keynote: Xen in Automotive - Artem Mygaiev, Director, Technology Solu...
XPDDS19 Keynote: Xen in Automotive - Artem Mygaiev, Director, Technology Solu...XPDDS19 Keynote: Xen in Automotive - Artem Mygaiev, Director, Technology Solu...
XPDDS19 Keynote: Xen in Automotive - Artem Mygaiev, Director, Technology Solu...
 
XPDDS19 Keynote: Xen Project Weather Report 2019 - Lars Kurth, Director of Op...
XPDDS19 Keynote: Xen Project Weather Report 2019 - Lars Kurth, Director of Op...XPDDS19 Keynote: Xen Project Weather Report 2019 - Lars Kurth, Director of Op...
XPDDS19 Keynote: Xen Project Weather Report 2019 - Lars Kurth, Director of Op...
 
XPDDS19 Keynote: Unikraft Weather Report
XPDDS19 Keynote:  Unikraft Weather ReportXPDDS19 Keynote:  Unikraft Weather Report
XPDDS19 Keynote: Unikraft Weather Report
 
XPDDS19 Keynote: Secret-free Hypervisor: Now and Future - Wei Liu, Software E...
XPDDS19 Keynote: Secret-free Hypervisor: Now and Future - Wei Liu, Software E...XPDDS19 Keynote: Secret-free Hypervisor: Now and Future - Wei Liu, Software E...
XPDDS19 Keynote: Secret-free Hypervisor: Now and Future - Wei Liu, Software E...
 
XPDDS19 Keynote: Patch Review for Non-maintainers - George Dunlap, Citrix Sys...
XPDDS19 Keynote: Patch Review for Non-maintainers - George Dunlap, Citrix Sys...XPDDS19 Keynote: Patch Review for Non-maintainers - George Dunlap, Citrix Sys...
XPDDS19 Keynote: Patch Review for Non-maintainers - George Dunlap, Citrix Sys...
 
XPDDS19: Memories of a VM Funk - Mihai Donțu, Bitdefender
XPDDS19: Memories of a VM Funk - Mihai Donțu, BitdefenderXPDDS19: Memories of a VM Funk - Mihai Donțu, Bitdefender
XPDDS19: Memories of a VM Funk - Mihai Donțu, Bitdefender
 
OSSJP/ALS19: The Road to Safety Certification: Overcoming Community Challeng...
OSSJP/ALS19:  The Road to Safety Certification: Overcoming Community Challeng...OSSJP/ALS19:  The Road to Safety Certification: Overcoming Community Challeng...
OSSJP/ALS19: The Road to Safety Certification: Overcoming Community Challeng...
 
OSSJP/ALS19: The Road to Safety Certification: How the Xen Project is Making...
 OSSJP/ALS19: The Road to Safety Certification: How the Xen Project is Making... OSSJP/ALS19: The Road to Safety Certification: How the Xen Project is Making...
OSSJP/ALS19: The Road to Safety Certification: How the Xen Project is Making...
 
XPDDS19: Speculative Sidechannels and Mitigations - Andrew Cooper, Citrix
XPDDS19: Speculative Sidechannels and Mitigations - Andrew Cooper, CitrixXPDDS19: Speculative Sidechannels and Mitigations - Andrew Cooper, Citrix
XPDDS19: Speculative Sidechannels and Mitigations - Andrew Cooper, Citrix
 
XPDDS19: Keeping Coherency on Arm: Reborn - Julien Grall, Arm ltd
XPDDS19: Keeping Coherency on Arm: Reborn - Julien Grall, Arm ltdXPDDS19: Keeping Coherency on Arm: Reborn - Julien Grall, Arm ltd
XPDDS19: Keeping Coherency on Arm: Reborn - Julien Grall, Arm ltd
 
XPDDS19: QEMU PV Backend 'qdevification'... What Does it Mean? - Paul Durrant...
XPDDS19: QEMU PV Backend 'qdevification'... What Does it Mean? - Paul Durrant...XPDDS19: QEMU PV Backend 'qdevification'... What Does it Mean? - Paul Durrant...
XPDDS19: QEMU PV Backend 'qdevification'... What Does it Mean? - Paul Durrant...
 
XPDDS19: Status of PCI Emulation in Xen - Roger Pau Monné, Citrix Systems R&D
XPDDS19: Status of PCI Emulation in Xen - Roger Pau Monné, Citrix Systems R&DXPDDS19: Status of PCI Emulation in Xen - Roger Pau Monné, Citrix Systems R&D
XPDDS19: Status of PCI Emulation in Xen - Roger Pau Monné, Citrix Systems R&D
 
XPDDS19: [ARM] OP-TEE Mediator in Xen - Volodymyr Babchuk, EPAM Systems
XPDDS19: [ARM] OP-TEE Mediator in Xen - Volodymyr Babchuk, EPAM SystemsXPDDS19: [ARM] OP-TEE Mediator in Xen - Volodymyr Babchuk, EPAM Systems
XPDDS19: [ARM] OP-TEE Mediator in Xen - Volodymyr Babchuk, EPAM Systems
 
XPDDS19: Bringing Xen to the Masses: The Story of Building a Community-driven...
XPDDS19: Bringing Xen to the Masses: The Story of Building a Community-driven...XPDDS19: Bringing Xen to the Masses: The Story of Building a Community-driven...
XPDDS19: Bringing Xen to the Masses: The Story of Building a Community-driven...
 
XPDDS19: Will Robots Automate Your Job Away? Streamlining Xen Project Contrib...
XPDDS19: Will Robots Automate Your Job Away? Streamlining Xen Project Contrib...XPDDS19: Will Robots Automate Your Job Away? Streamlining Xen Project Contrib...
XPDDS19: Will Robots Automate Your Job Away? Streamlining Xen Project Contrib...
 
XPDDS19: Client Virtualization Toolstack in Go - Nick Rosbrook & Brendan Kerr...
XPDDS19: Client Virtualization Toolstack in Go - Nick Rosbrook & Brendan Kerr...XPDDS19: Client Virtualization Toolstack in Go - Nick Rosbrook & Brendan Kerr...
XPDDS19: Client Virtualization Toolstack in Go - Nick Rosbrook & Brendan Kerr...
 
XPDDS19: Core Scheduling in Xen - Jürgen Groß, SUSE
XPDDS19: Core Scheduling in Xen - Jürgen Groß, SUSEXPDDS19: Core Scheduling in Xen - Jürgen Groß, SUSE
XPDDS19: Core Scheduling in Xen - Jürgen Groß, SUSE
 
XPDDS19: Implementing AMD MxGPU - Jonathan Farrell, Assured Information Security
XPDDS19: Implementing AMD MxGPU - Jonathan Farrell, Assured Information SecurityXPDDS19: Implementing AMD MxGPU - Jonathan Farrell, Assured Information Security
XPDDS19: Implementing AMD MxGPU - Jonathan Farrell, Assured Information Security
 

Recently uploaded

Search and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical FuturesSearch and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical Futures
Bhaskar Mitra
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
Safe Software
 

Recently uploaded (20)

Exploring UiPath Orchestrator API: updates and limits in 2024 🚀
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀Exploring UiPath Orchestrator API: updates and limits in 2024 🚀
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀
 
Search and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical FuturesSearch and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical Futures
 
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 
ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User Group
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
IESVE for Early Stage Design and Planning
IESVE for Early Stage Design and PlanningIESVE for Early Stage Design and Planning
IESVE for Early Stage Design and Planning
 
Optimizing NoSQL Performance Through Observability
Optimizing NoSQL Performance Through ObservabilityOptimizing NoSQL Performance Through Observability
Optimizing NoSQL Performance Through Observability
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
 
Free and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
Free and Effective: Making Flows Publicly Accessible, Yumi IbrahimzadeFree and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
Free and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
 
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya HalderCustom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
 
UiPath Test Automation using UiPath Test Suite series, part 1
UiPath Test Automation using UiPath Test Suite series, part 1UiPath Test Automation using UiPath Test Suite series, part 1
UiPath Test Automation using UiPath Test Suite series, part 1
 
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
 
In-Depth Performance Testing Guide for IT Professionals
In-Depth Performance Testing Guide for IT ProfessionalsIn-Depth Performance Testing Guide for IT Professionals
In-Depth Performance Testing Guide for IT Professionals
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
 

Xen Memory Management

  • 1. Vancouver, February 2009 Memory management in (x86) Xen Tim Deegan
  • 2. Xen’s memory services • Memory management • Allocating memory to guests, scrubbing free memory • Tracking memory usage with reference counts and types  Heap allocators and the frametable. • Virtual memory • Protecting guests from each other • Enforcing typing rules, e.g. read-only areas • Providing translation services between address spaces  MMU hypercalls, shadow pagetables, hardware-assisted paging © 2008 Citrix Systems, Inc. — All rights reserved 2
  • 3. Terminology • Virtual address/Physical address/Machine address • Frame vs. Page • PFN: physical frame number • Guest’s abstraction for tracking/allocating RAM • Usually fairly contiguous • GFN: guest frame number • Guest’s idea of what hardware addresses are • Used in guest pagetables • MFN: machine frame number • Actual hardware addresses © 2008 Citrix Systems, Inc. — All rights reserved 3
  • 4. Basic memory management • Buddy allocator hands out frames • Each guest has a max number of frames • Frame-table records for each frame: • Owner, if any • Linked list of other frames owned by this guest • Reference count (must be zero to free the frame) • Type, and a refcount for the type (must be zero to change type) • TLB-flush-avoidance timestamp © 2008 Citrix Systems, Inc. — All rights reserved 4
  • 5. PV pagetables, a.k.a. direct paging • PFN  MFN table managed by the guest • Shared MFN  PFN table provided by Xen • GFN == MFN, so pagetables can be used directly by the hardware • Xen checks the contents of the guest pagetables before allowing the hardware to see them. © 2008 Citrix Systems, Inc. — All rights reserved 5
  • 6. Enforcing isolation • Guest pagetables must have a pagetable type • Xen checks that page contents obey the typing rules before allowing them to take on PT type • Typing rules: • No mapping other guests’ frames • No read-write mappings of frames with PT type • Modifying an already-typed PT needs a call to Xen to check the modification obeys the rules. (Or trap-and-emulate assistance from Xen.) © 2008 Citrix Systems, Inc. — All rights reserved 6
  • 7. Grant Tables • Guest-supplied ACLs allowing other guests to map their frames • Mapper makes a hypercall with a domid, an opaque index, and the address of a PTE • Xen checks that entry in the mappee’s grant table and if it’s OK, modifies the PTE • Needs explicit unmap hypercall when finished • Also available: grant-copy, where Xen memcpy()s from/to a granted frame instead of mapping it. © 2008 Citrix Systems, Inc. — All rights reserved 7
  • 8. HVM pagetables • PFN  MFN table managed by Xen • GFN == PFN so need another layer of translation • Guest won’t cooperate in enforcing access control • Two options: • Xen builds shadow copies of guest pagetables with the extra translations and controls added; or • Hardware support for using a second set of pagetables containing extra translations and controls © 2008 Citrix Systems, Inc. — All rights reserved 8
  • 9. Shadow pagetables • Keep Xen-maintained copies of guest frames that we think are being used as pagetables • Guest never sees the shadows so we can add any translations and restrictions we like • 13 different kinds of shadows depending on what kind of pagetable we think it is: a single frame can have up to 10 shadows at once • Also have three kinds of shadows for faking out superpages (2MB of contiguous PFNs does not mean 2MB of contiguous MFNs) © 2008 Citrix Systems, Inc. — All rights reserved 9
  • 10. Shadow pagetables: building • Start with an empty top-level shadow of the PFN in CR3 • On pagefault, shadow the entries in the PT walk, making new shadows at each level if necessary. • Each shadow entry is the guest entry with the GFN replaces by an MFN (of the next-level shadow or of guest memory) and extra access restrictions: • Pages that have shadows are mapped read-only. • Extra restrictions can be specified in the PFN  MFN table. • We can restrict write access to guest’s frames for tracking page- dirtying during live migration. © 2008 Citrix Systems, Inc. — All rights reserved 10
  • 11. Shadow pagetables: maintenance • Shadowed pages are always kept read-only. • When the guest writes to a shadowed frame, Xen’s pagefault handler must: • Emulate the current instruction to figure out what’s being written; • Write the new value into the guest pagetable; and • Update the equivalent parts of all shadows of the frame. © 2008 Citrix Systems, Inc. — All rights reserved 11
  • 12. Shadow pagetables: tearing back down • Shadowing a frame is expensive • Thousands of cycles for trap and emulation of every write. • Easy to tell when a page becomes a PT; harder to tell when it stops: • Reference count based on higher-level shadows and CR3 contents, but hard to know when a PFN’s been used in CR3 for the last time • Guess based on odd-looking page contents • Guess based on memory access patterns • Get PV drivers to give us hints • Recycle under memory pressure by approximating LRU © 2008 Citrix Systems, Inc. — All rights reserved 12
  • 13. Optimizations • Tagged TLBs (AMD’s ASID; Intel’s VPID) allow us to avoid a TLB flush on every VMEXIT/VMENTER • In theory can do even better now that Win2k8 supports context switching without TLB flushing. • Shadowing not-present entries with invalid entries lets us fast-track “real” pagefaults back to the guest • Out-of-sync shadows: let the guest write directly to the lowest level of pagetables and sync up the shadows whenever a hardware TLB would re-read (TLB flush, page faults, higher-level writes) © 2008 Citrix Systems, Inc. — All rights reserved 13
  • 14. Hardware-assisted paging • Xen supplies a second set of pagetables describing the PFN  MFN translation and extra restrictions • CPU takes a pointer to this as well as a (PFN- space) CR3 value from the guest • MMU hardware applies the composition of the two translations and the intersection of the access rights © 2008 Citrix Systems, Inc. — All rights reserved 14
  • 15. Hardware-assisted paging: performance Avoid expensive trap + emulate on writes to PTs, and extra logic on pagefault path TLB fill can now take 20 memory accesses! CPU’s TLB is much smaller than the set of shadows we can maintain • AMD’s RVI gives +10% performance over shadows on some workloads, -10% on others; Intel’s EPT seems more consistently better than shadowing • Performance depends heavily on using superpage mappings in the second pagetable © 2008 Citrix Systems, Inc. — All rights reserved 15
  • 16. Fin © 2008 Citrix Systems, Inc. — All rights reserved 16