Dulloor xen-summit

VM Memory Allocation Schemes and
PV NUMA Guests

Dulloor Rao

Xen Summit AMD 2010

Agenda
● Motivation
● VM memory allocation strategies –
CONFINED, SPLIT, STRIPED
● AUTOMATIC (default) allocation scheme
● PV NUMA Guests
● Summary

Xen Summit AMD 2010

Motivation – NUMA Overheads

Xen Summit AMD 2010

Motivation – NUMA Overheads
● CPU0 and CPU1 are Hyper-Threads.
● CPU0 and CPU2 are on the same node.
● CPU0 and CPU8 are on different nodes.
● Overheads are due to both Cache Hierarchy (L1/L2/LLC) and
Memory Organization (NUMA)
● Modified Cache Coherency State – Cacheline is present only in
the current cache and is dirty. The cacheline is written back to
main memory before any reads.
● Substantial overhead in accessing remote node's memory.

Xen Summit AMD 2010

Motivation – NUMA-related OS
Optimizations (Linux as example)
● OS employs many optimizations to reduce
inter-node memory accesses – memory
management, scheduler, OS data-structures,
etc.
● OS defines multiple NUMA allocation policies
(MPOL_{DEFAULT/BIND/PREFERRED/INTER
LEAVE}) to suit different applications. DEFAULT
is local allocation.
● Significant performance improvement from
system-level NUMA optimizations.
Xen Summit AMD 2010

Motivation – NUMA-related
Application Optimizations (Linux)

● DEFAULT memory policy (of allocating from local
node) and a NUMA-aware scheduler reduce the
inter-node accesses.
● Libraries (numactl on Linux) are provided to
select appropriate memory placement policy for
specific application requirements.
● CONCLUSION – NUMA-related optimizations at
OS-level and Application-level are too important
and too many to ignore or discard.
Xen Summit AMD 2010

Motivation – Virtualization on
NUMA platforms (Issues)
● Ad-hoc and Minimum-Effort VM memory allocation
schemes.
● For instance, XEN tries to allocate all the memory for
a VM from a single memory node and pin the VM to
the node, for a one-to-one mapping between a VM
and a node.
● Not always possible to allocate from a single node –
VM size, node memory fragmentation, etc.
● Dynamic memory Interfaces (such as memory
ballooning) could still disrupt the mapping, by
allocating from some other node.
Xen Summit AMD 2010

Motivation – Virtualization on
NUMA platforms (Issues)

Xen Summit AMD 2010

VM Memory Allocation Strategies
● CONFINED : Allocate the entire VM memory from a single
node. Goal : Maximize performance.

● SPLIT : Allocate the VM memory from a set of nodes by
splitting equally across the nodes. Goal : Maximize
performance (with Enlightenment).

● STRIPED : Interleave the VM memory across a set of
nodes. Goal : Predictable (average) performance.

Xen Summit AMD 2010

VM Memory Allocation Strategies -
CONFINED

Xen Summit AMD 2010

SPLIT

Xen Summit AMD 2010

STRIPED

Xen Summit AMD 2010

Automatic VM Memory Allocation
Scheme
● TRY : Allocate CONFINED using Best-Fit-Decreasing
(BFD).
● TRY : Allocate SPLIT using Best-Fit-Decreasing (BFD),
if the guest is NUMA-enabled. Enlighten the guest.
● Allocate STRIPED using First-Fit-Increasing (FFI).
● BFD returns the minimal-subset of nodes.
● FFI returns the maximal-subset of nodes. Used with
STRIPED to reduce the fragmentation of free node
memory.

Xen Summit AMD 2010

VM Memory Allocation Strategy -
SPLIT
● Used to construct a strict one-to-one mapping
between virtual nodes and physical nodes.
● HVM : Export the VM memory layout using
ACPI tables. VM constructs virtual nodes.
● PV : Export the VM memory layout using Virtual
NUMA Enlightenment. VM constructs and
maintains virtual nodes.

Xen Summit AMD 2010

PV NUMA Guest - Enlightenment

Xen Summit AMD 2010

PV NUMA Guest -
Construction of Virtual Nodes

● Guest reads the Virtual NUMA Enlightenment using
a hypercall.

● Guest constructs the (virtual) nodes and (virtual)
cpu-to-node mappings.

● Guest (virtual) node distances reflect the actual
distances between the underlying physical nodes.

Xen Summit AMD 2010

PV NUMA Guest –
Construction of Virtual Nodes

Xen Summit AMD 2010

PV NUMA Guest –
Maintenance of Virtual Nodes
● Dynamic memory interfaces could
increase/decrease/exchange the VM memory
reservations. Eg. Ballooning (Table in slide 7)

● Modify the interfaces to use Virtual NUMA
Enlightenment. Maintain the strict mapping
between Virtual and Physical nodes.

Xen Summit AMD 2010

PV NUMA Guest -

Xen Summit AMD 2010

PV NUMA Guest –
● Strict approach could lead to starvation in
CONFINED/SPLIT VMs.
● Under memory pressure, relax the strict one-to-
one mapping between virtual and physical nodes.
● Provide a mechanism to the guests to look-up
physical node-id corresponding to a guest
physical address.
● Periodically sweep through the VM memory and
converge to original state (indefinitely).

Xen Summit AMD 2010

Results – linpack benchmark

Xen Summit AMD 2010

Summary
● VM Memory Allocation Strategies for NUMA –
CONFINED/SPLIT/STRIPED.
● Automatic VM Memory Allocation Scheme.
● NUMA Guests with SPLIT strategy :
● HVM – Inform using SLIT/SRAT ACPI tables
● PV – Inform using Enlightenment
● PV NUMA Guests
● Construction of Virtual Nodes
● Maintenance of Virtual Nodes (Eg, Ballooning)
Xen Summit AMD 2010

Questions ?

Xen Summit AMD 2010

Thank You !

Xen Summit AMD 2010

Dulloor xen-summit

Recommended

Recommended

More Related Content

What's hot

What's hot (14)

Viewers also liked

Viewers also liked (20)

Similar to Dulloor xen-summit

Similar to Dulloor xen-summit (20)

More from The Linux Foundation

More from The Linux Foundation (20)

Recently uploaded

Recently uploaded (20)

Dulloor xen-summit