4. Motivation – NUMA Overheads
● CPU0 and CPU1 are Hyper-Threads.
● CPU0 and CPU2 are on the same node.
● CPU0 and CPU8 are on different nodes.
● Overheads are due to both Cache Hierarchy (L1/L2/LLC) and
Memory Organization (NUMA)
● Modified Cache Coherency State – Cacheline is present only in
the current cache and is dirty. The cacheline is written back to
main memory before any reads.
● Substantial overhead in accessing remote node's memory.
Xen Summit AMD 2010
5. Motivation – NUMA-related OS
Optimizations (Linux as example)
● OS employs many optimizations to reduce
inter-node memory accesses – memory
management, scheduler, OS data-structures,
etc.
● OS defines multiple NUMA allocation policies
(MPOL_{DEFAULT/BIND/PREFERRED/INTER
LEAVE}) to suit different applications. DEFAULT
is local allocation.
● Significant performance improvement from
system-level NUMA optimizations.
Xen Summit AMD 2010
6. Motivation – NUMA-related
Application Optimizations (Linux)
● DEFAULT memory policy (of allocating from local
node) and a NUMA-aware scheduler reduce the
inter-node accesses.
● Libraries (numactl on Linux) are provided to
select appropriate memory placement policy for
specific application requirements.
● CONCLUSION – NUMA-related optimizations at
OS-level and Application-level are too important
and too many to ignore or discard.
Xen Summit AMD 2010
7. Motivation – Virtualization on
NUMA platforms (Issues)
● Ad-hoc and Minimum-Effort VM memory allocation
schemes.
● For instance, XEN tries to allocate all the memory for
a VM from a single memory node and pin the VM to
the node, for a one-to-one mapping between a VM
and a node.
● Not always possible to allocate from a single node –
VM size, node memory fragmentation, etc.
● Dynamic memory Interfaces (such as memory
ballooning) could still disrupt the mapping, by
allocating from some other node.
Xen Summit AMD 2010
9. VM Memory Allocation Strategies
● CONFINED : Allocate the entire VM memory from a single
node. Goal : Maximize performance.
● SPLIT : Allocate the VM memory from a set of nodes by
splitting equally across the nodes. Goal : Maximize
performance (with Enlightenment).
● STRIPED : Interleave the VM memory across a set of
nodes. Goal : Predictable (average) performance.
Xen Summit AMD 2010
13. Automatic VM Memory Allocation
Scheme
● TRY : Allocate CONFINED using Best-Fit-Decreasing
(BFD).
● TRY : Allocate SPLIT using Best-Fit-Decreasing (BFD),
if the guest is NUMA-enabled. Enlighten the guest.
● Allocate STRIPED using First-Fit-Increasing (FFI).
● BFD returns the minimal-subset of nodes.
● FFI returns the maximal-subset of nodes. Used with
STRIPED to reduce the fragmentation of free node
memory.
Xen Summit AMD 2010
14. VM Memory Allocation Strategy -
SPLIT
● Used to construct a strict one-to-one mapping
between virtual nodes and physical nodes.
● HVM : Export the VM memory layout using
ACPI tables. VM constructs virtual nodes.
● PV : Export the VM memory layout using Virtual
NUMA Enlightenment. VM constructs and
maintains virtual nodes.
Xen Summit AMD 2010
16. PV NUMA Guest -
Construction of Virtual Nodes
● Guest reads the Virtual NUMA Enlightenment using
a hypercall.
● Guest constructs the (virtual) nodes and (virtual)
cpu-to-node mappings.
● Guest (virtual) node distances reflect the actual
distances between the underlying physical nodes.
Xen Summit AMD 2010
17. PV NUMA Guest –
Construction of Virtual Nodes
Xen Summit AMD 2010
18. PV NUMA Guest –
Maintenance of Virtual Nodes
● Dynamic memory interfaces could
increase/decrease/exchange the VM memory
reservations. Eg. Ballooning (Table in slide 7)
● Modify the interfaces to use Virtual NUMA
Enlightenment. Maintain the strict mapping
between Virtual and Physical nodes.
Xen Summit AMD 2010
19. PV NUMA Guest -
Maintenance of Virtual Nodes
Xen Summit AMD 2010
20. PV NUMA Guest –
Maintenance of Virtual Nodes
● Strict approach could lead to starvation in
CONFINED/SPLIT VMs.
● Under memory pressure, relax the strict one-to-
one mapping between virtual and physical nodes.
● Provide a mechanism to the guests to look-up
physical node-id corresponding to a guest
physical address.
● Periodically sweep through the VM memory and
converge to original state (indefinitely).
Xen Summit AMD 2010
22. Summary
● VM Memory Allocation Strategies for NUMA –
CONFINED/SPLIT/STRIPED.
● Automatic VM Memory Allocation Scheme.
● NUMA Guests with SPLIT strategy :
● HVM – Inform using SLIT/SRAT ACPI tables
● PV – Inform using Enlightenment
● PV NUMA Guests
● Construction of Virtual Nodes
● Maintenance of Virtual Nodes (Eg, Ballooning)
Xen Summit AMD 2010