Linux NUMA & Databases: Perils and Opportunities

Linux NUMA
&
Databases
Perils and Opportunities

What is NUMA
● Stands for Non Uniform Memory Access
○ Non Uniform to whom.
○ Von Neumann bottleneck.
○ Cache coherent NUMA
● How does it work
○ Memory is placed local to the processes.
○ Balancing access to data over the available processors on multiple nodes.
● Large memory installations are becoming the norm
○ The i2 series on AWS.
○ Databases are the main consumers.
● Constraints
○ Speed of light
○ Interconnect saturation

What is NUMA
● Constraints
○ Speed of light
■ Higher latency of accessing remote memory.
○ Interconnect saturation
■ Performance counters.
● Slow abundant memory
○ Fast limited memory
● Cache coherence
○ Processor threads and cores share resources
■ Execution units (between HT threads)
■ Cache (between threads and cores)

Exotic cases
● Network cards
● PCIe storage
● NVRAM
● Nodes without memory
● Nodes without processors
● Unbalanced
● Central/Large memory
● Big Little architecture
● GPU

NUMA complications
● Unmovable memory
● KSM
● THP
● Interrupt balancing and locality

Tools/libraries for NUMA
● Supported by Linux since 2.5
○ Symmetric and CPU/Memory
● Numactl
● Hwloc / lstopo
● Numad
● Numatop
● Libnuma
● Numastat
● Taskset
● KVM for simulation and testing
● Perf

Tools/libraries for NUMA
● KVM for simulation and testing
● Useful for testing databases.
qemu-system-x86_64 -enable-kvm -drive file=./debian-8.1-lxc-puppet.qcow2 -net nic,
macaddr=52:54:00:00:EE:03 -net vde -smp sockets=2,cores=2,threads=2,maxcpus=16 -
numa node,nodeid=0,cpus=0-3 -numa node,nodeid=1,cpus=4-7 -numa node,nodeid=2,
cpus=8-15 -m 2G

Tunings and observables
● /proc/zoneinfo
○ Sysctl vm.zone_reclaim_mode OR /proc/sys/vm/zone_reclaim
○ /proc/sys/vm/min_unmapped_ratio
● /proc/meminfo
● /proc/vmstat
● Ftrace
● Cgroup hierarchy
○ memory

Tunings and observables
● ACPI
○ SLIT and SRAT
● Per process:
○ /proc/<pid>/numa_maps
○ /proc/<pid>/sched
● Auto NUMA balancing
○ CONFIG_NUMA_BALANCING in /proc/config.gz
● get_mempolicy(2), mbind(2), migrate_pages(2), move_pages(2),
set_mempolicy(2), sched_getaffinity(2)
● Libnuma (3)
○ Higher abstraction - numa_set_localalloc

AutoNUMA
● CPU follows memory
○ Reschedule tasks on same nodes as memory
● Memory follows CPU
○ Copy memory pages to same nodes as tasks/threads
● Heuristics
○ Fault statistics
○ Task grouping
○ Multi-resource optimization - cache, cpu, memory, starvation
■ Avoid thrashing
● Only CPU and memory?
○ For others, use manual pinning!

NUMA Policies
● MPOL_DEFAULT
● MPOL_BIND
● MPOL_INTERLEAVE
○ Memory striping in hardware
● MPOL_PREFERRED
● MPOL_MF_MOVE | MPOL_MF_MOVE_ALL

Databases
● Most databases support multiple cores and NUMA.
○ MAP_ANONYMOUS and O_DIRECT are common
● Most default to interleaving to avoid zone imbalance issues
○ Effects
■ Swapping due to Reclaim
■ OOM
○ Downsides to interleaving
○ MySQL, Cassandra et.al.
● Pattern of accesses
○ Cause of imbalance
● Duality of Applications v/s OS

Reclaim
● Swappiness
○ Anon v/s File-backed
● Zone reclaim
○ Single process can span multiple zones
○ Imbalance without any strategies
○ Watermarks
○ Databases suffer the most
■ They carry a lot of state!
○ Types of reclaim
● Imbalance
○ Why does this happen

Access Pattern Optimizations
● Thread pool
○ Reuse of threads with longer lifetime
○ Explicit or implicit bind
■ Numa_set_localalloc / numa_set_preferred
■ Sched_setaffinity
■ CONFIG_NO_HZ and latency
● Global heaps - buffer pool, JVM
○ Allocation by proxy
○ Mbind and MPOL_BIND
○ MAP_POPULATE (why? - First touch policy)
○ Node_set_preferred

Access Patterns (contd)
● Split Pools
○ Independent pools of memory in a database Ex: Multiple buffer pool instances
● Multiple instances
○ Mostly for simple databases.
■ Redis
○ Containers
● Hybird
○ Linux kernel - boot and init
○ MySQL / InnoDB
■ MPOL_LOCAL for threads
■ MPOL_INTERLEAVE for global heaps
● Task Grouping

Credits!
● http://queue.acm.org/detail.cfm?id=2513149
● www.linux-kvm.org/images/7/75/01x07b-NumaAutobalancing.pdf
● http://events.linuxfoundation.org/sites/events/files/slides/Normal%
20and%20Exotic%20use%20cases%20for%20NUMA%20features.pdf
● https://en.wikipedia.org/wiki/Non-uniform_memory_access
● https://lihz1990.gitbooks.io/transoflptg/content/02.%E7%9B%91%E6%
8E%A7%E5%92%8C%E5%8E%8B%E6%B5%8B%E5%B7%A5%E5%85%
B7/sample-output-of-the-numastat-command.png

Linux NUMA & Databases: Perils and Opportunities

Linux NUMA & Databases: Perils and Opportunities

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Linux NUMA & Databases: Perils and Opportunities

Similar to Linux NUMA & Databases: Perils and Opportunities (20)

More from Raghavendra Prabhu

More from Raghavendra Prabhu (20)

Recently uploaded

Recently uploaded (20)

Linux NUMA & Databases: Perils and Opportunities