SlideShare a Scribd company logo
Linux NUMA
&
Databases
Perils and Opportunities
NUMA Reference architecture
What is NUMA
● Stands for Non Uniform Memory Access
○ Non Uniform to whom.
○ Von Neumann bottleneck.
○ Cache coherent NUMA
● How does it work
○ Memory is placed local to the processes.
○ Balancing access to data over the available processors on multiple nodes.
● Large memory installations are becoming the norm
○ The i2 series on AWS.
○ Databases are the main consumers.
● Constraints
○ Speed of light
○ Interconnect saturation
What is NUMA
● Constraints
○ Speed of light
■ Higher latency of accessing remote memory.
○ Interconnect saturation
■ Performance counters.
● Slow abundant memory
○ Fast limited memory
● Cache coherence
○ Processor threads and cores share resources
■ Execution units (between HT threads)
■ Cache (between threads and cores)
Exotic cases
● Network cards
● PCIe storage
● NVRAM
● Nodes without memory
● Nodes without processors
● Unbalanced
● Central/Large memory
● Big Little architecture
● GPU
NUMA complications
● Unmovable memory
● KSM
● THP
● Interrupt balancing and locality
Tools/libraries for NUMA
● Supported by Linux since 2.5
○ Symmetric and CPU/Memory
● Numactl
● Hwloc / lstopo
● Numad
● Numatop
● Libnuma
● Numastat
● Taskset
● KVM for simulation and testing
● Perf
Tools/libraries for NUMA
● KVM for simulation and testing
● Useful for testing databases.
qemu-system-x86_64 -enable-kvm -drive file=./debian-8.1-lxc-puppet.qcow2 -net nic,
macaddr=52:54:00:00:EE:03 -net vde -smp sockets=2,cores=2,threads=2,maxcpus=16 -
numa node,nodeid=0,cpus=0-3 -numa node,nodeid=1,cpus=4-7 -numa node,nodeid=2,
cpus=8-15 -m 2G
Tunings and observables
● /proc/zoneinfo
○ Sysctl vm.zone_reclaim_mode OR /proc/sys/vm/zone_reclaim
○ /proc/sys/vm/min_unmapped_ratio
● /proc/meminfo
● /proc/vmstat
● Ftrace
● Cgroup hierarchy
○ memory
Tunings and observables
● ACPI
○ SLIT and SRAT
● Per process:
○ /proc/<pid>/numa_maps
○ /proc/<pid>/sched
● Auto NUMA balancing
○ CONFIG_NUMA_BALANCING in /proc/config.gz
● get_mempolicy(2), mbind(2), migrate_pages(2), move_pages(2),
set_mempolicy(2), sched_getaffinity(2)
● Libnuma (3)
○ Higher abstraction - numa_set_localalloc
Numa statistics
Numa statistics
AutoNUMA
● CPU follows memory
○ Reschedule tasks on same nodes as memory
● Memory follows CPU
○ Copy memory pages to same nodes as tasks/threads
● Heuristics
○ Fault statistics
○ Task grouping
○ Multi-resource optimization - cache, cpu, memory, starvation
■ Avoid thrashing
● Only CPU and memory?
○ For others, use manual pinning!
NUMA Policies
● MPOL_DEFAULT
● MPOL_BIND
● MPOL_INTERLEAVE
○ Memory striping in hardware
● MPOL_PREFERRED
● MPOL_MF_MOVE | MPOL_MF_MOVE_ALL
Databases
● Most databases support multiple cores and NUMA.
○ MAP_ANONYMOUS and O_DIRECT are common
● Most default to interleaving to avoid zone imbalance issues
○ Effects
■ Swapping due to Reclaim
■ OOM
○ Downsides to interleaving
○ MySQL, Cassandra et.al.
● Pattern of accesses
○ Cause of imbalance
● Duality of Applications v/s OS
Reclaim
● Swappiness
○ Anon v/s File-backed
● Zone reclaim
○ Single process can span multiple zones
○ Imbalance without any strategies
○ Watermarks
○ Databases suffer the most
■ They carry a lot of state!
○ Types of reclaim
● Imbalance
○ Why does this happen
Access Pattern Optimizations
● Thread pool
○ Reuse of threads with longer lifetime
○ Explicit or implicit bind
■ Numa_set_localalloc / numa_set_preferred
■ Sched_setaffinity
■ CONFIG_NO_HZ and latency
● Global heaps - buffer pool, JVM
○ Allocation by proxy
○ Mbind and MPOL_BIND
○ MAP_POPULATE (why? - First touch policy)
○ Node_set_preferred
Access Patterns (contd)
● Split Pools
○ Independent pools of memory in a database Ex: Multiple buffer pool instances
● Multiple instances
○ Mostly for simple databases.
■ Redis
○ Containers
● Hybird
○ Linux kernel - boot and init
○ MySQL / InnoDB
■ MPOL_LOCAL for threads
■ MPOL_INTERLEAVE for global heaps
● Task Grouping
Credits!
● http://queue.acm.org/detail.cfm?id=2513149
● www.linux-kvm.org/images/7/75/01x07b-NumaAutobalancing.pdf
● http://events.linuxfoundation.org/sites/events/files/slides/Normal%
20and%20Exotic%20use%20cases%20for%20NUMA%20features.pdf
● https://en.wikipedia.org/wiki/Non-uniform_memory_access
● https://lihz1990.gitbooks.io/transoflptg/content/02.%E7%9B%91%E6%
8E%A7%E5%92%8C%E5%8E%8B%E6%B5%8B%E5%B7%A5%E5%85%
B7/sample-output-of-the-numastat-command.png
Linux NUMA & Databases: Perils and Opportunities

More Related Content

What's hot

Non-Uniform Memory Access ( NUMA)
Non-Uniform Memory Access ( NUMA)Non-Uniform Memory Access ( NUMA)
Non-Uniform Memory Access ( NUMA)
Nakul Manchanda
 
Shared-Memory Multiprocessors
Shared-Memory MultiprocessorsShared-Memory Multiprocessors
Shared-Memory Multiprocessors
Salvatore La Bua
 
Computer architecture abhmail
Computer architecture abhmailComputer architecture abhmail
Computer architecture abhmail
Om Prakash
 
Linux memory consumption
Linux memory consumptionLinux memory consumption
Linux memory consumption
haish
 
Hardware multithreading
Hardware multithreadingHardware multithreading
Hardware multithreading
Fraboni Ec
 
Multithreaded processors ppt
Multithreaded processors pptMultithreaded processors ppt
Multithreaded processors ppt
Siddhartha Anand
 
Linux Locking Mechanisms
Linux Locking MechanismsLinux Locking Mechanisms
Linux Locking Mechanisms
Kernel TLV
 
C for Cuda - Small Introduction to GPU computing
C for Cuda - Small Introduction to GPU computingC for Cuda - Small Introduction to GPU computing
C for Cuda - Small Introduction to GPU computing
IPALab
 
Continguous Memory Allocator in the Linux Kernel
Continguous Memory Allocator in the Linux KernelContinguous Memory Allocator in the Linux Kernel
Continguous Memory Allocator in the Linux Kernel
Kernel TLV
 
Shared Memory Multi Processor
Shared Memory Multi ProcessorShared Memory Multi Processor
Shared Memory Multi Processor
babuece
 
Multithreading computer architecture
 Multithreading computer architecture  Multithreading computer architecture
Multithreading computer architecture
Haris456
 
Recent advancements in cache technology
Recent advancements in cache technologyRecent advancements in cache technology
Recent advancements in cache technology
Paras Nath Chaudhary
 
Architecture of TPU, GPU and CPU
Architecture of TPU, GPU and CPUArchitecture of TPU, GPU and CPU
Architecture of TPU, GPU and CPU
GlobalLogic Ukraine
 
What is simultaneous multithreading
What is simultaneous multithreadingWhat is simultaneous multithreading
What is simultaneous multithreading
Fraboni Ec
 
Warehouse scale computer
Warehouse scale computerWarehouse scale computer
Warehouse scale computer
Hassan A-j
 
Parallel architecture
Parallel architectureParallel architecture
Parallel architectureMr SMAK
 
Graphics processing uni computer archiecture
Graphics processing uni computer archiectureGraphics processing uni computer archiecture
Graphics processing uni computer archiecture
Haris456
 
Let's Talk Locks!
Let's Talk Locks!Let's Talk Locks!
Let's Talk Locks!
C4Media
 

What's hot (20)

Non-Uniform Memory Access ( NUMA)
Non-Uniform Memory Access ( NUMA)Non-Uniform Memory Access ( NUMA)
Non-Uniform Memory Access ( NUMA)
 
Shared-Memory Multiprocessors
Shared-Memory MultiprocessorsShared-Memory Multiprocessors
Shared-Memory Multiprocessors
 
Computer architecture abhmail
Computer architecture abhmailComputer architecture abhmail
Computer architecture abhmail
 
Linux memory consumption
Linux memory consumptionLinux memory consumption
Linux memory consumption
 
Hardware multithreading
Hardware multithreadingHardware multithreading
Hardware multithreading
 
Multithreaded processors ppt
Multithreaded processors pptMultithreaded processors ppt
Multithreaded processors ppt
 
Lecture1
Lecture1Lecture1
Lecture1
 
Linux Locking Mechanisms
Linux Locking MechanismsLinux Locking Mechanisms
Linux Locking Mechanisms
 
C for Cuda - Small Introduction to GPU computing
C for Cuda - Small Introduction to GPU computingC for Cuda - Small Introduction to GPU computing
C for Cuda - Small Introduction to GPU computing
 
Lecture5
Lecture5Lecture5
Lecture5
 
Continguous Memory Allocator in the Linux Kernel
Continguous Memory Allocator in the Linux KernelContinguous Memory Allocator in the Linux Kernel
Continguous Memory Allocator in the Linux Kernel
 
Shared Memory Multi Processor
Shared Memory Multi ProcessorShared Memory Multi Processor
Shared Memory Multi Processor
 
Multithreading computer architecture
 Multithreading computer architecture  Multithreading computer architecture
Multithreading computer architecture
 
Recent advancements in cache technology
Recent advancements in cache technologyRecent advancements in cache technology
Recent advancements in cache technology
 
Architecture of TPU, GPU and CPU
Architecture of TPU, GPU and CPUArchitecture of TPU, GPU and CPU
Architecture of TPU, GPU and CPU
 
What is simultaneous multithreading
What is simultaneous multithreadingWhat is simultaneous multithreading
What is simultaneous multithreading
 
Warehouse scale computer
Warehouse scale computerWarehouse scale computer
Warehouse scale computer
 
Parallel architecture
Parallel architectureParallel architecture
Parallel architecture
 
Graphics processing uni computer archiecture
Graphics processing uni computer archiectureGraphics processing uni computer archiecture
Graphics processing uni computer archiecture
 
Let's Talk Locks!
Let's Talk Locks!Let's Talk Locks!
Let's Talk Locks!
 

Viewers also liked

Kvm performance optimization for ubuntu
Kvm performance optimization for ubuntuKvm performance optimization for ubuntu
Kvm performance optimization for ubuntuSim Janghoon
 
Kernel Recipes 2016 - Kernel documentation: what we have and where it’s going
Kernel Recipes 2016 - Kernel documentation: what we have and where it’s goingKernel Recipes 2016 - Kernel documentation: what we have and where it’s going
Kernel Recipes 2016 - Kernel documentation: what we have and where it’s going
Anne Nicolas
 
Tackling the Management Challenges of Server Consolidation on Multi-core Systems
Tackling the Management Challenges of Server Consolidation on Multi-core SystemsTackling the Management Challenges of Server Consolidation on Multi-core Systems
Tackling the Management Challenges of Server Consolidation on Multi-core SystemsThe Linux Foundation
 
BKK16-404A PCI Development Meeting
BKK16-404A PCI Development MeetingBKK16-404A PCI Development Meeting
BKK16-404A PCI Development Meeting
Linaro
 
Reverse engineering for_beginners-en
Reverse engineering for_beginners-enReverse engineering for_beginners-en
Reverse engineering for_beginners-en
Andri Yabu
 
Specification-Based Test Program Generation for ARM VMSAv8-64 MMUs
Specification-Based Test Program Generation for ARM VMSAv8-64 MMUsSpecification-Based Test Program Generation for ARM VMSAv8-64 MMUs
Specification-Based Test Program Generation for ARM VMSAv8-64 MMUs
Alexander Kamkin
 
Virtualization overheads
Virtualization overheadsVirtualization overheads
Virtualization overheads
Sandeep Joshi
 
Docker and friends at Linux Days 2014 in Prague
Docker and friends at Linux Days 2014 in PragueDocker and friends at Linux Days 2014 in Prague
Docker and friends at Linux Days 2014 in Prague
tomasbart
 
Linux numa evolution
Linux numa evolutionLinux numa evolution
Linux numa evolution
Lukas Pirl
 
BKK16-104 sched-freq
BKK16-104 sched-freqBKK16-104 sched-freq
BKK16-104 sched-freq
Linaro
 
Cgroup resource mgmt_v1
Cgroup resource mgmt_v1Cgroup resource mgmt_v1
Cgroup resource mgmt_v1sprdd
 
Gc and-pagescan-attacks-by-linux
Gc and-pagescan-attacks-by-linuxGc and-pagescan-attacks-by-linux
Gc and-pagescan-attacks-by-linuxCuong Tran
 
Known basic of NFV Features
Known basic of NFV FeaturesKnown basic of NFV Features
Known basic of NFV Features
Raul Leite
 
SFO15-TR9: PSCI, ACPI (and UEFI to boot)
SFO15-TR9: PSCI, ACPI (and UEFI to boot)SFO15-TR9: PSCI, ACPI (and UEFI to boot)
SFO15-TR9: PSCI, ACPI (and UEFI to boot)
Linaro
 
P4, EPBF, and Linux TC Offload
P4, EPBF, and Linux TC OffloadP4, EPBF, and Linux TC Offload
P4, EPBF, and Linux TC Offload
Open-NFP
 
Cpu scheduling(suresh)
Cpu scheduling(suresh)Cpu scheduling(suresh)
Cpu scheduling(suresh)Nagarajan
 
Cat @ scale
Cat @ scaleCat @ scale
Cat @ scale
Rohit Jnagal
 
VHPC'12: Pre-Copy and Post-Copy VM Live Migration for Memory Intensive Applic...
VHPC'12: Pre-Copy and Post-Copy VM Live Migration for Memory Intensive Applic...VHPC'12: Pre-Copy and Post-Copy VM Live Migration for Memory Intensive Applic...
VHPC'12: Pre-Copy and Post-Copy VM Live Migration for Memory Intensive Applic...aidanshribman
 
CPU Caches
CPU CachesCPU Caches
CPU Caches
shinolajla
 

Viewers also liked (20)

Kvm performance optimization for ubuntu
Kvm performance optimization for ubuntuKvm performance optimization for ubuntu
Kvm performance optimization for ubuntu
 
Kernel Recipes 2016 - Kernel documentation: what we have and where it’s going
Kernel Recipes 2016 - Kernel documentation: what we have and where it’s goingKernel Recipes 2016 - Kernel documentation: what we have and where it’s going
Kernel Recipes 2016 - Kernel documentation: what we have and where it’s going
 
Tackling the Management Challenges of Server Consolidation on Multi-core Systems
Tackling the Management Challenges of Server Consolidation on Multi-core SystemsTackling the Management Challenges of Server Consolidation on Multi-core Systems
Tackling the Management Challenges of Server Consolidation on Multi-core Systems
 
BKK16-404A PCI Development Meeting
BKK16-404A PCI Development MeetingBKK16-404A PCI Development Meeting
BKK16-404A PCI Development Meeting
 
Reverse engineering for_beginners-en
Reverse engineering for_beginners-enReverse engineering for_beginners-en
Reverse engineering for_beginners-en
 
Dulloor xen-summit
Dulloor xen-summitDulloor xen-summit
Dulloor xen-summit
 
Specification-Based Test Program Generation for ARM VMSAv8-64 MMUs
Specification-Based Test Program Generation for ARM VMSAv8-64 MMUsSpecification-Based Test Program Generation for ARM VMSAv8-64 MMUs
Specification-Based Test Program Generation for ARM VMSAv8-64 MMUs
 
Virtualization overheads
Virtualization overheadsVirtualization overheads
Virtualization overheads
 
Docker and friends at Linux Days 2014 in Prague
Docker and friends at Linux Days 2014 in PragueDocker and friends at Linux Days 2014 in Prague
Docker and friends at Linux Days 2014 in Prague
 
Linux numa evolution
Linux numa evolutionLinux numa evolution
Linux numa evolution
 
BKK16-104 sched-freq
BKK16-104 sched-freqBKK16-104 sched-freq
BKK16-104 sched-freq
 
Cgroup resource mgmt_v1
Cgroup resource mgmt_v1Cgroup resource mgmt_v1
Cgroup resource mgmt_v1
 
Gc and-pagescan-attacks-by-linux
Gc and-pagescan-attacks-by-linuxGc and-pagescan-attacks-by-linux
Gc and-pagescan-attacks-by-linux
 
Known basic of NFV Features
Known basic of NFV FeaturesKnown basic of NFV Features
Known basic of NFV Features
 
SFO15-TR9: PSCI, ACPI (and UEFI to boot)
SFO15-TR9: PSCI, ACPI (and UEFI to boot)SFO15-TR9: PSCI, ACPI (and UEFI to boot)
SFO15-TR9: PSCI, ACPI (and UEFI to boot)
 
P4, EPBF, and Linux TC Offload
P4, EPBF, and Linux TC OffloadP4, EPBF, and Linux TC Offload
P4, EPBF, and Linux TC Offload
 
Cpu scheduling(suresh)
Cpu scheduling(suresh)Cpu scheduling(suresh)
Cpu scheduling(suresh)
 
Cat @ scale
Cat @ scaleCat @ scale
Cat @ scale
 
VHPC'12: Pre-Copy and Post-Copy VM Live Migration for Memory Intensive Applic...
VHPC'12: Pre-Copy and Post-Copy VM Live Migration for Memory Intensive Applic...VHPC'12: Pre-Copy and Post-Copy VM Live Migration for Memory Intensive Applic...
VHPC'12: Pre-Copy and Post-Copy VM Live Migration for Memory Intensive Applic...
 
CPU Caches
CPU CachesCPU Caches
CPU Caches
 

Similar to Linux NUMA & Databases: Perils and Opportunities

Linux Memory Basics for SysAdmins - ChinaNetCloud Training
Linux Memory Basics for SysAdmins - ChinaNetCloud TrainingLinux Memory Basics for SysAdmins - ChinaNetCloud Training
Linux Memory Basics for SysAdmins - ChinaNetCloud Training
ChinaNetCloud
 
An Introduction to Apache Cassandra
An Introduction to Apache CassandraAn Introduction to Apache Cassandra
An Introduction to Apache Cassandra
Saeid Zebardast
 
SNIA SDC 2016 final
SNIA SDC 2016 finalSNIA SDC 2016 final
SNIA SDC 2016 final
Dan Lambright
 
Overview on NUMA
Overview on NUMAOverview on NUMA
Overview on NUMA
Abed Maatalla
 
Vanquishing Latency Outliers in the Lightbits LightOS Software Defined Storag...
Vanquishing Latency Outliers in the Lightbits LightOS Software Defined Storag...Vanquishing Latency Outliers in the Lightbits LightOS Software Defined Storag...
Vanquishing Latency Outliers in the Lightbits LightOS Software Defined Storag...
ScyllaDB
 
Multiprocessor_YChen.ppt
Multiprocessor_YChen.pptMultiprocessor_YChen.ppt
Multiprocessor_YChen.ppt
AberaZeleke1
 
Threads - Why Can't You Just Play Nicely With Your Memory_
Threads - Why Can't You Just Play Nicely With Your Memory_Threads - Why Can't You Just Play Nicely With Your Memory_
Threads - Why Can't You Just Play Nicely With Your Memory_Robert Burrell Donkin
 
A Journey into Hexagon: Dissecting Qualcomm Basebands
A Journey into Hexagon: Dissecting Qualcomm BasebandsA Journey into Hexagon: Dissecting Qualcomm Basebands
A Journey into Hexagon: Dissecting Qualcomm Basebands
Priyanka Aash
 
Caffe + H2O - By Cyprien noel
Caffe + H2O - By Cyprien noelCaffe + H2O - By Cyprien noel
Caffe + H2O - By Cyprien noel
Sri Ambati
 
Scalability broad strokes
Scalability   broad strokesScalability   broad strokes
Scalability broad strokes
Gagan Bajpai
 
Principles of High Load - Vilnius January 2015
Principles of High Load - Vilnius January 2015Principles of High Load - Vilnius January 2015
Principles of High Load - Vilnius January 2015
Peter Milne
 
network ram parallel computing
network ram parallel computingnetwork ram parallel computing
network ram parallel computing
Niranjana Ambadi
 
Distributed Databases - Concepts & Architectures
Distributed Databases - Concepts & ArchitecturesDistributed Databases - Concepts & Architectures
Distributed Databases - Concepts & Architectures
Daniel Marcous
 
Measuring a 25 and 40Gb/s Data Plane
Measuring a 25 and 40Gb/s Data PlaneMeasuring a 25 and 40Gb/s Data Plane
Measuring a 25 and 40Gb/s Data Plane
Open-NFP
 
Linux Stammtisch Munich: Ceph - Overview, Experiences and Outlook
Linux Stammtisch Munich: Ceph - Overview, Experiences and OutlookLinux Stammtisch Munich: Ceph - Overview, Experiences and Outlook
Linux Stammtisch Munich: Ceph - Overview, Experiences and Outlook
Danny Al-Gaaf
 
The Dark Side Of Go -- Go runtime related problems in TiDB in production
The Dark Side Of Go -- Go runtime related problems in TiDB  in productionThe Dark Side Of Go -- Go runtime related problems in TiDB  in production
The Dark Side Of Go -- Go runtime related problems in TiDB in production
PingCAP
 
Unit IV Memory.pptx
Unit IV  Memory.pptxUnit IV  Memory.pptx
Unit IV Memory.pptx
madhukarvnimbalkar
 
3 computer memory
3   computer memory3   computer memory
3 computer memory
arslanzafar13162
 

Similar to Linux NUMA & Databases: Perils and Opportunities (20)

Linux Memory Basics for SysAdmins - ChinaNetCloud Training
Linux Memory Basics for SysAdmins - ChinaNetCloud TrainingLinux Memory Basics for SysAdmins - ChinaNetCloud Training
Linux Memory Basics for SysAdmins - ChinaNetCloud Training
 
An Introduction to Apache Cassandra
An Introduction to Apache CassandraAn Introduction to Apache Cassandra
An Introduction to Apache Cassandra
 
SNIA SDC 2016 final
SNIA SDC 2016 finalSNIA SDC 2016 final
SNIA SDC 2016 final
 
Overview on NUMA
Overview on NUMAOverview on NUMA
Overview on NUMA
 
Vanquishing Latency Outliers in the Lightbits LightOS Software Defined Storag...
Vanquishing Latency Outliers in the Lightbits LightOS Software Defined Storag...Vanquishing Latency Outliers in the Lightbits LightOS Software Defined Storag...
Vanquishing Latency Outliers in the Lightbits LightOS Software Defined Storag...
 
Multiprocessor_YChen.ppt
Multiprocessor_YChen.pptMultiprocessor_YChen.ppt
Multiprocessor_YChen.ppt
 
Threads - Why Can't You Just Play Nicely With Your Memory_
Threads - Why Can't You Just Play Nicely With Your Memory_Threads - Why Can't You Just Play Nicely With Your Memory_
Threads - Why Can't You Just Play Nicely With Your Memory_
 
A Journey into Hexagon: Dissecting Qualcomm Basebands
A Journey into Hexagon: Dissecting Qualcomm BasebandsA Journey into Hexagon: Dissecting Qualcomm Basebands
A Journey into Hexagon: Dissecting Qualcomm Basebands
 
Caffe + H2O - By Cyprien noel
Caffe + H2O - By Cyprien noelCaffe + H2O - By Cyprien noel
Caffe + H2O - By Cyprien noel
 
Scalability broad strokes
Scalability   broad strokesScalability   broad strokes
Scalability broad strokes
 
Principles of High Load - Vilnius January 2015
Principles of High Load - Vilnius January 2015Principles of High Load - Vilnius January 2015
Principles of High Load - Vilnius January 2015
 
network ram parallel computing
network ram parallel computingnetwork ram parallel computing
network ram parallel computing
 
Distributed Databases - Concepts & Architectures
Distributed Databases - Concepts & ArchitecturesDistributed Databases - Concepts & Architectures
Distributed Databases - Concepts & Architectures
 
Measuring a 25 and 40Gb/s Data Plane
Measuring a 25 and 40Gb/s Data PlaneMeasuring a 25 and 40Gb/s Data Plane
Measuring a 25 and 40Gb/s Data Plane
 
Linux Stammtisch Munich: Ceph - Overview, Experiences and Outlook
Linux Stammtisch Munich: Ceph - Overview, Experiences and OutlookLinux Stammtisch Munich: Ceph - Overview, Experiences and Outlook
Linux Stammtisch Munich: Ceph - Overview, Experiences and Outlook
 
The Dark Side Of Go -- Go runtime related problems in TiDB in production
The Dark Side Of Go -- Go runtime related problems in TiDB  in productionThe Dark Side Of Go -- Go runtime related problems in TiDB  in production
The Dark Side Of Go -- Go runtime related problems in TiDB in production
 
Computer memory
Computer memoryComputer memory
Computer memory
 
Unit IV Memory.pptx
Unit IV  Memory.pptxUnit IV  Memory.pptx
Unit IV Memory.pptx
 
3 computer memory
3   computer memory3   computer memory
3 computer memory
 
Multicore
MulticoreMulticore
Multicore
 

More from Raghavendra Prabhu

Orchestrating Cassandra with Kubernetes Operator and PaaSTA
Orchestrating Cassandra with Kubernetes Operator and PaaSTAOrchestrating Cassandra with Kubernetes Operator and PaaSTA
Orchestrating Cassandra with Kubernetes Operator and PaaSTA
Raghavendra Prabhu
 
Orchestrating Cassandra with Kubernetes
Orchestrating Cassandra with KubernetesOrchestrating Cassandra with Kubernetes
Orchestrating Cassandra with Kubernetes
Raghavendra Prabhu
 
Cassandra Operator with Yelp PaaSTA
Cassandra Operator with Yelp PaaSTACassandra Operator with Yelp PaaSTA
Cassandra Operator with Yelp PaaSTA
Raghavendra Prabhu
 
Safe and Fast Automation on AWS for Fun and Profit
Safe and Fast Automation on AWS for Fun and ProfitSafe and Fast Automation on AWS for Fun and Profit
Safe and Fast Automation on AWS for Fun and Profit
Raghavendra Prabhu
 
Orchestrating Cassandra with Kubernetes: Challenges and Opportunities
Orchestrating Cassandra with Kubernetes: Challenges and OpportunitiesOrchestrating Cassandra with Kubernetes: Challenges and Opportunities
Orchestrating Cassandra with Kubernetes: Challenges and Opportunities
Raghavendra Prabhu
 
Pass Elk: CAP Theorem since 90s and Beyond
Pass Elk: CAP Theorem since 90s and BeyondPass Elk: CAP Theorem since 90s and Beyond
Pass Elk: CAP Theorem since 90s and Beyond
Raghavendra Prabhu
 
Cassandra in Docker at Yelp: Opportunities and Challenges
Cassandra in Docker at Yelp: Opportunities and ChallengesCassandra in Docker at Yelp: Opportunities and Challenges
Cassandra in Docker at Yelp: Opportunities and Challenges
Raghavendra Prabhu
 
Taskerman: A Distributed Cluster Task Manager
Taskerman: A Distributed Cluster Task ManagerTaskerman: A Distributed Cluster Task Manager
Taskerman: A Distributed Cluster Task Manager
Raghavendra Prabhu
 
Taskerman - a distributed cluster task manager
Taskerman - a distributed cluster task managerTaskerman - a distributed cluster task manager
Taskerman - a distributed cluster task manager
Raghavendra Prabhu
 
Clusternaut: Orchestrating  Percona XtraDB Cluster with Kubernetes
Clusternaut:  Orchestrating  Percona XtraDB Cluster with KubernetesClusternaut:  Orchestrating  Percona XtraDB Cluster with Kubernetes
Clusternaut: Orchestrating  Percona XtraDB Cluster with Kubernetes
Raghavendra Prabhu
 
Clusternaut: Orchestrating Percona XtraDB Cluster with Kubernetes.
Clusternaut: Orchestrating Percona XtraDB Cluster with Kubernetes.Clusternaut: Orchestrating Percona XtraDB Cluster with Kubernetes.
Clusternaut: Orchestrating Percona XtraDB Cluster with Kubernetes.
Raghavendra Prabhu
 
Working from home - fun, facts and scares!
Working from home -  fun, facts and scares!Working from home -  fun, facts and scares!
Working from home - fun, facts and scares!
Raghavendra Prabhu
 
Securing databases with systemd for containers and services
Securing databases with systemd for containers and services Securing databases with systemd for containers and services
Securing databases with systemd for containers and services
Raghavendra Prabhu
 
Corpus collapsum: Partition tolerance testing of Galera with Docker and NetEm
Corpus collapsum: Partition tolerance testing of Galera with Docker and NetEm Corpus collapsum: Partition tolerance testing of Galera with Docker and NetEm
Corpus collapsum: Partition tolerance testing of Galera with Docker and NetEm
Raghavendra Prabhu
 
Dock'em: Distributed Systems Testing with NetEm and Docker
Dock'em: Distributed Systems Testing with NetEm and Docker Dock'em: Distributed Systems Testing with NetEm and Docker
Dock'em: Distributed Systems Testing with NetEm and Docker
Raghavendra Prabhu
 
Galera with Docker: How Synchronous Replication and Linux Containers mesh tog...
Galera with Docker: How Synchronous Replication and Linux Containers mesh tog...Galera with Docker: How Synchronous Replication and Linux Containers mesh tog...
Galera with Docker: How Synchronous Replication and Linux Containers mesh tog...
Raghavendra Prabhu
 
Jutsu or Dô: Open documentation: continuous process than a body
Jutsu or Dô: Open documentation: continuous process than a body Jutsu or Dô: Open documentation: continuous process than a body
Jutsu or Dô: Open documentation: continuous process than a body
Raghavendra Prabhu
 
Corpus collapsum: Partition tolerance of Galera in a noisy high load environment
Corpus collapsum: Partition tolerance of Galera in a noisy high load environmentCorpus collapsum: Partition tolerance of Galera in a noisy high load environment
Corpus collapsum: Partition tolerance of Galera in a noisy high load environment
Raghavendra Prabhu
 
Corpus collapsum: Partition tolerance of Galera put to test
Corpus collapsum: Partition tolerance of Galera put to testCorpus collapsum: Partition tolerance of Galera put to test
Corpus collapsum: Partition tolerance of Galera put to test
Raghavendra Prabhu
 
Acidic clusters - Review of contemporary ACID-compliant databases with synchr...
Acidic clusters - Review of contemporary ACID-compliant databases with synchr...Acidic clusters - Review of contemporary ACID-compliant databases with synchr...
Acidic clusters - Review of contemporary ACID-compliant databases with synchr...
Raghavendra Prabhu
 

More from Raghavendra Prabhu (20)

Orchestrating Cassandra with Kubernetes Operator and PaaSTA
Orchestrating Cassandra with Kubernetes Operator and PaaSTAOrchestrating Cassandra with Kubernetes Operator and PaaSTA
Orchestrating Cassandra with Kubernetes Operator and PaaSTA
 
Orchestrating Cassandra with Kubernetes
Orchestrating Cassandra with KubernetesOrchestrating Cassandra with Kubernetes
Orchestrating Cassandra with Kubernetes
 
Cassandra Operator with Yelp PaaSTA
Cassandra Operator with Yelp PaaSTACassandra Operator with Yelp PaaSTA
Cassandra Operator with Yelp PaaSTA
 
Safe and Fast Automation on AWS for Fun and Profit
Safe and Fast Automation on AWS for Fun and ProfitSafe and Fast Automation on AWS for Fun and Profit
Safe and Fast Automation on AWS for Fun and Profit
 
Orchestrating Cassandra with Kubernetes: Challenges and Opportunities
Orchestrating Cassandra with Kubernetes: Challenges and OpportunitiesOrchestrating Cassandra with Kubernetes: Challenges and Opportunities
Orchestrating Cassandra with Kubernetes: Challenges and Opportunities
 
Pass Elk: CAP Theorem since 90s and Beyond
Pass Elk: CAP Theorem since 90s and BeyondPass Elk: CAP Theorem since 90s and Beyond
Pass Elk: CAP Theorem since 90s and Beyond
 
Cassandra in Docker at Yelp: Opportunities and Challenges
Cassandra in Docker at Yelp: Opportunities and ChallengesCassandra in Docker at Yelp: Opportunities and Challenges
Cassandra in Docker at Yelp: Opportunities and Challenges
 
Taskerman: A Distributed Cluster Task Manager
Taskerman: A Distributed Cluster Task ManagerTaskerman: A Distributed Cluster Task Manager
Taskerman: A Distributed Cluster Task Manager
 
Taskerman - a distributed cluster task manager
Taskerman - a distributed cluster task managerTaskerman - a distributed cluster task manager
Taskerman - a distributed cluster task manager
 
Clusternaut: Orchestrating  Percona XtraDB Cluster with Kubernetes
Clusternaut:  Orchestrating  Percona XtraDB Cluster with KubernetesClusternaut:  Orchestrating  Percona XtraDB Cluster with Kubernetes
Clusternaut: Orchestrating  Percona XtraDB Cluster with Kubernetes
 
Clusternaut: Orchestrating Percona XtraDB Cluster with Kubernetes.
Clusternaut: Orchestrating Percona XtraDB Cluster with Kubernetes.Clusternaut: Orchestrating Percona XtraDB Cluster with Kubernetes.
Clusternaut: Orchestrating Percona XtraDB Cluster with Kubernetes.
 
Working from home - fun, facts and scares!
Working from home -  fun, facts and scares!Working from home -  fun, facts and scares!
Working from home - fun, facts and scares!
 
Securing databases with systemd for containers and services
Securing databases with systemd for containers and services Securing databases with systemd for containers and services
Securing databases with systemd for containers and services
 
Corpus collapsum: Partition tolerance testing of Galera with Docker and NetEm
Corpus collapsum: Partition tolerance testing of Galera with Docker and NetEm Corpus collapsum: Partition tolerance testing of Galera with Docker and NetEm
Corpus collapsum: Partition tolerance testing of Galera with Docker and NetEm
 
Dock'em: Distributed Systems Testing with NetEm and Docker
Dock'em: Distributed Systems Testing with NetEm and Docker Dock'em: Distributed Systems Testing with NetEm and Docker
Dock'em: Distributed Systems Testing with NetEm and Docker
 
Galera with Docker: How Synchronous Replication and Linux Containers mesh tog...
Galera with Docker: How Synchronous Replication and Linux Containers mesh tog...Galera with Docker: How Synchronous Replication and Linux Containers mesh tog...
Galera with Docker: How Synchronous Replication and Linux Containers mesh tog...
 
Jutsu or Dô: Open documentation: continuous process than a body
Jutsu or Dô: Open documentation: continuous process than a body Jutsu or Dô: Open documentation: continuous process than a body
Jutsu or Dô: Open documentation: continuous process than a body
 
Corpus collapsum: Partition tolerance of Galera in a noisy high load environment
Corpus collapsum: Partition tolerance of Galera in a noisy high load environmentCorpus collapsum: Partition tolerance of Galera in a noisy high load environment
Corpus collapsum: Partition tolerance of Galera in a noisy high load environment
 
Corpus collapsum: Partition tolerance of Galera put to test
Corpus collapsum: Partition tolerance of Galera put to testCorpus collapsum: Partition tolerance of Galera put to test
Corpus collapsum: Partition tolerance of Galera put to test
 
Acidic clusters - Review of contemporary ACID-compliant databases with synchr...
Acidic clusters - Review of contemporary ACID-compliant databases with synchr...Acidic clusters - Review of contemporary ACID-compliant databases with synchr...
Acidic clusters - Review of contemporary ACID-compliant databases with synchr...
 

Recently uploaded

AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdfAKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
SamSarthak3
 
Gen AI Study Jams _ For the GDSC Leads in India.pdf
Gen AI Study Jams _ For the GDSC Leads in India.pdfGen AI Study Jams _ For the GDSC Leads in India.pdf
Gen AI Study Jams _ For the GDSC Leads in India.pdf
gdsczhcet
 
MCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdfMCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdf
Osamah Alsalih
 
Immunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary AttacksImmunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary Attacks
gerogepatton
 
Railway Signalling Principles Edition 3.pdf
Railway Signalling Principles Edition 3.pdfRailway Signalling Principles Edition 3.pdf
Railway Signalling Principles Edition 3.pdf
TeeVichai
 
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
Amil Baba Dawood bangali
 
CME397 Surface Engineering- Professional Elective
CME397 Surface Engineering- Professional ElectiveCME397 Surface Engineering- Professional Elective
CME397 Surface Engineering- Professional Elective
karthi keyan
 
Standard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - NeometrixStandard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - Neometrix
Neometrix_Engineering_Pvt_Ltd
 
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
AJAYKUMARPUND1
 
Water Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation and Control Monthly - May 2024.pdfWater Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation & Control
 
Cosmetic shop management system project report.pdf
Cosmetic shop management system project report.pdfCosmetic shop management system project report.pdf
Cosmetic shop management system project report.pdf
Kamal Acharya
 
ethical hacking-mobile hacking methods.ppt
ethical hacking-mobile hacking methods.pptethical hacking-mobile hacking methods.ppt
ethical hacking-mobile hacking methods.ppt
Jayaprasanna4
 
WATER CRISIS and its solutions-pptx 1234
WATER CRISIS and its solutions-pptx 1234WATER CRISIS and its solutions-pptx 1234
WATER CRISIS and its solutions-pptx 1234
AafreenAbuthahir2
 
HYDROPOWER - Hydroelectric power generation
HYDROPOWER - Hydroelectric power generationHYDROPOWER - Hydroelectric power generation
HYDROPOWER - Hydroelectric power generation
Robbie Edward Sayers
 
ethical hacking in wireless-hacking1.ppt
ethical hacking in wireless-hacking1.pptethical hacking in wireless-hacking1.ppt
ethical hacking in wireless-hacking1.ppt
Jayaprasanna4
 
addressing modes in computer architecture
addressing modes  in computer architectureaddressing modes  in computer architecture
addressing modes in computer architecture
ShahidSultan24
 
Final project report on grocery store management system..pdf
Final project report on grocery store management system..pdfFinal project report on grocery store management system..pdf
Final project report on grocery store management system..pdf
Kamal Acharya
 
Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024
Massimo Talia
 
COLLEGE BUS MANAGEMENT SYSTEM PROJECT REPORT.pdf
COLLEGE BUS MANAGEMENT SYSTEM PROJECT REPORT.pdfCOLLEGE BUS MANAGEMENT SYSTEM PROJECT REPORT.pdf
COLLEGE BUS MANAGEMENT SYSTEM PROJECT REPORT.pdf
Kamal Acharya
 
Vaccine management system project report documentation..pdf
Vaccine management system project report documentation..pdfVaccine management system project report documentation..pdf
Vaccine management system project report documentation..pdf
Kamal Acharya
 

Recently uploaded (20)

AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdfAKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
 
Gen AI Study Jams _ For the GDSC Leads in India.pdf
Gen AI Study Jams _ For the GDSC Leads in India.pdfGen AI Study Jams _ For the GDSC Leads in India.pdf
Gen AI Study Jams _ For the GDSC Leads in India.pdf
 
MCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdfMCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdf
 
Immunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary AttacksImmunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary Attacks
 
Railway Signalling Principles Edition 3.pdf
Railway Signalling Principles Edition 3.pdfRailway Signalling Principles Edition 3.pdf
Railway Signalling Principles Edition 3.pdf
 
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
 
CME397 Surface Engineering- Professional Elective
CME397 Surface Engineering- Professional ElectiveCME397 Surface Engineering- Professional Elective
CME397 Surface Engineering- Professional Elective
 
Standard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - NeometrixStandard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - Neometrix
 
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
 
Water Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation and Control Monthly - May 2024.pdfWater Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation and Control Monthly - May 2024.pdf
 
Cosmetic shop management system project report.pdf
Cosmetic shop management system project report.pdfCosmetic shop management system project report.pdf
Cosmetic shop management system project report.pdf
 
ethical hacking-mobile hacking methods.ppt
ethical hacking-mobile hacking methods.pptethical hacking-mobile hacking methods.ppt
ethical hacking-mobile hacking methods.ppt
 
WATER CRISIS and its solutions-pptx 1234
WATER CRISIS and its solutions-pptx 1234WATER CRISIS and its solutions-pptx 1234
WATER CRISIS and its solutions-pptx 1234
 
HYDROPOWER - Hydroelectric power generation
HYDROPOWER - Hydroelectric power generationHYDROPOWER - Hydroelectric power generation
HYDROPOWER - Hydroelectric power generation
 
ethical hacking in wireless-hacking1.ppt
ethical hacking in wireless-hacking1.pptethical hacking in wireless-hacking1.ppt
ethical hacking in wireless-hacking1.ppt
 
addressing modes in computer architecture
addressing modes  in computer architectureaddressing modes  in computer architecture
addressing modes in computer architecture
 
Final project report on grocery store management system..pdf
Final project report on grocery store management system..pdfFinal project report on grocery store management system..pdf
Final project report on grocery store management system..pdf
 
Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024
 
COLLEGE BUS MANAGEMENT SYSTEM PROJECT REPORT.pdf
COLLEGE BUS MANAGEMENT SYSTEM PROJECT REPORT.pdfCOLLEGE BUS MANAGEMENT SYSTEM PROJECT REPORT.pdf
COLLEGE BUS MANAGEMENT SYSTEM PROJECT REPORT.pdf
 
Vaccine management system project report documentation..pdf
Vaccine management system project report documentation..pdfVaccine management system project report documentation..pdf
Vaccine management system project report documentation..pdf
 

Linux NUMA & Databases: Perils and Opportunities

  • 3.
  • 4. What is NUMA ● Stands for Non Uniform Memory Access ○ Non Uniform to whom. ○ Von Neumann bottleneck. ○ Cache coherent NUMA ● How does it work ○ Memory is placed local to the processes. ○ Balancing access to data over the available processors on multiple nodes. ● Large memory installations are becoming the norm ○ The i2 series on AWS. ○ Databases are the main consumers. ● Constraints ○ Speed of light ○ Interconnect saturation
  • 5. What is NUMA ● Constraints ○ Speed of light ■ Higher latency of accessing remote memory. ○ Interconnect saturation ■ Performance counters. ● Slow abundant memory ○ Fast limited memory ● Cache coherence ○ Processor threads and cores share resources ■ Execution units (between HT threads) ■ Cache (between threads and cores)
  • 6. Exotic cases ● Network cards ● PCIe storage ● NVRAM ● Nodes without memory ● Nodes without processors ● Unbalanced ● Central/Large memory ● Big Little architecture ● GPU
  • 7. NUMA complications ● Unmovable memory ● KSM ● THP ● Interrupt balancing and locality
  • 8. Tools/libraries for NUMA ● Supported by Linux since 2.5 ○ Symmetric and CPU/Memory ● Numactl ● Hwloc / lstopo ● Numad ● Numatop ● Libnuma ● Numastat ● Taskset ● KVM for simulation and testing ● Perf
  • 9. Tools/libraries for NUMA ● KVM for simulation and testing ● Useful for testing databases. qemu-system-x86_64 -enable-kvm -drive file=./debian-8.1-lxc-puppet.qcow2 -net nic, macaddr=52:54:00:00:EE:03 -net vde -smp sockets=2,cores=2,threads=2,maxcpus=16 - numa node,nodeid=0,cpus=0-3 -numa node,nodeid=1,cpus=4-7 -numa node,nodeid=2, cpus=8-15 -m 2G
  • 10. Tunings and observables ● /proc/zoneinfo ○ Sysctl vm.zone_reclaim_mode OR /proc/sys/vm/zone_reclaim ○ /proc/sys/vm/min_unmapped_ratio ● /proc/meminfo ● /proc/vmstat ● Ftrace ● Cgroup hierarchy ○ memory
  • 11. Tunings and observables ● ACPI ○ SLIT and SRAT ● Per process: ○ /proc/<pid>/numa_maps ○ /proc/<pid>/sched ● Auto NUMA balancing ○ CONFIG_NUMA_BALANCING in /proc/config.gz ● get_mempolicy(2), mbind(2), migrate_pages(2), move_pages(2), set_mempolicy(2), sched_getaffinity(2) ● Libnuma (3) ○ Higher abstraction - numa_set_localalloc
  • 14. AutoNUMA ● CPU follows memory ○ Reschedule tasks on same nodes as memory ● Memory follows CPU ○ Copy memory pages to same nodes as tasks/threads ● Heuristics ○ Fault statistics ○ Task grouping ○ Multi-resource optimization - cache, cpu, memory, starvation ■ Avoid thrashing ● Only CPU and memory? ○ For others, use manual pinning!
  • 15. NUMA Policies ● MPOL_DEFAULT ● MPOL_BIND ● MPOL_INTERLEAVE ○ Memory striping in hardware ● MPOL_PREFERRED ● MPOL_MF_MOVE | MPOL_MF_MOVE_ALL
  • 16. Databases ● Most databases support multiple cores and NUMA. ○ MAP_ANONYMOUS and O_DIRECT are common ● Most default to interleaving to avoid zone imbalance issues ○ Effects ■ Swapping due to Reclaim ■ OOM ○ Downsides to interleaving ○ MySQL, Cassandra et.al. ● Pattern of accesses ○ Cause of imbalance ● Duality of Applications v/s OS
  • 17. Reclaim ● Swappiness ○ Anon v/s File-backed ● Zone reclaim ○ Single process can span multiple zones ○ Imbalance without any strategies ○ Watermarks ○ Databases suffer the most ■ They carry a lot of state! ○ Types of reclaim ● Imbalance ○ Why does this happen
  • 18. Access Pattern Optimizations ● Thread pool ○ Reuse of threads with longer lifetime ○ Explicit or implicit bind ■ Numa_set_localalloc / numa_set_preferred ■ Sched_setaffinity ■ CONFIG_NO_HZ and latency ● Global heaps - buffer pool, JVM ○ Allocation by proxy ○ Mbind and MPOL_BIND ○ MAP_POPULATE (why? - First touch policy) ○ Node_set_preferred
  • 19. Access Patterns (contd) ● Split Pools ○ Independent pools of memory in a database Ex: Multiple buffer pool instances ● Multiple instances ○ Mostly for simple databases. ■ Redis ○ Containers ● Hybird ○ Linux kernel - boot and init ○ MySQL / InnoDB ■ MPOL_LOCAL for threads ■ MPOL_INTERLEAVE for global heaps ● Task Grouping
  • 20. Credits! ● http://queue.acm.org/detail.cfm?id=2513149 ● www.linux-kvm.org/images/7/75/01x07b-NumaAutobalancing.pdf ● http://events.linuxfoundation.org/sites/events/files/slides/Normal% 20and%20Exotic%20use%20cases%20for%20NUMA%20features.pdf ● https://en.wikipedia.org/wiki/Non-uniform_memory_access ● https://lihz1990.gitbooks.io/transoflptg/content/02.%E7%9B%91%E6% 8E%A7%E5%92%8C%E5%8E%8B%E6%B5%8B%E5%B7%A5%E5%85% B7/sample-output-of-the-numastat-command.png