SlideShare a Scribd company logo

Kernel Recipes 2017 - 20 years of Linux Virtual Memory - Andrea Arcangeli

Andrea will provide a short high level view of the most notable milestones in the evolution of the Linux Virtual Memory over the years. He will then focus on the various Memory Management features such as Transparent Huge Pages(THP), automatic NUMA balancing and userfaultd/postcopy live migration of Kernel Virtual Machines (KVM). Andrea will cover best practices, providing the audience with an understanding of when and how to leverage these features in their environments. Andrea Arcangeli, Red Hat

1 of 50
Download to read offline
Copyright © 2017 Red Hat Inc.
1
20 years of Linux Virtual Memory20 years of Linux Virtual Memory
Red Hat, Inc.
Andrea Arcangeli <aarcange at redhat.com>
Kernel Recipes, Paris
29 Sep 2017
Copyright © 2017 Red Hat Inc.
2
Topics
●
Milestones in the evolution of the Virtual
Memory subsystem
●
Virtual Memory latest innovations
– Automatic NUMA balancing
– THP developments
– KSMscale
– userfaultfd
●
Postcopy live Migration, etc..
Copyright © 2017 Red Hat Inc.
3
Virtual Memory (simplified)
Virtual pages
They cost “nothing”
Practically unlimited
on 64bit archs
Physical pages
They cost money!
This is the RAM
arrows = pagetables
virtual to physical mapping
Copyright © 2017 Red Hat Inc.
4
PageTablesPageTables
pgd
pmd
pte
pud
pte
pmd
pte pte
pmd
pte
pud
pte
pmd
pte pte
2^9 = 512 arrows, not just 2
●
Common code and x86 pagetable format is a tree
●
All pagetables are 4KB in size
●
Total: grep PageTables /proc/meminfo
●
(((2**9)**4)*4096)>>48 = 1 → 48bits → 256 TiB
●
5 levels in v4.13 → (48+9) bits → 57bits → 128 PiB
– Build time to avoid slowdown
Copyright © 2017 Red Hat Inc.
5
The Fabric of the Virtual Memory
●
The fabric are all those data structures that connects to the hardware
constrained structures like pagetables and that collectively create all the
software abstractions we're accustomed to
– tasks, processes, virtual memory areas, mmap (glibc malloc) ...
●
The fabric is the most black and white part of the Virtual Memory
●
The algorithms doing the computations on those data structures are the
Virtual Memory heuristics
– They need to solve hard problems with no guaranteed perfect solution
– i.e. when it's the right time to start to unmap pages (swappiness)
●
Some of the design didn't change: we still measure how hard it is to
free memory while we're trying to free it
●
All free memory is used as cache and we overcommit by default (not
excessively by default)
– Android uses: echo 1 >/proc/sys/vm/overcommit_memory
Copyright © 2017 Red Hat Inc.
6
Physical page and struct page
PAGE_SIZE (4KiB) large physical pagestruct page
64 bytes
PAGE
SIZE
4KiB
...Physical RAMmem_map
64/4096
1.56%
of the RAM
Ad

Recommended

Kernel Recipes 2019 - No NMI? No Problem! – Implementing Arm64 Pseudo-NMI
Kernel Recipes 2019 - No NMI? No Problem! – Implementing Arm64 Pseudo-NMIKernel Recipes 2019 - No NMI? No Problem! – Implementing Arm64 Pseudo-NMI
Kernel Recipes 2019 - No NMI? No Problem! – Implementing Arm64 Pseudo-NMIAnne Nicolas
 
HES2011 - Tarjei Mandt – Kernel Pool Exploitation on Windows 7
HES2011 - Tarjei Mandt – Kernel Pool Exploitation on Windows 7HES2011 - Tarjei Mandt – Kernel Pool Exploitation on Windows 7
HES2011 - Tarjei Mandt – Kernel Pool Exploitation on Windows 7Hackito Ergo Sum
 
Glibc内存管理ptmalloc源代码分析4
Glibc内存管理ptmalloc源代码分析4Glibc内存管理ptmalloc源代码分析4
Glibc内存管理ptmalloc源代码分析4hans511002
 
Troubleshooting Apache Cloudstack
Troubleshooting Apache CloudstackTroubleshooting Apache Cloudstack
Troubleshooting Apache CloudstackRadhika Puthiyetath
 
Lxc で始めるケチケチ仮想化生活?!
Lxc で始めるケチケチ仮想化生活?!Lxc で始めるケチケチ仮想化生活?!
Lxc で始めるケチケチ仮想化生活?!Etsuji Nakai
 
Part 02 Linux Kernel Module Programming
Part 02 Linux Kernel Module ProgrammingPart 02 Linux Kernel Module Programming
Part 02 Linux Kernel Module ProgrammingTushar B Kute
 

More Related Content

What's hot

How Linux Processes Your Network Packet - Elazar Leibovich
How Linux Processes Your Network Packet - Elazar LeibovichHow Linux Processes Your Network Packet - Elazar Leibovich
How Linux Processes Your Network Packet - Elazar LeibovichDevOpsDays Tel Aviv
 
The Microkernel Mach Under NeXTSTEP
The Microkernel Mach Under NeXTSTEPThe Microkernel Mach Under NeXTSTEP
The Microkernel Mach Under NeXTSTEPGregor Schmidt
 
Android internals 07 - Android graphics (rev_1.1)
Android internals 07 - Android graphics (rev_1.1)Android internals 07 - Android graphics (rev_1.1)
Android internals 07 - Android graphics (rev_1.1)Egor Elizarov
 
The Linux Kernel Implementation of Pipes and FIFOs
The Linux Kernel Implementation of Pipes and FIFOsThe Linux Kernel Implementation of Pipes and FIFOs
The Linux Kernel Implementation of Pipes and FIFOsDivye Kapoor
 
Course 102: Lecture 9: Input Output Internals
Course 102: Lecture 9: Input Output Internals Course 102: Lecture 9: Input Output Internals
Course 102: Lecture 9: Input Output Internals Ahmed El-Arabawy
 
Linux MMAP & Ioremap introduction
Linux MMAP & Ioremap introductionLinux MMAP & Ioremap introduction
Linux MMAP & Ioremap introductionGene Chang
 
仮想化環境におけるパケットフォワーディング
仮想化環境におけるパケットフォワーディング仮想化環境におけるパケットフォワーディング
仮想化環境におけるパケットフォワーディングTakuya ASADA
 
Let's trace Linux Lernel with KGDB @ COSCUP 2021
Let's trace Linux Lernel with KGDB @ COSCUP 2021Let's trace Linux Lernel with KGDB @ COSCUP 2021
Let's trace Linux Lernel with KGDB @ COSCUP 2021Jian-Hong Pan
 
Linux Memory Management with CMA (Contiguous Memory Allocator)
Linux Memory Management with CMA (Contiguous Memory Allocator)Linux Memory Management with CMA (Contiguous Memory Allocator)
Linux Memory Management with CMA (Contiguous Memory Allocator)Pankaj Suryawanshi
 
from Binary to Binary: How Qemu Works
from Binary to Binary: How Qemu Worksfrom Binary to Binary: How Qemu Works
from Binary to Binary: How Qemu WorksZhen Wei
 
Static partitioning virtualization on RISC-V
Static partitioning virtualization on RISC-VStatic partitioning virtualization on RISC-V
Static partitioning virtualization on RISC-VRISC-V International
 
Linux Kernel Booting Process (2) - For NLKB
Linux Kernel Booting Process (2) - For NLKBLinux Kernel Booting Process (2) - For NLKB
Linux Kernel Booting Process (2) - For NLKBshimosawa
 
ebpf and IO Visor: The What, how, and what next!
ebpf and IO Visor: The What, how, and what next!ebpf and IO Visor: The What, how, and what next!
ebpf and IO Visor: The What, how, and what next!Affan Syed
 
XPDDS17: Reworking the ARM GIC Emulation & Xen Challenges in the ARM ITS Emu...
XPDDS17:  Reworking the ARM GIC Emulation & Xen Challenges in the ARM ITS Emu...XPDDS17:  Reworking the ARM GIC Emulation & Xen Challenges in the ARM ITS Emu...
XPDDS17: Reworking the ARM GIC Emulation & Xen Challenges in the ARM ITS Emu...The Linux Foundation
 
インテルMEの秘密 - チップセットに隠されたコードと、それが一体何をするかを見出す方法 - by イゴール・スコチンスキー - Igor Skochinsky
インテルMEの秘密 - チップセットに隠されたコードと、それが一体何をするかを見出す方法 - by イゴール・スコチンスキー - Igor SkochinskyインテルMEの秘密 - チップセットに隠されたコードと、それが一体何をするかを見出す方法 - by イゴール・スコチンスキー - Igor Skochinsky
インテルMEの秘密 - チップセットに隠されたコードと、それが一体何をするかを見出す方法 - by イゴール・スコチンスキー - Igor SkochinskyCODE BLUE
 

What's hot (20)

淺談探索 Linux 系統設計之道
淺談探索 Linux 系統設計之道 淺談探索 Linux 系統設計之道
淺談探索 Linux 系統設計之道
 
How Linux Processes Your Network Packet - Elazar Leibovich
How Linux Processes Your Network Packet - Elazar LeibovichHow Linux Processes Your Network Packet - Elazar Leibovich
How Linux Processes Your Network Packet - Elazar Leibovich
 
The Microkernel Mach Under NeXTSTEP
The Microkernel Mach Under NeXTSTEPThe Microkernel Mach Under NeXTSTEP
The Microkernel Mach Under NeXTSTEP
 
Android internals 07 - Android graphics (rev_1.1)
Android internals 07 - Android graphics (rev_1.1)Android internals 07 - Android graphics (rev_1.1)
Android internals 07 - Android graphics (rev_1.1)
 
The Linux Kernel Implementation of Pipes and FIFOs
The Linux Kernel Implementation of Pipes and FIFOsThe Linux Kernel Implementation of Pipes and FIFOs
The Linux Kernel Implementation of Pipes and FIFOs
 
Course 102: Lecture 9: Input Output Internals
Course 102: Lecture 9: Input Output Internals Course 102: Lecture 9: Input Output Internals
Course 102: Lecture 9: Input Output Internals
 
Linux MMAP & Ioremap introduction
Linux MMAP & Ioremap introductionLinux MMAP & Ioremap introduction
Linux MMAP & Ioremap introduction
 
from Source to Binary: How GNU Toolchain Works
from Source to Binary: How GNU Toolchain Worksfrom Source to Binary: How GNU Toolchain Works
from Source to Binary: How GNU Toolchain Works
 
The Internals of "Hello World" Program
The Internals of "Hello World" ProgramThe Internals of "Hello World" Program
The Internals of "Hello World" Program
 
仮想化環境におけるパケットフォワーディング
仮想化環境におけるパケットフォワーディング仮想化環境におけるパケットフォワーディング
仮想化環境におけるパケットフォワーディング
 
Linux Internals - Interview essentials - 1.0
Linux Internals - Interview essentials - 1.0Linux Internals - Interview essentials - 1.0
Linux Internals - Interview essentials - 1.0
 
Let's trace Linux Lernel with KGDB @ COSCUP 2021
Let's trace Linux Lernel with KGDB @ COSCUP 2021Let's trace Linux Lernel with KGDB @ COSCUP 2021
Let's trace Linux Lernel with KGDB @ COSCUP 2021
 
Linux Memory Management with CMA (Contiguous Memory Allocator)
Linux Memory Management with CMA (Contiguous Memory Allocator)Linux Memory Management with CMA (Contiguous Memory Allocator)
Linux Memory Management with CMA (Contiguous Memory Allocator)
 
from Binary to Binary: How Qemu Works
from Binary to Binary: How Qemu Worksfrom Binary to Binary: How Qemu Works
from Binary to Binary: How Qemu Works
 
Static partitioning virtualization on RISC-V
Static partitioning virtualization on RISC-VStatic partitioning virtualization on RISC-V
Static partitioning virtualization on RISC-V
 
Linux Kernel Booting Process (2) - For NLKB
Linux Kernel Booting Process (2) - For NLKBLinux Kernel Booting Process (2) - For NLKB
Linux Kernel Booting Process (2) - For NLKB
 
ebpf and IO Visor: The What, how, and what next!
ebpf and IO Visor: The What, how, and what next!ebpf and IO Visor: The What, how, and what next!
ebpf and IO Visor: The What, how, and what next!
 
XPDDS17: Reworking the ARM GIC Emulation & Xen Challenges in the ARM ITS Emu...
XPDDS17:  Reworking the ARM GIC Emulation & Xen Challenges in the ARM ITS Emu...XPDDS17:  Reworking the ARM GIC Emulation & Xen Challenges in the ARM ITS Emu...
XPDDS17: Reworking the ARM GIC Emulation & Xen Challenges in the ARM ITS Emu...
 
Construct an Efficient and Secure Microkernel for IoT
Construct an Efficient and Secure Microkernel for IoTConstruct an Efficient and Secure Microkernel for IoT
Construct an Efficient and Secure Microkernel for IoT
 
インテルMEの秘密 - チップセットに隠されたコードと、それが一体何をするかを見出す方法 - by イゴール・スコチンスキー - Igor Skochinsky
インテルMEの秘密 - チップセットに隠されたコードと、それが一体何をするかを見出す方法 - by イゴール・スコチンスキー - Igor SkochinskyインテルMEの秘密 - チップセットに隠されたコードと、それが一体何をするかを見出す方法 - by イゴール・スコチンスキー - Igor Skochinsky
インテルMEの秘密 - チップセットに隠されたコードと、それが一体何をするかを見出す方法 - by イゴール・スコチンスキー - Igor Skochinsky
 

Viewers also liked

Kernel Recipes 2017 - Linux Kernel Release Model - Greg KH
Kernel Recipes 2017 - Linux Kernel Release Model - Greg KHKernel Recipes 2017 - Linux Kernel Release Model - Greg KH
Kernel Recipes 2017 - Linux Kernel Release Model - Greg KHAnne Nicolas
 
Kernel Recipes 2017 - Build farm again - Willy Tarreau
Kernel Recipes 2017 - Build farm again - Willy TarreauKernel Recipes 2017 - Build farm again - Willy Tarreau
Kernel Recipes 2017 - Build farm again - Willy TarreauAnne Nicolas
 
Kernel Recipes 2017 - Understanding the Linux kernel via ftrace - Steven Rostedt
Kernel Recipes 2017 - Understanding the Linux kernel via ftrace - Steven RostedtKernel Recipes 2017 - Understanding the Linux kernel via ftrace - Steven Rostedt
Kernel Recipes 2017 - Understanding the Linux kernel via ftrace - Steven RostedtAnne Nicolas
 
Inside Google's Numbers in 2017
Inside Google's Numbers in 2017Inside Google's Numbers in 2017
Inside Google's Numbers in 2017Rand Fishkin
 
AI and Machine Learning Demystified by Carol Smith at Midwest UX 2017
AI and Machine Learning Demystified by Carol Smith at Midwest UX 2017AI and Machine Learning Demystified by Carol Smith at Midwest UX 2017
AI and Machine Learning Demystified by Carol Smith at Midwest UX 2017Carol Smith
 

Viewers also liked (7)

Kernel Recipes 2017 - Linux Kernel Release Model - Greg KH
Kernel Recipes 2017 - Linux Kernel Release Model - Greg KHKernel Recipes 2017 - Linux Kernel Release Model - Greg KH
Kernel Recipes 2017 - Linux Kernel Release Model - Greg KH
 
Kernel Recipes 2017 - Build farm again - Willy Tarreau
Kernel Recipes 2017 - Build farm again - Willy TarreauKernel Recipes 2017 - Build farm again - Willy Tarreau
Kernel Recipes 2017 - Build farm again - Willy Tarreau
 
Kernel Recipes 2017 - Understanding the Linux kernel via ftrace - Steven Rostedt
Kernel Recipes 2017 - Understanding the Linux kernel via ftrace - Steven RostedtKernel Recipes 2017 - Understanding the Linux kernel via ftrace - Steven Rostedt
Kernel Recipes 2017 - Understanding the Linux kernel via ftrace - Steven Rostedt
 
Inside Google's Numbers in 2017
Inside Google's Numbers in 2017Inside Google's Numbers in 2017
Inside Google's Numbers in 2017
 
10 facts about jobs in the future
10 facts about jobs in the future10 facts about jobs in the future
10 facts about jobs in the future
 
The AI Rush
The AI RushThe AI Rush
The AI Rush
 
AI and Machine Learning Demystified by Carol Smith at Midwest UX 2017
AI and Machine Learning Demystified by Carol Smith at Midwest UX 2017AI and Machine Learning Demystified by Carol Smith at Midwest UX 2017
AI and Machine Learning Demystified by Carol Smith at Midwest UX 2017
 

Similar to Kernel Recipes 2017 - 20 years of Linux Virtual Memory - Andrea Arcangeli

Continguous Memory Allocator in the Linux Kernel
Continguous Memory Allocator in the Linux KernelContinguous Memory Allocator in the Linux Kernel
Continguous Memory Allocator in the Linux KernelKernel TLV
 
DRBD + OpenStack (Openstack Live Prague 2016)
DRBD + OpenStack (Openstack Live Prague 2016)DRBD + OpenStack (Openstack Live Prague 2016)
DRBD + OpenStack (Openstack Live Prague 2016)Jaroslav Jacjuk
 
The Forefront of the Development for NVDIMM on Linux Kernel
The Forefront of the Development for NVDIMM on Linux KernelThe Forefront of the Development for NVDIMM on Linux Kernel
The Forefront of the Development for NVDIMM on Linux KernelYasunori Goto
 
Virtualization for Emerging Memory Devices
Virtualization for Emerging Memory DevicesVirtualization for Emerging Memory Devices
Virtualization for Emerging Memory DevicesTakahiro Hirofuchi
 
We4IT lcty 2013 - infra-man - domino run faster
We4IT lcty 2013 - infra-man - domino run faster We4IT lcty 2013 - infra-man - domino run faster
We4IT lcty 2013 - infra-man - domino run faster We4IT Group
 
Automation of Hadoop cluster operations in Arm Treasure Data
Automation of Hadoop cluster operations in Arm Treasure DataAutomation of Hadoop cluster operations in Arm Treasure Data
Automation of Hadoop cluster operations in Arm Treasure DataYan Wang
 
Software Design for Persistent Memory Systems
Software Design for Persistent Memory SystemsSoftware Design for Persistent Memory Systems
Software Design for Persistent Memory SystemsC4Media
 
Data Policies for the Kafka-API with WebAssembly | Alexander Gallego, Vectorized
Data Policies for the Kafka-API with WebAssembly | Alexander Gallego, VectorizedData Policies for the Kafka-API with WebAssembly | Alexander Gallego, Vectorized
Data Policies for the Kafka-API with WebAssembly | Alexander Gallego, VectorizedHostedbyConfluent
 
JerryScript: An ultra-lighteweight JavaScript Engine for the Internet of Things
JerryScript: An ultra-lighteweight JavaScript Engine for the Internet of ThingsJerryScript: An ultra-lighteweight JavaScript Engine for the Internet of Things
JerryScript: An ultra-lighteweight JavaScript Engine for the Internet of ThingsSamsung Open Source Group
 
DB2 for z/OS - Starter's guide to memory monitoring and control
DB2 for z/OS - Starter's guide to memory monitoring and controlDB2 for z/OS - Starter's guide to memory monitoring and control
DB2 for z/OS - Starter's guide to memory monitoring and controlFlorence Dubois
 
Rendering Battlefield 4 with Mantle by Johan Andersson - AMD at GDC14
Rendering Battlefield 4 with Mantle by Johan Andersson - AMD at GDC14Rendering Battlefield 4 with Mantle by Johan Andersson - AMD at GDC14
Rendering Battlefield 4 with Mantle by Johan Andersson - AMD at GDC14AMD Developer Central
 
Comparative Performance Analysis of AWS EC2 Instance Types Commonly Used for ...
Comparative Performance Analysis of AWS EC2 Instance Types Commonly Used for ...Comparative Performance Analysis of AWS EC2 Instance Types Commonly Used for ...
Comparative Performance Analysis of AWS EC2 Instance Types Commonly Used for ...DataWorks Summit
 
Scaling Cassandra for Big Data
Scaling Cassandra for Big DataScaling Cassandra for Big Data
Scaling Cassandra for Big DataDataStax Academy
 
lecture11_GPUArchCUDA01.pptx
lecture11_GPUArchCUDA01.pptxlecture11_GPUArchCUDA01.pptx
lecture11_GPUArchCUDA01.pptxssuser413a98
 

Similar to Kernel Recipes 2017 - 20 years of Linux Virtual Memory - Andrea Arcangeli (20)

Linux Huge Pages
Linux Huge PagesLinux Huge Pages
Linux Huge Pages
 
Continguous Memory Allocator in the Linux Kernel
Continguous Memory Allocator in the Linux KernelContinguous Memory Allocator in the Linux Kernel
Continguous Memory Allocator in the Linux Kernel
 
DRBD + OpenStack (Openstack Live Prague 2016)
DRBD + OpenStack (Openstack Live Prague 2016)DRBD + OpenStack (Openstack Live Prague 2016)
DRBD + OpenStack (Openstack Live Prague 2016)
 
The Forefront of the Development for NVDIMM on Linux Kernel
The Forefront of the Development for NVDIMM on Linux KernelThe Forefront of the Development for NVDIMM on Linux Kernel
The Forefront of the Development for NVDIMM on Linux Kernel
 
Time For D.I.M.E?
Time For D.I.M.E?Time For D.I.M.E?
Time For D.I.M.E?
 
Virtualization for Emerging Memory Devices
Virtualization for Emerging Memory DevicesVirtualization for Emerging Memory Devices
Virtualization for Emerging Memory Devices
 
Time For DIME
Time For DIMETime For DIME
Time For DIME
 
We4IT lcty 2013 - infra-man - domino run faster
We4IT lcty 2013 - infra-man - domino run faster We4IT lcty 2013 - infra-man - domino run faster
We4IT lcty 2013 - infra-man - domino run faster
 
Automation of Hadoop cluster operations in Arm Treasure Data
Automation of Hadoop cluster operations in Arm Treasure DataAutomation of Hadoop cluster operations in Arm Treasure Data
Automation of Hadoop cluster operations in Arm Treasure Data
 
Software Design for Persistent Memory Systems
Software Design for Persistent Memory SystemsSoftware Design for Persistent Memory Systems
Software Design for Persistent Memory Systems
 
Data Policies for the Kafka-API with WebAssembly | Alexander Gallego, Vectorized
Data Policies for the Kafka-API with WebAssembly | Alexander Gallego, VectorizedData Policies for the Kafka-API with WebAssembly | Alexander Gallego, Vectorized
Data Policies for the Kafka-API with WebAssembly | Alexander Gallego, Vectorized
 
VM-aware Adaptive Storage Cache Prefetching
VM-aware Adaptive Storage Cache PrefetchingVM-aware Adaptive Storage Cache Prefetching
VM-aware Adaptive Storage Cache Prefetching
 
JerryScript: An ultra-lighteweight JavaScript Engine for the Internet of Things
JerryScript: An ultra-lighteweight JavaScript Engine for the Internet of ThingsJerryScript: An ultra-lighteweight JavaScript Engine for the Internet of Things
JerryScript: An ultra-lighteweight JavaScript Engine for the Internet of Things
 
DB2 for z/OS - Starter's guide to memory monitoring and control
DB2 for z/OS - Starter's guide to memory monitoring and controlDB2 for z/OS - Starter's guide to memory monitoring and control
DB2 for z/OS - Starter's guide to memory monitoring and control
 
Rendering Battlefield 4 with Mantle
Rendering Battlefield 4 with MantleRendering Battlefield 4 with Mantle
Rendering Battlefield 4 with Mantle
 
Rendering Battlefield 4 with Mantle by Johan Andersson - AMD at GDC14
Rendering Battlefield 4 with Mantle by Johan Andersson - AMD at GDC14Rendering Battlefield 4 with Mantle by Johan Andersson - AMD at GDC14
Rendering Battlefield 4 with Mantle by Johan Andersson - AMD at GDC14
 
Persistent memory
Persistent memoryPersistent memory
Persistent memory
 
Comparative Performance Analysis of AWS EC2 Instance Types Commonly Used for ...
Comparative Performance Analysis of AWS EC2 Instance Types Commonly Used for ...Comparative Performance Analysis of AWS EC2 Instance Types Commonly Used for ...
Comparative Performance Analysis of AWS EC2 Instance Types Commonly Used for ...
 
Scaling Cassandra for Big Data
Scaling Cassandra for Big DataScaling Cassandra for Big Data
Scaling Cassandra for Big Data
 
lecture11_GPUArchCUDA01.pptx
lecture11_GPUArchCUDA01.pptxlecture11_GPUArchCUDA01.pptx
lecture11_GPUArchCUDA01.pptx
 

More from Anne Nicolas

Kernel Recipes 2019 - Driving the industry toward upstream first
Kernel Recipes 2019 - Driving the industry toward upstream firstKernel Recipes 2019 - Driving the industry toward upstream first
Kernel Recipes 2019 - Driving the industry toward upstream firstAnne Nicolas
 
Kernel Recipes 2019 - Hunting and fixing bugs all over the Linux kernel
Kernel Recipes 2019 - Hunting and fixing bugs all over the Linux kernelKernel Recipes 2019 - Hunting and fixing bugs all over the Linux kernel
Kernel Recipes 2019 - Hunting and fixing bugs all over the Linux kernelAnne Nicolas
 
Kernel Recipes 2019 - Metrics are money
Kernel Recipes 2019 - Metrics are moneyKernel Recipes 2019 - Metrics are money
Kernel Recipes 2019 - Metrics are moneyAnne Nicolas
 
Kernel Recipes 2019 - Kernel documentation: past, present, and future
Kernel Recipes 2019 - Kernel documentation: past, present, and futureKernel Recipes 2019 - Kernel documentation: past, present, and future
Kernel Recipes 2019 - Kernel documentation: past, present, and futureAnne Nicolas
 
Embedded Recipes 2019 - Knowing your ARM from your ARSE: wading through the t...
Embedded Recipes 2019 - Knowing your ARM from your ARSE: wading through the t...Embedded Recipes 2019 - Knowing your ARM from your ARSE: wading through the t...
Embedded Recipes 2019 - Knowing your ARM from your ARSE: wading through the t...Anne Nicolas
 
Kernel Recipes 2019 - GNU poke, an extensible editor for structured binary data
Kernel Recipes 2019 - GNU poke, an extensible editor for structured binary dataKernel Recipes 2019 - GNU poke, an extensible editor for structured binary data
Kernel Recipes 2019 - GNU poke, an extensible editor for structured binary dataAnne Nicolas
 
Kernel Recipes 2019 - Analyzing changes to the binary interface exposed by th...
Kernel Recipes 2019 - Analyzing changes to the binary interface exposed by th...Kernel Recipes 2019 - Analyzing changes to the binary interface exposed by th...
Kernel Recipes 2019 - Analyzing changes to the binary interface exposed by th...Anne Nicolas
 
Embedded Recipes 2019 - Remote update adventures with RAUC, Yocto and Barebox
Embedded Recipes 2019 - Remote update adventures with RAUC, Yocto and BareboxEmbedded Recipes 2019 - Remote update adventures with RAUC, Yocto and Barebox
Embedded Recipes 2019 - Remote update adventures with RAUC, Yocto and BareboxAnne Nicolas
 
Embedded Recipes 2019 - Making embedded graphics less special
Embedded Recipes 2019 - Making embedded graphics less specialEmbedded Recipes 2019 - Making embedded graphics less special
Embedded Recipes 2019 - Making embedded graphics less specialAnne Nicolas
 
Embedded Recipes 2019 - Linux on Open Source Hardware and Libre Silicon
Embedded Recipes 2019 - Linux on Open Source Hardware and Libre SiliconEmbedded Recipes 2019 - Linux on Open Source Hardware and Libre Silicon
Embedded Recipes 2019 - Linux on Open Source Hardware and Libre SiliconAnne Nicolas
 
Embedded Recipes 2019 - From maintaining I2C to the big (embedded) picture
Embedded Recipes 2019 - From maintaining I2C to the big (embedded) pictureEmbedded Recipes 2019 - From maintaining I2C to the big (embedded) picture
Embedded Recipes 2019 - From maintaining I2C to the big (embedded) pictureAnne Nicolas
 
Embedded Recipes 2019 - Testing firmware the devops way
Embedded Recipes 2019 - Testing firmware the devops wayEmbedded Recipes 2019 - Testing firmware the devops way
Embedded Recipes 2019 - Testing firmware the devops wayAnne Nicolas
 
Embedded Recipes 2019 - Herd your socs become a matchmaker
Embedded Recipes 2019 - Herd your socs become a matchmakerEmbedded Recipes 2019 - Herd your socs become a matchmaker
Embedded Recipes 2019 - Herd your socs become a matchmakerAnne Nicolas
 
Embedded Recipes 2019 - LLVM / Clang integration
Embedded Recipes 2019 - LLVM / Clang integrationEmbedded Recipes 2019 - LLVM / Clang integration
Embedded Recipes 2019 - LLVM / Clang integrationAnne Nicolas
 
Embedded Recipes 2019 - Introduction to JTAG debugging
Embedded Recipes 2019 - Introduction to JTAG debuggingEmbedded Recipes 2019 - Introduction to JTAG debugging
Embedded Recipes 2019 - Introduction to JTAG debuggingAnne Nicolas
 
Embedded Recipes 2019 - Pipewire a new foundation for embedded multimedia
Embedded Recipes 2019 - Pipewire a new foundation for embedded multimediaEmbedded Recipes 2019 - Pipewire a new foundation for embedded multimedia
Embedded Recipes 2019 - Pipewire a new foundation for embedded multimediaAnne Nicolas
 
Kernel Recipes 2019 - ftrace: Where modifying a running kernel all started
Kernel Recipes 2019 - ftrace: Where modifying a running kernel all startedKernel Recipes 2019 - ftrace: Where modifying a running kernel all started
Kernel Recipes 2019 - ftrace: Where modifying a running kernel all startedAnne Nicolas
 
Kernel Recipes 2019 - Suricata and XDP
Kernel Recipes 2019 - Suricata and XDPKernel Recipes 2019 - Suricata and XDP
Kernel Recipes 2019 - Suricata and XDPAnne Nicolas
 
Kernel Recipes 2019 - Marvels of Memory Auto-configuration (SPD)
Kernel Recipes 2019 - Marvels of Memory Auto-configuration (SPD)Kernel Recipes 2019 - Marvels of Memory Auto-configuration (SPD)
Kernel Recipes 2019 - Marvels of Memory Auto-configuration (SPD)Anne Nicolas
 
Kernel Recipes 2019 - Formal modeling made easy
Kernel Recipes 2019 - Formal modeling made easyKernel Recipes 2019 - Formal modeling made easy
Kernel Recipes 2019 - Formal modeling made easyAnne Nicolas
 

More from Anne Nicolas (20)

Kernel Recipes 2019 - Driving the industry toward upstream first
Kernel Recipes 2019 - Driving the industry toward upstream firstKernel Recipes 2019 - Driving the industry toward upstream first
Kernel Recipes 2019 - Driving the industry toward upstream first
 
Kernel Recipes 2019 - Hunting and fixing bugs all over the Linux kernel
Kernel Recipes 2019 - Hunting and fixing bugs all over the Linux kernelKernel Recipes 2019 - Hunting and fixing bugs all over the Linux kernel
Kernel Recipes 2019 - Hunting and fixing bugs all over the Linux kernel
 
Kernel Recipes 2019 - Metrics are money
Kernel Recipes 2019 - Metrics are moneyKernel Recipes 2019 - Metrics are money
Kernel Recipes 2019 - Metrics are money
 
Kernel Recipes 2019 - Kernel documentation: past, present, and future
Kernel Recipes 2019 - Kernel documentation: past, present, and futureKernel Recipes 2019 - Kernel documentation: past, present, and future
Kernel Recipes 2019 - Kernel documentation: past, present, and future
 
Embedded Recipes 2019 - Knowing your ARM from your ARSE: wading through the t...
Embedded Recipes 2019 - Knowing your ARM from your ARSE: wading through the t...Embedded Recipes 2019 - Knowing your ARM from your ARSE: wading through the t...
Embedded Recipes 2019 - Knowing your ARM from your ARSE: wading through the t...
 
Kernel Recipes 2019 - GNU poke, an extensible editor for structured binary data
Kernel Recipes 2019 - GNU poke, an extensible editor for structured binary dataKernel Recipes 2019 - GNU poke, an extensible editor for structured binary data
Kernel Recipes 2019 - GNU poke, an extensible editor for structured binary data
 
Kernel Recipes 2019 - Analyzing changes to the binary interface exposed by th...
Kernel Recipes 2019 - Analyzing changes to the binary interface exposed by th...Kernel Recipes 2019 - Analyzing changes to the binary interface exposed by th...
Kernel Recipes 2019 - Analyzing changes to the binary interface exposed by th...
 
Embedded Recipes 2019 - Remote update adventures with RAUC, Yocto and Barebox
Embedded Recipes 2019 - Remote update adventures with RAUC, Yocto and BareboxEmbedded Recipes 2019 - Remote update adventures with RAUC, Yocto and Barebox
Embedded Recipes 2019 - Remote update adventures with RAUC, Yocto and Barebox
 
Embedded Recipes 2019 - Making embedded graphics less special
Embedded Recipes 2019 - Making embedded graphics less specialEmbedded Recipes 2019 - Making embedded graphics less special
Embedded Recipes 2019 - Making embedded graphics less special
 
Embedded Recipes 2019 - Linux on Open Source Hardware and Libre Silicon
Embedded Recipes 2019 - Linux on Open Source Hardware and Libre SiliconEmbedded Recipes 2019 - Linux on Open Source Hardware and Libre Silicon
Embedded Recipes 2019 - Linux on Open Source Hardware and Libre Silicon
 
Embedded Recipes 2019 - From maintaining I2C to the big (embedded) picture
Embedded Recipes 2019 - From maintaining I2C to the big (embedded) pictureEmbedded Recipes 2019 - From maintaining I2C to the big (embedded) picture
Embedded Recipes 2019 - From maintaining I2C to the big (embedded) picture
 
Embedded Recipes 2019 - Testing firmware the devops way
Embedded Recipes 2019 - Testing firmware the devops wayEmbedded Recipes 2019 - Testing firmware the devops way
Embedded Recipes 2019 - Testing firmware the devops way
 
Embedded Recipes 2019 - Herd your socs become a matchmaker
Embedded Recipes 2019 - Herd your socs become a matchmakerEmbedded Recipes 2019 - Herd your socs become a matchmaker
Embedded Recipes 2019 - Herd your socs become a matchmaker
 
Embedded Recipes 2019 - LLVM / Clang integration
Embedded Recipes 2019 - LLVM / Clang integrationEmbedded Recipes 2019 - LLVM / Clang integration
Embedded Recipes 2019 - LLVM / Clang integration
 
Embedded Recipes 2019 - Introduction to JTAG debugging
Embedded Recipes 2019 - Introduction to JTAG debuggingEmbedded Recipes 2019 - Introduction to JTAG debugging
Embedded Recipes 2019 - Introduction to JTAG debugging
 
Embedded Recipes 2019 - Pipewire a new foundation for embedded multimedia
Embedded Recipes 2019 - Pipewire a new foundation for embedded multimediaEmbedded Recipes 2019 - Pipewire a new foundation for embedded multimedia
Embedded Recipes 2019 - Pipewire a new foundation for embedded multimedia
 
Kernel Recipes 2019 - ftrace: Where modifying a running kernel all started
Kernel Recipes 2019 - ftrace: Where modifying a running kernel all startedKernel Recipes 2019 - ftrace: Where modifying a running kernel all started
Kernel Recipes 2019 - ftrace: Where modifying a running kernel all started
 
Kernel Recipes 2019 - Suricata and XDP
Kernel Recipes 2019 - Suricata and XDPKernel Recipes 2019 - Suricata and XDP
Kernel Recipes 2019 - Suricata and XDP
 
Kernel Recipes 2019 - Marvels of Memory Auto-configuration (SPD)
Kernel Recipes 2019 - Marvels of Memory Auto-configuration (SPD)Kernel Recipes 2019 - Marvels of Memory Auto-configuration (SPD)
Kernel Recipes 2019 - Marvels of Memory Auto-configuration (SPD)
 
Kernel Recipes 2019 - Formal modeling made easy
Kernel Recipes 2019 - Formal modeling made easyKernel Recipes 2019 - Formal modeling made easy
Kernel Recipes 2019 - Formal modeling made easy
 

Recently uploaded

ASTRAZENECA. Knowledge Graphs Powering a Fast-moving Global Life Sciences Org...
ASTRAZENECA. Knowledge Graphs Powering a Fast-moving Global Life Sciences Org...ASTRAZENECA. Knowledge Graphs Powering a Fast-moving Global Life Sciences Org...
ASTRAZENECA. Knowledge Graphs Powering a Fast-moving Global Life Sciences Org...Neo4j
 
Campotel: Telecommunications Infra and Network Builder - Company Profile
Campotel: Telecommunications Infra and Network Builder - Company ProfileCampotel: Telecommunications Infra and Network Builder - Company Profile
Campotel: Telecommunications Infra and Network Builder - Company ProfileCampotelPhilippines
 
HBR SERIES METAL HOUSED RESISTORS POWER ELECTRICAL ABSORBS HIGH CURRENT DURIN...
HBR SERIES METAL HOUSED RESISTORS POWER ELECTRICAL ABSORBS HIGH CURRENT DURIN...HBR SERIES METAL HOUSED RESISTORS POWER ELECTRICAL ABSORBS HIGH CURRENT DURIN...
HBR SERIES METAL HOUSED RESISTORS POWER ELECTRICAL ABSORBS HIGH CURRENT DURIN...htrindia
 
The Art of the Possible with Graph by Dr Jim Webber Neo4j.pptx
The Art of the Possible with Graph by Dr Jim Webber Neo4j.pptxThe Art of the Possible with Graph by Dr Jim Webber Neo4j.pptx
The Art of the Possible with Graph by Dr Jim Webber Neo4j.pptxNeo4j
 
Building Products That Think- Bhaskaran Srinivasan & Ashish Gupta
Building Products That Think- Bhaskaran Srinivasan & Ashish GuptaBuilding Products That Think- Bhaskaran Srinivasan & Ashish Gupta
Building Products That Think- Bhaskaran Srinivasan & Ashish GuptaISPMAIndia
 
Battle of React State Managers in frontend applications
Battle of React State Managers in frontend applicationsBattle of React State Managers in frontend applications
Battle of React State Managers in frontend applicationsEvangelia Mitsopoulou
 
IT Nation Evolve event 2024 - Quarter 1
IT Nation Evolve event 2024  - Quarter 1IT Nation Evolve event 2024  - Quarter 1
IT Nation Evolve event 2024 - Quarter 1Inbay UK
 
How we think about an advisor tech stack
How we think about an advisor tech stackHow we think about an advisor tech stack
How we think about an advisor tech stackSummit
 
Avoiding Bad Stats and the Benefits of Playing Trivia with Friends: PancakesC...
Avoiding Bad Stats and the Benefits of Playing Trivia with Friends: PancakesC...Avoiding Bad Stats and the Benefits of Playing Trivia with Friends: PancakesC...
Avoiding Bad Stats and the Benefits of Playing Trivia with Friends: PancakesC...Adrian Sanabria
 
Confoo 2024 Gettings started with OpenAI and data science
Confoo 2024 Gettings started with OpenAI and data scienceConfoo 2024 Gettings started with OpenAI and data science
Confoo 2024 Gettings started with OpenAI and data scienceSusan Ibach
 
"Platform Engineering with Development Containers", Igor Fesenko
"Platform Engineering with Development Containers", Igor Fesenko"Platform Engineering with Development Containers", Igor Fesenko
"Platform Engineering with Development Containers", Igor FesenkoFwdays
 
The Future of Product, by Founder & CEO, Product School
The Future of Product, by Founder & CEO, Product SchoolThe Future of Product, by Founder & CEO, Product School
The Future of Product, by Founder & CEO, Product SchoolProduct School
 
Apex Replay Debugger and Salesforce Platform Events.pptx
Apex Replay Debugger and Salesforce Platform Events.pptxApex Replay Debugger and Salesforce Platform Events.pptx
Apex Replay Debugger and Salesforce Platform Events.pptxmohayyudin7826
 
Cultivating Entrepreneurial Mindset in Product Management: Strategies for Suc...
Cultivating Entrepreneurial Mindset in Product Management: Strategies for Suc...Cultivating Entrepreneurial Mindset in Product Management: Strategies for Suc...
Cultivating Entrepreneurial Mindset in Product Management: Strategies for Suc...Product School
 
Act Like an Owner, Challenge Like a VC by former CPO, Tripadvisor
Act Like an Owner,  Challenge Like a VC by former CPO, TripadvisorAct Like an Owner,  Challenge Like a VC by former CPO, Tripadvisor
Act Like an Owner, Challenge Like a VC by former CPO, TripadvisorProduct School
 
Leveraging SLF4j for Effective Logging in IBM App Connect Enterprise.docx
Leveraging SLF4j for Effective Logging in IBM App Connect Enterprise.docxLeveraging SLF4j for Effective Logging in IBM App Connect Enterprise.docx
Leveraging SLF4j for Effective Logging in IBM App Connect Enterprise.docxVotarikari Shravan
 
Enterprise Architecture As Strategy - Book Review
Enterprise Architecture As Strategy - Book ReviewEnterprise Architecture As Strategy - Book Review
Enterprise Architecture As Strategy - Book ReviewAshraf Fouad
 
Synergy in Leadership and Product Excellence: A Blueprint for Growth by CPO, ...
Synergy in Leadership and Product Excellence: A Blueprint for Growth by CPO, ...Synergy in Leadership and Product Excellence: A Blueprint for Growth by CPO, ...
Synergy in Leadership and Product Excellence: A Blueprint for Growth by CPO, ...Product School
 
Dynamical systems simulation in Python for science and engineering
Dynamical systems simulation in Python for science and engineeringDynamical systems simulation in Python for science and engineering
Dynamical systems simulation in Python for science and engineeringMassimo Talia
 
21ST CENTURY LITERACY FROM TRADITIONAL TO MODERN
21ST CENTURY LITERACY FROM TRADITIONAL TO MODERN21ST CENTURY LITERACY FROM TRADITIONAL TO MODERN
21ST CENTURY LITERACY FROM TRADITIONAL TO MODERNRonnelBaroc
 

Recently uploaded (20)

ASTRAZENECA. Knowledge Graphs Powering a Fast-moving Global Life Sciences Org...
ASTRAZENECA. Knowledge Graphs Powering a Fast-moving Global Life Sciences Org...ASTRAZENECA. Knowledge Graphs Powering a Fast-moving Global Life Sciences Org...
ASTRAZENECA. Knowledge Graphs Powering a Fast-moving Global Life Sciences Org...
 
Campotel: Telecommunications Infra and Network Builder - Company Profile
Campotel: Telecommunications Infra and Network Builder - Company ProfileCampotel: Telecommunications Infra and Network Builder - Company Profile
Campotel: Telecommunications Infra and Network Builder - Company Profile
 
HBR SERIES METAL HOUSED RESISTORS POWER ELECTRICAL ABSORBS HIGH CURRENT DURIN...
HBR SERIES METAL HOUSED RESISTORS POWER ELECTRICAL ABSORBS HIGH CURRENT DURIN...HBR SERIES METAL HOUSED RESISTORS POWER ELECTRICAL ABSORBS HIGH CURRENT DURIN...
HBR SERIES METAL HOUSED RESISTORS POWER ELECTRICAL ABSORBS HIGH CURRENT DURIN...
 
The Art of the Possible with Graph by Dr Jim Webber Neo4j.pptx
The Art of the Possible with Graph by Dr Jim Webber Neo4j.pptxThe Art of the Possible with Graph by Dr Jim Webber Neo4j.pptx
The Art of the Possible with Graph by Dr Jim Webber Neo4j.pptx
 
Building Products That Think- Bhaskaran Srinivasan & Ashish Gupta
Building Products That Think- Bhaskaran Srinivasan & Ashish GuptaBuilding Products That Think- Bhaskaran Srinivasan & Ashish Gupta
Building Products That Think- Bhaskaran Srinivasan & Ashish Gupta
 
Battle of React State Managers in frontend applications
Battle of React State Managers in frontend applicationsBattle of React State Managers in frontend applications
Battle of React State Managers in frontend applications
 
IT Nation Evolve event 2024 - Quarter 1
IT Nation Evolve event 2024  - Quarter 1IT Nation Evolve event 2024  - Quarter 1
IT Nation Evolve event 2024 - Quarter 1
 
How we think about an advisor tech stack
How we think about an advisor tech stackHow we think about an advisor tech stack
How we think about an advisor tech stack
 
Avoiding Bad Stats and the Benefits of Playing Trivia with Friends: PancakesC...
Avoiding Bad Stats and the Benefits of Playing Trivia with Friends: PancakesC...Avoiding Bad Stats and the Benefits of Playing Trivia with Friends: PancakesC...
Avoiding Bad Stats and the Benefits of Playing Trivia with Friends: PancakesC...
 
Confoo 2024 Gettings started with OpenAI and data science
Confoo 2024 Gettings started with OpenAI and data scienceConfoo 2024 Gettings started with OpenAI and data science
Confoo 2024 Gettings started with OpenAI and data science
 
"Platform Engineering with Development Containers", Igor Fesenko
"Platform Engineering with Development Containers", Igor Fesenko"Platform Engineering with Development Containers", Igor Fesenko
"Platform Engineering with Development Containers", Igor Fesenko
 
The Future of Product, by Founder & CEO, Product School
The Future of Product, by Founder & CEO, Product SchoolThe Future of Product, by Founder & CEO, Product School
The Future of Product, by Founder & CEO, Product School
 
Apex Replay Debugger and Salesforce Platform Events.pptx
Apex Replay Debugger and Salesforce Platform Events.pptxApex Replay Debugger and Salesforce Platform Events.pptx
Apex Replay Debugger and Salesforce Platform Events.pptx
 
Cultivating Entrepreneurial Mindset in Product Management: Strategies for Suc...
Cultivating Entrepreneurial Mindset in Product Management: Strategies for Suc...Cultivating Entrepreneurial Mindset in Product Management: Strategies for Suc...
Cultivating Entrepreneurial Mindset in Product Management: Strategies for Suc...
 
Act Like an Owner, Challenge Like a VC by former CPO, Tripadvisor
Act Like an Owner,  Challenge Like a VC by former CPO, TripadvisorAct Like an Owner,  Challenge Like a VC by former CPO, Tripadvisor
Act Like an Owner, Challenge Like a VC by former CPO, Tripadvisor
 
Leveraging SLF4j for Effective Logging in IBM App Connect Enterprise.docx
Leveraging SLF4j for Effective Logging in IBM App Connect Enterprise.docxLeveraging SLF4j for Effective Logging in IBM App Connect Enterprise.docx
Leveraging SLF4j for Effective Logging in IBM App Connect Enterprise.docx
 
Enterprise Architecture As Strategy - Book Review
Enterprise Architecture As Strategy - Book ReviewEnterprise Architecture As Strategy - Book Review
Enterprise Architecture As Strategy - Book Review
 
Synergy in Leadership and Product Excellence: A Blueprint for Growth by CPO, ...
Synergy in Leadership and Product Excellence: A Blueprint for Growth by CPO, ...Synergy in Leadership and Product Excellence: A Blueprint for Growth by CPO, ...
Synergy in Leadership and Product Excellence: A Blueprint for Growth by CPO, ...
 
Dynamical systems simulation in Python for science and engineering
Dynamical systems simulation in Python for science and engineeringDynamical systems simulation in Python for science and engineering
Dynamical systems simulation in Python for science and engineering
 
21ST CENTURY LITERACY FROM TRADITIONAL TO MODERN
21ST CENTURY LITERACY FROM TRADITIONAL TO MODERN21ST CENTURY LITERACY FROM TRADITIONAL TO MODERN
21ST CENTURY LITERACY FROM TRADITIONAL TO MODERN
 

Kernel Recipes 2017 - 20 years of Linux Virtual Memory - Andrea Arcangeli

  • 1. Copyright © 2017 Red Hat Inc. 1 20 years of Linux Virtual Memory20 years of Linux Virtual Memory Red Hat, Inc. Andrea Arcangeli <aarcange at redhat.com> Kernel Recipes, Paris 29 Sep 2017
  • 2. Copyright © 2017 Red Hat Inc. 2 Topics ● Milestones in the evolution of the Virtual Memory subsystem ● Virtual Memory latest innovations – Automatic NUMA balancing – THP developments – KSMscale – userfaultfd ● Postcopy live Migration, etc..
  • 3. Copyright © 2017 Red Hat Inc. 3 Virtual Memory (simplified) Virtual pages They cost “nothing” Practically unlimited on 64bit archs Physical pages They cost money! This is the RAM arrows = pagetables virtual to physical mapping
  • 4. Copyright © 2017 Red Hat Inc. 4 PageTablesPageTables pgd pmd pte pud pte pmd pte pte pmd pte pud pte pmd pte pte 2^9 = 512 arrows, not just 2 ● Common code and x86 pagetable format is a tree ● All pagetables are 4KB in size ● Total: grep PageTables /proc/meminfo ● (((2**9)**4)*4096)>>48 = 1 → 48bits → 256 TiB ● 5 levels in v4.13 → (48+9) bits → 57bits → 128 PiB – Build time to avoid slowdown
  • 5. Copyright © 2017 Red Hat Inc. 5 The Fabric of the Virtual Memory ● The fabric are all those data structures that connects to the hardware constrained structures like pagetables and that collectively create all the software abstractions we're accustomed to – tasks, processes, virtual memory areas, mmap (glibc malloc) ... ● The fabric is the most black and white part of the Virtual Memory ● The algorithms doing the computations on those data structures are the Virtual Memory heuristics – They need to solve hard problems with no guaranteed perfect solution – i.e. when it's the right time to start to unmap pages (swappiness) ● Some of the design didn't change: we still measure how hard it is to free memory while we're trying to free it ● All free memory is used as cache and we overcommit by default (not excessively by default) – Android uses: echo 1 >/proc/sys/vm/overcommit_memory
  • 6. Copyright © 2017 Red Hat Inc. 6 Physical page and struct page PAGE_SIZE (4KiB) large physical pagestruct page 64 bytes PAGE SIZE 4KiB ...Physical RAMmem_map 64/4096 1.56% of the RAM
  • 7. Copyright © 2017 Red Hat Inc. 7 MM & VMA ● mm_struct aka MM – Memory of the process ● Shared by threads ● vm_area_struct aka VMA – Virtual Memory Area ● Created and teardown by mmap and munmap ● Defines the virtual address space of an “MM”
  • 8. Copyright © 2017 Red Hat Inc. 11 Original bitmap... Active and Inactive list LRU ● The active page LRU preserves the the active memory working set – only the inactive LRU loses information as fast as use-once I/O goes – Introduced in 2001, it works good enough also with an arbitrary balance – Active/inactive list optimum balancing algorithm was solved in 2012-2014 ● shadow radix tree nodes that detect re-faults
  • 9. Copyright © 2017 Red Hat Inc. 13 Active and Inactive list LRU $ grep -i active /proc/meminfo Active: 3555744 kB Inactive: 2511156 kB Active(anon): 2286400 kB Inactive(anon): 1472540 kB Active(file): 1269344 kB Inactive(file): 1038616 kB
  • 10. Copyright © 2017 Red Hat Inc. 14 Active LRU working-set detection ● From Johannes Weiner fault ------------------------+ | +--------------+ | +-------------+ reclaim <- T inactive H <-+-- demotion T active H <--+ +--------------+ +-------------+ | | | +-------------- promotion ------------------+ H = Head, T = Tail +-memory available to cache-+ | | +-inactive------+-active----+ a b | c d e f g h i | J K L M N | +---------------+-----------+
  • 11. Copyright © 2017 Red Hat Inc. 15 lru→inactive_age and radix tree shadow entries inode/file radix_tree root inode/file radix_tree root inode/file radix_tree node inode/file radix_tree node inode/file radix_tree node inode/file radix_tree node pagepage
  • 12. Copyright © 2017 Red Hat Inc. 16 Reclaim saving inactive_age inode/file radix_tree root inode/file radix_tree root inode/file radix_tree node inode/file radix_tree node inode/file radix_tree node inode/file radix_tree node pagepage Exceptional/shadow entry memcg LRU LRU reclaim inactive_age++ Exceptional/shadow entry memcg LRU LRU reclaim inactive_age++
  • 13. Copyright © 2017 Red Hat Inc. 20 Object-based reverse mapping PHYSICAL PAGE Anonymous anon_vma inode (objrmap) vma vma vma vma prio_tree rmap item rmap item vma vma PHYSICAL PAGE Filesystem PHYSICAL PAGE Anonymous PHYSICAL PAGE Filesystem PHYSICAL PAGE KSM
  • 14. Copyright © 2017 Red Hat Inc. 21 Active & inactive + rmap page[0] page[1] page[N] mem_map process virtualmemory process virtualmemory process virtualmemory process virtualmemory active_lruinactive_lru rmap rmap rmap rmap Use-once pages to trash Use-many pages
  • 15. Copyright © 2017 Red Hat Inc. 22 Many more LRUs ● Separated LRU for anon and file backed mappings ● Memcg (memory cgroups) introduced per- memcg LRUs ● Removal of unfreeable pages from LRUs – anonymous memory with no swap – mlocked memory ● Transparent Hugepages in the LRU increase scalability further (lru size decreased 512 times)
  • 16. Copyright © 2017 Red Hat Inc. 23 Recent Virtual Memory trends ● Optimizing the workloads for you, without manual tuning – NUMA hard bindings (numactl) → Automatic NUMA Balancing – Hugetlbfs → Transparent Hugepage – Programs or Virtual Machines duplicating memory → KSM – Page pinning (RDMA/KVM shadow MMU) -> MMU notifier – Private device memory managed by hand and pinned → HMM/UVM (unified virtual memory) for GPU seamlessly computing in GPU memory ● The optimizations can be optionally disabled
  • 17. Copyright © 2017 Red Hat Inc. 30
  • 18. Copyright © 2017 Red Hat Inc. 31 Automatic NUMA Balancing benchmark Intel SandyBridge (Intel(R) Xeon(R) CPU E5-2690 0 @ 2.90GHz) 2 Sockets – 32 Cores with Hyperthreads 256G Memory RHEV 3.6 Host bare metal – 3.10.0-327.el7 (RHEL7.2) VM guest – 3.10.0-324.el7 (RHEL7.2) VM – 32P , 160G (Optimized for Server) Storage – Violin 6616 – 16G Fibre Channel Oracle – 12C , 128G SGA Test – Running Oracle OLTP workload with increasing user count and measuring Trans / min for each run as a metric for comparison
  • 19. Copyright © 2017 Red Hat Inc. 32 10U 20U 40U 80U 100U 0 200000 400000 600000 800000 1000000 1200000 1400000 4 VMs with different NUMA options OLTP workload Auto numa OFF Auto Numa on - No pinning Auto Numa on - Numa pinning User count Trans/min
  • 20. Copyright © 2017 Red Hat Inc. 33 Automatic NUMA balancing configuration ● https://tinyurl.com/zupp9v3 https://access.redhat.com/ ● In RHEL7 Automatic NUMA balancing is enabled when: # numactl --hardware shows multiple nodes ● To disable automatic NUMA balancing: # echo 0 > /proc/sys/kernel/numa_balancing ● To enable automatic NUMA balancing: # echo 1 > /proc/sys/kernel/numa_balancing ● At boot: numa_balancing=enable|disable
  • 21. Copyright © 2017 Red Hat Inc. 34
  • 22. Copyright © 2017 Red Hat Inc. 35 Hugepages ● Traditionally x86 hardware gave us 4KiB pages ● The more memory the bigger the overhead in managing 4KiB pages ● What if you had bigger pages? – 512 times bigger → 2MiB
  • 23. Copyright © 2017 Red Hat Inc. 36 PageTablesPageTables pgd pmd pte pud pte pmd pte pte pmd pte pud pte pmd pte pte 2^9 = 512 arrows, not just 2
  • 24. Copyright © 2017 Red Hat Inc. 37 ● Improve CPU performance – Enlarge TLB size (essential for KVM) – Speed up TLB miss (essential for KVM) ● Need 3 accesses to memory instead of 4 to refill the TLB – Faster to allocate memory initially (minor) – Page colouring inside the hugepage (minor) – Higher scalability of the page LRUs ● Cons – clear_page/copy_page less cache friendly ● Clear faulting subpage last included in v4.13 from Andi Kleen and Ying Hhuang – 28.3% increase in vm-scalability anon-w-seq – higher memory footprint sometime – Direct compaction takes time Benefit of hugepages
  • 25. Copyright © 2017 Red Hat Inc. 38 TLB miss cost:TLB miss cost: number of accesses to memorynumber of accesses to memory host THP off guest THP off host THP on guest THP off host THP off guest THP on host THP on guest THP on novirt THP off novirt THP on 0 5 10 15 20 25 BaremetalVirtualization
  • 26. Copyright © 2017 Red Hat Inc. 39 ● How do we get the benefits of hugetlbfs without having to configure anything? ● Any Linux process will receive 2M pages – if the mmap region is 2M naturally aligned – If compaction succeeds in producing hugepages ● Entirely transparent to userland Transparent Hugepage design
  • 27. Copyright © 2017 Red Hat Inc. 40 ● /sys/kernel/mm/transparent_hugepage/enabled – [always] madvise never ● Always use THP if vma start/end permits – always [madvise] never ● Use THP only inside MADV_HUGEPAGE – Applies to khugepaged too – always madvise [never] ● Never use THP – khugepaged quits ● Default selected at build time THP sysfs enabled
  • 28. Copyright © 2017 Red Hat Inc. 41 ● /sys/kernel/mm/transparent_hugepage/defrag – [always] defer defer+madvise madvise never ● Always use direct compaction (ideal for long lived allocations) – always [defer] defer+madvise madvise never ● Defer compaction asynchronously (kswapd/kcompactd) – always defer [defer+madvise] madvise never ● Direct compaction in MADV_HUGEPAGE, async otherwise – always defer defer+madvise [madvise] never ● Use direct compaction only inside MADV_HUGEPAGE ● khugepaged enabled in the background – always defer defer+madvise madvise [never] ● Never use compaction, quit khugepaged too ● Disabling THP is excessive if only direct compaction is too expensive ● KVM uses MADV_HUGEPAGE – MADV_HUGEPAGE will still use direct compaction THP defrag - compaction control
  • 29. Copyright © 2017 Red Hat Inc. 42 Priming the VM ● To achieve maximum THP utilization you can run the two commands below # cat /proc/buddyinfo Node 0, zone DMA 0 0 1 1 1 1 1 1 0 1 3 Node 0, zone DMA32 8357 4904 2670 472 126 82 8 1 0 3 5 Node 0, zone Normal 5364 38097 608 63 11 1 1 55 0 0 0 # echo 3 >/proc/sys/vm/drop_caches # cat /proc/buddyinfo Node 0, zone DMA 0 0 1 1 1 1 1 1 0 1 3 Node 0, zone DMA32 5989 5445 5235 5022 4420 3583 2424 1331 471 119 25 Node 0, zone Normal 71579 58338 36733 19523 6431 1572 559 282 115 29 2 # echo >/proc/sys/vm/compact_memory # cat /proc/buddyinfo Node 0, zone DMA 0 0 1 1 1 1 1 1 0 1 3 Node 0, zone DMA32 1218 1161 1227 1240 1135 925 688 502 342 305 369 Node 0, zone Normal 7479 5147 5124 4240 2768 1959 1391 814 389 270 149 4k 8k 16k 32k 64k 2M 4M
  • 30. Copyright © 2017 Red Hat Inc. 43 ● From Kirill A. Shutemov and Hugh Dickins – Merged in the v4.8 kernel ● # mount -o remount,huge=always none /dev/shm – Including small files (i.e. < 2MB in size) ● Very high memory usage on small files ● # mount -o remount,huge=never none /dev/shm – Current default ● # mount -o remount,huge=within_size none /dev/shm – Allocate THP if inode i_size can contain hugepages, or if there’s a madvise/fadvise hint ● Ideal for apps benefiting from THP on tmpfs! ● # mount -o remount,huge=advise none /dev/shm – Only if requested through madvise/fadvise THP on tmpfs
  • 31. Copyright © 2017 Red Hat Inc. 44 ● shmem uses an internal tmpfs mount for: SysV SHM, memfd, MAP_SHARED of /dev/zero or MAP_ANONYMOUS, DRM objects, ashmem ● Internal mount is controlled by: $ cat /sys/kernel/mm/transparent_hugepage/shmem_enabled always within_size advise [never] deny force ● deny – Disable THP for all tmpfs mounts ● For debugging ● force – Always use THP for all tmpfs mounts ● For testing THP on shmem
  • 32. Copyright © 2017 Red Hat Inc. 45
  • 33. Copyright © 2017 Red Hat Inc. 46 ● Virtual memory to dedup is practically unlimited (even on 32bit if you deduplicate across multiple processes) ● The same page content can be deduplicated an unlimited number of times KSMscale PageKSM stable_node rmap_item rmap_item Virtual address In userland process (mm, addr1) Virtual address In userland process (mm, addr2)
  • 34. Copyright © 2017 Red Hat Inc. 47 $ cat /sys/kernel/mm/ksm/max_page_sharing 256 ● KSMscale limits the maximum deduplication for each physical PageKSM allocated – Limiting “compression” to x256 is enough – Higher maximum deduplication generates diminishing returns ● To alter the max page sharing: $ echo 2 > /sys/kernel/mm/ksm/run $ echo 512 > /sys/kernel/mm/ksm/max_page_sharing ● Included in v4.13 KSMscale
  • 35. Copyright © 2017 Red Hat Inc. 48
  • 36. Copyright © 2017 Red Hat Inc. 49 Why: Memory Externalization ● Memory externalization is about running a program with part (or all) of its memory residing on a remote node ● Memory is transferred from the memory node to the compute node on access ● Memory can be transferred from the compute node to the memory node if it's not frequently used during memory pressure Node 0Compute node Local Memory Node 0Memory node Remote Memory userfault Node 0Compute node Local Memory Node 0Memory node Remote Memory memory pressure
  • 37. Copyright © 2017 Red Hat Inc. 50 Postcopy live migration ● Postcopy live migration is a form of memory externalization ● When the QEMU compute node (destination) faults on a missing page that resides in the memory node (source) the kernel has no way to fetch the page – Solution: let QEMU in userland handle the pagefault Partially funded by the Orbit European Union project Node 0Compute node Local Memory Node 0Memory node Remote Memory userfault QEMU source QEMU destination Postcopy live migration
  • 38. Copyright © 2017 Red Hat Inc. 51 userfaultfd latency 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 0 200 400 600 800 1000 1200 1400 1600 userfault latency during postcopy live migration - 10Gbit qemu 2.5+ - RHEL7.2+ - stressapptest running in guest <= latency in milliseconds numberofuserfaults Userfaults triggered on pages that were already in network-flight are instantaneous. Background transfer seeks at the last userfault address.
  • 39. Copyright © 2017 Red Hat Inc. 52 KVM precopy live migration 0 100000 200000 300000 400000 500000 600000 700000 800000 900000 Before precopy After precopy During precopy Precopy never completes until the database benchmark completes 10Gbit NIC 120GiB guest Database TPM ~120sec Time to Transfer RAM over network pre copy
  • 40. Copyright © 2017 Red Hat Inc. 53 KVM postcopy live migration 00:10 00:50 01:30 02:10 02:50 03:30 04:10 04:50 05:30 06:10 06:50 07:30 08:10 08:50 09:30 10:10 10:50 11:30 12:10 12:50 13:30 14:10 14:50 15:30 16:10 16:50 0 100000 200000 300000 400000 500000 600000 700000 800000 900000 Before postcopy During postcopy After postcopy # virsh migrate .. --postcopy --timeout <sec> --timeout-postcopy # virsh migrate .. --postcopy --postcopy-after-precopy precopy runs From 5m To 7m postcopy runs From 7m To about ~9m deterministic pre copy post copy khugepaged collapses THPs
  • 41. Copyright © 2017 Red Hat Inc. 54 All available upstream ● Userfaultfd() syscall in Linux Kernel >= v4.3 ● Postcopy live migration in: – QEMU >= v2.5.0 ● Author: David Gilbert @ Red Hat Inc. – Postcopy in Libvirt >= 1.3.4 – OpenStack Nova >= Newton ● In production since RHEL 7.3
  • 42. Copyright © 2017 Red Hat Inc. 55 Live migration total time Total time 0 50 100 150 200 250 300 350 400 450 500 autoconverge postcopy seconds
  • 43. Copyright © 2017 Red Hat Inc. 56 Max UDP latency 0 5 10 15 20 25 precopy timeout autoconverge postcopy seconds Live migration max perceived downtime latency
  • 44. Copyright © 2017 Red Hat Inc. 57 userfaultfd v4.13 features ● The current v4.13 upstream kernel supports: – Missing faults (i.e. missing pages EVENT_PAGEFAULT) ● Anonymous (UFFDIO_COPY, UFFDIO_ZEROPAGE, UFFDIO_WAKE) ● Hugetlbfs (UFFDIO_COPY, UFFDIO_WAKE) ● Shmem (UFFDIO_COPY, UFFDIO_ZEROPAGE, UFFDIO_WAKE) – UFFD_FEATURE_THREAD_ID (to collect vcpu statistics) – UFFD_FEATURE_SIGBUS (raises SIGBUS instead of userfault) – Non cooperative Events ● EVENT_REMOVE (MADV_REMOVE/DONTNEED/FREE) ● EVENT_UNMAP (munmap notification to stop background transfer) ● EVENT_REMAP (non-cooperative process called mremap) ● EVENT_FORK (create new uffd over fork()) ● UFFDIO_COPY returns -ESRCH after exit()
  • 45. Copyright © 2017 Red Hat Inc. 58 userfaultfd in-progress features ● v4.13 rebased aa.git for anonymous memory supports: – UFFDIO_WRITEPROTECT – UFFDIO_COPY_MODE_WP ● Same as UFFDIO_COPY but wrprotected – UFFDIO_REMAP ● monitor can remove memory atomically ● Floating patchset for synchronous EVENT_REMOVE to allow multithreaded non cooperative manager
  • 46. Copyright © 2017 Red Hat Inc. 59 struct uffd_msg/* read() structure */ struct uffd_msg { __u8 event; __u8 reserved1; __u16 reserved2; __u32 reserved3; union { struct { __u64 flags; __u64 address; union { __u32 ptid; } feat; } pagefault; struct { __u32 ufd; } fork; struct { __u64 from; __u64 to; __u64 len; } remap; struct { __u64 start; __u64 end; } remove; struct { /* unused reserved fields */ __u64 reserved1; __u64 reserved2; __u64 reserved3; } reserved; } arg; } __packed; sizeof(struct uffd_msg) 32bit/64bit ABI enforcement Zeros here, can extend with UFFD_FEATURE flags UFFD_EVENT_* tells which part of the union is valid Default cooperative support tracking pagefaults UFFD_EVENT_PAGEFAULT Non cooperative support tracking MM syscalls UFFD_EVENT_FORK UFFD_EVENT_REMAP UFFD_EVENT_REMOVE|UNMAP
  • 47. Copyright © 2017 Red Hat Inc. 60 userfaultfd potential use cases ● Efficient snapshotting (drop fork()) ● JIT/to-native compilers (drop write bits) ● Add robustness to hugetlbfs/tmpfs ● Host enforcement for virtualization memory ballooning ● Distributed shared memory projects: at Berkeley and University of Colorado ● Tracking of anon pages written by other threads: at llnl.gov ● Obsoletes soft-dirty ● Obsoletes volatile pages SIGBUS
  • 48. Copyright © 2017 Red Hat Inc. 61 Git userfaultfd branch ● https://git.kernel.org/cgit/linux/kernel/git/andrea/aa.git/log/?h=userfault
  • 49. Copyright © 2017 Red Hat Inc. 62
  • 50. Copyright © 2017 Red Hat Inc. 63 Virtual Memory evolution ● Amazing to see the room for further innovation there was back then – Things constantly looks pretty mature ● They may actually have been considering my hardware back then was much less powerful and not more complex than my cellphone ● Unthinkable to maintain the current level of mission critical complexity by reinventing the wheel in a not Open Source way – Can perhaps be still done in a limited set of laptops and cellphones models, but for how long? ● Innovation in the Virtual Memory space is probably one among the plenty of factors that contributed to Linux success and the KVM lead in OpenStack user base too – KVM (unlike the preceding Hypervisor designs) leverages the power of the Linux Virtual Memory in its entirety