SlideShare a Scribd company logo
Linux CMA (Contiguous Memory Allocator)
Mark Veltzer
<mark.veltzer@gmail.com>
Why contiguous memory?
● DMA
● Sharing between operating systems
Why large contiguous memory?
● Brain dead devices (200MB in one DMA
transfer?).
● Devices that uses large memory chunks.
● Performance of scatter/gather.
● kmalloc (4MB on x86, less on other platforms),
gfp (8MB) limitations.
● And again, sharing between operating
systems...
Solutions to the contiguous memory
problem
● Use small hardware buffers (the most common).
● Use scatter gather in hardware or in software (very
common if small buffers are not available – especially in
general availability products – because of the problems of
the next methods and the unavailability of an IOMMU).
● Reserve a large contiguous buffer (a.k.a the old way – try
not to use me if you can...).
● CMA (you are here! - half way to nirvana).
● IOMMU (a.k.a. the best thing since sliced bread – pure
bliss – your company will build a temple in your honor).
Doing it on your own (no CMA)
● First reserve a large buffer for your needs and tell the kernel to keep it's
filthy grubbing hands off your buffer...
● Find an free large area by looking at /proc/iomem. I found 200M free
buffer at address 800M in this example.
● Pass mem=800M memmap=200M$800M to the kernel at boot (this
actually depends on the kernel version, for some kernels only memmap
is needed, ugh!!!).
● Check it by passing bootmem_debug
● See that the allocation worked by looking at:
– dmesg | grep user (no guarantee, kernel log changes all the time)
– cat /proc/iomem (even better than previous).
– /sys/firmware/memmap/*/* (the raw data – can be machine parsed).
Doing it on your own (no CMA).
Continued...
● Now you want your driver/kernel code to access this
buffer…
● request_mem_region with the same physical address
you passed as a kernel parameter.
● ioremap
● And just you the buffer you get from ioremap.
● Dont forget to iounmap and release_mem_region...
● Lets see an example…
What's wrong with this approach?
● You need to find a free spot from /proc/iomem. This space can
change between board to board, chipset to chipset, at hotplug
time, pci plug&play non determinacy, between architectures and
more.
● The space will not be used for anything, ever. So that space is
lost even you never do anything with it.
● You need to know the size of RAM you want in advance, and be
accurate about it. This is because every byte you allocate is a
byte you can't use for regular system operations.
CMA
● What is it? A piece of code, now in the standard kernel.
● Enabled in the default Linux configuration and by most
current Linux distributions.
● A result of a long line of patches with various
approaches intended to solve the “large contiguous
memory” problem.
● Well integrated, both with the Linux memory allocator
(gfp) and with the Linux DMA subsystem.
How do you use it?
● Make sure it's in your kernel (CONFIG_CMA)
● You can make the default CMA context larger or smaller via various kernel configuration options
(CMA_AREAS, CMA_DEBUG, CMA_SIZE_MBYTES, CMA_SIZE_PERCENTAGE, ...)
● Use the regular DMA API to allocate DMA buffers. That's it.
●
This means that drivers do not need to be changed.
● Allocation time will be a little longer if you ask for large buffers (see next slide). But not by much.
● You can use the special CMA APIS (dma_contiguous_reserve, dma_contiguous_early_fixup,
dma_alloc_from_contiguous, dma_release_from_contiguous, dma_declare_contiguous) but you
don't have too. Try to avoid it.
How does it work?
● Pages in the kernel are marked as movable and non movable.
● Pages which are movable are pages where the physical addresses don't matter. This is true for most pages
(especially user space pages and the page cache which take up most of the space in the system).
●
When you alloc via cma (actually when the DMA api allocs and CMA is enabled) it migrated pages pages which are
migratable (movable) to make physical room for you buffer.
●
While the migration is taking place the CMA uses a feature of the page allocator which allow to mark a page block as
MIGRATE_ISOLATE which prevents the page allocator from allocating these pages while the moving is taking place.
●
Allocating is from CMA contexts/pool. By default there are 7 (controllable from kernel configuration) of those and one
is default. You can create your own and have a private pool for you driver if you are willing to use the special API.
Usually you don't need these but for special considerations (realtime etc) this may be an option.
How does this solve my problems?
● Well, with CMA all of your RAM is used and if
you never ask for that large contiguous space
then you can use all of your systems RAM.
● In addition CMA is integrated with the page
allocator so when you return pages (simply
dealloc via the DMA API) the pages actually go
back to being used by whomever.
● CMA is transparent to drivers while the previous
solution is not.
Issues with CMA
● get_user_pages() is an issue which is still to be
dealt with efficiently. get_user_pages() locks
pages and makes the unmovable. This is dealt
with today by moving pages away as soon as
they are locked which creates a bit of overhead.
● IOMMU is better (see next slide).
IOMMU
● IOMMU solves all of your large contiguous woes (at
least as they pertain to hardware, not to operating
system sharing).
● It also has a security advantage.
● And much better debugging and stability.
● Really, try to use a board with IOMMU… It's well
worth the extra $$$….
● Maybe you already have an iommu? Check
/sys/class/iommu/*...
References
● $KERNEL/doc/kernel-parameters.txt (cma=)
● $KERNEL/drivers/base/dma-contiguous.c
● $KERNEL/include/linux/dma-contiguous.h
● https://lwn.net/Articles/486301/ A deep dive into CMA
● https://lwn.net/Articles/474819/ DMA buffer sharing
● https://lwn.net/Articles/396657/ Older CMA article – some
of it irrelevant.
● https://lwn.net/Articles/480055/ The Android ION memory
allocator (replacement of pmem).

More Related Content

What's hot

Process Scheduler and Balancer in Linux Kernel
Process Scheduler and Balancer in Linux KernelProcess Scheduler and Balancer in Linux Kernel
Process Scheduler and Balancer in Linux Kernel
Haifeng Li
 
Linux MMAP & Ioremap introduction
Linux MMAP & Ioremap introductionLinux MMAP & Ioremap introduction
Linux MMAP & Ioremap introduction
Gene Chang
 
Memory management in Linux kernel
Memory management in Linux kernelMemory management in Linux kernel
Memory management in Linux kernel
Vadim Nikitin
 
QEMU - Binary Translation
QEMU - Binary Translation QEMU - Binary Translation
QEMU - Binary Translation
Jiann-Fuh Liaw
 
ACPI Debugging from Linux Kernel
ACPI Debugging from Linux KernelACPI Debugging from Linux Kernel
ACPI Debugging from Linux Kernel
SUSE Labs Taipei
 
Secure Boot on ARM systems – Building a complete Chain of Trust upon existing...
Secure Boot on ARM systems – Building a complete Chain of Trust upon existing...Secure Boot on ARM systems – Building a complete Chain of Trust upon existing...
Secure Boot on ARM systems – Building a complete Chain of Trust upon existing...
Linaro
 
Monitoring MySQL with DTrace/SystemTap
Monitoring MySQL with DTrace/SystemTapMonitoring MySQL with DTrace/SystemTap
Monitoring MySQL with DTrace/SystemTap
Padraig O'Sullivan
 
리눅스 커널 디버거 KGDB/KDB
리눅스 커널 디버거 KGDB/KDB리눅스 커널 디버거 KGDB/KDB
리눅스 커널 디버거 KGDB/KDB
Manjong Han
 
Linux Kernel - Virtual File System
Linux Kernel - Virtual File SystemLinux Kernel - Virtual File System
Linux Kernel - Virtual File System
Adrian Huang
 
The Linux Block Layer - Built for Fast Storage
The Linux Block Layer - Built for Fast StorageThe Linux Block Layer - Built for Fast Storage
The Linux Block Layer - Built for Fast Storage
Kernel TLV
 
Part 01 Linux Kernel Compilation (Ubuntu)
Part 01 Linux Kernel Compilation (Ubuntu)Part 01 Linux Kernel Compilation (Ubuntu)
Part 01 Linux Kernel Compilation (Ubuntu)
Tushar B Kute
 
System Device Tree and Lopper: Concrete Examples - ELC NA 2022
System Device Tree and Lopper: Concrete Examples - ELC NA 2022System Device Tree and Lopper: Concrete Examples - ELC NA 2022
System Device Tree and Lopper: Concrete Examples - ELC NA 2022
Stefano Stabellini
 
Decompressed vmlinux: linux kernel initialization from page table configurati...
Decompressed vmlinux: linux kernel initialization from page table configurati...Decompressed vmlinux: linux kernel initialization from page table configurati...
Decompressed vmlinux: linux kernel initialization from page table configurati...
Adrian Huang
 
Linux Kernel Booting Process (1) - For NLKB
Linux Kernel Booting Process (1) - For NLKBLinux Kernel Booting Process (1) - For NLKB
Linux Kernel Booting Process (1) - For NLKB
shimosawa
 
Linux Initialization Process (2)
Linux Initialization Process (2)Linux Initialization Process (2)
Linux Initialization Process (2)
shimosawa
 
The e820 trap of Linux kernel hibernation
The e820 trap of Linux kernel hibernationThe e820 trap of Linux kernel hibernation
The e820 trap of Linux kernel hibernation
joeylikernel
 
Launch the First Process in Linux System
Launch the First Process in Linux SystemLaunch the First Process in Linux System
Launch the First Process in Linux System
Jian-Hong Pan
 
Memory management in Linux
Memory management in LinuxMemory management in Linux
Memory management in Linux
Raghu Udiyar
 
U Boot or Universal Bootloader
U Boot or Universal BootloaderU Boot or Universal Bootloader
U Boot or Universal Bootloader
Satpal Parmar
 
Page cache in Linux kernel
Page cache in Linux kernelPage cache in Linux kernel
Page cache in Linux kernel
Adrian Huang
 

What's hot (20)

Process Scheduler and Balancer in Linux Kernel
Process Scheduler and Balancer in Linux KernelProcess Scheduler and Balancer in Linux Kernel
Process Scheduler and Balancer in Linux Kernel
 
Linux MMAP & Ioremap introduction
Linux MMAP & Ioremap introductionLinux MMAP & Ioremap introduction
Linux MMAP & Ioremap introduction
 
Memory management in Linux kernel
Memory management in Linux kernelMemory management in Linux kernel
Memory management in Linux kernel
 
QEMU - Binary Translation
QEMU - Binary Translation QEMU - Binary Translation
QEMU - Binary Translation
 
ACPI Debugging from Linux Kernel
ACPI Debugging from Linux KernelACPI Debugging from Linux Kernel
ACPI Debugging from Linux Kernel
 
Secure Boot on ARM systems – Building a complete Chain of Trust upon existing...
Secure Boot on ARM systems – Building a complete Chain of Trust upon existing...Secure Boot on ARM systems – Building a complete Chain of Trust upon existing...
Secure Boot on ARM systems – Building a complete Chain of Trust upon existing...
 
Monitoring MySQL with DTrace/SystemTap
Monitoring MySQL with DTrace/SystemTapMonitoring MySQL with DTrace/SystemTap
Monitoring MySQL with DTrace/SystemTap
 
리눅스 커널 디버거 KGDB/KDB
리눅스 커널 디버거 KGDB/KDB리눅스 커널 디버거 KGDB/KDB
리눅스 커널 디버거 KGDB/KDB
 
Linux Kernel - Virtual File System
Linux Kernel - Virtual File SystemLinux Kernel - Virtual File System
Linux Kernel - Virtual File System
 
The Linux Block Layer - Built for Fast Storage
The Linux Block Layer - Built for Fast StorageThe Linux Block Layer - Built for Fast Storage
The Linux Block Layer - Built for Fast Storage
 
Part 01 Linux Kernel Compilation (Ubuntu)
Part 01 Linux Kernel Compilation (Ubuntu)Part 01 Linux Kernel Compilation (Ubuntu)
Part 01 Linux Kernel Compilation (Ubuntu)
 
System Device Tree and Lopper: Concrete Examples - ELC NA 2022
System Device Tree and Lopper: Concrete Examples - ELC NA 2022System Device Tree and Lopper: Concrete Examples - ELC NA 2022
System Device Tree and Lopper: Concrete Examples - ELC NA 2022
 
Decompressed vmlinux: linux kernel initialization from page table configurati...
Decompressed vmlinux: linux kernel initialization from page table configurati...Decompressed vmlinux: linux kernel initialization from page table configurati...
Decompressed vmlinux: linux kernel initialization from page table configurati...
 
Linux Kernel Booting Process (1) - For NLKB
Linux Kernel Booting Process (1) - For NLKBLinux Kernel Booting Process (1) - For NLKB
Linux Kernel Booting Process (1) - For NLKB
 
Linux Initialization Process (2)
Linux Initialization Process (2)Linux Initialization Process (2)
Linux Initialization Process (2)
 
The e820 trap of Linux kernel hibernation
The e820 trap of Linux kernel hibernationThe e820 trap of Linux kernel hibernation
The e820 trap of Linux kernel hibernation
 
Launch the First Process in Linux System
Launch the First Process in Linux SystemLaunch the First Process in Linux System
Launch the First Process in Linux System
 
Memory management in Linux
Memory management in LinuxMemory management in Linux
Memory management in Linux
 
U Boot or Universal Bootloader
U Boot or Universal BootloaderU Boot or Universal Bootloader
U Boot or Universal Bootloader
 
Page cache in Linux kernel
Page cache in Linux kernelPage cache in Linux kernel
Page cache in Linux kernel
 

Similar to Continguous Memory Allocator in the Linux Kernel

gcma: guaranteed contiguous memory allocator
gcma:  guaranteed contiguous memory allocatorgcma:  guaranteed contiguous memory allocator
gcma: guaranteed contiguous memory allocator
SeongJae Park
 
GCMA: Guaranteed Contiguous Memory Allocator
GCMA: Guaranteed Contiguous Memory AllocatorGCMA: Guaranteed Contiguous Memory Allocator
GCMA: Guaranteed Contiguous Memory Allocator
SeongJae Park
 
Kernel Recipes 2017 - 20 years of Linux Virtual Memory - Andrea Arcangeli
Kernel Recipes 2017 - 20 years of Linux Virtual Memory - Andrea ArcangeliKernel Recipes 2017 - 20 years of Linux Virtual Memory - Andrea Arcangeli
Kernel Recipes 2017 - 20 years of Linux Virtual Memory - Andrea Arcangeli
Anne Nicolas
 
LCE13: Android Graphics Upstreaming
LCE13: Android Graphics UpstreamingLCE13: Android Graphics Upstreaming
LCE13: Android Graphics Upstreaming
Linaro
 
Controlling Memory Footprint at All Layers: Linux Kernel, Applications, Libra...
Controlling Memory Footprint at All Layers: Linux Kernel, Applications, Libra...Controlling Memory Footprint at All Layers: Linux Kernel, Applications, Libra...
Controlling Memory Footprint at All Layers: Linux Kernel, Applications, Libra...
peknap
 
Linux Huge Pages
Linux Huge PagesLinux Huge Pages
Linux Huge Pages
Geraldo Netto
 
Introduction of ram ddr3
Introduction of ram ddr3Introduction of ram ddr3
Introduction of ram ddr3Technocratz
 
Introduction of ram ddr3
Introduction of ram ddr3Introduction of ram ddr3
Introduction of ram ddr3
Jatin Goyal
 
Virtualization for Emerging Memory Devices
Virtualization for Emerging Memory DevicesVirtualization for Emerging Memory Devices
Virtualization for Emerging Memory Devices
Takahiro Hirofuchi
 
Memory Unit
Memory UnitMemory Unit
Memory Unit
DAVIS THOMAS
 
Working memory in computer
Working memory in computerWorking memory in computer
Working memory in computer
sai shiva hari prasad
 
Pace IT - Introduction to Ram
Pace IT - Introduction to RamPace IT - Introduction to Ram
Pace IT - Introduction to Ram
Pace IT at Edmonds Community College
 
Reliability, Availability and Serviceability on Linux
Reliability, Availability and Serviceability on LinuxReliability, Availability and Serviceability on Linux
Reliability, Availability and Serviceability on Linux
Samsung Open Source Group
 
Memory Hierarchy (RAM and ROM)
Memory Hierarchy (RAM and ROM)Memory Hierarchy (RAM and ROM)
Memory Hierarchy (RAM and ROM)sumanth ch
 
Q4.11: Introduction to eMMC
Q4.11: Introduction to eMMCQ4.11: Introduction to eMMC
Q4.11: Introduction to eMMC
Linaro
 
Enabling Machine Learning on the Edge using SRAM Conserving Efficient Neural ...
Enabling Machine Learning on the Edge using SRAM Conserving Efficient Neural ...Enabling Machine Learning on the Edge using SRAM Conserving Efficient Neural ...
Enabling Machine Learning on the Edge using SRAM Conserving Efficient Neural ...
Bharath Sudharsan
 
Improving MeeGo boot-up time
Improving MeeGo boot-up timeImproving MeeGo boot-up time
Improving MeeGo boot-up time
Hiroshi Doyu
 
Function of memory.4to5
Function of memory.4to5Function of memory.4to5
Function of memory.4to5myrajendra
 

Similar to Continguous Memory Allocator in the Linux Kernel (20)

gcma: guaranteed contiguous memory allocator
gcma:  guaranteed contiguous memory allocatorgcma:  guaranteed contiguous memory allocator
gcma: guaranteed contiguous memory allocator
 
GCMA: Guaranteed Contiguous Memory Allocator
GCMA: Guaranteed Contiguous Memory AllocatorGCMA: Guaranteed Contiguous Memory Allocator
GCMA: Guaranteed Contiguous Memory Allocator
 
Kernel Recipes 2017 - 20 years of Linux Virtual Memory - Andrea Arcangeli
Kernel Recipes 2017 - 20 years of Linux Virtual Memory - Andrea ArcangeliKernel Recipes 2017 - 20 years of Linux Virtual Memory - Andrea Arcangeli
Kernel Recipes 2017 - 20 years of Linux Virtual Memory - Andrea Arcangeli
 
LCE13: Android Graphics Upstreaming
LCE13: Android Graphics UpstreamingLCE13: Android Graphics Upstreaming
LCE13: Android Graphics Upstreaming
 
Controlling Memory Footprint at All Layers: Linux Kernel, Applications, Libra...
Controlling Memory Footprint at All Layers: Linux Kernel, Applications, Libra...Controlling Memory Footprint at All Layers: Linux Kernel, Applications, Libra...
Controlling Memory Footprint at All Layers: Linux Kernel, Applications, Libra...
 
Linux Huge Pages
Linux Huge PagesLinux Huge Pages
Linux Huge Pages
 
Introduction of ram ddr3
Introduction of ram ddr3Introduction of ram ddr3
Introduction of ram ddr3
 
Introduction of ram ddr3
Introduction of ram ddr3Introduction of ram ddr3
Introduction of ram ddr3
 
Virtualization for Emerging Memory Devices
Virtualization for Emerging Memory DevicesVirtualization for Emerging Memory Devices
Virtualization for Emerging Memory Devices
 
Memory Unit
Memory UnitMemory Unit
Memory Unit
 
Working memory in computer
Working memory in computerWorking memory in computer
Working memory in computer
 
Pace IT - Introduction to Ram
Pace IT - Introduction to RamPace IT - Introduction to Ram
Pace IT - Introduction to Ram
 
Ch4 memory management
Ch4 memory managementCh4 memory management
Ch4 memory management
 
Memory
MemoryMemory
Memory
 
Reliability, Availability and Serviceability on Linux
Reliability, Availability and Serviceability on LinuxReliability, Availability and Serviceability on Linux
Reliability, Availability and Serviceability on Linux
 
Memory Hierarchy (RAM and ROM)
Memory Hierarchy (RAM and ROM)Memory Hierarchy (RAM and ROM)
Memory Hierarchy (RAM and ROM)
 
Q4.11: Introduction to eMMC
Q4.11: Introduction to eMMCQ4.11: Introduction to eMMC
Q4.11: Introduction to eMMC
 
Enabling Machine Learning on the Edge using SRAM Conserving Efficient Neural ...
Enabling Machine Learning on the Edge using SRAM Conserving Efficient Neural ...Enabling Machine Learning on the Edge using SRAM Conserving Efficient Neural ...
Enabling Machine Learning on the Edge using SRAM Conserving Efficient Neural ...
 
Improving MeeGo boot-up time
Improving MeeGo boot-up timeImproving MeeGo boot-up time
Improving MeeGo boot-up time
 
Function of memory.4to5
Function of memory.4to5Function of memory.4to5
Function of memory.4to5
 

More from Kernel TLV

DPDK In Depth
DPDK In DepthDPDK In Depth
DPDK In Depth
Kernel TLV
 
Building Network Functions with eBPF & BCC
Building Network Functions with eBPF & BCCBuilding Network Functions with eBPF & BCC
Building Network Functions with eBPF & BCC
Kernel TLV
 
SGX Trusted Execution Environment
SGX Trusted Execution EnvironmentSGX Trusted Execution Environment
SGX Trusted Execution Environment
Kernel TLV
 
Fun with FUSE
Fun with FUSEFun with FUSE
Fun with FUSE
Kernel TLV
 
Kernel Proc Connector and Containers
Kernel Proc Connector and ContainersKernel Proc Connector and Containers
Kernel Proc Connector and Containers
Kernel TLV
 
Bypassing ASLR Exploiting CVE 2015-7545
Bypassing ASLR Exploiting CVE 2015-7545Bypassing ASLR Exploiting CVE 2015-7545
Bypassing ASLR Exploiting CVE 2015-7545
Kernel TLV
 
Present Absence of Linux Filesystem Security
Present Absence of Linux Filesystem SecurityPresent Absence of Linux Filesystem Security
Present Absence of Linux Filesystem Security
Kernel TLV
 
OpenWrt From Top to Bottom
OpenWrt From Top to BottomOpenWrt From Top to Bottom
OpenWrt From Top to Bottom
Kernel TLV
 
Make Your Containers Faster: Linux Container Performance Tools
Make Your Containers Faster: Linux Container Performance ToolsMake Your Containers Faster: Linux Container Performance Tools
Make Your Containers Faster: Linux Container Performance Tools
Kernel TLV
 
Emerging Persistent Memory Hardware and ZUFS - PM-based File Systems in User ...
Emerging Persistent Memory Hardware and ZUFS - PM-based File Systems in User ...Emerging Persistent Memory Hardware and ZUFS - PM-based File Systems in User ...
Emerging Persistent Memory Hardware and ZUFS - PM-based File Systems in User ...
Kernel TLV
 
File Systems: Why, How and Where
File Systems: Why, How and WhereFile Systems: Why, How and Where
File Systems: Why, How and Where
Kernel TLV
 
netfilter and iptables
netfilter and iptablesnetfilter and iptables
netfilter and iptables
Kernel TLV
 
KernelTLV Speaker Guidelines
KernelTLV Speaker GuidelinesKernelTLV Speaker Guidelines
KernelTLV Speaker Guidelines
Kernel TLV
 
Userfaultfd: Current Features, Limitations and Future Development
Userfaultfd: Current Features, Limitations and Future DevelopmentUserfaultfd: Current Features, Limitations and Future Development
Userfaultfd: Current Features, Limitations and Future Development
Kernel TLV
 
Linux Kernel Cryptographic API and Use Cases
Linux Kernel Cryptographic API and Use CasesLinux Kernel Cryptographic API and Use Cases
Linux Kernel Cryptographic API and Use Cases
Kernel TLV
 
DMA Survival Guide
DMA Survival GuideDMA Survival Guide
DMA Survival Guide
Kernel TLV
 
FD.IO Vector Packet Processing
FD.IO Vector Packet ProcessingFD.IO Vector Packet Processing
FD.IO Vector Packet Processing
Kernel TLV
 
WiFi and the Beast
WiFi and the BeastWiFi and the Beast
WiFi and the Beast
Kernel TLV
 
Introduction to DPDK
Introduction to DPDKIntroduction to DPDK
Introduction to DPDK
Kernel TLV
 
FreeBSD and Drivers
FreeBSD and DriversFreeBSD and Drivers
FreeBSD and Drivers
Kernel TLV
 

More from Kernel TLV (20)

DPDK In Depth
DPDK In DepthDPDK In Depth
DPDK In Depth
 
Building Network Functions with eBPF & BCC
Building Network Functions with eBPF & BCCBuilding Network Functions with eBPF & BCC
Building Network Functions with eBPF & BCC
 
SGX Trusted Execution Environment
SGX Trusted Execution EnvironmentSGX Trusted Execution Environment
SGX Trusted Execution Environment
 
Fun with FUSE
Fun with FUSEFun with FUSE
Fun with FUSE
 
Kernel Proc Connector and Containers
Kernel Proc Connector and ContainersKernel Proc Connector and Containers
Kernel Proc Connector and Containers
 
Bypassing ASLR Exploiting CVE 2015-7545
Bypassing ASLR Exploiting CVE 2015-7545Bypassing ASLR Exploiting CVE 2015-7545
Bypassing ASLR Exploiting CVE 2015-7545
 
Present Absence of Linux Filesystem Security
Present Absence of Linux Filesystem SecurityPresent Absence of Linux Filesystem Security
Present Absence of Linux Filesystem Security
 
OpenWrt From Top to Bottom
OpenWrt From Top to BottomOpenWrt From Top to Bottom
OpenWrt From Top to Bottom
 
Make Your Containers Faster: Linux Container Performance Tools
Make Your Containers Faster: Linux Container Performance ToolsMake Your Containers Faster: Linux Container Performance Tools
Make Your Containers Faster: Linux Container Performance Tools
 
Emerging Persistent Memory Hardware and ZUFS - PM-based File Systems in User ...
Emerging Persistent Memory Hardware and ZUFS - PM-based File Systems in User ...Emerging Persistent Memory Hardware and ZUFS - PM-based File Systems in User ...
Emerging Persistent Memory Hardware and ZUFS - PM-based File Systems in User ...
 
File Systems: Why, How and Where
File Systems: Why, How and WhereFile Systems: Why, How and Where
File Systems: Why, How and Where
 
netfilter and iptables
netfilter and iptablesnetfilter and iptables
netfilter and iptables
 
KernelTLV Speaker Guidelines
KernelTLV Speaker GuidelinesKernelTLV Speaker Guidelines
KernelTLV Speaker Guidelines
 
Userfaultfd: Current Features, Limitations and Future Development
Userfaultfd: Current Features, Limitations and Future DevelopmentUserfaultfd: Current Features, Limitations and Future Development
Userfaultfd: Current Features, Limitations and Future Development
 
Linux Kernel Cryptographic API and Use Cases
Linux Kernel Cryptographic API and Use CasesLinux Kernel Cryptographic API and Use Cases
Linux Kernel Cryptographic API and Use Cases
 
DMA Survival Guide
DMA Survival GuideDMA Survival Guide
DMA Survival Guide
 
FD.IO Vector Packet Processing
FD.IO Vector Packet ProcessingFD.IO Vector Packet Processing
FD.IO Vector Packet Processing
 
WiFi and the Beast
WiFi and the BeastWiFi and the Beast
WiFi and the Beast
 
Introduction to DPDK
Introduction to DPDKIntroduction to DPDK
Introduction to DPDK
 
FreeBSD and Drivers
FreeBSD and DriversFreeBSD and Drivers
FreeBSD and Drivers
 

Recently uploaded

Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus
 
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
Juraj Vysvader
 
Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...
Globus
 
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptxTop Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
rickgrimesss22
 
Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024
Globus
 
A Sighting of filterA in Typelevel Rite of Passage
A Sighting of filterA in Typelevel Rite of PassageA Sighting of filterA in Typelevel Rite of Passage
A Sighting of filterA in Typelevel Rite of Passage
Philip Schwarz
 
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Globus
 
Prosigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology SolutionsProsigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology Solutions
Prosigns
 
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
informapgpstrackings
 
Enhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdfEnhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdf
Globus
 
Cyaniclab : Software Development Agency Portfolio.pdf
Cyaniclab : Software Development Agency Portfolio.pdfCyaniclab : Software Development Agency Portfolio.pdf
Cyaniclab : Software Development Agency Portfolio.pdf
Cyanic lab
 
Enterprise Resource Planning System in Telangana
Enterprise Resource Planning System in TelanganaEnterprise Resource Planning System in Telangana
Enterprise Resource Planning System in Telangana
NYGGS Automation Suite
 
Lecture 1 Introduction to games development
Lecture 1 Introduction to games developmentLecture 1 Introduction to games development
Lecture 1 Introduction to games development
abdulrafaychaudhry
 
Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024
Paco van Beckhoven
 
Corporate Management | Session 3 of 3 | Tendenci AMS
Corporate Management | Session 3 of 3 | Tendenci AMSCorporate Management | Session 3 of 3 | Tendenci AMS
Corporate Management | Session 3 of 3 | Tendenci AMS
Tendenci - The Open Source AMS (Association Management Software)
 
RISE with SAP and Journey to the Intelligent Enterprise
RISE with SAP and Journey to the Intelligent EnterpriseRISE with SAP and Journey to the Intelligent Enterprise
RISE with SAP and Journey to the Intelligent Enterprise
Srikant77
 
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Globus
 
Enhancing Project Management Efficiency_ Leveraging AI Tools like ChatGPT.pdf
Enhancing Project Management Efficiency_ Leveraging AI Tools like ChatGPT.pdfEnhancing Project Management Efficiency_ Leveraging AI Tools like ChatGPT.pdf
Enhancing Project Management Efficiency_ Leveraging AI Tools like ChatGPT.pdf
Jay Das
 
Large Language Models and the End of Programming
Large Language Models and the End of ProgrammingLarge Language Models and the End of Programming
Large Language Models and the End of Programming
Matt Welsh
 
Graphic Design Crash Course for beginners
Graphic Design Crash Course for beginnersGraphic Design Crash Course for beginners
Graphic Design Crash Course for beginners
e20449
 

Recently uploaded (20)

Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024
 
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
 
Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...
 
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptxTop Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
 
Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024
 
A Sighting of filterA in Typelevel Rite of Passage
A Sighting of filterA in Typelevel Rite of PassageA Sighting of filterA in Typelevel Rite of Passage
A Sighting of filterA in Typelevel Rite of Passage
 
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
 
Prosigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology SolutionsProsigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology Solutions
 
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
 
Enhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdfEnhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdf
 
Cyaniclab : Software Development Agency Portfolio.pdf
Cyaniclab : Software Development Agency Portfolio.pdfCyaniclab : Software Development Agency Portfolio.pdf
Cyaniclab : Software Development Agency Portfolio.pdf
 
Enterprise Resource Planning System in Telangana
Enterprise Resource Planning System in TelanganaEnterprise Resource Planning System in Telangana
Enterprise Resource Planning System in Telangana
 
Lecture 1 Introduction to games development
Lecture 1 Introduction to games developmentLecture 1 Introduction to games development
Lecture 1 Introduction to games development
 
Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024
 
Corporate Management | Session 3 of 3 | Tendenci AMS
Corporate Management | Session 3 of 3 | Tendenci AMSCorporate Management | Session 3 of 3 | Tendenci AMS
Corporate Management | Session 3 of 3 | Tendenci AMS
 
RISE with SAP and Journey to the Intelligent Enterprise
RISE with SAP and Journey to the Intelligent EnterpriseRISE with SAP and Journey to the Intelligent Enterprise
RISE with SAP and Journey to the Intelligent Enterprise
 
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
 
Enhancing Project Management Efficiency_ Leveraging AI Tools like ChatGPT.pdf
Enhancing Project Management Efficiency_ Leveraging AI Tools like ChatGPT.pdfEnhancing Project Management Efficiency_ Leveraging AI Tools like ChatGPT.pdf
Enhancing Project Management Efficiency_ Leveraging AI Tools like ChatGPT.pdf
 
Large Language Models and the End of Programming
Large Language Models and the End of ProgrammingLarge Language Models and the End of Programming
Large Language Models and the End of Programming
 
Graphic Design Crash Course for beginners
Graphic Design Crash Course for beginnersGraphic Design Crash Course for beginners
Graphic Design Crash Course for beginners
 

Continguous Memory Allocator in the Linux Kernel

  • 1. Linux CMA (Contiguous Memory Allocator) Mark Veltzer <mark.veltzer@gmail.com>
  • 2. Why contiguous memory? ● DMA ● Sharing between operating systems
  • 3. Why large contiguous memory? ● Brain dead devices (200MB in one DMA transfer?). ● Devices that uses large memory chunks. ● Performance of scatter/gather. ● kmalloc (4MB on x86, less on other platforms), gfp (8MB) limitations. ● And again, sharing between operating systems...
  • 4. Solutions to the contiguous memory problem ● Use small hardware buffers (the most common). ● Use scatter gather in hardware or in software (very common if small buffers are not available – especially in general availability products – because of the problems of the next methods and the unavailability of an IOMMU). ● Reserve a large contiguous buffer (a.k.a the old way – try not to use me if you can...). ● CMA (you are here! - half way to nirvana). ● IOMMU (a.k.a. the best thing since sliced bread – pure bliss – your company will build a temple in your honor).
  • 5. Doing it on your own (no CMA) ● First reserve a large buffer for your needs and tell the kernel to keep it's filthy grubbing hands off your buffer... ● Find an free large area by looking at /proc/iomem. I found 200M free buffer at address 800M in this example. ● Pass mem=800M memmap=200M$800M to the kernel at boot (this actually depends on the kernel version, for some kernels only memmap is needed, ugh!!!). ● Check it by passing bootmem_debug ● See that the allocation worked by looking at: – dmesg | grep user (no guarantee, kernel log changes all the time) – cat /proc/iomem (even better than previous). – /sys/firmware/memmap/*/* (the raw data – can be machine parsed).
  • 6. Doing it on your own (no CMA). Continued... ● Now you want your driver/kernel code to access this buffer… ● request_mem_region with the same physical address you passed as a kernel parameter. ● ioremap ● And just you the buffer you get from ioremap. ● Dont forget to iounmap and release_mem_region... ● Lets see an example…
  • 7. What's wrong with this approach? ● You need to find a free spot from /proc/iomem. This space can change between board to board, chipset to chipset, at hotplug time, pci plug&play non determinacy, between architectures and more. ● The space will not be used for anything, ever. So that space is lost even you never do anything with it. ● You need to know the size of RAM you want in advance, and be accurate about it. This is because every byte you allocate is a byte you can't use for regular system operations.
  • 8. CMA ● What is it? A piece of code, now in the standard kernel. ● Enabled in the default Linux configuration and by most current Linux distributions. ● A result of a long line of patches with various approaches intended to solve the “large contiguous memory” problem. ● Well integrated, both with the Linux memory allocator (gfp) and with the Linux DMA subsystem.
  • 9. How do you use it? ● Make sure it's in your kernel (CONFIG_CMA) ● You can make the default CMA context larger or smaller via various kernel configuration options (CMA_AREAS, CMA_DEBUG, CMA_SIZE_MBYTES, CMA_SIZE_PERCENTAGE, ...) ● Use the regular DMA API to allocate DMA buffers. That's it. ● This means that drivers do not need to be changed. ● Allocation time will be a little longer if you ask for large buffers (see next slide). But not by much. ● You can use the special CMA APIS (dma_contiguous_reserve, dma_contiguous_early_fixup, dma_alloc_from_contiguous, dma_release_from_contiguous, dma_declare_contiguous) but you don't have too. Try to avoid it.
  • 10. How does it work? ● Pages in the kernel are marked as movable and non movable. ● Pages which are movable are pages where the physical addresses don't matter. This is true for most pages (especially user space pages and the page cache which take up most of the space in the system). ● When you alloc via cma (actually when the DMA api allocs and CMA is enabled) it migrated pages pages which are migratable (movable) to make physical room for you buffer. ● While the migration is taking place the CMA uses a feature of the page allocator which allow to mark a page block as MIGRATE_ISOLATE which prevents the page allocator from allocating these pages while the moving is taking place. ● Allocating is from CMA contexts/pool. By default there are 7 (controllable from kernel configuration) of those and one is default. You can create your own and have a private pool for you driver if you are willing to use the special API. Usually you don't need these but for special considerations (realtime etc) this may be an option.
  • 11. How does this solve my problems? ● Well, with CMA all of your RAM is used and if you never ask for that large contiguous space then you can use all of your systems RAM. ● In addition CMA is integrated with the page allocator so when you return pages (simply dealloc via the DMA API) the pages actually go back to being used by whomever. ● CMA is transparent to drivers while the previous solution is not.
  • 12. Issues with CMA ● get_user_pages() is an issue which is still to be dealt with efficiently. get_user_pages() locks pages and makes the unmovable. This is dealt with today by moving pages away as soon as they are locked which creates a bit of overhead. ● IOMMU is better (see next slide).
  • 13. IOMMU ● IOMMU solves all of your large contiguous woes (at least as they pertain to hardware, not to operating system sharing). ● It also has a security advantage. ● And much better debugging and stability. ● Really, try to use a board with IOMMU… It's well worth the extra $$$…. ● Maybe you already have an iommu? Check /sys/class/iommu/*...
  • 14. References ● $KERNEL/doc/kernel-parameters.txt (cma=) ● $KERNEL/drivers/base/dma-contiguous.c ● $KERNEL/include/linux/dma-contiguous.h ● https://lwn.net/Articles/486301/ A deep dive into CMA ● https://lwn.net/Articles/474819/ DMA buffer sharing ● https://lwn.net/Articles/396657/ Older CMA article – some of it irrelevant. ● https://lwn.net/Articles/480055/ The Android ION memory allocator (replacement of pmem).