SlideShare a Scribd company logo
GCMA: Guaranteed
Contiguous Memory Allocator
SeongJae Park1
, Minchan Kim2
, Heon Y. Yeom1
Seoul National University1
LG Electronics2
This work by SeongJae Park is licensed under the Creative
Commons Attribution-ShareAlike 3.0 Unported License. To
view a copy of this license, visit
http://creativecommons.org/licenses/by-sa/3.0/.
These slides were presented during
EWiLi 2015 at Amsterdam
(http://syst.univ-brest.fr/ewili2015/)
TL; DR
● Contiguous Memory Allocation is a big problem of embedded system
● Current solutions have weakness
○ Too expensive for embedded system, or
○ Waste memory space, or
○ Too slow and even fail to allocate
● GCMA, our new solution, guarantees
○ Fast latency of contiguous memory allocation
○ Success of contiguous memory allocation
○ and Reasonable memory utilization
● That’s it! ;)
Who Wants Physically Contiguous Memory?
● Virtual Memory avoids the fragmentation problem
○ Applications use virtual address; virtual address can be mapped to any physical address
○ MMU do mapping of virtual address to physical address
● In Linux, it uses demand paging and reclaim
Physical
Memory
Application MMUCPU
Logical address Physical address
We Need Physically Contiguous Memory!
● Devices using DMA
○ Lots of devices need large, contiguous memory for internal buffer
(e.g., 12 Mega-Pixel Camera needs large, contiguous buffer)
○ Because MMU is behind the CPU, devices using DMA cannot use virtual address space
○ Slow latency or failure of allocation can harm user experience or life
(What if your camera does not react while a kitty go away)
Physical
Memory
Application MMUCPU
Logical address Physical address
Device
DMA
Physical address
We Need Physically Contiguous Memory!
● Devices using DMA
○ Lots of devices need large, contiguous memory for internal buffer
(e.g., 12 Mega-Pixel Camera needs large, contiguous buffer)
○ Because MMU is behind the CPU, devices using DMA can not use virtual address space
○ Slow latency or failure of allocation can harm user experience or life
(What if your camera does not react while a kitty go away)
● Huge page, System internal buffer
○ Huge pages can enhance system performance by minimizing TLB miss
○ System needs contiguous memory for internal usage
e.g., struct page *alloc_pages(unsigned int flags, unsigned int order);
NOTE: This research focuses on device case for embedded system, though
H/W Solutions: IOMMU, Scatter / Gather DMA
● IOMMU serves as MMU for devices
● Scatter / Gather DMA let DMA to discontiguous memory area
● Consumes more power and money because it requires additional H/W
● Can be expensive for low-end embedded devices
● Useless for Huge page or system internal physically contiguous memory
Physical
Memory
Application MMUCPU
Logical address Physicals address
Device
DMA
Device address
IOMMU
Physical address
Reserved Area Technique
● Reserves sufficient amount of contiguous area at boot time and
let only contiguous memory allocation to use the area
● Pros: Simple and effective for contiguous memory allocation
● Cons: System memory space utilization could be low if the contiguous
memory allocation does not fully use the reserved area
○ Sacrifice 10% of your phone memory for a kitty that may or may not meet with?
● Widely adopted solution despite of the utilization problem
System Memory
(Narrowed by Reservation)
Reserved
Area
Process
Normal Allocation Contiguous Allocation
CMA: Contiguous Memory Allocator
● Software-based solution in mainline Linux
● Generously extended version of Reserved area technique
○ Mainly focus on memory utilization
○ Let movable pages to use the reserved area
○ If contig mem alloc requires pages being used for movable pages,
move those pages out of reserved area and give the vacant area to contig mem alloc
○ Solves memory utilization problem because general pages are movable
● In other words, CMA gives different priority to clients of the reserved area
○ Primary client: contiguous memory allocation
○ Secondary client: movable page allocation
CMA: in Real-land
● Worst case latency for a photo shot using Camera of Raspberry Pi 2 under
background workload is
● 1.6 Seconds when configured with Reserved area technique,
● 9.8 seconds when configured with CMA; Enough for a kitty to go away
The Phantom Menace Unveiled: Latency of CMA
● Latency of CMA during a photo shot using Camera of Raspberry Pi 2 under
background workload
● It’s clear that latency of Camera is bounded by CMA
CMA: Why So Slow?
● In short, secondary client of CMA is not so nice as expected
○ In this context, ‘nice’ means as unix command ‘nice’ means
● Moving page is expensive task (copying, rmap control, LRU churn, …)
● If someone is holding the page, it could not be moved until the one releases
the page (e.g., get_user_page())
● Result: Unexpected long latency and even failure
● That’s why CMA is not adopted to many devices
● Raspberry Pi does not support CMA officially because of the problem
GCMA: Guaranteed Contiguous Memory Allocator
● GCMA is a variant of CMA that guarantees
○ Fast latency and success of allocation
○ While keeping memory utilization as well
● Main idea is simple
○ Follow primary / secondary client idea of CMA to keep memory utilization
○ Select secondary client as only nice one unlike CMA
■ For fast latency, should be able to vacate the reserved area as soon as required without
any task such as moving content
■ For success, should be out of kernel control
■ For memory utilization, most pages should be able to be secondary client
Secondary Client of GCMA
● Last chance cache for
○ pages being swapped out and
○ clean pages being evicted from file cache
● Pages being swapped out to write-through swap device
○ Could be discarded immediately because the contents are written through to swap device
○ Kernel think it’s already swapped out
○ Most anonymous pages could be swapped out
● Clean pages being evicted from file cache
○ Could be discarded because the contents are in storage already
○ Kernel think it’s already evicted out
○ Lots of file-backed pages could be the case
● Additional pros 1: Already expected to not be accessed again soon
○ Discarding secondary client pages from GCAM will not affect system performance much
● Additional pros 2: Occurs with severe workload, only
○ In peaceful case, zero overhead at all
GCMA: Workflow
● Reserve memory area in boot time
● If a page is swapped out or evicted from file cache, keep the page in reserved
area
○ If system requires the page again, give it back from the reserved area
○ If contiguous memory allocation requires area being used by those pages, discard those
pages and use the area for contiguous memory allocation
System Memory GCMA Reserved
Area
Swap Device
File
Process
Normal Allocation Contiguous Allocation
Reclaim / Swap
Get back if hit
GCMA Implementation
● Used Cleancache / Frontswap interface for second class client of GCMA
rather than inventing the wheel again
● DMEM: Discardable Memory
○ Works as a last chance cache for secondary client pages using GCMA reserved area
○ Manages Index table using hash table and RB-tree
○ Evicts pages in LRU scheme
Reserved Area
Dmem Logic
GCMA Logic
Dmem Interface
CMA Interface
CMA Logic
Cleancache Frontswap
GCMA Implementation: in Short
● Implemented on Linux v3.18
● About 1,500 lines of code
● Ported, evaluated on CuBox and Raspberry Pi 2
● Available under GPL v3 at: https://github.com/sjp38/linux.
gcma/releases/tag/gcma/rfc/v2
GCMA Optimization for Write-through Swap
● Write-through swap device could consume I/O resource
● GCMA recommends compressed memory block device as a swap device
○ Zram is a compressed memory block device in upstream Linux
○ Google also recommends zram as a swap device for low-memory Android devices
Experimental Setup
● Device setup
○ Raspberry Pi 2
○ ARM cortex-A7 900 MHz * 4 cores
○ 1 GiB LPDDR2 SDRAM
○ Class 10 SanDisk 16 GiB microSD card
● Configurations
○ Baseline: Linux rpi-v3.18.11 + 100 MiB swap +
256 MiB reserved area
○ CMA: Linux rpi-v3.18.11 + 100 MiB swap +
256 MiB CMA area
○ GCMA: Linux rpi-v3.18.11 + 100 MiB Zram swap +
256 MiB GCMA area
● Workloads
○ Contig Mem Alloc: Contiguous memory allocation using CMA or GCMA
○ Blogbench: Realistic file system benchmark
○ Camera shot: Repeated shot of picture using Raspberry Pi 2 Camera module with 10 seconds
interval between each shot
Contig Mem Alloc without Background Workload
● Average of 30 allocations
● GCMA shows 14.89x to 129.41x faster latency compared to CMA
● CMA even failed once for 32,768 contiguous pages allocation even though
there is no background task
4MiB Contig Mem Alloc w/o Background Workload
● Latency of CMA is more predictable compared to GCMA
4MiB Contig Mem Allocation w/ Background Task
● Background workload: Blogbench
● CMA consumes 5 seconds (unacceptable!) for one allocation in bad case
Camera Latency w/ Background Task
● Camera is a realistic, important workload of Raspberry Pi 2
● Background task: Blogbench
● GCMA keeps latency as fast as Reserved area technique configuration
Blogbench on CMA / GCMA
● Scores are normalized to scores of Baseline configuration
● ‘/ cam’ means camera workload as a background workload
● System under GCMA even outperforms CMA with realistic workload owing to
its light overhead
● Allocation using CMA even degrades system performance because it should
move out pages in reserved area
Conclusion
● Contiguous memory allocation is still a big problem for embedded device
○ H/W solutions are too expensive
○ S/W solutions like CMA is not effective
○ Most system uses Reserved area technique despite of low memory utilization
● GCMA guarantees fast latency, success of allocation and reasonable memory
utilization
○ Allocation latency of GCMA is as fast as Reserved area technique
○ System performance under GCMA is is similar or outperforms the performance under CMA
Thank You
Any question?

More Related Content

What's hot

DB Latency Using DRAM + PMem in App Direct & Memory Modes
DB Latency Using DRAM + PMem in App Direct & Memory ModesDB Latency Using DRAM + PMem in App Direct & Memory Modes
DB Latency Using DRAM + PMem in App Direct & Memory Modes
ScyllaDB
 
Vanquishing Latency Outliers in the Lightbits LightOS Software Defined Storag...
Vanquishing Latency Outliers in the Lightbits LightOS Software Defined Storag...Vanquishing Latency Outliers in the Lightbits LightOS Software Defined Storag...
Vanquishing Latency Outliers in the Lightbits LightOS Software Defined Storag...
ScyllaDB
 
µCLinux on Pluto 6 Project presentation
µCLinux on Pluto 6 Project presentationµCLinux on Pluto 6 Project presentation
µCLinux on Pluto 6 Project presentation
edlangley
 
Performance: Observe and Tune
Performance: Observe and TunePerformance: Observe and Tune
Performance: Observe and Tune
Paul V. Novarese
 
Kernel Recipes 2017 - What's new in the world of storage for Linux - Jens Axboe
Kernel Recipes 2017 - What's new in the world of storage for Linux - Jens AxboeKernel Recipes 2017 - What's new in the world of storage for Linux - Jens Axboe
Kernel Recipes 2017 - What's new in the world of storage for Linux - Jens Axboe
Anne Nicolas
 
Get Lower Latency and Higher Throughput for Java Applications
Get Lower Latency and Higher Throughput for Java ApplicationsGet Lower Latency and Higher Throughput for Java Applications
Get Lower Latency and Higher Throughput for Java Applications
ScyllaDB
 
Where Did All These Cycles Go?
Where Did All These Cycles Go?Where Did All These Cycles Go?
Where Did All These Cycles Go?
ScyllaDB
 
LAS16-TR04: Using tracing to tune and optimize EAS (English)
LAS16-TR04: Using tracing to tune and optimize EAS (English)LAS16-TR04: Using tracing to tune and optimize EAS (English)
LAS16-TR04: Using tracing to tune and optimize EAS (English)
Linaro
 
BKK16-208 EAS
BKK16-208 EASBKK16-208 EAS
BKK16-208 EAS
Linaro
 
Kernel Recipes 2016 - Speeding up development by setting up a kernel build farm
Kernel Recipes 2016 - Speeding up development by setting up a kernel build farmKernel Recipes 2016 - Speeding up development by setting up a kernel build farm
Kernel Recipes 2016 - Speeding up development by setting up a kernel build farm
Anne Nicolas
 
Practical SystemTAP basics: Perl memory profiling
Practical SystemTAP basics: Perl memory profilingPractical SystemTAP basics: Perl memory profiling
Practical SystemTAP basics: Perl memory profiling
Lubomir Rintel
 
Let’s Fix Logging Once and for All
Let’s Fix Logging Once and for AllLet’s Fix Logging Once and for All
Let’s Fix Logging Once and for All
ScyllaDB
 
Linux rt in financial markets
Linux rt in financial marketsLinux rt in financial markets
Linux rt in financial markets
Adrien Mahieux
 
Using eBPF to Measure the k8s Cluster Health
Using eBPF to Measure the k8s Cluster HealthUsing eBPF to Measure the k8s Cluster Health
Using eBPF to Measure the k8s Cluster Health
ScyllaDB
 
Luca Abeni - Real-Time Virtual Machines with Linux and kvm
Luca Abeni - Real-Time Virtual Machines with Linux and kvmLuca Abeni - Real-Time Virtual Machines with Linux and kvm
Luca Abeni - Real-Time Virtual Machines with Linux and kvm
linuxlab_conf
 
Linux kernel memory allocators
Linux kernel memory allocatorsLinux kernel memory allocators
Linux kernel memory allocators
Hao-Ran Liu
 
Keeping Latency Low and Throughput High with Application-level Priority Manag...
Keeping Latency Low and Throughput High with Application-level Priority Manag...Keeping Latency Low and Throughput High with Application-level Priority Manag...
Keeping Latency Low and Throughput High with Application-level Priority Manag...
ScyllaDB
 
Extreme HTTP Performance Tuning: 1.2M API req/s on a 4 vCPU EC2 Instance
Extreme HTTP Performance Tuning: 1.2M API req/s on a 4 vCPU EC2 InstanceExtreme HTTP Performance Tuning: 1.2M API req/s on a 4 vCPU EC2 Instance
Extreme HTTP Performance Tuning: 1.2M API req/s on a 4 vCPU EC2 Instance
ScyllaDB
 
Using QEMU for cross development
Using QEMU for cross developmentUsing QEMU for cross development
Using QEMU for cross development
Tetsuyuki Kobayashi
 
Process scheduling
Process schedulingProcess scheduling
Process scheduling
Hao-Ran Liu
 

What's hot (20)

DB Latency Using DRAM + PMem in App Direct & Memory Modes
DB Latency Using DRAM + PMem in App Direct & Memory ModesDB Latency Using DRAM + PMem in App Direct & Memory Modes
DB Latency Using DRAM + PMem in App Direct & Memory Modes
 
Vanquishing Latency Outliers in the Lightbits LightOS Software Defined Storag...
Vanquishing Latency Outliers in the Lightbits LightOS Software Defined Storag...Vanquishing Latency Outliers in the Lightbits LightOS Software Defined Storag...
Vanquishing Latency Outliers in the Lightbits LightOS Software Defined Storag...
 
µCLinux on Pluto 6 Project presentation
µCLinux on Pluto 6 Project presentationµCLinux on Pluto 6 Project presentation
µCLinux on Pluto 6 Project presentation
 
Performance: Observe and Tune
Performance: Observe and TunePerformance: Observe and Tune
Performance: Observe and Tune
 
Kernel Recipes 2017 - What's new in the world of storage for Linux - Jens Axboe
Kernel Recipes 2017 - What's new in the world of storage for Linux - Jens AxboeKernel Recipes 2017 - What's new in the world of storage for Linux - Jens Axboe
Kernel Recipes 2017 - What's new in the world of storage for Linux - Jens Axboe
 
Get Lower Latency and Higher Throughput for Java Applications
Get Lower Latency and Higher Throughput for Java ApplicationsGet Lower Latency and Higher Throughput for Java Applications
Get Lower Latency and Higher Throughput for Java Applications
 
Where Did All These Cycles Go?
Where Did All These Cycles Go?Where Did All These Cycles Go?
Where Did All These Cycles Go?
 
LAS16-TR04: Using tracing to tune and optimize EAS (English)
LAS16-TR04: Using tracing to tune and optimize EAS (English)LAS16-TR04: Using tracing to tune and optimize EAS (English)
LAS16-TR04: Using tracing to tune and optimize EAS (English)
 
BKK16-208 EAS
BKK16-208 EASBKK16-208 EAS
BKK16-208 EAS
 
Kernel Recipes 2016 - Speeding up development by setting up a kernel build farm
Kernel Recipes 2016 - Speeding up development by setting up a kernel build farmKernel Recipes 2016 - Speeding up development by setting up a kernel build farm
Kernel Recipes 2016 - Speeding up development by setting up a kernel build farm
 
Practical SystemTAP basics: Perl memory profiling
Practical SystemTAP basics: Perl memory profilingPractical SystemTAP basics: Perl memory profiling
Practical SystemTAP basics: Perl memory profiling
 
Let’s Fix Logging Once and for All
Let’s Fix Logging Once and for AllLet’s Fix Logging Once and for All
Let’s Fix Logging Once and for All
 
Linux rt in financial markets
Linux rt in financial marketsLinux rt in financial markets
Linux rt in financial markets
 
Using eBPF to Measure the k8s Cluster Health
Using eBPF to Measure the k8s Cluster HealthUsing eBPF to Measure the k8s Cluster Health
Using eBPF to Measure the k8s Cluster Health
 
Luca Abeni - Real-Time Virtual Machines with Linux and kvm
Luca Abeni - Real-Time Virtual Machines with Linux and kvmLuca Abeni - Real-Time Virtual Machines with Linux and kvm
Luca Abeni - Real-Time Virtual Machines with Linux and kvm
 
Linux kernel memory allocators
Linux kernel memory allocatorsLinux kernel memory allocators
Linux kernel memory allocators
 
Keeping Latency Low and Throughput High with Application-level Priority Manag...
Keeping Latency Low and Throughput High with Application-level Priority Manag...Keeping Latency Low and Throughput High with Application-level Priority Manag...
Keeping Latency Low and Throughput High with Application-level Priority Manag...
 
Extreme HTTP Performance Tuning: 1.2M API req/s on a 4 vCPU EC2 Instance
Extreme HTTP Performance Tuning: 1.2M API req/s on a 4 vCPU EC2 InstanceExtreme HTTP Performance Tuning: 1.2M API req/s on a 4 vCPU EC2 Instance
Extreme HTTP Performance Tuning: 1.2M API req/s on a 4 vCPU EC2 Instance
 
Using QEMU for cross development
Using QEMU for cross developmentUsing QEMU for cross development
Using QEMU for cross development
 
Process scheduling
Process schedulingProcess scheduling
Process scheduling
 

Viewers also liked

Git inter-snapshot public
Git  inter-snapshot publicGit  inter-snapshot public
Git inter-snapshot public
SeongJae Park
 
Porting golang development environment developed with golang
Porting golang development environment developed with golangPorting golang development environment developed with golang
Porting golang development environment developed with golang
SeongJae Park
 
An introduction to_golang.avi
An introduction to_golang.aviAn introduction to_golang.avi
An introduction to_golang.avi
SeongJae Park
 
Let the contribution begin (EST futures)
Let the contribution begin  (EST futures)Let the contribution begin  (EST futures)
Let the contribution begin (EST futures)
SeongJae Park
 
Deep dark side of git - prologue
Deep dark side of git - prologueDeep dark side of git - prologue
Deep dark side of git - prologueSeongJae Park
 
(Live) build and run golang web server on android.avi
(Live) build and run golang web server on android.avi(Live) build and run golang web server on android.avi
(Live) build and run golang web server on android.avi
SeongJae Park
 
Develop Android/iOS app using golang
Develop Android/iOS app using golangDevelop Android/iOS app using golang
Develop Android/iOS app using golang
SeongJae Park
 
Sw install with_without_docker
Sw install with_without_dockerSw install with_without_docker
Sw install with_without_docker
SeongJae Park
 
Deep dark-side of git: How git works internally
Deep dark-side of git: How git works internallyDeep dark-side of git: How git works internally
Deep dark-side of git: How git works internally
SeongJae Park
 
Develop Android app using Golang
Develop Android app using GolangDevelop Android app using Golang
Develop Android app using Golang
SeongJae Park
 

Viewers also liked (11)

Git inter-snapshot public
Git  inter-snapshot publicGit  inter-snapshot public
Git inter-snapshot public
 
Porting golang development environment developed with golang
Porting golang development environment developed with golangPorting golang development environment developed with golang
Porting golang development environment developed with golang
 
An introduction to_golang.avi
An introduction to_golang.aviAn introduction to_golang.avi
An introduction to_golang.avi
 
Let the contribution begin (EST futures)
Let the contribution begin  (EST futures)Let the contribution begin  (EST futures)
Let the contribution begin (EST futures)
 
Deep dark side of git - prologue
Deep dark side of git - prologueDeep dark side of git - prologue
Deep dark side of git - prologue
 
(Live) build and run golang web server on android.avi
(Live) build and run golang web server on android.avi(Live) build and run golang web server on android.avi
(Live) build and run golang web server on android.avi
 
Develop Android/iOS app using golang
Develop Android/iOS app using golangDevelop Android/iOS app using golang
Develop Android/iOS app using golang
 
Sw install with_without_docker
Sw install with_without_dockerSw install with_without_docker
Sw install with_without_docker
 
Deep dark-side of git: How git works internally
Deep dark-side of git: How git works internallyDeep dark-side of git: How git works internally
Deep dark-side of git: How git works internally
 
Develop Android app using Golang
Develop Android app using GolangDevelop Android app using Golang
Develop Android app using Golang
 
Memory management
Memory managementMemory management
Memory management
 

Similar to gcma: guaranteed contiguous memory allocator

Continguous Memory Allocator in the Linux Kernel
Continguous Memory Allocator in the Linux KernelContinguous Memory Allocator in the Linux Kernel
Continguous Memory Allocator in the Linux Kernel
Kernel TLV
 
Virtualization for Emerging Memory Devices
Virtualization for Emerging Memory DevicesVirtualization for Emerging Memory Devices
Virtualization for Emerging Memory Devices
Takahiro Hirofuchi
 
Linux Memory Basics for SysAdmins - ChinaNetCloud Training
Linux Memory Basics for SysAdmins - ChinaNetCloud TrainingLinux Memory Basics for SysAdmins - ChinaNetCloud Training
Linux Memory Basics for SysAdmins - ChinaNetCloud Training
ChinaNetCloud
 
LCE13: Android Graphics Upstreaming
LCE13: Android Graphics UpstreamingLCE13: Android Graphics Upstreaming
LCE13: Android Graphics Upstreaming
Linaro
 
IBM MQ Disaster Recovery
IBM MQ Disaster RecoveryIBM MQ Disaster Recovery
IBM MQ Disaster Recovery
MarkTaylorIBM
 
LCA13: Memory Hotplug on Android
LCA13: Memory Hotplug on AndroidLCA13: Memory Hotplug on Android
LCA13: Memory Hotplug on Android
Linaro
 
Memory Management in Amoeba
Memory Management in AmoebaMemory Management in Amoeba
Memory Management in Amoeba
Ramesh Adhikari
 
The Dark Side Of Go -- Go runtime related problems in TiDB in production
The Dark Side Of Go -- Go runtime related problems in TiDB  in productionThe Dark Side Of Go -- Go runtime related problems in TiDB  in production
The Dark Side Of Go -- Go runtime related problems in TiDB in production
PingCAP
 
RAMinate Invited Talk at NII
RAMinate Invited Talk at NIIRAMinate Invited Talk at NII
RAMinate Invited Talk at NII
Takahiro Hirofuchi
 
Sc19 ibm hms final
Sc19 ibm hms finalSc19 ibm hms final
Sc19 ibm hms final
Yutaka Kawai
 
Webinar: Eliminate Backups and Simplify DR with Hybrid Cloud Storage
Webinar: Eliminate Backups and Simplify DR with Hybrid Cloud StorageWebinar: Eliminate Backups and Simplify DR with Hybrid Cloud Storage
Webinar: Eliminate Backups and Simplify DR with Hybrid Cloud Storage
Storage Switzerland
 
Task migration using CRIU
Task migration using CRIUTask migration using CRIU
Task migration using CRIU
Rohit Jnagal
 
Enhanced Live Migration for Intensive Memory Loads
Enhanced Live Migration for Intensive Memory LoadsEnhanced Live Migration for Intensive Memory Loads
Enhanced Live Migration for Intensive Memory Loads
Samsung Open Source Group
 
Literature survey presentation
Literature survey presentationLiterature survey presentation
Literature survey presentationKarthik Iyr
 
Recent advancements in cache technology
Recent advancements in cache technologyRecent advancements in cache technology
Recent advancements in cache technology
Paras Nath Chaudhary
 
ch9_virMem.pdf
ch9_virMem.pdfch9_virMem.pdf
ch9_virMem.pdf
HoNguyn746501
 
Kernel Recipes 2017 - 20 years of Linux Virtual Memory - Andrea Arcangeli
Kernel Recipes 2017 - 20 years of Linux Virtual Memory - Andrea ArcangeliKernel Recipes 2017 - 20 years of Linux Virtual Memory - Andrea Arcangeli
Kernel Recipes 2017 - 20 years of Linux Virtual Memory - Andrea Arcangeli
Anne Nicolas
 
Improving MeeGo boot-up time
Improving MeeGo boot-up timeImproving MeeGo boot-up time
Improving MeeGo boot-up time
Hiroshi Doyu
 
Measuring a 25 and 40Gb/s Data Plane
Measuring a 25 and 40Gb/s Data PlaneMeasuring a 25 and 40Gb/s Data Plane
Measuring a 25 and 40Gb/s Data Plane
Open-NFP
 
RAMinate ACM SoCC 2016 Talk
RAMinate ACM SoCC 2016 TalkRAMinate ACM SoCC 2016 Talk
RAMinate ACM SoCC 2016 Talk
Takahiro Hirofuchi
 

Similar to gcma: guaranteed contiguous memory allocator (20)

Continguous Memory Allocator in the Linux Kernel
Continguous Memory Allocator in the Linux KernelContinguous Memory Allocator in the Linux Kernel
Continguous Memory Allocator in the Linux Kernel
 
Virtualization for Emerging Memory Devices
Virtualization for Emerging Memory DevicesVirtualization for Emerging Memory Devices
Virtualization for Emerging Memory Devices
 
Linux Memory Basics for SysAdmins - ChinaNetCloud Training
Linux Memory Basics for SysAdmins - ChinaNetCloud TrainingLinux Memory Basics for SysAdmins - ChinaNetCloud Training
Linux Memory Basics for SysAdmins - ChinaNetCloud Training
 
LCE13: Android Graphics Upstreaming
LCE13: Android Graphics UpstreamingLCE13: Android Graphics Upstreaming
LCE13: Android Graphics Upstreaming
 
IBM MQ Disaster Recovery
IBM MQ Disaster RecoveryIBM MQ Disaster Recovery
IBM MQ Disaster Recovery
 
LCA13: Memory Hotplug on Android
LCA13: Memory Hotplug on AndroidLCA13: Memory Hotplug on Android
LCA13: Memory Hotplug on Android
 
Memory Management in Amoeba
Memory Management in AmoebaMemory Management in Amoeba
Memory Management in Amoeba
 
The Dark Side Of Go -- Go runtime related problems in TiDB in production
The Dark Side Of Go -- Go runtime related problems in TiDB  in productionThe Dark Side Of Go -- Go runtime related problems in TiDB  in production
The Dark Side Of Go -- Go runtime related problems in TiDB in production
 
RAMinate Invited Talk at NII
RAMinate Invited Talk at NIIRAMinate Invited Talk at NII
RAMinate Invited Talk at NII
 
Sc19 ibm hms final
Sc19 ibm hms finalSc19 ibm hms final
Sc19 ibm hms final
 
Webinar: Eliminate Backups and Simplify DR with Hybrid Cloud Storage
Webinar: Eliminate Backups and Simplify DR with Hybrid Cloud StorageWebinar: Eliminate Backups and Simplify DR with Hybrid Cloud Storage
Webinar: Eliminate Backups and Simplify DR with Hybrid Cloud Storage
 
Task migration using CRIU
Task migration using CRIUTask migration using CRIU
Task migration using CRIU
 
Enhanced Live Migration for Intensive Memory Loads
Enhanced Live Migration for Intensive Memory LoadsEnhanced Live Migration for Intensive Memory Loads
Enhanced Live Migration for Intensive Memory Loads
 
Literature survey presentation
Literature survey presentationLiterature survey presentation
Literature survey presentation
 
Recent advancements in cache technology
Recent advancements in cache technologyRecent advancements in cache technology
Recent advancements in cache technology
 
ch9_virMem.pdf
ch9_virMem.pdfch9_virMem.pdf
ch9_virMem.pdf
 
Kernel Recipes 2017 - 20 years of Linux Virtual Memory - Andrea Arcangeli
Kernel Recipes 2017 - 20 years of Linux Virtual Memory - Andrea ArcangeliKernel Recipes 2017 - 20 years of Linux Virtual Memory - Andrea Arcangeli
Kernel Recipes 2017 - 20 years of Linux Virtual Memory - Andrea Arcangeli
 
Improving MeeGo boot-up time
Improving MeeGo boot-up timeImproving MeeGo boot-up time
Improving MeeGo boot-up time
 
Measuring a 25 and 40Gb/s Data Plane
Measuring a 25 and 40Gb/s Data PlaneMeasuring a 25 and 40Gb/s Data Plane
Measuring a 25 and 40Gb/s Data Plane
 
RAMinate ACM SoCC 2016 Talk
RAMinate ACM SoCC 2016 TalkRAMinate ACM SoCC 2016 Talk
RAMinate ACM SoCC 2016 Talk
 

More from SeongJae Park

Biscuit: an operating system written in go
Biscuit:  an operating system written in goBiscuit:  an operating system written in go
Biscuit: an operating system written in go
SeongJae Park
 
DO YOU WANT TO USE A VCS
DO YOU WANT TO USE A VCSDO YOU WANT TO USE A VCS
DO YOU WANT TO USE A VCS
SeongJae Park
 
Experimental android hacking using reflection
Experimental android hacking using reflectionExperimental android hacking using reflection
Experimental android hacking using reflection
SeongJae Park
 
ash
ashash
Hacktime for adk
Hacktime for adkHacktime for adk
Hacktime for adk
SeongJae Park
 
Let the contribution begin
Let the contribution beginLet the contribution begin
Let the contribution begin
SeongJae Park
 
Touch Android Without Touching
Touch Android Without TouchingTouch Android Without Touching
Touch Android Without Touching
SeongJae Park
 
AOSP에 컨트리뷰션 하기 dev festx korea 2012 presentation
AOSP에 컨트리뷰션 하기   dev festx korea 2012 presentationAOSP에 컨트리뷰션 하기   dev festx korea 2012 presentation
AOSP에 컨트리뷰션 하기 dev festx korea 2012 presentation
SeongJae Park
 

More from SeongJae Park (8)

Biscuit: an operating system written in go
Biscuit:  an operating system written in goBiscuit:  an operating system written in go
Biscuit: an operating system written in go
 
DO YOU WANT TO USE A VCS
DO YOU WANT TO USE A VCSDO YOU WANT TO USE A VCS
DO YOU WANT TO USE A VCS
 
Experimental android hacking using reflection
Experimental android hacking using reflectionExperimental android hacking using reflection
Experimental android hacking using reflection
 
ash
ashash
ash
 
Hacktime for adk
Hacktime for adkHacktime for adk
Hacktime for adk
 
Let the contribution begin
Let the contribution beginLet the contribution begin
Let the contribution begin
 
Touch Android Without Touching
Touch Android Without TouchingTouch Android Without Touching
Touch Android Without Touching
 
AOSP에 컨트리뷰션 하기 dev festx korea 2012 presentation
AOSP에 컨트리뷰션 하기   dev festx korea 2012 presentationAOSP에 컨트리뷰션 하기   dev festx korea 2012 presentation
AOSP에 컨트리뷰션 하기 dev festx korea 2012 presentation
 

Recently uploaded

Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
Adtran
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Nexer Digital
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Paige Cruz
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
Neo4j
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
Quotidiano Piemontese
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems S.M.S.A.
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
Dorra BARTAGUIZ
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
mikeeftimakis1
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
Matthew Sinclair
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Aggregage
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
Aftab Hussain
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
Neo4j
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfSAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
Peter Spielvogel
 

Recently uploaded (20)

Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfSAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
 

gcma: guaranteed contiguous memory allocator

  • 1. GCMA: Guaranteed Contiguous Memory Allocator SeongJae Park1 , Minchan Kim2 , Heon Y. Yeom1 Seoul National University1 LG Electronics2
  • 2. This work by SeongJae Park is licensed under the Creative Commons Attribution-ShareAlike 3.0 Unported License. To view a copy of this license, visit http://creativecommons.org/licenses/by-sa/3.0/.
  • 3. These slides were presented during EWiLi 2015 at Amsterdam (http://syst.univ-brest.fr/ewili2015/)
  • 4. TL; DR ● Contiguous Memory Allocation is a big problem of embedded system ● Current solutions have weakness ○ Too expensive for embedded system, or ○ Waste memory space, or ○ Too slow and even fail to allocate ● GCMA, our new solution, guarantees ○ Fast latency of contiguous memory allocation ○ Success of contiguous memory allocation ○ and Reasonable memory utilization ● That’s it! ;)
  • 5. Who Wants Physically Contiguous Memory? ● Virtual Memory avoids the fragmentation problem ○ Applications use virtual address; virtual address can be mapped to any physical address ○ MMU do mapping of virtual address to physical address ● In Linux, it uses demand paging and reclaim Physical Memory Application MMUCPU Logical address Physical address
  • 6. We Need Physically Contiguous Memory! ● Devices using DMA ○ Lots of devices need large, contiguous memory for internal buffer (e.g., 12 Mega-Pixel Camera needs large, contiguous buffer) ○ Because MMU is behind the CPU, devices using DMA cannot use virtual address space ○ Slow latency or failure of allocation can harm user experience or life (What if your camera does not react while a kitty go away) Physical Memory Application MMUCPU Logical address Physical address Device DMA Physical address
  • 7. We Need Physically Contiguous Memory! ● Devices using DMA ○ Lots of devices need large, contiguous memory for internal buffer (e.g., 12 Mega-Pixel Camera needs large, contiguous buffer) ○ Because MMU is behind the CPU, devices using DMA can not use virtual address space ○ Slow latency or failure of allocation can harm user experience or life (What if your camera does not react while a kitty go away) ● Huge page, System internal buffer ○ Huge pages can enhance system performance by minimizing TLB miss ○ System needs contiguous memory for internal usage e.g., struct page *alloc_pages(unsigned int flags, unsigned int order); NOTE: This research focuses on device case for embedded system, though
  • 8. H/W Solutions: IOMMU, Scatter / Gather DMA ● IOMMU serves as MMU for devices ● Scatter / Gather DMA let DMA to discontiguous memory area ● Consumes more power and money because it requires additional H/W ● Can be expensive for low-end embedded devices ● Useless for Huge page or system internal physically contiguous memory Physical Memory Application MMUCPU Logical address Physicals address Device DMA Device address IOMMU Physical address
  • 9. Reserved Area Technique ● Reserves sufficient amount of contiguous area at boot time and let only contiguous memory allocation to use the area ● Pros: Simple and effective for contiguous memory allocation ● Cons: System memory space utilization could be low if the contiguous memory allocation does not fully use the reserved area ○ Sacrifice 10% of your phone memory for a kitty that may or may not meet with? ● Widely adopted solution despite of the utilization problem System Memory (Narrowed by Reservation) Reserved Area Process Normal Allocation Contiguous Allocation
  • 10. CMA: Contiguous Memory Allocator ● Software-based solution in mainline Linux ● Generously extended version of Reserved area technique ○ Mainly focus on memory utilization ○ Let movable pages to use the reserved area ○ If contig mem alloc requires pages being used for movable pages, move those pages out of reserved area and give the vacant area to contig mem alloc ○ Solves memory utilization problem because general pages are movable ● In other words, CMA gives different priority to clients of the reserved area ○ Primary client: contiguous memory allocation ○ Secondary client: movable page allocation
  • 11. CMA: in Real-land ● Worst case latency for a photo shot using Camera of Raspberry Pi 2 under background workload is ● 1.6 Seconds when configured with Reserved area technique, ● 9.8 seconds when configured with CMA; Enough for a kitty to go away
  • 12. The Phantom Menace Unveiled: Latency of CMA ● Latency of CMA during a photo shot using Camera of Raspberry Pi 2 under background workload ● It’s clear that latency of Camera is bounded by CMA
  • 13. CMA: Why So Slow? ● In short, secondary client of CMA is not so nice as expected ○ In this context, ‘nice’ means as unix command ‘nice’ means ● Moving page is expensive task (copying, rmap control, LRU churn, …) ● If someone is holding the page, it could not be moved until the one releases the page (e.g., get_user_page()) ● Result: Unexpected long latency and even failure ● That’s why CMA is not adopted to many devices ● Raspberry Pi does not support CMA officially because of the problem
  • 14. GCMA: Guaranteed Contiguous Memory Allocator ● GCMA is a variant of CMA that guarantees ○ Fast latency and success of allocation ○ While keeping memory utilization as well ● Main idea is simple ○ Follow primary / secondary client idea of CMA to keep memory utilization ○ Select secondary client as only nice one unlike CMA ■ For fast latency, should be able to vacate the reserved area as soon as required without any task such as moving content ■ For success, should be out of kernel control ■ For memory utilization, most pages should be able to be secondary client
  • 15. Secondary Client of GCMA ● Last chance cache for ○ pages being swapped out and ○ clean pages being evicted from file cache ● Pages being swapped out to write-through swap device ○ Could be discarded immediately because the contents are written through to swap device ○ Kernel think it’s already swapped out ○ Most anonymous pages could be swapped out ● Clean pages being evicted from file cache ○ Could be discarded because the contents are in storage already ○ Kernel think it’s already evicted out ○ Lots of file-backed pages could be the case ● Additional pros 1: Already expected to not be accessed again soon ○ Discarding secondary client pages from GCAM will not affect system performance much ● Additional pros 2: Occurs with severe workload, only ○ In peaceful case, zero overhead at all
  • 16. GCMA: Workflow ● Reserve memory area in boot time ● If a page is swapped out or evicted from file cache, keep the page in reserved area ○ If system requires the page again, give it back from the reserved area ○ If contiguous memory allocation requires area being used by those pages, discard those pages and use the area for contiguous memory allocation System Memory GCMA Reserved Area Swap Device File Process Normal Allocation Contiguous Allocation Reclaim / Swap Get back if hit
  • 17. GCMA Implementation ● Used Cleancache / Frontswap interface for second class client of GCMA rather than inventing the wheel again ● DMEM: Discardable Memory ○ Works as a last chance cache for secondary client pages using GCMA reserved area ○ Manages Index table using hash table and RB-tree ○ Evicts pages in LRU scheme Reserved Area Dmem Logic GCMA Logic Dmem Interface CMA Interface CMA Logic Cleancache Frontswap
  • 18. GCMA Implementation: in Short ● Implemented on Linux v3.18 ● About 1,500 lines of code ● Ported, evaluated on CuBox and Raspberry Pi 2 ● Available under GPL v3 at: https://github.com/sjp38/linux. gcma/releases/tag/gcma/rfc/v2
  • 19. GCMA Optimization for Write-through Swap ● Write-through swap device could consume I/O resource ● GCMA recommends compressed memory block device as a swap device ○ Zram is a compressed memory block device in upstream Linux ○ Google also recommends zram as a swap device for low-memory Android devices
  • 20. Experimental Setup ● Device setup ○ Raspberry Pi 2 ○ ARM cortex-A7 900 MHz * 4 cores ○ 1 GiB LPDDR2 SDRAM ○ Class 10 SanDisk 16 GiB microSD card ● Configurations ○ Baseline: Linux rpi-v3.18.11 + 100 MiB swap + 256 MiB reserved area ○ CMA: Linux rpi-v3.18.11 + 100 MiB swap + 256 MiB CMA area ○ GCMA: Linux rpi-v3.18.11 + 100 MiB Zram swap + 256 MiB GCMA area ● Workloads ○ Contig Mem Alloc: Contiguous memory allocation using CMA or GCMA ○ Blogbench: Realistic file system benchmark ○ Camera shot: Repeated shot of picture using Raspberry Pi 2 Camera module with 10 seconds interval between each shot
  • 21. Contig Mem Alloc without Background Workload ● Average of 30 allocations ● GCMA shows 14.89x to 129.41x faster latency compared to CMA ● CMA even failed once for 32,768 contiguous pages allocation even though there is no background task
  • 22. 4MiB Contig Mem Alloc w/o Background Workload ● Latency of CMA is more predictable compared to GCMA
  • 23. 4MiB Contig Mem Allocation w/ Background Task ● Background workload: Blogbench ● CMA consumes 5 seconds (unacceptable!) for one allocation in bad case
  • 24. Camera Latency w/ Background Task ● Camera is a realistic, important workload of Raspberry Pi 2 ● Background task: Blogbench ● GCMA keeps latency as fast as Reserved area technique configuration
  • 25. Blogbench on CMA / GCMA ● Scores are normalized to scores of Baseline configuration ● ‘/ cam’ means camera workload as a background workload ● System under GCMA even outperforms CMA with realistic workload owing to its light overhead ● Allocation using CMA even degrades system performance because it should move out pages in reserved area
  • 26. Conclusion ● Contiguous memory allocation is still a big problem for embedded device ○ H/W solutions are too expensive ○ S/W solutions like CMA is not effective ○ Most system uses Reserved area technique despite of low memory utilization ● GCMA guarantees fast latency, success of allocation and reasonable memory utilization ○ Allocation latency of GCMA is as fast as Reserved area technique ○ System performance under GCMA is is similar or outperforms the performance under CMA