Buddy System
5/1 OSDI
Big picture for Linux memory
management architecture
Source: https://blog.csdn.net/S_E_A_N/article/details/5829978
NUMA & Node
• NUMA (hardware perspective)
• Stands for Non-Uniform Memory Access model.
• Different CPU has different memory access time.
• Provide scalable memory bandwidth.
• Node (software perspective)
• Partition physical memory to different node for different CPU.
• Memory accesses to memory on “closer” nodes.
• Split node into different zone.
Core-API for memory management
• Allocate
• Free
void * kmalloc(size_t size, gfp_t flags)
void * vmalloc(unsigned long size)
Allocate memory that smaller than page size.
Allocate virtually contiguous memory.
Create a cache from slab allocator.void * kmem_cache_alloc
Free memory.
Deallocate an object.
void kvfree(const void * addr)
void kmem_cache_free
Data Structures
Zone & Zone Type
start index of mem_map belonging to zone
table of different order
gfp.h
mmzone.h
mmzone.h
Free Area & Page
link block of pages
store order in buddy system
number of free pages
link first page
mmzone.h
mm_types.h
mem_map
name
zone_start_pfn
free_area[MAX_ORDER]
…
zone
0
1
2
3
…
k
…
MAX_ORDER - 1
lru
free_list
page
Init
Every zone has its own free_area !
page_alloc.c
• Init free_list
• Empty nr_free
page_alloc.c
Allocate Page
Page allocation API
• Different users may require different zones and size.
• Linux provide sizable API for page allocation.
• Use gfp_t as a flag indicating how the allocator will behave.
Source: https://www.kernel.org/doc/gorman/html/understand/understand009.html
If the request page order is zero,
call per-CPU allocator for high performance.
Enter critical section, call buddy allocator
for requesting page as small as possible.
Find smallest page
from order to MAX_ORDER-1
• Remove found page from free_list.
• Remove page order(private).
• Decrease nr_free of free_area.
• Split page block
Split page if necessary.
• Add buddy to head of free_list.
• Increase nr_free of free_area.
• Reset buddy order(private).
nr_free order
0 0
0 1
0 2
1 3
…
k
…
MAX_ORDER - 1
mem_map
Ask order = 1 page.
private = 3
__rmqueue_smallest
nr_free order
0 0
0 1
0 2
1 3
…
k
…
MAX_ORDER - 1
mem_map
1. free_list is empty.
2. Try next order.
private = 3
current_order
__rmqueue_smallest
nr_free order
0 0
0 1
0 2
1 3
…
k
…
MAX_ORDER - 1
mem_map
private = 3
current_order
1. free_list is empty.
2. Try next order.
__rmqueue_smallest
nr_free order
0 0
0 1
0 2
0 3
…
k
…
MAX_ORDER - 1
mem_map
free_list is not empty.
1. Unlink free_list.
2. Remove page order.
3. Decrease nr_free.
4. Call expand with low=order, high=current_order
page
current_order
①
②
③
__rmqueue_smallest
mem_map
*area
high: 3
low: 1
size: 8
page
nr_free order
0 0
0 1
0 2
0 3
…
k
…
MAX_ORDER - 1
expand
nr_free order
0 0
0 1
1 2
0 3
…
k
…
MAX_ORDER - 1
mem_map
*area
high: 2
low: 1
size: 4
1. area point to previous order.
2. Decrease high.
3. Half size.
4. Add buddy to free_list.
5. Increase nr_free.
6. Reset page order.
page
private = 2
①
②
③
④
⑤
⑥
expand
page + 4
nr_free order
0 0
1 1
1 2
0 3
…
k
…
MAX_ORDER - 1
mem_map
*area
high: 1
low: 1
size: 2
1. area point to previous order.
2. Decrease high.
3. Half size.
4. Add buddy to free_list.
5. Increase nr_free.
6. Reset page order.
page
private = 1
private = 2
①
⑤
④
⑥
②
③
expand
page + 2
Free Page
Handle different order case.
Enter critical section, coalesce the buddies
if needed.
• Unlink buddy from free_list.
• Decrease nr_free of free_area.
• Remove order of buddy.
Update index and page.
Find buddy index corresponding to that order.
• Reset page order.
• Link page to free_list.
• Increase nr_free of free_area.
We can merge a buddy if
• The buddy is in the buddy system and free.
• Page and its buddy have the same order.
• Page and its buddy are in the same zone.
Check buddy is free and in the buddy system.
Why?
nr_free order
0 0
1 1
1 2
0 3
…
k
…
MAX_ORDER - 1
mem_map
page
private = 1
private = 2
pfn: 10
order: 1
10
page
__free_one_page
nr_free order
0 0
0 1
1 2
0 3
…
k
…
MAX_ORDER - 1
mem_map
page
private = 2
order: 1
pfn: 10
buddy_pfn: 8
10
0010
⨁1010
1000
8
buddy
page
buddy
1. Find buddy
2. Unlink buddy
3. Decrease nr_free of free_area.
4. Remove order of buddy
④
①
②③
page + (8 - 10)
__free_one_page
nr_free order
0 0
0 1
1 2
0 3
…
k
…
MAX_ORDER - 1
mem_map
private = 2
order: 2
pfn: 10
buddy_pfn: 8
combined_pfn: 8
8
page
1010
∧ 1000
1000
page
1. Update page.
2. Update pfn.
3. Increase order.
①
②③
8
__free_one_page
nr_free order
0 0
0 1
0 2
0 3
…
k
…
MAX_ORDER - 1
mem_map
page
8
buddy
buddy
1. Find buddy
2. Unink buddy
3. Decrease nr_free of free_area.
4. Remove order of buddy
page
①
②
③
④
order: 2
pfn: 8
buddy_pfn: 12
0100
⨁1000
1100
12
page + (12 - 8)
__free_one_page
nr_free order
0 0
0 1
0 2
0 3
…
k
…
MAX_ORDER - 1
mem_map
8
1. Update page.
2. Update pfn.
3. Increase order.
page
page
①
order: 3
pfn: 8
buddy_pfn: 12
combined_pfn: 8
1000
∧ 1100
1000
②③
8
__free_one_page
nr_free order
0 0
0 1
0 2
1 3
…
k
…
MAX_ORDER - 1
mem_map
8
1. Set page order.
2. Add page to free_list.
3. Increase nr_free of free_area.
page
page
private = 3
①
②
③
order: 3
__free_one_page
Migrate Type
Scenario
mem_map
name
zone_start_pfn
free_area[MAX_ORDER]
…
zone
0
1
2
3
…
k
…
MAX_ORDER - 1
lru
free_list
page
Request order 2 page…
Number of migrate type
mmzone.h
• Fixed page, usually used by kernel itself
• Page used by user space APP
• Not directly movable but deletable
mem_map
name
zone_start_pfn
free_area[MAX_ORDER]
…
zone
0
1
2
3
…
k
…
MAX_ORDER - 1
lru
free_list
page
MIGRATE_UNMOVABLE
MIGRATE_MOVABLE
MIGRATE_RECLAIMABLE
Init all page as MIGRATE_MOVABLE.
page_alloc.c
page_alloc.c
Lender candidates
Borrow from different migrate type.
__rmqueue_fallback steal block from candidates.
The End
Reference
• What is NUMA?
• Memory Allocation Guide
• Memory Management APIs
• Subsystem Trace Points: kmem
• Physical Page Allocation
• Linux内存管理(一)
• Linux伙伴系統
• 深入浅出Linux内核内存管理基础

Buddy system

  • 1.
  • 2.
    Big picture forLinux memory management architecture Source: https://blog.csdn.net/S_E_A_N/article/details/5829978
  • 3.
    NUMA & Node •NUMA (hardware perspective) • Stands for Non-Uniform Memory Access model. • Different CPU has different memory access time. • Provide scalable memory bandwidth. • Node (software perspective) • Partition physical memory to different node for different CPU. • Memory accesses to memory on “closer” nodes. • Split node into different zone.
  • 4.
    Core-API for memorymanagement • Allocate • Free void * kmalloc(size_t size, gfp_t flags) void * vmalloc(unsigned long size) Allocate memory that smaller than page size. Allocate virtually contiguous memory. Create a cache from slab allocator.void * kmem_cache_alloc Free memory. Deallocate an object. void kvfree(const void * addr) void kmem_cache_free
  • 5.
  • 6.
    Zone & ZoneType start index of mem_map belonging to zone table of different order gfp.h mmzone.h mmzone.h
  • 7.
    Free Area &Page link block of pages store order in buddy system number of free pages link first page mmzone.h mm_types.h
  • 8.
  • 9.
  • 10.
    Every zone hasits own free_area ! page_alloc.c
  • 11.
    • Init free_list •Empty nr_free page_alloc.c
  • 12.
  • 13.
    Page allocation API •Different users may require different zones and size. • Linux provide sizable API for page allocation. • Use gfp_t as a flag indicating how the allocator will behave. Source: https://www.kernel.org/doc/gorman/html/understand/understand009.html
  • 14.
    If the requestpage order is zero, call per-CPU allocator for high performance. Enter critical section, call buddy allocator for requesting page as small as possible.
  • 15.
    Find smallest page fromorder to MAX_ORDER-1 • Remove found page from free_list. • Remove page order(private). • Decrease nr_free of free_area. • Split page block
  • 16.
    Split page ifnecessary. • Add buddy to head of free_list. • Increase nr_free of free_area. • Reset buddy order(private).
  • 17.
    nr_free order 0 0 01 0 2 1 3 … k … MAX_ORDER - 1 mem_map Ask order = 1 page. private = 3 __rmqueue_smallest
  • 18.
    nr_free order 0 0 01 0 2 1 3 … k … MAX_ORDER - 1 mem_map 1. free_list is empty. 2. Try next order. private = 3 current_order __rmqueue_smallest
  • 19.
    nr_free order 0 0 01 0 2 1 3 … k … MAX_ORDER - 1 mem_map private = 3 current_order 1. free_list is empty. 2. Try next order. __rmqueue_smallest
  • 20.
    nr_free order 0 0 01 0 2 0 3 … k … MAX_ORDER - 1 mem_map free_list is not empty. 1. Unlink free_list. 2. Remove page order. 3. Decrease nr_free. 4. Call expand with low=order, high=current_order page current_order ① ② ③ __rmqueue_smallest
  • 21.
    mem_map *area high: 3 low: 1 size:8 page nr_free order 0 0 0 1 0 2 0 3 … k … MAX_ORDER - 1 expand
  • 22.
    nr_free order 0 0 01 1 2 0 3 … k … MAX_ORDER - 1 mem_map *area high: 2 low: 1 size: 4 1. area point to previous order. 2. Decrease high. 3. Half size. 4. Add buddy to free_list. 5. Increase nr_free. 6. Reset page order. page private = 2 ① ② ③ ④ ⑤ ⑥ expand page + 4
  • 23.
    nr_free order 0 0 11 1 2 0 3 … k … MAX_ORDER - 1 mem_map *area high: 1 low: 1 size: 2 1. area point to previous order. 2. Decrease high. 3. Half size. 4. Add buddy to free_list. 5. Increase nr_free. 6. Reset page order. page private = 1 private = 2 ① ⑤ ④ ⑥ ② ③ expand page + 2
  • 24.
  • 25.
    Handle different ordercase. Enter critical section, coalesce the buddies if needed.
  • 26.
    • Unlink buddyfrom free_list. • Decrease nr_free of free_area. • Remove order of buddy. Update index and page. Find buddy index corresponding to that order. • Reset page order. • Link page to free_list. • Increase nr_free of free_area.
  • 27.
    We can mergea buddy if • The buddy is in the buddy system and free. • Page and its buddy have the same order. • Page and its buddy are in the same zone. Check buddy is free and in the buddy system. Why?
  • 28.
    nr_free order 0 0 11 1 2 0 3 … k … MAX_ORDER - 1 mem_map page private = 1 private = 2 pfn: 10 order: 1 10 page __free_one_page
  • 29.
    nr_free order 0 0 01 1 2 0 3 … k … MAX_ORDER - 1 mem_map page private = 2 order: 1 pfn: 10 buddy_pfn: 8 10 0010 ⨁1010 1000 8 buddy page buddy 1. Find buddy 2. Unlink buddy 3. Decrease nr_free of free_area. 4. Remove order of buddy ④ ① ②③ page + (8 - 10) __free_one_page
  • 30.
    nr_free order 0 0 01 1 2 0 3 … k … MAX_ORDER - 1 mem_map private = 2 order: 2 pfn: 10 buddy_pfn: 8 combined_pfn: 8 8 page 1010 ∧ 1000 1000 page 1. Update page. 2. Update pfn. 3. Increase order. ① ②③ 8 __free_one_page
  • 31.
    nr_free order 0 0 01 0 2 0 3 … k … MAX_ORDER - 1 mem_map page 8 buddy buddy 1. Find buddy 2. Unink buddy 3. Decrease nr_free of free_area. 4. Remove order of buddy page ① ② ③ ④ order: 2 pfn: 8 buddy_pfn: 12 0100 ⨁1000 1100 12 page + (12 - 8) __free_one_page
  • 32.
    nr_free order 0 0 01 0 2 0 3 … k … MAX_ORDER - 1 mem_map 8 1. Update page. 2. Update pfn. 3. Increase order. page page ① order: 3 pfn: 8 buddy_pfn: 12 combined_pfn: 8 1000 ∧ 1100 1000 ②③ 8 __free_one_page
  • 33.
    nr_free order 0 0 01 0 2 1 3 … k … MAX_ORDER - 1 mem_map 8 1. Set page order. 2. Add page to free_list. 3. Increase nr_free of free_area. page page private = 3 ① ② ③ order: 3 __free_one_page
  • 34.
  • 35.
  • 36.
    Number of migratetype mmzone.h • Fixed page, usually used by kernel itself • Page used by user space APP • Not directly movable but deletable
  • 37.
  • 38.
    Init all pageas MIGRATE_MOVABLE. page_alloc.c
  • 39.
    page_alloc.c Lender candidates Borrow fromdifferent migrate type. __rmqueue_fallback steal block from candidates.
  • 40.
  • 41.
    Reference • What isNUMA? • Memory Allocation Guide • Memory Management APIs • Subsystem Trace Points: kmem • Physical Page Allocation • Linux内存管理(一) • Linux伙伴系統 • 深入浅出Linux内核内存管理基础