SlideShare a Scribd company logo
1 of 101
* Based on kernel 5.11 (x86_64) – QEMU
* 2-socket CPUs (4 cores/socket)
* 16GB memory
* Kernel parameter: nokaslr norandmaps
* KASAN: disabled
* Userspace: ASLR is disabled
* Legacy BIOS
Slab Allocator in Linux Kernel
Adrian Huang | Aug, 2022
Agenda
• Slab & Buddy System
• Slab Concept
o “Chicken or the egg” problem
o Will not focus the differences about slab/slub/slob: Lots of info about them in Internet.
o Note: SLUB is the default allocator since 2.6.23.
• [Init & Creation] kmem_cache_init(): Let’s check the initialization detail
o How to initialize ‘boot_kmem_cache’ and ‘boot_kmem_cache_node’?
o What are freepointers?
o Data structures: kmem_cache & kmem_cache_node
o [Comparison] Generic kmem_cache_create() & low-level implementation __kmem_cache_create()
• [Cache allocation] slab_alloc_node – Allocate a slab cache with specific node id
o Fast path & slow path
 Will follow kmem_cache_init() flow
• [Cache release/free] kmem_cache_free()
o Fast path & slow path
• kmalloc & slab
Slab & Buddy System
Buddy System
alloc_page(s), __get_free_page(s)
Slab Allocator
kmem_cache_alloc/kmem_cache_free
kmalloc/kfree
glibc: malloc/free
brk/mmap
. . .
vmalloc
User Space
Kernel Space
Hardware
• Balance between brk() and mmap()
• Use brk() if request size < DEFAULT_MMAP_THRESHOLD_MIN (128 KB)
o The heap can be trimmed only if memory is freed at the top end.
o sbrk() is implemented as a library function that uses the brk() system call.
• Use mmap() if request size >= DEFAULT_MMAP_THRESHOLD_MIN (128 KB)
o The allocated memory blocks can be independently released back to the system.
o Deallocated space is not placed on the free list for reuse by later allocations.
o Memory may be wasted because mmap allocations must be page-aligned; and the
kernel must perform the expensive task of zeroing out memory allocated.
o Note: glibc uses the dynamic mmap threshold
o Detail: `man mallopt`
[glibc] malloc
Slab Concept
Slab Allocator
Slab Cache
(page order = 0)
Slab Cache
(page order = 1)
Page #0
slab object
slab object
slab object
slab object
…
Page #N
slab object
slab object
slab object
slab object
slab object
slab object
slab object
slab object
4 objects
per page
Page #1
slab object
…
Page #N
2 objects
per page
slab object
slab object
slab object
slab object
slab object
.
.
.
Memory (object)
allocation
* object = data structure
slab_caches
global list_head
slab
…
slab
slab
…
Page #0
slab object
slab object
1. Allocating/freeing data structures is the common operations in Linux kernel
2. Slab is a generic data structure-caching layer
Slab Cache
Node-based cache: freelist concept is implemented in page struct (not shown)
slab
object
slab
object
..
slab
object
Page
.
.
kmem_cache_cpu
(CPU #0)
struct page *page
void **freelist
struct page *partial
kmem_cache_cpu
(CPU #N)
struct page *page
void **freelist
struct page *partial
slab
object
slab
object
..
slab
object
Page
slab
object
slab
object
..
slab
object
Page
kmem_cache_node
(Node #0)
struct list_head partial
kmem_cache_node
(Node #N)
struct list_head partial
.
.
slab
object
slab
object
..
slab
object
Page
slab
object
slab
object
..
slab
object
Page
kmem_cache
(Slab cache)
__percpu *cpu_slab
*node[MAX_NUMNODES]
percpu cache
Node-based cache
Slab Cache
slab
object
slab
object
..
slab
object
Page
.
.
kmem_cache_cpu (CPU #0)
struct page *page
void **freelist
struct page *partial
kmem_cache_node (Node #0)
struct list_head partial
.
.
slab
object
slab
object
..
slab
object
Page
kmem_cache
(Slab cache)
__percpu *cpu_slab
*node[MAX_NUMNODES]
percpu cache
Node-based cache
[Slab Cache] struct kmem_cache *kmem_cache
slab
object
slab
object
..
slab
object
Page
.
.
kmem_cache_cpu (CPU #0)
struct page *page
void **freelist
struct page *partial
kmem_cache_node (Node #0)
struct list_head partial
.
.
slab
object
slab
object
..
slab
object
Page
kmem_cache
(Slab cache)
__percpu *cpu_slab
*node[MAX_NUMNODES]
percpu cache
Node-based cache
[Slab Cache] struct kmem_cache *kmem_cache_node
kmem_cache_cpu (CPU #0)
struct page *page
void **freelist
struct page *partial
kmem_cache_node (Node #0)
struct list_head partial
kmem_cache
(Slab cache)
__percpu *cpu_slab
*node[MAX_NUMNODES]
percpu cache
Node-based cache
[Slab Cache] struct kmem_cache *sighand_cachep
.
.
.
.
1
2
3
Allocate from
‘kmem_cache_node’ slab cache
Slab Allocator Initialization:
“Chicken or the egg” problem
.
.
kmem_cache_cpu (CPU #0)
struct page *page
void **freelist
struct page *partial
kmem_cache_node (Node #0)
struct list_head partial
.
.
kmem_cache = NULL
(Slab cache)
__percpu *cpu_slab
*node[MAX_NUMNODES]
percpu cache
Node-based cache
[Slab Cache] struct kmem_cache *kmem_cache = NULL
.
.
kmem_cache_cpu (CPU #0)
struct page *page
void **freelist
struct page *partial
kmem_cache_node (Node #0)
struct list_head partial
.
.
kmem_cache = NULL
(Slab cache)
__percpu *cpu_slab
*node[MAX_NUMNODES]
percpu cache
Node-based cache
[Slab Cache] struct kmem_cache *kmem_cache_node = NULL
kmem_cache_cpu (CPU #0)
struct page *page
void **freelist
struct page *partial
kmem_cache_node (Node #0)
struct list_head partial
kmem_cache
(Slab cache)
__percpu *cpu_slab
*node[MAX_NUMNODES]
percpu cache
Node-based cache
[Slab Cache] struct kmem_cache *kmem_cache
.
.
.
.
1
2
3
[Cannot work] Allocate from
‘kmem_cache_node’ slab cache
[Slab Allocator Init] Just an example: Not in Linux kernel code
Slab Allocator Initialization:
“Chicken or the egg” problem
.
.
kmem_cache_cpu (CPU #0)
struct page *page
void **freelist
struct page *partial
kmem_cache_node (Node #0)
struct list_head partial
.
.
kmem_cache = NULL
(Slab cache)
__percpu *cpu_slab
*node[MAX_NUMNODES]
percpu cache
Node-based cache
[Slab Cache] struct kmem_cache *kmem_cache = NULL
.
.
kmem_cache_cpu (CPU #0)
struct page *page
void **freelist
struct page *partial
kmem_cache_node (Node #0)
struct list_head partial
.
.
kmem_cache = NULL
(Slab cache)
__percpu *cpu_slab
*node[MAX_NUMNODES]
percpu cache
Node-based cache
[Slab Cache] struct kmem_cache *kmem_cache_node = NULL
kmem_cache_cpu (CPU #0)
struct page *page
void **freelist
struct page *partial
kmem_cache_node (Node #0)
struct list_head partial
kmem_cache
(Slab cache)
__percpu *cpu_slab
*node[MAX_NUMNODES]
percpu cache
Node-based cache
[Slab Cache] struct kmem_cache *kmem_cache
.
.
.
.
1
2
3
[Cannot work] Allocate from
‘kmem_cache_node’ slab cache
[Example] Slab allocator init for kmem_cache
Who initializes global variables ‘kmem_cache’ and ‘kmem_cache_node’?
Slab Allocator Initialization:
“Chicken or the egg” problem
.
.
kmem_cache_cpu (CPU #0)
struct page *page
void **freelist
struct page *partial
kmem_cache_node (Node #0)
struct list_head partial
.
.
kmem_cache = NULL
(Slab cache)
__percpu *cpu_slab
*node[MAX_NUMNODES]
percpu cache
Node-based cache
[Slab Cache] struct kmem_cache *kmem_cache = NULL
.
.
kmem_cache_cpu (CPU #0)
struct page *page
void **freelist
struct page *partial
kmem_cache_node (Node #0)
struct list_head partial
.
.
kmem_cache = NULL
(Slab cache)
__percpu *cpu_slab
*node[MAX_NUMNODES]
percpu cache
Node-based cache
[Slab Cache] struct kmem_cache *kmem_cache_node = NULL
kmem_cache_cpu (CPU #0)
struct page *page
void **freelist
struct page *partial
kmem_cache_node (Node #0)
struct list_head partial
kmem_cache
(Slab cache)
__percpu *cpu_slab
*node[MAX_NUMNODES]
percpu cache
Node-based cache
[Slab Cache] struct kmem_cache *sighand_cachep
.
.
.
.
1
2
3
[Cannot work] Allocate from
‘kmem_cache_node’ slab cache
Slab allocator: statically initialized ‘kmem_cache_boot’
Slob allocator: statically initialized ‘kmem_cache_boot’
Slub allocator: static local variables ‘boot_kmem_cache’ and ‘boot_kmem_cache_node’
[Init & Creation] kmem_cache_init():
Let’s check the initialization detail
• How to initialize ‘boot_kmem_cache’ and ‘boot_kmem_cache_node’?
• What are freepointers?
• Data structures: kmem_cache & kmem_cache_node
• [Comparison] Generic kmem_cache_create() & low-level implementation
__kmem_cache_create()
* Call paths are based on SLUB (the unqueued slab allocator)
kmem_cache_init()
start_kernel
1. Declare two static variables ‘boot_kmem_cache’ and ‘boot_kmem_cache_node’
2. Assign the addresses of two static variables to generic/global variables
kmem_cache_node = &boot_kmem_cache_node
kmem_cache = &boot_kmem_cache
__kmem_cache_create init_kmem_cache_nodes
calculate_sizes
kmem_cache_open
alloc_kmem_cache_cpus
mm_init kmem_cache_init
create_boot_cache(kmem_cache_node, …)
create_boot_cache(kmem_cache, …)
kmem_cache =
bootstrap(&boot_kmem_cache)
kmem_cache_node =
bootstrap(&boot_kmem_cache_node)
setup_kmalloc_cache_index_table
create_kmalloc_caches
__alloc_percpu
init_kmem_cache_cpus
Assign cpu id to
kmem_cache_cpu->tid
Allocate/init ‘kmem_cache_node’
structs for all CPU nodes (sockets)
Allocate/init a ‘kmem_cache_cpu’ struct
Calculate page order and objects per slab
kmem_cache_node/boot_kmem_cache_node init
kmem_cache
slab Page #0
slab object #0
…
…
…
…
…
…
slab object #63
64 objects
per page
Slub allocator: A generic data structure-cache layer
* object = data structure
object size = sizeof(struct kmem_cache_node) = 64
object size with alignment (SLAB_HWCACHE_ALIGN: 64) = 64
calculate_sizes (): order = 0
kmem_cache_node/boot_kmem_cache_node init
object size = sizeof(struct kmem_cache_node) = 64
object size with alignment (SLAB_HWCACHE_ALIGN: 64) = 64
kmem_cache
slab
Page #0
slab object
slab object
slab object
slab object
Page #1
slab object
slab object
slab object
slab object
16 objects
per page
Slab allocator: A generic data structure-cache layer
* object = data structure
object size = offsetof(struct kmem_cache, node) +
nr_node_ids * sizeof(struct kmem_cache_node *)
nr_node_ids = 2 (2-socket guest OS)
object size = 200 + 2 * 8 = 216
object size with alignment (SLAB_HWCACHE_ALIGN: 64) = 256
calculate_sizes (): order = 1
kmem_cache/boot_kmem_cache init
object size = 200 + 2 * 8 = 216
kmem_cache/boot_kmem_cache init
init_kmem_cache_nodes()
n = kmem_cache_alloc_node()
early_kmem_cache_node_alloc
init_kmem_cache_nodes
for_each_node_state()
slab_state == DOWN
init_kmem_cache_node
kmem_cache->node[node] = n
Y
N
__kmem_cache_create init_kmem_cache_nodes
calculate_sizes
kmem_cache_open
alloc_kmem_cache_cpus
__alloc_percpu
init_kmem_cache_cpus
Assign cpu id to
kmem_cache_cpu->tid
Allocate/init ‘kmem_cache_node’
structs for all CPU nodes (sockets)
Allocate/init a ‘kmem_cache_cpu’ struct
Calculate page order and objects per slab
create_boot_cache(kmem_cache_node, …)
Special (one-shot) path:
Fix ‘chicken or the egg’ problem
Generic path
early_kmem_cache_node_alloc for ‘kmem_cache_node’
allocate_slab
page = new_slab(kmem_cache_node, …)
early_kmem_cache_node_alloc
page = alloc_slab_page()
page->slab_cache = s
Update freepointers for
the allocated pages
n = page->freelist
kmem_cache_node->node[node] = n
page->freelist =
get_freepointer(kmem_cache_node, n)
init_kmem_cache_node(n)
__add_partial(n, page, DEACTIVATE_TO_HEAD)
inc_slabs_node
page struct (slab, slob or slub)
struct list_head slab_list
struct page *next
struct kmem_cache *slab_cache
void *freelist (first free object)
s_mem: first object - SLAB
int pages
int pobjects
counters - SLUB
inuse: 16 = 64
objects:15 = 64
frozen: 1 = 1
union
struct
(SLUB)
union
struct
(Partial
Page)
kmem_cache
(kmem_cache_node)
__percpu *cpu_slab
min_partial = 5
size = 64
object_size = 64
cpu_partial = 30
min = 64
inuse = 64
align = 64
name =
“kmem_cache_node”
list
*node[MAX_NUMNODES]
max = 64
oo (order objects) = 64
allocate_slab() – freelist/freepointer configuration
offset = 32
slab
object
slab
object
..
slab
object
Page
boot_kmem_cache_node
static local variable
Reference functions: get_freepointer/set_freepointer
allocate_slab
page = new_slab(kmem_cache_node, …)
early_kmem_cache_node_alloc
page = alloc_slab_page()
page->slab_cache = s
Update freepointers for
the allocated pages
n = page->freelist
kmem_cache_node->node[node] = n
page->freelist =
get_freepointer(kmem_cache_node, n)
init_kmem_cache_node(n)
__add_partial(n, page, DEACTIVATE_TO_HEAD)
inc_slabs_node
page struct (slab, slob or slub)
struct list_head slab_list
struct page *next
struct kmem_cache *slab_cache
void *freelist (first free object)
s_mem: first object - SLAB
int pages
int pobjects
counters - SLUB
inuse: 16 = 64
objects:15 = 64
frozen: 1 = 1
union
struct
(SLUB)
union
struct
(Percpu
Partial
Page)
kmem_cache
(kmem_cache_node)
__percpu *cpu_slab
min_partial = 5
size = 64
object_size = 64
cpu_partial = 30
min = 64
inuse = 64
align = 64
name =
“kmem_cache_node”
list
*node[MAX_NUMNODES]
max = 64
oo (order objects) = 64
allocate_slab() – freelist/freepointer configuration
offset = 32
slab
object
slab
object
..
slab
object
Page
Page address = X
X
X + kmem_cache->size
X + kmem_cache->size
slab object #0
X + kmem_cache->size * 2
slab object #1
freepointer in slab object: Point to the next slab object
(address: middle of object  avoid buffer overflow or underflow)
NULL
slab object #n -1
point to the next
free slab object
.
.
.
X + kmem_cache->offset
boot_kmem_cache_node
static local variable
Reference functions: get_freepointer/set_freepointer
page struct (slab, slob or slub)
struct list_head slab_list
struct page *next
struct kmem_cache *slab_cache
void *freelist (first free object)
s_mem: first object - SLAB
int pages
int pobjects
counters - SLUB
inuse: 16 = 64 1
objects:15 = 64
frozen: 1 = 1 0
union
struct
(SLUB)
union
kmem_cache
(kmem_cache_node)
__percpu *cpu_slab
min_partial = 5
size = 64
object_size = 64
cpu_partial = 30
min = 64
inuse = 64
align = 64
name =
“kmem_cache_node”
list
*node[MAX_NUMNODES]
max = 64
oo (order objects) = 64
early_kmem_cache_node_alloc() – Get a slab object
from freelist
slab
object
slab
object
..
slab
object
Page
offset = 32
kmem_cache_node
nr_partial
struct list_head partial
nr_slabs
total_objects
list_lock
struct list_head full
Get a slab object from page->freelist
kmem_cache_node->node[node] = n
static local variable
boot_kmem_cache_node
struct
(Percpu
Partial
Page)
page struct (slab, slob or slub)
struct list_head slab_list
struct page *next
struct kmem_cache *slab_cache
void *freelist (first free object)
s_mem: first object - SLAB
int pages
int pobjects
counters - SLUB
inuse: 16 = 1
objects:15 = 64
frozen: 1 = 0
union
struct
(SLUB)
union
kmem_cache
(kmem_cache_node)
__percpu *cpu_slab
min_partial = 5
size = 64
object_size = 64
cpu_partial = 30
min = 64
inuse = 64
align = 64
name =
“kmem_cache_node”
list
*node[MAX_NUMNODES]
max = 64
oo (order objects) = 64
early_kmem_cache_node_alloc() – Get a slab object
from freelist
slab
object
slab
object
..
slab
object
Page
offset = 32
kmem_cache_node
nr_partial = 0
struct list_head partial
nr_slabs = 0
total_objects = 0
list_lock
struct list_head full
Get a slab object from page->freelist
kmem_cache_node->node[node] = n
static local variable
boot_kmem_cache_node
struct
(Percpu
Partial
Page)
page struct (slab, slob or slub)
struct list_head slab_list
struct page *next
struct kmem_cache *slab_cache
void *freelist (first free object)
s_mem: first object - SLAB
int pages
int pobjects
counters - SLUB
inuse: 16 = 1
objects:15 = 64
frozen: 1 = 0
union
struct
(SLUB)
union
kmem_cache
(kmem_cache_node)
__percpu *cpu_slab
min_partial = 5
size = 64
object_size = 64
cpu_partial = 30
min = 64
inuse = 64
align = 64
name =
“kmem_cache_node”
list
*node[MAX_NUMNODES]
max = 64
oo (order objects) = 64
early_kmem_cache_node_alloc() – Get a slab object
from freelist
slab
object
slab
object
..
slab
object
Page
offset = 32
kmem_cache_node
nr_partial = 0
struct list_head partial
nr_slabs = 0 1
total_objects = 0 64
list_lock
struct list_head full
Get a slab object from page->freelist
kmem_cache_node->node[node] = n
static local variable
boot_kmem_cache_node
struct
(Percpu
Partial
Page)
page struct (slab, slob or slub)
struct list_head slab_list
struct page *next
struct kmem_cache *slab_cache
void *freelist (first free object)
s_mem: first object - SLAB
int pages
int pobjects
counters - SLUB
inuse: 16 = 1
objects:15 = 64
frozen: 1 = 0
union
struct
(SLUB)
union
struct
(Percpu
Partial
Page)
kmem_cache
(kmem_cache_node)
__percpu *cpu_slab
min_partial = 5
size = 64
object_size = 64
cpu_partial = 30
min = 64
inuse = 64
align = 64
name =
“kmem_cache_node”
list
*node[MAX_NUMNODES]
max = 64
oo (order objects) = 64
early_kmem_cache_node_alloc() – Get a slab object
from freelist
slab
object
slab
object
..
slab
object
Page
offset = 32
kmem_cache_node
nr_partial = 0 1
struct list_head partial
nr_slabs = 1
total_objects = 64
list_lock
struct list_head full
Get a slab object from page->freelist
kmem_cache_node->node[node] = n
static local variable
boot_kmem_cache_node
list_head
page struct (slab, slob or slub)
struct list_head slab_list
struct page *next
struct kmem_cache *slab_cache
void *freelist (first free object)
s_mem: first object - SLAB
int pages
int pobjects
counters - SLUB
inuse: 16 = 1
objects:15 = 64
frozen: 1 = 0
union
struct
(SLUB)
union
struct
(Percpu
Partial
Page)
kmem_cache
(kmem_cache_node)
__percpu *cpu_slab
min_partial = 5
size = 64
object_size = 64
cpu_partial = 30
min = 64
inuse = 64
align = 64
name =
“kmem_cache_node”
list
*node[MAX_NUMNODES]
max = 64
oo (order objects) = 64
early_kmem_cache_node_alloc(): configure node #0
slab
object
slab
object
..
slab
object
Page
offset = 32
kmem_cache_node
nr_partial = 1
struct list_head partial
nr_slabs = 1
total_objects = 64
list_lock
struct list_head full
Get a slab object from page->freelist
kmem_cache_node->node[node] = n
list_head
CONFIG_SLUB_DEBUG = y
static local variable
boot_kmem_cache_node
Allocate slab page(s) for kmem_cache_node struct allocation: Fix “the chicken or egg problem”
page (slab, slob or slub)
struct list_head slab_list
struct page *next
struct kmem_cache *slab_cache
void *freelist (first free object)
s_mem: first object - SLAB
int pages
int pobjects
counters - SLUB
inuse: 1
objects:15 = 64
frozen: 1 = 0
union
struct
(SLUB)
union
struct
(Partial
Page)
kmem_cache
(kmem_cache_node)
__percpu *cpu_slab
min_partial = 5
size = 64
object_size = 64
cpu_partial = 30
min = 64
inuse = 64
align = 64
name =
“kmem_cache_node”
list
*node[MAX_NUMNODES]
max = 64
oo (order objects) = 64
slab
object
slab
object
..
slab
object
Page
offset = 32
kmem_cache_node
nr_partial = 1
struct list_head partial
nr_slabs = 1
total_objects = 64
list_lock
struct list_head full
Get a slab object from page->freelist
kmem_cache_node->node[node] = n
list_head
CONFIG_SLUB_DEBUG = y
early_kmem_cache_node_alloc(): configure node #0
page struct (slab, slob or slub)
struct list_head slab_list
struct page *next
struct kmem_cache *slab_cache
void *freelist (first free object)
s_mem: first object - SLAB
int pages
int pobjects
counters - SLUB
inuse: 16 = 1
objects:15 = 64
frozen: 1 = 0
union
struct
(SLUB)
union
struct
(Percpu
Partial
Page)
kmem_cache
(global variable: kmem_cache_node)
__percpu *cpu_slab = NULL
min_partial = 5
size = 64
object_size = 64
cpu_partial = 30
min = 64
inuse = 64
align = 64
name =
“kmem_cache_node”
list
*node[MAX_NUMNODES]
max = 64
oo (order objects) = 64
slab
object
slab
object
..
slab
object
Page
offset = 32
kmem_cache_node
nr_partial = 1
struct list_head partial
nr_slabs = 1
total_objects = 64
list_lock
struct list_head full
Get a slab object from page->freelist
kmem_cache_node->node[node] = n
list_head
CONFIG_SLUB_DEBUG = y
static local variable
boot_kmem_cache_node
early_kmem_cache_node_alloc(): configure node #0
page struct (slab, slob or slub)
struct list_head slab_list
struct page *next
struct kmem_cache *slab_cache
void *freelist (first free object)
s_mem: first object - SLAB
int pages
int pobjects
counters - SLUB
inuse: 16 = 1
objects:15 = 64
frozen: 1 = 0
union
struct
(SLUB)
union
struct
(Percpu
Partial
Page)
kmem_cache
(global variable: kmem_cache_node)
__percpu *cpu_slab = NULL
min_partial = 5
size = 64
object_size = 64
cpu_partial = 30
min = 64
inuse = 64
align = 64
name =
“kmem_cache_node”
list
*node[MAX_NUMNODES]
max = 64
oo (order objects) = 64
slab
object
slab
object
..
slab
object
Page
offset = 32
kmem_cache_node
nr_partial = 1
struct list_head partial
nr_slabs = 1
total_objects = 64
list_lock
struct list_head full
Get a slab object from page->freelist
kmem_cache_node->node[node] = n
list_head
CONFIG_SLUB_DEBUG = y
kmem_cache_node
nr_partial = 1
struct list_head partial
nr_slabs = 1
total_objects = 64
list_lock
struct list_head full
.
.
static local variable
boot_kmem_cache_node
early_kmem_cache_node_alloc(): Fully-configured
page struct (slab, slob or slub)
struct list_head slab_list
struct page *next
struct kmem_cache *slab_cache
void *freelist (first free object)
s_mem: first object - SLAB
int pages
int pobjects
counters - SLUB
inuse: 16 = 1
objects:15 = 64
frozen: 1 = 0
union
struct
(SLUB)
union
struct
(Percpu
Partial
Page)
kmem_cache
(global variable: kmem_cache_node)
__percpu *cpu_slab = NULL
min_partial = 5
size = 64
object_size = 64
cpu_partial = 30
min = 64
inuse = 64
align = 64
name =
“kmem_cache_node”
list
*node[MAX_NUMNODES]
max = 64
oo (order objects) = 64
slab
object
slab
object
..
slab
object
Page
offset = 32
kmem_cache_node
nr_partial = 1
struct list_head partial
nr_slabs = 1
total_objects = 64
list_lock
struct list_head full
Get a slab object from page->freelist
kmem_cache_node->node[node] = n
list_head
kmem_cache_node
nr_partial = 1
struct list_head partial
nr_slabs = 1
total_objects = 64
list_lock
struct list_head full
.
.
static local variable
boot_kmem_cache_node
early_kmem_cache_node_alloc(): Fully-configured
[Comparison] Generic kmem_cache_create() & low-level implementation
__kmem_cache_create()
start_kernel
__kmem_cache_create init_kmem_cache_nodes
calculate_sizes
kmem_cache_open
alloc_kmem_cache_cpus
mm_init kmem_cache_init
create_boot_cache(kmem_cache_node, …)
create_boot_cache(kmem_cache, …)
__alloc_percpu
init_kmem_cache_cpus
Assign cpu id to
kmem_cache_cpu->tid
Allocate/init ‘kmem_cache_node’
structs for all CPU nodes (sockets)
Allocate/init a ‘kmem_cache_cpu’ struct
Calculate page order and objects per slab
kmem_cache_create
kmem_cache_create_usercopy
__kmem_cache_alias
create_cache
find_mergeable
kmem_cache_zalloc slab_alloc_node
slab_alloc
kmem_cache_alloc
__kmem_cache_create
list_add(&s->list, &slab_caches)
Allocate a ‘kmem_cache’ struct from
a global variable ‘kmem_cache’
[Cache allocation] slab_alloc_node –
Allocate a slab cache with specific node id
• Fast path & slow path
o Will follow kmem_cache_init() flow
kmem_cache_init()
start_kernel
1. Declare two static variables ‘boot_kmem_cache’ and ‘boot_kmem_cache_node’
2. Assign the addresses of two static variables to generic/global variables
kmem_cache_node = &boot_kmem_cache_node
kmem_cache = &boot_kmem_cache
__kmem_cache_create init_kmem_cache_nodes
calculate_sizes
kmem_cache_open
alloc_kmem_cache_cpus
mm_init kmem_cache_init
create_boot_cache(kmem_cache_node, …)
create_boot_cache(kmem_cache, …)
kmem_cache =
bootstrap(&boot_kmem_cache)
kmem_cache_node =
bootstrap(&boot_kmem_cache_node)
setup_kmalloc_cache_index_table
create_kmalloc_caches
__alloc_percpu
init_kmem_cache_cpus
Assign cpu id to
kmem_cache_cpu->tid
Allocate/init ‘kmem_cache_node’
structs for all CPU nodes (sockets)
Allocate/init a ‘kmem_cache_cpu’ struct
kmem_cache
(global variable: kmem_cache)
__percpu *cpu_slab
min_partial = 5
size = 256
object_size = 216
cpu_partial = 13
min = 0x10
inuse = 216
align = 64
name = “kmem_cache”
list
*node[MAX_NUMNODES]
max = 0x10020
oo (order objects) = 0x10020
kmem_cache: init_kmem_cache_nodes()
offset = 112
kmem_cache_node
nr_partial = 0
struct list_head partial
nr_slabs = 0
total_objects = 0
list_lock
struct list_head full
kmem_cache_node->node[node] = n
CONFIG_SLUB_DEBUG = y
kmem_cache_node
nr_partial = 0
struct list_head partial
nr_slabs = 0
total_objects = 0
list_lock
struct list_head full
.
.
kmem_cache_cpu
void **freelist
tid (transaction id) = cpu id
struct page *page
struct page *partial
stat[NR_SLUB_STAT_ITEMS]
boot_kmem_cache
static variable
Allocate from global variable ‘kmem_cache_node’
* init_kmem_cache_nodes -> kmem_cache_alloc_node
n = kmem_cache_alloc_node()
early_kmem_cache_node_alloc
init_kmem_cache_nodes
for_each_node_state()
slab_state == DOWN
init_kmem_cache_node
kmem_cache->node[node] = n
Y
N
Special (one-shot) path:
Fix ‘chicken or the egg’ problem
Generic path
Get a slab object from c->freelist
kmem_cache_alloc_node
!c->freelist or
!c-> page
kmem_cache_alloc_node/slab_alloc_node
__slab_alloc
kmem_cache_alloc slab_alloc slab_alloc_node
NUMA_NO_NODE
__kmalloc_node
kmalloc_node node_id
node_id
___slab_alloc
new_slab_objects
c->page = slub_percpu_partial(c)
Get a slab object from page->freelist
c->page &&
!c->freelist
!c->page && c->partial
get_partial
Get partial page from percpu cache
new_slab allocate_slab
alloc_slab_page
Allocate pages from
Buddy system
!c->page && !c->partial
Get partial page from node cache
(kmem_cache_node)
Y
N
fast path
slow path get_freelist -> __cmpxchg_double_slab
this_cpu_cmpxchg_double()
Fast Path
Get a slab object from kmem_cache_cpu->freelist
.
.
kmem_cache_cpu
(CPU #0)
struct page *page
void **freelist
struct page *partial
kmem_cache_cpu
(CPU #N)
struct page *page
void **freelist
struct page *partial
slab
object
slab
object
..
slab
object
Page
slab
object
slab
object
..
slab
object
Page
kmem_cache
(Slab cache)
__percpu *cpu_slab
percpu cache
.
.
kmem_cache_cpu
(CPU #0)
struct page *page
void **freelist
struct page *partial
kmem_cache_cpu
(CPU #N)
struct page *page
void **freelist
struct page *partial
slab
object
slab
object
..
slab
object
Page
slab
object
slab
object
..
slab
object
Page
percpu cache
slab
object
slab
object
..
Page
slab
object
slab
object
slab
object
slab
object
..
Page
slab
object
slab
object
get a slab object
Get a slab object from c->freelist
kmem_cache_alloc_node
!c->freelist or
!c-> page
slab_alloc_node(): slowpath #1 – Get an object from node’s partial page
__slab_alloc
kmem_cache_alloc slab_alloc slab_alloc_node
NUMA_NO_NODE
__kmalloc_node
kmalloc_node node_id
node_id
___slab_alloc
new_slab_objects
c->page = slub_percpu_partial(c)
c->page &&
!c->freelist
!c->page && c->partial
get_partial
Get partial page from percpu cache
new_slab allocate_slab
alloc_slab_page
Allocate pages from
Buddy system
!c->page && !c->partial
Get partial page from node cache
(kmem_cache_node)
Y
N
fast path
slow path get_freelist -> __cmpxchg_double_slab
this_cpu_cmpxchg_double()
Get a slab object from page->freelist
kmem_cache
(global variable: kmem_cache_node)
__percpu *cpu_slab
min_partial = 5
size = 64
object_size = 64
cpu_partial = 30
min = 64
inuse = 64
align = 64
name =
“kmem_cache_node”
list
*node[MAX_NUMNODES]
max = 64
oo (order objects) = 64
offset = 32
static local variable
boot_kmem_cache_node
kmem_cache_cpu
(CPU #N)
struct page *page = NULL
void **freelist = NULL
struct page *partial = NULL
Get a page (slab) from kmem_cache_node
kmem_cache_node (node 0)
nr_partial = 1
struct list_head partial
nr_slabs = 1
total_objects = 64
list_lock
struct list_head full
kmem_cache_node (node 1)
nr_partial = 1
struct list_head partial
nr_slabs = 1
total_objects = 64
list_lock
struct list_head full
*node[MAX_NUMNODES]
2
3
Get a page (slab) from ‘partial’ member
4
Move node’s partial page
(slab) to percpu cache
page struct (slab, slob or slub): node #0
struct page *next
struct kmem_cache *slab_cache
void *freelist (first free object)
s_mem: first object - SLAB
int pages
int pobjects
counters - SLUB
inuse: 16 = 1
objects:15 = 64
frozen: 1 = 0
union
struct
(SLUB)
union
struct
(Partial
Page)
list_head
struct list_head slab_list
slab
object
slab
object
..
slab
object
Page
slab
object
page struct (slab, slob or slub): node #1
Allocate a slab object
from node #0
1
kmem_cache
(kmem_cache)
*node[]
boot_kmem_cache
static local variable
Allocate slab objects
for node[0] & node[1]
Slowpath #1 – Get an object from node’s partial page
kmem_cache
(global variable: kmem_cache_node)
__percpu *cpu_slab
min_partial = 5
size = 64
object_size = 64
cpu_partial = 30
min = 64
inuse = 64
align = 64
name =
“kmem_cache_node”
list
*node[MAX_NUMNODES]
max = 64
oo (order objects) = 64
offset = 32
static local variable
boot_kmem_cache_node
kmem_cache_cpu
(CPU #N)
struct page *page = NULL
void **freelist = NULL
struct page *partial = NULL
Get a page (slab) from kmem_cache_node
kmem_cache_node (node 0)
nr_partial = 1
struct list_head partial
nr_slabs = 1
total_objects = 64
list_lock
struct list_head full
kmem_cache_node (node 1)
nr_partial = 1
struct list_head partial
nr_slabs = 1
total_objects = 64
list_lock
struct list_head full
*node[MAX_NUMNODES]
2
3
Get a page (slab) from ‘partial’ member
4
Move node’s partial page
(slab) to percpu cache
page struct (slab, slob or slub): node #0
struct page *next
struct kmem_cache *slab_cache
void *freelist (first free object)
s_mem: first object - SLAB
int pages
int pobjects
counters - SLUB
inuse: 16 = 1
objects:15 = 64
frozen: 1 = 0
union
struct
(SLUB)
union
struct
(Partial
Page)
list_head
struct list_head slab_list
slab
object
slab
object
..
slab
object
Page
slab
object
page struct (slab, slob or slub): node #1
Allocate a slab object
from node #0
1
kmem_cache
(kmem_cache)
*node[]
boot_kmem_cache
static local variable
Allocate slab objects
for node[0] & node[1]
Slowpath #1 – Get an object from node’s partial page
kmem_cache
(global variable: kmem_cache_node)
__percpu *cpu_slab
min_partial = 5
size = 64
object_size = 64
cpu_partial = 30
min = 64
inuse = 64
align = 64
name =
“kmem_cache_node”
list
*node[MAX_NUMNODES]
max = 64
oo (order objects) = 64
offset = 32
static local variable
boot_kmem_cache_node
kmem_cache_cpu
(CPU #N)
struct page *page = NULL
void **freelist = NULL
struct page *partial = NULL
Get a page (slab) from kmem_cache_node
kmem_cache_node (node 0)
nr_partial = 1
struct list_head partial
nr_slabs = 1
total_objects = 64
list_lock
struct list_head full
kmem_cache_node (node 1)
nr_partial = 1
struct list_head partial
nr_slabs = 1
total_objects = 64
list_lock
struct list_head full
*node[MAX_NUMNODES]
2
3
Get a page (slab) from ‘partial’ member
4
Move node’s partial page
(slab) to percpu cache
page struct (slab, slob or slub): node #0
struct page *next
struct kmem_cache *slab_cache
void *freelist (first free object)
s_mem: first object - SLAB
int pages
int pobjects
counters - SLUB
inuse: 16 = 1
objects:15 = 64
frozen: 1 = 0
union
struct
(SLUB)
union
struct
(Partial
Page)
list_head
struct list_head slab_list
slab
object
slab
object
..
slab
object
Page
slab
object
page struct (slab, slob or slub): node #1
Allocate a slab object
from node #0
1
kmem_cache
(kmem_cache)
*node[]
boot_kmem_cache
static local variable
Allocate slab objects
for node[0] & node[1]
Slowpath #1 – Get an object from node’s partial page
kmem_cache
(global variable: kmem_cache_node)
__percpu *cpu_slab
min_partial = 5
size = 64
object_size = 64
cpu_partial = 30
min = 64
inuse = 64
align = 64
name =
“kmem_cache_node”
list
*node[MAX_NUMNODES]
max = 64
oo (order objects) = 64
offset = 32
static local variable
boot_kmem_cache_node
kmem_cache_cpu
(CPU #N)
struct page *page = NULL
void **freelist = NULL
struct page *partial = NULL
Get a page (slab) from kmem_cache_node
kmem_cache_node (node 0)
nr_partial = 1
struct list_head partial
nr_slabs = 1
total_objects = 64
list_lock
struct list_head full
kmem_cache_node (node 1)
nr_partial = 1
struct list_head partial
nr_slabs = 1
total_objects = 64
list_lock
struct list_head full
*node[MAX_NUMNODES]
2
3
Get a page (slab) from ‘partial’ member
4
Move node’s partial page
(slab) to percpu cache
page struct (slab, slob or slub): node #0
struct page *next
struct kmem_cache *slab_cache
void *freelist (first free object)
s_mem: first object - SLAB
int pages
int pobjects
counters - SLUB
inuse: 16 = 1
objects:15 = 64
frozen: 1 = 0
union
struct
(SLUB)
union
struct
(Partial
Page)
list_head
struct list_head slab_list
slab
object
slab
object
..
slab
object
Page
slab
object
page struct (slab, slob or slub): node #1
Allocate a slab object
from node #0
1
kmem_cache
(kmem_cache)
*node[]
boot_kmem_cache
static local variable
Allocate slab objects
for node[0] & node[1]
Slowpath #1 – Get an object from node’s partial page
kmem_cache
(global variable: kmem_cache_node)
__percpu *cpu_slab
min_partial = 5
size = 64
object_size = 64
cpu_partial = 30
min = 64
inuse = 64
align = 64
name =
“kmem_cache_node”
list
*node[MAX_NUMNODES]
max = 64
oo (order objects) = 64
offset = 32
static local variable
boot_kmem_cache_node
kmem_cache_cpu
(CPU #N)
struct page *page
void **freelist
struct page *partial = NULL
kmem_cache_node (node 0)
nr_partial = 0
struct list_head partial
nr_slabs = 1
total_objects = 64
list_lock
struct list_head full
kmem_cache_node (node 1)
nr_partial = 1
struct list_head partial
nr_slabs = 1
total_objects = 64
list_lock
struct list_head full
*node[MAX_NUMNODES]
page struct (slab, slob or slub): node #1
Allocate a slab object
from node #0
kmem_cache
(kmem_cache)
*node[]
boot_kmem_cache
static local variable
Return this
object address
page struct (slab, slob or slub): node #0
struct page *next
struct kmem_cache *slab_cache
void *freelist (first free object) = NULL
s_mem: first object - SLAB
int pages
int pobjects
counters - SLUB
inuse: 16 = 64
objects:15 = 64
frozen: 1 = 1
union
struct
(SLUB)
union
struct
(Partial
Page)
struct list_head slab_list = invalid
slab
object
slab
object
..
slab
object
Page
slab
object
Allocate slab objects
for node[0] & node[1]
• node[0]: allocated
Slowpath #1 – Get an object from node’s partial page
kmem_cache
(global variable: kmem_cache_node)
__percpu *cpu_slab
min_partial = 5
size = 64
object_size = 64
cpu_partial = 30
min = 64
inuse = 64
align = 64
name =
“kmem_cache_node”
list
*node[MAX_NUMNODES]
max = 64
oo (order objects) = 64
offset = 32
static local variable
boot_kmem_cache_node
kmem_cache_cpu
(CPU #N)
struct page *page
void **freelist
struct page *partial = NULL
kmem_cache_node (node 0)
nr_partial = 0
struct list_head partial
nr_slabs = 1
total_objects = 64
list_lock
struct list_head full
kmem_cache_node (node 1)
nr_partial = 1
struct list_head partial
nr_slabs = 1
total_objects = 64
list_lock
struct list_head full
*node[MAX_NUMNODES]
page struct (slab, slob or slub): node #1
Allocate a slab object
from node #0
kmem_cache
(kmem_cache)
*node[]
boot_kmem_cache
static local variable
Allocate slab objects
for node[0] & node[1]
• node[0]: allocated
Return this
object address
page struct (slab, slob or slub): node #0
struct page *next
struct kmem_cache *slab_cache
void *freelist (first free object) = NULL
s_mem: first object - SLAB
int pages
int pobjects
counters - SLUB
inuse: 16 = 64
objects:15 = 64
frozen: 1 = 1
union
struct
(SLUB)
union
struct
(Partial
Page)
struct list_head slab_list = invalid
slab
object
slab
object
..
slab
object
Page
slab
object
Slowpath #1 – Get an object from node’s partial page
kmem_cache
(global variable: kmem_cache_node)
__percpu *cpu_slab
min_partial = 5
size = 64
object_size = 64
cpu_partial = 30
min = 64
inuse = 64
align = 64
name =
“kmem_cache_node”
list
*node[MAX_NUMNODES]
max = 64
oo (order objects) = 64
offset = 32
static local variable
boot_kmem_cache_node
kmem_cache_cpu
(CPU #N)
struct page *page
void **freelist
struct page *partial = NULL
kmem_cache_node (node 0)
nr_partial = 0
struct list_head partial
nr_slabs = 1
total_objects = 64
list_lock
struct list_head full
kmem_cache_node (node 1)
nr_partial = 1
struct list_head partial
nr_slabs = 1
total_objects = 64
list_lock
struct list_head full
*node[MAX_NUMNODES]
page struct (slab, slob or slub): node #1
Allocate a slab object
from node #1
kmem_cache
(kmem_cache)
*node[]
boot_kmem_cache
static local variable
Allocate slab objects
for node[0] & node[1]
• node[0]: allocated
page struct (slab, slob or slub): node #0
struct page *next
struct kmem_cache *slab_cache
void *freelist (first free object) = NULL
s_mem: first object - SLAB
int pages
int pobjects
counters - SLUB
inuse: 16 = 64
objects:15 = 64
frozen: 1 = 1
union
struct
(SLUB)
union
struct
(Partial
Page)
struct list_head slab_list = invalid
slab
object
slab
object
..
slab
object
Page
slab
object
1
2 percpu page’s node
is mismatch
3
deactivate_slab(): Remove percpu slab
Slowpath #1 – Get an object from node’s partial page
kmem_cache
(global variable: kmem_cache_node)
__percpu *cpu_slab
min_partial = 5
size = 64
object_size = 64
cpu_partial = 30
min = 64
inuse = 64
align = 64
name =
“kmem_cache_node”
list
*node[MAX_NUMNODES]
max = 64
oo (order objects) = 64
offset = 32
static local variable
boot_kmem_cache_node
kmem_cache_cpu
(CPU #N)
struct page *page
void **freelist
struct page *partial = NULL
kmem_cache_node (node 0)
nr_partial = 0
struct list_head partial
nr_slabs = 1
total_objects = 64
list_lock
struct list_head full
kmem_cache_node (node 1)
nr_partial = 1
struct list_head partial
nr_slabs = 1
total_objects = 64
list_lock
struct list_head full
*node[MAX_NUMNODES]
page struct (slab, slob or slub): node #1
Allocate a slab object
from node #1
kmem_cache
(kmem_cache)
*node[]
boot_kmem_cache
static local variable
Allocate slab objects
for node[0] & node[1]
• node[0]: allocated
page struct (slab, slob or slub): node #0
struct page *next
struct kmem_cache *slab_cache
void *freelist (first free object) = NULL
s_mem: first object - SLAB
int pages
int pobjects
counters - SLUB
inuse: 16 = 64
objects:15 = 64
frozen: 1 = 1
union
struct
(SLUB)
union
struct
(Partial
Page)
struct list_head slab_list = invalid
slab
object
slab
object
..
slab
object
Page
slab
object
1
2 percpu page’s node
is mismatch
3
deactivate_slab(): Remove percpu slab
Slowpath #1 – Get an object from node’s partial page
kmem_cache
(global variable: kmem_cache_node)
__percpu *cpu_slab
min_partial = 5
size = 64
object_size = 64
cpu_partial = 30
min = 64
inuse = 64
align = 64
name =
“kmem_cache_node”
list
*node[MAX_NUMNODES]
max = 64
oo (order objects) = 64
offset = 32
static local variable
boot_kmem_cache_node
kmem_cache_cpu
(CPU #N)
struct page *page = NULL
void **freelist = NULL
struct page *partial = NULL
kmem_cache_node (node 0)
nr_partial = 1
struct list_head partial
nr_slabs = 1
total_objects = 64
list_lock
struct list_head full
kmem_cache_node (node 1)
nr_partial = 1
struct list_head partial
nr_slabs = 1
total_objects = 64
list_lock
struct list_head full
*node[MAX_NUMNODES]
Allocate a slab object
from node #1
kmem_cache
(kmem_cache)
*node[]
boot_kmem_cache
static local variable
Allocate slab objects
for node[0] & node[1]
• node[0]: allocated
page struct (slab, slob or slub): node #0
struct page *next
struct kmem_cache *slab_cache
void *freelist (first free object)
s_mem: first object - SLAB
int pages
int pobjects
counters - SLUB
inuse: 16 = 2
objects:15 = 64
frozen: 1 = 0
union
struct
(SLUB)
union
struct
(Partial
Page)
struct list_head slab_list
slab
object
slab
object
..
slab
object
Page
slab
object
1
page struct (slab, slob or slub): node #1
struct page *next
struct kmem_cache *slab_cache
void *freelist (first free object)
s_mem: first object - SLAB
int pages
int pobjects
counters - SLUB
inuse: 16 = 1
objects:15 = 64
frozen: 1 = 0
union
struct
(SLUB)
union
struct
(Partial
Page)
struct list_head slab_list
slab
object
slab
object
..
slab
object
Page
slab
object
freepointers:
reverse!
Slowpath #1 – Get an object from node’s partial page
kmem_cache
(global variable: kmem_cache_node)
__percpu *cpu_slab
min_partial = 5
size = 64
object_size = 64
cpu_partial = 30
min = 64
inuse = 64
align = 64
name =
“kmem_cache_node”
list
*node[MAX_NUMNODES]
max = 64
oo (order objects) = 64
offset = 32
static local variable
boot_kmem_cache_node
kmem_cache_cpu
(CPU #N)
struct page *page = NULL
void **freelist = NULL
struct page *partial = NULL
kmem_cache_node (node 0)
nr_partial = 1
struct list_head partial
nr_slabs = 1
total_objects = 64
list_lock
struct list_head full
kmem_cache_node (node 1)
nr_partial = 1
struct list_head partial
nr_slabs = 1
total_objects = 64
list_lock
struct list_head full
*node[MAX_NUMNODES]
Allocate a slab object
from node #1
kmem_cache
(kmem_cache)
*node[]
boot_kmem_cache
static local variable
Allocate slab objects
for node[0] & node[1]
• node[0]: allocated
page struct (slab, slob or slub): node #0
struct page *next
struct kmem_cache *slab_cache
void *freelist (first free object)
s_mem: first object - SLAB
int pages
int pobjects
counters - SLUB
inuse: 16 = 2
objects:15 = 64
frozen: 1 = 0
union
struct
(SLUB)
union
struct
(Partial
Page)
struct list_head slab_list
1
page struct (slab, slob or slub): node #1
struct page *next
struct kmem_cache *slab_cache
void *freelist (first free object)
s_mem: first object - SLAB
int pages
int pobjects
counters - SLUB
inuse: 16 = 1
objects:15 = 64
frozen: 1 = 0
union
struct
(SLUB)
union
struct
(Partial
Page)
struct list_head slab_list
slab
object
slab
object
..
slab
object
Page
slab
object
slab
object
slab
object
..
slab
object
Page
slab
object
freepointers:
reverse!
Slowpath #1 – Get an object from node’s partial page
kmem_cache
(global variable: kmem_cache_node)
__percpu *cpu_slab
min_partial = 5
size = 64
object_size = 64
cpu_partial = 30
min = 64
inuse = 64
align = 64
name =
“kmem_cache_node”
list
*node[MAX_NUMNODES]
max = 64
oo (order objects) = 64
offset = 32
static local variable
boot_kmem_cache_node
kmem_cache_cpu
(CPU #N)
struct page *page = NULL
void **freelist = NULL
struct page *partial = NULL
kmem_cache_node (node 0)
nr_partial = 1
struct list_head partial
nr_slabs = 1
total_objects = 64
list_lock
struct list_head full
kmem_cache_node (node 1)
nr_partial = 1
struct list_head partial
nr_slabs = 1
total_objects = 64
list_lock
struct list_head full
*node[MAX_NUMNODES]
Allocate a slab object
from node #1
kmem_cache
(kmem_cache)
*node[]
boot_kmem_cache
static local variable
Allocate slab objects
for node[0] & node[1]
• node[0]: allocated
page struct (slab, slob or slub): node #0
struct page *next
struct kmem_cache *slab_cache
void *freelist (first free object)
s_mem: first object - SLAB
int pages
int pobjects
counters - SLUB
inuse: 16 = 2
objects:15 = 64
frozen: 1 = 0
union
struct
(SLUB)
union
struct
(Partial
Page)
struct list_head slab_list
page struct (slab, slob or slub): node #1
struct page *next
struct kmem_cache *slab_cache
void *freelist (first free object)
s_mem: first object - SLAB
int pages
int pobjects
counters - SLUB
inuse: 16 = 1
objects:15 = 64
frozen: 1 = 0
union
struct
(SLUB)
union
struct
(Partial
Page)
struct list_head slab_list
slab
object
slab
object
..
slab
object
Page
slab
object
slab
object
slab
object
..
slab
object
Page
slab
object
freepointers:
reverse!
4
Move node’s partial page
(slab) to percpu cache
Slowpath #1 – Get an object from node’s partial page
kmem_cache
(global variable: kmem_cache_node)
__percpu *cpu_slab
min_partial = 5
size = 64
object_size = 64
cpu_partial = 30
min = 64
inuse = 64
align = 64
name =
“kmem_cache_node”
list
*node[MAX_NUMNODES]
max = 64
oo (order objects) = 64
offset = 32
static local variable
boot_kmem_cache_node
kmem_cache_cpu
(CPU #N)
struct page *page
void **freelist
struct page *partial = NULL
kmem_cache_node (node 0)
nr_partial = 1
struct list_head partial
nr_slabs = 1
total_objects = 64
list_lock
struct list_head full
kmem_cache_node (node 1)
nr_partial = 1
struct list_head partial
nr_slabs = 1
total_objects = 64
list_lock
struct list_head full
*node[MAX_NUMNODES]
Allocate a slab object
from node #1
kmem_cache
(kmem_cache)
*node[]
boot_kmem_cache
static local variable
Allocate slab objects
for node[0] & node[1]
• node[0]: allocated
• node[1]: allocated
page struct (slab, slob or slub): node #1
struct page *next
struct kmem_cache *slab_cache
void *freelist (first free object) = NULL
s_mem: first object - SLAB
int pages
int pobjects
counters - SLUB
inuse: 16 = 64
objects:15 = 64
frozen: 1 = 1
union
struct
(SLUB)
union
struct
(Partial
Page)
struct list_head slab_list = invalid
slab
object
slab
object
..
slab
object
Page
slab
object
page struct (slab, slob or slub): node #0
struct page *next
struct kmem_cache *slab_cache
void *freelist (first free object)
s_mem: first object - SLAB
int pages
int pobjects
counters - SLUB
inuse: 16 = 2
objects:15 = 64
frozen: 1 = 0
union
struct
(SLUB)
union
struct
(Partial
Page)
struct list_head slab_list
slab
object
slab
object
..
slab
object
Page
slab
object
freepointers:
reverse!
Return this
object address
Slowpath #1 – Get an object from node’s partial page
kmem_cache_init()
start_kernel
1. Declare two static variables ‘boot_kmem_cache’ and ‘boot_kmem_cache_node’
2. Assign the addresses of two static variables to generic/global variables
kmem_cache_node = &boot_kmem_cache_node
kmem_cache = &boot_kmem_cache
__kmem_cache_create init_kmem_cache_nodes
calculate_sizes
kmem_cache_open
alloc_kmem_cache_cpus
mm_init kmem_cache_init
create_boot_cache(kmem_cache_node, …)
create_boot_cache(kmem_cache, …)
kmem_cache =
bootstrap(&boot_kmem_cache)
kmem_cache_node =
bootstrap(&boot_kmem_cache_node)
setup_kmalloc_cache_index_table
create_kmalloc_caches
__alloc_percpu
init_kmem_cache_cpus
Assign cpu id to
kmem_cache_cpu->tid
Allocate/init ‘kmem_cache_node’
structs for all CPU nodes (sockets)
Allocate/init a ‘kmem_cache_cpu’ struct
Let’s check data structures about kmem_cache & kmem_cache_node after
function call “create_boot_cache”
kmem_cache
(global variable: kmem_cache_node)
__percpu *cpu_slab
min_partial = 5
size = 64
object_size = 64
cpu_partial = 30
min = 64
inuse = 64
align = 64
name =
“kmem_cache_node”
list
*node[MAX_NUMNODES]
max = 64
oo (order objects) = 64
create_boot_cache(): 1/2
offset = 32
static local variable
boot_kmem_cache_node
kmem_cache_cpu
(CPU #N)
struct page *page
void **freelist
struct page *partial = NULL
kmem_cache_node (node 0)
nr_partial = 1
struct list_head partial
nr_slabs = 1
total_objects = 64
list_lock
struct list_head full
kmem_cache_node (node 1)
nr_partial = 1
struct list_head partial
nr_slabs = 1
total_objects = 64
list_lock
struct list_head full
*node[MAX_NUMNODES]
page struct (slab, slob or slub): node #1
struct page *next
struct kmem_cache *slab_cache
void *freelist (first free object) = NULL
s_mem: first object - SLAB
int pages
int pobjects
counters - SLUB
inuse: 16 = 64
objects:15 = 64
frozen: 1 = 1
union
struct
(SLUB)
union
struct
(Partial
Page)
struct list_head slab_list = invalid
slab
object
slab
object
..
slab
object
Page
slab
object
page struct (slab, slob or slub): node #0
struct page *next
struct kmem_cache *slab_cache
void *freelist (first free object)
s_mem: first object - SLAB
int pages
int pobjects
counters - SLUB
inuse: 16 = 2
objects:15 = 64
frozen: 1 = 0
union
struct
(SLUB)
union
struct
(Partial
Page)
struct list_head slab_list
slab
object
slab
object
..
slab
object
Page
slab
object
freepointers:
reverse!
Return this
object address
kmem_cache
(global variable: kmem_cache)
__percpu *cpu_slab
min_partial = 5
size = 256
object_size = 216
cpu_partial = 13
min = 0x10
inuse = 216
align = 64
name = “kmem_cache”
list
*node[MAX_NUMNODES]
max = 0x10020
oo (order objects) = 0x10020
create_boot_cache(): 2/2
offset = 112
static local variable
boot_kmem_cache kmem_cache_cpu
(CPU #N)
struct page *page = NULL
void **freelist = NULL
struct page *partial = NULL
kmem_cache_node (node 0)
nr_partial = 0
struct list_head partial
nr_slabs = 0
total_objects = 0
list_lock
struct list_head full
kmem_cache_node (node 1)
nr_partial = 0
struct list_head partial
nr_slabs = 0
total_objects = 0
list_lock
struct list_head full
*node[MAX_NUMNODES]
kmem_cache_init()
start_kernel
1. Declare two static variables ‘boot_kmem_cache’ and ‘boot_kmem_cache_node’
2. Assign the addresses of two static variables to generic/global variables
kmem_cache_node = &boot_kmem_cache_node
kmem_cache = &boot_kmem_cache
__kmem_cache_create init_kmem_cache_nodes
calculate_sizes
kmem_cache_open
alloc_kmem_cache_cpus
mm_init kmem_cache_init
create_boot_cache(kmem_cache_node, …)
create_boot_cache(kmem_cache, …)
kmem_cache =
bootstrap(&boot_kmem_cache)
kmem_cache_node =
bootstrap(&boot_kmem_cache_node)
setup_kmalloc_cache_index_table
create_kmalloc_caches
__alloc_percpu
init_kmem_cache_cpus
Assign cpu id to
kmem_cache_cpu->tid
Allocate/init ‘kmem_cache_node’
structs for all CPU nodes (sockets)
Allocate/init a ‘kmem_cache_cpu’ struct
bootstrap
s = kmem_cache_zalloc(kmem_cache, …)
Allocate a kmem_cache struct from global variable ‘kmem_cache’: Generic path
__flush_cpu_slab
list_add(&s->list, &slab_caches)
for_each_kmem_cache_node(s, node, n)
kmem_cache = bootstrap(&boot_kmem_cache)
memcpy(s, static_cache, …)
return s
list_for_each_entry(p, &n->partial, slab_list)
struct page *p;
p->slab_cache = s
Get a slab object from c->freelist
kmem_cache_alloc_node
!c->freelist or
!c-> page
slab_alloc_node(): slowpath #2 – Allocate a page from Buddy system
__slab_alloc
kmem_cache_alloc slab_alloc slab_alloc_node
NUMA_NO_NODE
__kmalloc_node
kmalloc_node node_id
node_id
___slab_alloc
new_slab_objects
c->page = slub_percpu_partial(c)
c->page &&
!c->freelist
!c->page && c->partial
get_partial
Get partial page from percpu cache
new_slab allocate_slab
alloc_slab_page
Allocate pages from
Buddy system
!c->page && !c->partial
Get partial page from node cache
(kmem_cache_node)
Y
N
fast path
slow path get_freelist -> __cmpxchg_double_slab
this_cpu_cmpxchg_double()
bootstrap
s = kmem_cache_zalloc(kmem_cache, …)
Allocate a kmem_cache struct from global variable ‘kmem_cache’
Get a slab object from page->freelist
kmem_cache
(global variable: kmem_cache)
__percpu *cpu_slab
min_partial = 5
size = 256
object_size = 216
cpu_partial = 13
min = 0x10
inuse = 216
align = 64
name = “kmem_cache”
list
*node[MAX_NUMNODES]
max = 0x10020
oo (order objects) = 0x10020
offset = 112
static local variable
boot_kmem_cache kmem_cache_cpu
(CPU #N)
struct page *page = NULL
void **freelist = NULL
struct page *partial = NULL
kmem_cache_node (node 0)
nr_partial = 0
struct list_head partial
nr_slabs = 0
total_objects = 0
list_lock
struct list_head full
kmem_cache_node (node 1)
nr_partial = 0
struct list_head partial
nr_slabs = 0
total_objects = 0
list_lock
struct list_head full
*node[MAX_NUMNODES]
[slowpath #2] Before: bootstrap(&boot_kmem_cache) -> kmem_cache_zalloc()
kmem_cache
(global variable: kmem_cache)
__percpu *cpu_slab
min_partial = 5
size = 256
object_size = 216
cpu_partial = 13
min = 0x10
inuse = 216
align = 64
name = “kmem_cache”
list
*node[MAX_NUMNODES]
max = 0x10020
oo (order objects) = 0x10020
offset = 112
static local variable
boot_kmem_cache kmem_cache_cpu
(CPU #N)
struct page *page
void **freelist
struct page *partial = NULL
kmem_cache_node (node 0)
nr_partial = 0
struct list_head partial
nr_slabs = 0
total_objects = 0
list_lock
struct list_head full
kmem_cache_node (node 1)
nr_partial = 0
struct list_head partial
nr_slabs = 0
total_objects = 0
list_lock
struct list_head full
*node[MAX_NUMNODES]
page struct (slab, slob or slub): order =1
struct page *next
struct kmem_cache *slab_cache
void *freelist (first free object) = NULL
s_mem: first object - SLAB
int pages
int pobjects
counters - SLUB
inuse: 16 = 32
objects:15 = 32
frozen: 1 = 1
union
struct
(SLUB)
union
struct
(Partial
Page)
struct list_head slab_list = invalid
slab
object
slab
object
..
slab
object
Slab Page(s)
slab
object
Return this
object address
Page order objects per slab
0
15
16
oo (order objects)
kmem_cache_zalloc(k
mem_cache, …)
1
2 Allocate from Buddy system
3
slab_alloc_node(): slowpath #2 – Allocate a page from Buddy system
[slowpath #2] After: bootstrap(&boot_kmem_cache) -> kmem_cache_zalloc()
kmem_cache
(global variable: kmem_cache)
__percpu *cpu_slab
min_partial = 5
size = 256
object_size = 216
cpu_partial = 13
min = 0x10
inuse = 216
align = 64
name = “kmem_cache”
list
*node[MAX_NUMNODES]
max = 0x10020
oo (order objects) = 0x10020
offset = 112
static local variable
boot_kmem_cache kmem_cache_cpu
(CPU #N)
struct page *page
void **freelist
struct page *partial = NULL
kmem_cache_node (node 0)
nr_partial = 0
struct list_head partial
nr_slabs = 0
total_objects = 0
list_lock
struct list_head full
kmem_cache_node (node 1)
nr_partial = 0
struct list_head partial
nr_slabs = 0
total_objects = 0
list_lock
struct list_head full
*node[MAX_NUMNODES]
page struct (slab, slob or slub): order =1
struct page *next
struct kmem_cache *slab_cache
void *freelist (first free object) = NULL
s_mem: first object - SLAB
int pages
int pobjects
counters - SLUB
inuse: 16 = 32
objects:15 = 32
frozen: 1 = 1
union
struct
(SLUB)
union
struct
(Partial
Page)
struct list_head slab_list = invalid
slab
object
slab
object
..
slab
object
Slab Page(s)
slab
object
Return this
object address
Page order objects per slab
0
15
16
oo (order objects)
kmem_cache_zalloc(k
mem_cache, …)
1
2 Allocate from Buddy system
3
[slowpath #2] After: bootstrap(&boot_kmem_cache) -> kmem_cache_zalloc()
kmem_cache
(global variable: kmem_cache)
__percpu *cpu_slab
min_partial = 5
size = 256
object_size = 216
cpu_partial = 13
min = 0x10
inuse = 216
align = 64
name = “kmem_cache”
list
*node[MAX_NUMNODES]
max = 0x10020
oo (order objects) = 0x10020
offset = 112
static local variable
boot_kmem_cache kmem_cache_cpu
(CPU #N)
struct page *page = NULL
void **freelist = NULL
struct page *partial = NULL
kmem_cache_node (node 0)
nr_partial = 1
struct list_head partial
nr_slabs = 1
total_objects = 32
list_lock
struct list_head full
kmem_cache_node (node 1)
nr_partial = 0
struct list_head partial
nr_slabs = 0
total_objects = 0
list_lock
struct list_head full
*node[MAX_NUMNODES]
page struct (slab, slob or slub): order =1
struct page *next
struct kmem_cache *slab_cache
void *freelist (first free object) = NULL
s_mem: first object - SLAB
int pages
int pobjects
counters - SLUB
inuse: 16 = 32
objects:15 = 32
frozen: 1 = 1
union
struct
(SLUB)
union
struct
(Partial
Page)
struct list_head slab_list = invalid
slab
object
slab
object
..
slab
object
Slab Page(s)
slab
object
Page order objects per slab
0
15
16
oo (order objects)
kmem_cache_zalloc(k
mem_cache, …)
X
X
page struct (slab, slob or slub): order =1
struct page *next
struct kmem_cache *slab_cache
void *freelist (first free object)
s_mem: first object - SLAB
int pages
int pobjects
counters - SLUB
inuse: 16 = 1
objects:15 = 32
frozen: 1 = 0
union
struct
(SLUB)
union
struct
(Partial
Page)
struct list_head slab_list
slab
object
slab
object
..
slab
object
Slab Page(s)
slab
object
freepointers:
reverse!
deactivate_slab(): Remove percpu slab
[slowpath #2] After: bootstrap(&boot_kmem_cache) -> __flush_cpu_slab()
kmem_cache
(global variable: kmem_cache)
__percpu *cpu_slab
min_partial = 5
size = 256
object_size = 216
cpu_partial = 13
min = 0x10
inuse = 216
align = 64
name = “kmem_cache”
list
*node[MAX_NUMNODES]
max = 0x10020
oo (order objects) = 0x10020
offset = 112
kmem_cache_cpu
(CPU #N)
struct page *page = NULL
void **freelist = NULL
struct page *partial = NULL
kmem_cache_node (node 0)
nr_partial = 1
struct list_head partial
nr_slabs = 1
total_objects = 32
list_lock
struct list_head full
kmem_cache_node (node 1)
nr_partial = 0
struct list_head partial
nr_slabs = 0
total_objects = 0
list_lock
struct list_head full
*node[MAX_NUMNODES]
Page order objects per slab
0
15
16
oo (order objects)
page struct (slab, slob or slub): order =1
struct page *next
struct kmem_cache *slab_cache
void *freelist (first free object)
s_mem: first object - SLAB
int pages
int pobjects
counters - SLUB
inuse: 16 = 1
objects:15 = 32
frozen: 1 = 0
union
struct
(SLUB)
union
struct
(Partial
Page)
struct list_head slab_list
slab
object
slab
object
..
slab
object
Slab Page(s)
slab
object
freepointers:
reverse!
kmem_cache = bootstrap(&boot_kmem_cache): after function returns
static local variable
boot_kmem_cache
kmem_cache
global variable
X
kmem_cache
(global variable: kmem_cache)
__percpu *cpu_slab
min_partial = 5
size = 256
object_size = 216
cpu_partial = 13
min = 0x10
inuse = 216
align = 64
name = “kmem_cache”
list
*node[MAX_NUMNODES]
max = 0x10020
oo (order objects) = 0x10020
offset = 112
kmem_cache_cpu
(CPU #N)
struct page *page = NULL
void **freelist = NULL
struct page *partial = NULL
kmem_cache_node (node 0)
nr_partial = 1
struct list_head partial
nr_slabs = 1
total_objects = 32
list_lock
struct list_head full
kmem_cache_node (node 1)
nr_partial = 0
struct list_head partial
nr_slabs = 0
total_objects = 0
list_lock
struct list_head full
*node[MAX_NUMNODES]
Page order objects per slab
0
15
16
oo (order objects)
page struct (slab, slob or slub): order =1
struct page *next
struct kmem_cache *slab_cache
void *freelist (first free object)
s_mem: first object - SLAB
int pages
int pobjects
counters - SLUB
inuse: 16 = 1
objects:15 = 32
frozen: 1 = 0
union
struct
(SLUB)
union
struct
(Partial
Page)
struct list_head slab_list
slab
object
slab
object
..
slab
object
Slab Page(s)
slab
object
freepointers:
reverse!
kmem_cache = bootstrap(&boot_kmem_cache): after function returns
static local variable
boot_kmem_cache
kmem_cache
global variable
X
kmem_cache_init()
start_kernel
1. Declare two static variables ‘boot_kmem_cache’ and ‘boot_kmem_cache_node’
2. Assign the addresses of two static variables to generic/global variables
kmem_cache_node = &boot_kmem_cache_node
kmem_cache = &boot_kmem_cache
__kmem_cache_create init_kmem_cache_nodes
calculate_sizes
kmem_cache_open
alloc_kmem_cache_cpus
mm_init kmem_cache_init
create_boot_cache(kmem_cache_node, …)
create_boot_cache(kmem_cache, …)
kmem_cache =
bootstrap(&boot_kmem_cache)
kmem_cache_node =
bootstrap(&boot_kmem_cache_node)
setup_kmalloc_cache_index_table
create_kmalloc_caches
__alloc_percpu
init_kmem_cache_cpus
Assign cpu id to
kmem_cache_cpu->tid
Allocate/init ‘kmem_cache_node’
structs for all CPU nodes (sockets)
Allocate/init a ‘kmem_cache_cpu’ struct
Get a slab object from c->freelist
kmem_cache_alloc_node
!c->freelist or
!c-> page
slab_alloc_node(): slowpath #1 – Get an object from node’s partial page
__slab_alloc
kmem_cache_alloc slab_alloc slab_alloc_node
NUMA_NO_NODE
__kmalloc_node
kmalloc_node node_id
node_id
___slab_alloc
new_slab_objects
c->page = slub_percpu_partial(c)
c->page &&
!c->freelist
!c->page && c->partial
get_partial
Get partial page from percpu cache
new_slab allocate_slab
alloc_slab_page
Allocate pages from
Buddy system
!c->page && !c->partial
Get partial page from node cache
(kmem_cache_node)
Y
N
fast path
slow path get_freelist -> __cmpxchg_double_slab
this_cpu_cmpxchg_double()
bootstrap
s = kmem_cache_zalloc(kmem_cache, …)
Allocate a kmem_cache struct from global variable ‘kmem_cache’
kmem_cache_node = bootstrap(&boot_kmem_cache_node)
Get a slab object from page->freelist
kmem_cache
(global variable: kmem_cache)
__percpu *cpu_slab
min_partial = 5
size = 256
object_size = 216
cpu_partial = 13
min = 0x10
inuse = 216
align = 64
name = “kmem_cache”
list
*node[MAX_NUMNODES]
max = 0x10020
oo (order objects) = 0x10020
offset = 112
kmem_cache_cpu
(CPU #N)
struct page *page = NULL
void **freelist = NULL
struct page *partial = NULL
kmem_cache_node (node 0)
nr_partial = 0
struct list_head partial
nr_slabs = 1
total_objects = 32
list_lock
struct list_head full
kmem_cache_node (node 1)
nr_partial = 0
struct list_head partial
nr_slabs = 0
total_objects = 0
list_lock
struct list_head full
*node[MAX_NUMNODES]
Page order objects per slab
0
15
16
oo (order objects)
page struct (slab, slob or slub): order =1
struct page *next
struct kmem_cache *slab_cache
void *freelist (first free object)
s_mem: first object - SLAB
int pages
int pobjects
counters - SLUB
inuse: 16 = 1
objects:15 = 32
frozen: 1 = 0
union
struct
(SLUB)
union
struct
(Partial
Page)
struct list_head slab_list
slab
object
slab
object
..
slab
object
Slab Page(s)
slab
object
freepointers:
reverse!
kmem_cache_node = bootstrap(&boot_kmem_cache_node) -> kmem_cache_zalloc(…)
kmem_cache
global variable
page struct (slab, slob or slub): order =1
struct page *next
struct kmem_cache *slab_cache
void *freelist (first free object) = NULL
s_mem: first object - SLAB
int pages
int pobjects
counters - SLUB
inuse: 16 = 32
objects:15 = 32
frozen: 1 = 1
union
struct
(SLUB)
union
struct
(Partial
Page)
struct list_head slab_list = invalid
slab
object
slab
object
..
slab
object
Slab Page(s)
slab
object
slab
object
Return this
object address
get_partial_node() -> acquire_slab()
kmem_cache_zalloc(k
mem_cache, …)
slab_alloc_node(): slowpath #1 – Get an object from node’s partial page
kmem_cache
(global variable: kmem_cache)
__percpu *cpu_slab
min_partial = 5
size = 256
object_size = 216
cpu_partial = 13
min = 0x10
inuse = 216
align = 64
name = “kmem_cache”
list
*node[MAX_NUMNODES]
max = 0x10020
oo (order objects) = 0x10020
offset = 112
kmem_cache_cpu
(CPU #N)
struct page *page
void **freelist
struct page *partial = NULL
kmem_cache_node (node 0)
nr_partial = 0
struct list_head partial
nr_slabs = 1
total_objects = 32
list_lock
struct list_head full
kmem_cache_node (node 1)
nr_partial = 0
struct list_head partial
nr_slabs = 0
total_objects = 0
list_lock
struct list_head full
*node[MAX_NUMNODES]
Page order objects per slab
0
15
16
oo (order objects)
page struct (slab, slob or slub): order =1
struct page *next
struct kmem_cache *slab_cache
void *freelist (first free object)
s_mem: first object - SLAB
int pages
int pobjects
counters - SLUB
inuse: 16 = 1
objects:15 = 32
frozen: 1 = 0
union
struct
(SLUB)
union
struct
(Partial
Page)
struct list_head slab_list
slab
object
slab
object
..
slab
object
Slab Page(s)
slab
object
freepointers:
reverse!
kmem_cache
global variable
page struct (slab, slob or slub): order =1
struct page *next
struct kmem_cache *slab_cache
void *freelist (first free object) = NULL
s_mem: first object - SLAB
int pages
int pobjects
counters - SLUB
inuse: 16 = 32
objects:15 = 32
frozen: 1 = 1
union
struct
(SLUB)
union
struct
(Partial
Page)
struct list_head slab_list = invalid
slab
object
slab
object
..
slab
object
Slab Page(s)
slab
object
slab
object
Return this
object address
get_partial_node() -> acquire_slab()
kmem_cache_node = bootstrap(&boot_kmem_cache_node) -> kmem_cache_zalloc(…)
kmem_cache
(global variable: kmem_cache_node)
__percpu *cpu_slab
min_partial = 5
size = 64
object_size = 64
cpu_partial = 30
min = 64
inuse = 64
align = 64
name =
“kmem_cache_node”
list
*node[MAX_NUMNODES]
max = 64
oo (order objects) = 64
offset = 32
static local variable
boot_kmem_cache_node
kmem_cache_cpu
(CPU #N)
struct page *page
void **freelist
struct page *partial = NULL
kmem_cache_node (node 0)
nr_partial = 1
struct list_head partial
nr_slabs = 1
total_objects = 64
list_lock
struct list_head full
kmem_cache_node (node 1)
nr_partial = 1
struct list_head partial
nr_slabs = 1
total_objects = 64
list_lock
struct list_head full
*node[MAX_NUMNODES]
page struct (slab, slob or slub): node #1
struct page *next
struct kmem_cache *slab_cache
void *freelist (first free object) = NULL
s_mem: first object - SLAB
int pages
int pobjects
counters - SLUB
inuse: 16 = 64
objects:15 = 64
frozen: 1 = 1
union
struct
(SLUB)
union
struct
(Partial
Page)
struct list_head slab_list = invalid
slab
object
slab
object
..
slab
object
Page
slab
object
page struct (slab, slob or slub): node #0
struct page *next
struct kmem_cache *slab_cache
void *freelist (first free object)
s_mem: first object - SLAB
int pages
int pobjects
counters - SLUB
inuse: 16 = 2
objects:15 = 64
frozen: 1 = 0
union
struct
(SLUB)
union
struct
(Partial
Page)
struct list_head slab_list
slab
object
slab
object
..
slab
object
Page
slab
object
freepointers:
reverse!
kmem_cache_node = bootstrap(&boot_kmem_cache_node) -> __flush_cpu_slab()
deactivate_slab(): Remove percpu slab
Reference from page #49
kmem_cache
(global variable: kmem_cache_node)
__percpu *cpu_slab
min_partial = 5
size = 64
object_size = 64
cpu_partial = 30
min = 64
inuse = 64
align = 64
name =
“kmem_cache_node”
list
*node[MAX_NUMNODES]
max = 64
oo (order objects) = 64
offset = 32
static local variable
boot_kmem_cache_node
kmem_cache_cpu
(CPU #N)
struct page *page = NULL
void **freelist = NULL
struct page *partial = NULL
kmem_cache_node (node 0)
nr_partial = 1
struct list_head partial
nr_slabs = 1
total_objects = 64
list_lock
struct list_head full
kmem_cache_node (node 1)
nr_partial = 1
struct list_head partial
nr_slabs = 1
total_objects = 64
list_lock
struct list_head full
*node[MAX_NUMNODES]
page struct (slab, slob or slub): node #1
struct page *next
struct kmem_cache *slab_cache
void *freelist (first free object)
s_mem: first object - SLAB
int pages
int pobjects
counters - SLUB
inuse: 16 = 2
objects:15 = 64
frozen: 1 = 0
union
struct
(SLUB)
union
struct
(Partial
Page)
struct list_head slab_list
slab
object
slab
object
..
slab
object
Page
slab
object
page struct (slab, slob or slub): node #0
struct page *next
struct kmem_cache *slab_cache
void *freelist (first free object)
s_mem: first object - SLAB
int pages
int pobjects
counters - SLUB
inuse: 16 = 2
objects:15 = 64
frozen: 1 = 0
union
struct
(SLUB)
union
struct
(Partial
Page)
struct list_head slab_list
slab
object
slab
object
..
slab
object
Page
slab
object
freepointers:
reverse!
kmem_cache_node = bootstrap(&boot_kmem_cache_node) -> __flush_cpu_slab()
freepointers:
reverse!
kmem_cache
(global variable: kmem_cache_node)
__percpu *cpu_slab
min_partial = 5
size = 64
object_size = 64
cpu_partial = 30
min = 64
inuse = 64
align = 64
name =
“kmem_cache_node”
list
*node[MAX_NUMNODES]
max = 64
oo (order objects) = 64
offset = 32
static local variable
boot_kmem_cache_node
kmem_cache_cpu
(CPU #N)
struct page *page = NULL
void **freelist = NULL
struct page *partial = NULL
kmem_cache_node (node 0)
nr_partial = 1
struct list_head partial
nr_slabs = 1
total_objects = 64
list_lock
struct list_head full
kmem_cache_node (node 1)
nr_partial = 1
struct list_head partial
nr_slabs = 1
total_objects = 64
list_lock
struct list_head full
*node[MAX_NUMNODES]
page struct (slab, slob or slub): node #1
struct page *next
struct kmem_cache *slab_cache
void *freelist (first free object)
s_mem: first object - SLAB
int pages
int pobjects
counters - SLUB
inuse: 16 = 2
objects:15 = 64
frozen: 1 = 0
union
struct
(SLUB)
union
struct
(Partial
Page)
struct list_head slab_list
slab
object
slab
object
..
slab
object
Page
slab
object
page struct (slab, slob or slub): node #0
struct page *next
struct kmem_cache *slab_cache
void *freelist (first free object)
s_mem: first object - SLAB
int pages
int pobjects
counters - SLUB
inuse: 16 = 2
objects:15 = 64
frozen: 1 = 0
union
struct
(SLUB)
union
struct
(Partial
Page)
struct list_head slab_list
slab
object
slab
object
..
slab
object
Page
slab
object
freepointers:
reverse!
kmem_cache_node = bootstrap(&boot_kmem_cache_node) -> __flush_cpu_slab()
freepointers:
reverse!
kmem_cache
__percpu *cpu_slab
min_partial = 5
size = 64
object_size = 64
cpu_partial = 30
min = 64
inuse = 64
align = 64
name =
“kmem_cache_node”
list
*node[MAX_NUMNODES]
max = 64
oo (order objects) = 64
offset = 32
static struct kmem_cache
*kmem_cache_node
kmem_cache_cpu
(CPU #N)
struct page *page = NULL
void **freelist = NULL
struct page *partial = NULL
kmem_cache_node (node 0)
nr_partial = 1
struct list_head partial
nr_slabs = 1
total_objects = 64
list_lock
struct list_head full
kmem_cache_node (node 1)
nr_partial = 1
struct list_head partial
nr_slabs = 1
total_objects = 64
list_lock
struct list_head full
*node[MAX_NUMNODES]
page struct (slab, slob or slub): node #1
struct page *next
struct kmem_cache *slab_cache
void *freelist (first free object)
s_mem: first object - SLAB
int pages
int pobjects
counters - SLUB
inuse: 16 = 2
objects:15 = 64
frozen: 1 = 0
union
struct
(SLUB)
union
struct
(Partial
Page)
struct list_head slab_list
slab
object
slab
object
..
slab
object
Page
slab
object
page struct (slab, slob or slub): node #0
struct page *next
struct kmem_cache *slab_cache
void *freelist (first free object)
s_mem: first object - SLAB
int pages
int pobjects
counters - SLUB
inuse: 16 = 2
objects:15 = 64
frozen: 1 = 0
union
struct
(SLUB)
union
struct
(Partial
Page)
struct list_head slab_list
slab
object
slab
object
..
slab
object
Page
slab
object
freepointers:
reverse!
After ‘kmem_cache_node = bootstrap(&boot_kmem_cache_node)’
freepointers:
reverse!
kmem_cache_init()
start_kernel
1. Declare two static variables ‘boot_kmem_cache’ and ‘boot_kmem_cache_node’
2. Assign the addresses of two static variables to generic/global variables
kmem_cache_node = &boot_kmem_cache_node
kmem_cache = &boot_kmem_cache
__kmem_cache_create init_kmem_cache_nodes
calculate_sizes
kmem_cache_open
alloc_kmem_cache_cpus
mm_init kmem_cache_init
create_boot_cache(kmem_cache_node, …)
create_boot_cache(kmem_cache, …)
kmem_cache =
bootstrap(&boot_kmem_cache)
kmem_cache_node =
bootstrap(&boot_kmem_cache_node)
setup_kmalloc_cache_index_table
create_kmalloc_caches
__alloc_percpu
init_kmem_cache_cpus
Assign cpu id to
kmem_cache_cpu->tid
Allocate/init ‘kmem_cache_node’
structs for all CPU nodes (sockets)
Allocate/init a ‘kmem_cache_cpu’ struct
Will talk about this later
Get a slab object from c->freelist
kmem_cache_alloc_node
!c->freelist or
!c-> page
slab_alloc_node(): slowpath #3 – Get a partial page from percpu cache
__slab_alloc
kmem_cache_alloc slab_alloc slab_alloc_node
NUMA_NO_NODE
__kmalloc_node
kmalloc_node node_id
node_id
___slab_alloc
new_slab_objects
c->page = slub_percpu_partial(c)
c->page &&
!c->freelist
!c->page && c->partial
get_partial
Get partial page from percpu cache
new_slab allocate_slab
alloc_slab_page
Allocate pages from
Buddy system
!c->page && !c->partial
Get partial page from node cache
(kmem_cache_node)
Y
N
fast path
slow path get_freelist -> __cmpxchg_double_slab
this_cpu_cmpxchg_double()
Get a slab object from page->freelist
Let’s talk about another slow path
kmem_cache
__percpu *cpu_slab
*node[MAX_NUMNODES]
kmem_cache_cpu
(CPU #N)
struct page *page = NULL or valid
void **freelist = NULL or valid
struct page *partial
kmem_cache_node
(node 0)
struct list_head partial
kmem_cache_node
(node 1)
struct list_head partial
*node[MAX_NUMNODES]
slowpath #3: Get a partial page from percpu cache
slab
object
slab
object
..
slab
object
Page
page->next
Slowpath #3: Get a partial page from percpu cache
slab
object
slab
object
..
slab
object
Page
page->freelist page->freelist
1
kmem_cache
__percpu *cpu_slab
*node[MAX_NUMNODES]
kmem_cache_cpu
(CPU #N)
struct page *page = NULL
void **freelist = NULL
struct page *partial = NULL
kmem_cache_node
(node 0)
struct list_head partial
kmem_cache_node
(node 1)
struct list_head partial
*node[MAX_NUMNODES]
slab
object
slab
object
..
slab
object
Page
slab
object
slab
object
..
slab
object
Page
slab
object
slab
object
..
slab
object
Page
Slowpath #1: Get a slab object from node’s partial (new_slab_objects->get_partial-
>get_partial_node->put_cpu_partial)
• [Just for a scenario] Acquire 3 pages because of the number of kmem_cache->cpu_partial
1
When to add a page to percpu partial page? – Case 1 (1/2)
put_cpu_partial(): Put page(s) into a percpu partial page
kmem_cache
__percpu *cpu_slab
*node[MAX_NUMNODES]
kmem_cache_cpu
(CPU #N)
struct page *page
void **freelist
struct page *partial
kmem_cache_node
(node 0)
struct list_head partial
kmem_cache_node
(node 1)
struct list_head partial
*node[MAX_NUMNODES]
When to add a page to percpu partial page? – Case 1 (2/2)
slab
object
slab
object
..
slab
object
Page
slab
object
slab
object
..
slab
object
Page
slab
object
slab
object
..
slab
object
Page
slab
object
slab
object
..
slab
object
Page
page->next
slab
object
slab
object
..
slab
object
Page
1
slab
object
slab
object
..
slab
object
Page
page->freelist page->freelist
Slowpath #1: Get a slab object from node’s partial (new_slab_objects->get_partial-
>get_partial_node->put_cpu_partial)
• [Just for a scenario] Acquire 3 pages because of the number of kmem_cache->cpu_partial
put_cpu_partial(): Put page(s) into a percpu partial page
kmem_cache
__percpu *cpu_slab
*node[MAX_NUMNODES]
kmem_cache_cpu
(CPU #N)
struct page *page
void **freelist
struct page *partial
kmem_cache_node
(node 0)
struct list_head partial
kmem_cache_node
(node 1)
struct list_head partial
*node[MAX_NUMNODES]
When to add a page to percpu partial page? – Case 2: slab_free()
slab
object
slab
object
..
slab
object
Page
slab
object
slab
object
..
slab
object
Page
page->next
slab
object
slab
object
..
slab
object
Page
slab
object
slab
object
..
slab
object
Page
page->freelist page->freelist
slab_free(): Move page(s) to percpu partial and node partial (if possible)
kmem_cache
__percpu *cpu_slab
*node[MAX_NUMNODES]
kmem_cache_cpu
(CPU #N)
struct page *page
void **freelist
struct page *partial
kmem_cache_node
(node 0)
struct list_head partial
kmem_cache_node
(node 1)
struct list_head partial
*node[MAX_NUMNODES]
free this object
slab
object
slab
object
..
slab
object
Page
page->next
put_cpu_partial(): Move this
page to percpu partial
2
slab
object
slab
object
..
slab
object
Page
slab
object
slab
object
..
slab
object
Page
slab
object
slab
object
..
slab
object
Page
..
slab
object
slab
object
..
slab
object
Page
1
Fully-allocated objects of pages
When to add a page to percpu partial page? – Case 2: slab_free()
slab_free(): Move page(s) to percpu partial and node partial (if possible)
* Detail will be discussed in section “[Cache release/free] kmem_cache_free()”
Get a slab object from c->freelist
kmem_cache_alloc_node
!c->freelist or
!c-> page
slab_alloc_node(): slowpath #4 – Get a slab object page from page->freelist
__slab_alloc
kmem_cache_alloc slab_alloc slab_alloc_node
NUMA_NO_NODE
__kmalloc_node
kmalloc_node node_id
node_id
___slab_alloc
new_slab_objects
c->page = slub_percpu_partial(c)
c->page &&
!c->freelist
!c->page && c->partial
get_partial
Get partial page from percpu cache
new_slab allocate_slab
alloc_slab_page
Allocate pages from
Buddy system
!c->page && !c->partial
Get partial page from node cache
(kmem_cache_node)
Y
N
fast path
slow path get_freelist -> __cmpxchg_double_slab
this_cpu_cmpxchg_double()
Get a slab object from page->freelist
kmem_cache
__percpu *cpu_slab
*node[MAX_NUMNODES]
kmem_cache_cpu
(CPU #N)
struct page *page
void **freelist = NULL
struct page *partial
kmem_cache_node
(node 0)
struct list_head partial
kmem_cache_node
(node 1)
struct list_head partial
*node[MAX_NUMNODES]
slowpath #4: Get a slab object from page->freelist
slab
object
slab
object
..
slab
object
Page
page->next
slab
object
slab
object
..
slab
object
Page
page->freelist page->freelist
page descriptor address = virtual memory map = vmemmap_base: page #5 of Decompressed
vmlinux: linux kernel initialization from page table configuration perspective
kmem_cache
__percpu *cpu_slab
*node[MAX_NUMNODES]
kmem_cache_cpu
(CPU #N)
struct page *page
void **freelist = NULL
struct page *partial
kmem_cache_node
(node 0)
struct list_head partial
kmem_cache_node
(node 1)
struct list_head partial
*node[MAX_NUMNODES]
slowpath #4: Get a slab object from page->freelist
slab
object
slab
object
..
slab
object
Page
page->next
slab
object
slab
object
..
slab
object
Page
page->freelist page->freelist
page = c->page =
slub_percpu_partial(c)
1
slub_set_percpu_partial(c, page)
2
kmem_cache
__percpu *cpu_slab
*node[MAX_NUMNODES]
kmem_cache_cpu
(CPU #N)
struct page *page
void **freelist = NULL
struct page *partial
slowpath #4: Get a slab object from page->freelist
slab
object
slab
object
..
slab
object
Page
page->next
slab
object
slab
object
..
slab
object
Page
page->freelist
page->freelist
kmem_cache
__percpu *cpu_slab
*node[MAX_NUMNODES]
kmem_cache_cpu
(CPU #N)
struct page *page
void **freelist
struct page *partial
slowpath #4: Get a slab object from page->freelist
slab
object
slab
object
..
slab
object
Page
slab
object
slab
object
..
slab
object
Page
page->freelist
page->freelist = NULL
slab
object
1
2
/proc/slabinfo
* active_objs = num_objs - free_objs
[Cache release/free] kmem_cache_free()
• Fast path & slow path
Return the slab object to c->freelist
kmem_cache_free
page == c->page?
kmem_cache_free()
__slab_free cmpxchg_double_slab
N
fast path: percpu cache
slow path: node cache
this_cpu_cmpxchg_double()
slab_free do_slab_free
put_cpu_partial
add_partial
remove_partial
discard_slab __free_slab __free_pages
Return the slab object
to page->freelist
Slowpath#3: Slab objects in page slab
are all free: free to Buddy system
Y
Slowpath#1: Move page(s) to percpu
partial or node partial (if possible)
Slowpath#2: Move pages to node partial
kmem_cache
__percpu *cpu_slab
*node[MAX_NUMNODES]
kmem_cache_cpu
(CPU #N)
struct page *page
void **freelist
struct page *partial
kmem_cache_node
(node 0)
struct list_head partial
kmem_cache_node
(node 1)
struct list_head partial
*node[MAX_NUMNODES]
slowpath #1:put_cpu_partial() – Case 1: Move page(s) to percpu partial
slab
object
slab
object
..
slab
object
Page
slab
object
slab
object
..
slab
object
Page
slab
object
slab
object
..
slab
object
Page
..
slab
object
slab
object
..
slab
object
Page
Fully-allocated objects of pages
free this object
slab
object
slab
object
..
slab
object
Page
page->next
put_cpu_partial(): Move this
page to percpu partial
1
2
__slab_free cmpxchg_double_slab
slow path: node cache
put_cpu_partial
add_partial
remove_partial
discard_slab __free_slab __free_pages
Return the slab object
to page->freelist
Slowpath#3: Slab objects in page slab
are all free: free to Buddy system
Slowpath#1: Move page(s) to percpu
partial and node partial (if possible)
Slowpath#2: Move pages to node partial
kmem_cache
__percpu *cpu_slab
*node[MAX_NUMNODES]
kmem_cache_cpu
(CPU #N)
struct page *page
void **freelist
struct page *partial
kmem_cache_node
(node 0)
struct list_head partial
kmem_cache_node
(node 1)
struct list_head partial
*node[MAX_NUMNODES]
slowpath #1:put_cpu_partial() – Case 2 (1/2)
slab
object
slab
object
..
slab
object
Page
slab
object
slab
object
..
slab
object
Page
slab
object
slab
object
..
slab
object
Page
page->next
free this object
if available objects > kmem_cache->cpu_partial && want_to_drain
call unfreeze_partials() to move percpu partial pages to node partial
2
slab
object
slab
object
..
slab
object
Page
slab
object
slab
object
..
slab
object
Page
slab
object
slab
object
..
slab
object
Page
..
slab
object
slab
object
..
slab
object
Page
1
Fully-allocated objects of pages
kmem_cache
__percpu *cpu_slab
*node[MAX_NUMNODES]
kmem_cache_cpu
(CPU #N)
struct page *page
void **freelist
struct page *partial
kmem_cache_node
(node 0)
struct list_head partial
kmem_cache_node
(node 1)
struct list_head partial
*node[MAX_NUMNODES]
slowpath #1:put_cpu_partial() – Case 2 (2/2)
slab
object
slab
object
..
slab
object
Page
slab
object
slab
object
..
slab
object
Page
slab
object
slab
object
..
slab
object
Page
free this object
slab
object
slab
object
..
slab
object
Page
page->next
put_cpu_partial(): Move this
page to percpu partial
2
3
if available objects > kmem_cache->cpu_partial && want_to_drain
call unfreeze_partials() to move percpu partial pages to node partial
slab
object
slab
object
..
slab
object
Page
slab
object
slab
object
..
slab
object
Page
slab
object
slab
object
..
slab
object
Page
..
slab
object
slab
object
..
slab
object
Page
1
Fully-allocated objects of pages
kmem_cache
__percpu *cpu_slab
*node[MAX_NUMNODES]
kmem_cache_cpu
(CPU #N)
struct page *page
void **freelist
struct page *partial
kmem_cache_node
(node 0)
struct list_head partial
kmem_cache_node
(node 1)
struct list_head partial
*node[MAX_NUMNODES]
slowpath #2: add_partial() – Move a page to node partial
free this object
slab
object
slab
object
..
slab
object
Page
 CONFIG_SLUB_DEBUG=y && CONFIG_SLUB_DEBUG_ON=y &&
!page->freelist (in percpu cache or page is full of objects)
o add_partial(): Move this page to node partial
2
slab
object
slab
object
..
slab
object
Page
slab
object
slab
object
..
slab
object
Page
slab
object
slab
object
..
slab
object
Page
..
slab
object
slab
object
..
slab
object
Page
1
Fully-allocated objects of pages
kmem_cache
__percpu *cpu_slab
*node[MAX_NUMNODES]
kmem_cache_cpu
(CPU #N)
struct page *page
void **freelist
struct page *partial
kmem_cache_node
(node 0)
struct list_head partial
kmem_cache_node
(node 1)
struct list_head partial
*node[MAX_NUMNODES]
slowpath #3: remove_partial() and discard_slab()
Fully-allocated objects of pages
free this object
 All objects in page(s) are available (free) &&
kmem_cache_node->nr_partial >= kmem_cache->min_partial
o remove_partial()
1
slab
object
slab
object
..
slab
object
Page
slab
object
slab
object
..
slab
object
Page
slab
object
slab
object
..
slab
object
Page
2
Buddy system
free page(s): discard_slab -> free_slab -> __free_slab -> __free_pages
3
slab
object
slab
object
..
slab
object
Page
slab
object
slab
object
..
slab
object
Page
slab
object
slab
object
..
slab
object
Page
..
slab
object
slab
object
..
slab
object
Page
kmalloc & slab
• Call path
• Compound page
kmalloc & slab
struct kmem_cache
*kmalloc_caches[NR_KMALLOC_TYPES][KMALLOC_SHIFT_HIGH + 1]
struct kmem_cache
*kmalloc_caches[KMALLOC_NORMAL][]
kmem_cache
__percpu *cpu_slab
*node[MAX_NUMNODES]
kmem_cache
__percpu *cpu_slab
*node[MAX_NUMNODES]
kmem_cache
__percpu *cpu_slab
*node[MAX_NUMNODES]
kmem_cache
__percpu *cpu_slab
*node[MAX_NUMNODES]
kmem_cache
__percpu *cpu_slab
*node[MAX_NUMNODES]
NULL
kmalloc-96
0
1
2
3
4
13
kmalloc-192
kmalloc-8
kmalloc-16
…
kmalloc-8192
struct kmem_cache
*kmalloc_caches[KMALLOC_RECLAIM][]
NULL
kmalloc-96
0
1
2
3
4
13
kmalloc-192
kmalloc-8
kmalloc-16
…
kmalloc-8192
__GFP_RECLAIMABLE
struct kmem_cache
*kmalloc_caches[KMALLOC_DMA][]
NULL
kmalloc-96
0
1
2
3
4
13
kmalloc-192
kmalloc-8
kmalloc-16
…
kmalloc-8192
__GFP_DMA
Check create_kmalloc_caches() &kmalloc_info
kmalloc() & kfree()
kmalloc
kmem_cache_alloc
kmalloc_large kmalloc_order_trace kmalloc_order
kmem_cache_alloc_trace
size > KMALLOC_MAX_CACHE_SIZE (8KB)
size <= KMALLOC_MAX_CACHE_SIZE (8KB)
kfree page = virt_to_head_page(x)
slab_free
compound_order
A slab page
(check PG_slab flag)
Not a slab page
alloc_pages
__GFP_COMP
(Compound page)
__free_pages
kmalloc_index
• Get index based on size
• if size < 8  return KMALLOC_SHIFT_LOW (3)
• if size == 0  zero allocation
Compound page
Page
(head page)
flags |= PG_head
Page
(Tail page)
compound_head
compound_dtor
compound_order
compound_mapcount
compound_nr
First
Tail
Page
only
Page
(Tail page)
_compound_pad_1
(compound_head)
hpage_pinned_refcount
deferred_list
2nd
Tail
Page
only
. . .
Compound Page
Page
(Tail page)
_compound_pad_1
(compound_head)
Page
(Tail page)
_compound_pad_1
(compound_head)
Compound page – Use Cases
• Mainly used in huge page
• kmalloc: allocation size > 8192 bytes
o Check kmalloc_order()
Compound page
Allocation size = 8195
order = 2  4 Pages
kmalloc
kmem_cache_alloc
kmalloc_large kmalloc_order_trace kmalloc_order
kmem_cache_alloc_trace
size > KMALLOC_MAX_CACHE_SIZE (8KB)
size <= KMALLOC_MAX_CACHE_SIZE (8KB)
alloc_pages
__GFP_COMP
(Compound page)
Compound page
Page
(Head page)
flags |= PG_head
Page
(Tail page)
compound_head
compound_dtor
compound_order = 2
compound_mapcount
compound_nr = 4
First
Tail
Page
only
Page
(Tail page)
_compound_pad_1
(compound_head)
hpage_pinned_refcount
deferred_list
2nd
Tail
Page
only
Compound Page
Page
(Tail page)
_compound_pad_1
(compound_head)
Head Page First Tail Page
Second Tail Page
Compound page
Page
(Head page)
flags |= PG_head
Page
(Tail page)
compound_head
compound_dtor
compound_order = 2
compound_mapcount
compound_nr = 4
First
Tail
Page
only
Page
(Tail page)
_compound_pad_1
(compound_head)
hpage_pinned_refcount
deferred_list
2nd
Tail
Page
only
Compound Page
Page
(Tail page)
_compound_pad_1
(compound_head)
First Tail Page
Second Tail Page bit 0: purpose
kmalloc
kmem_cache_alloc
kmalloc_large kmalloc_order_trace kmalloc_order
kmem_cache_alloc_trace
size > KMALLOC_MAX_CACHE_SIZE (8KB)
size <= KMALLOC_MAX_CACHE_SIZE (8KB)
alloc_pages
__GFP_COMP
(Compound page)
kmalloc_index
• Get index based on size
• if size < 8  return KMALLOC_SHIFT_LOW (3)
• if size == 0  zero allocation
Q1: [SLUB] kmalloc: allocation size = 0, 1, 2, or 4  workable?
struct kmem_cache
*kmalloc_caches[NR_KMALLOC_TYPES][KMALLOC_SHIFT_HIGH + 1]
kmem_cache
__percpu *cpu_slab
*node[MAX_NUMNODES]
kmem_cache
__percpu *cpu_slab
*node[MAX_NUMNODES]
struct kmem_cache
*kmalloc_caches[KMALLOC_NORMAL][]
NULL
kmalloc-96
0
1
2
3
4
13
kmalloc-192
kmalloc-8
kmalloc-16
…
kmalloc-8192
minimum kmalloc cache size = 8 bytes
• if size < 8  return KMALLOC_SHIFT_LOW (3)
• if size == 0  zero allocation
Reference
• https://www.cnblogs.com/LoyenWang/p/11922887.html
Backup
Slab Cache
Node-based cache: freelist concept is implemented in page struct (not shown)
slab
object
slab
object
..
slab
object
Page
.
.
kmem_cache_cpu
(CPU #0)
struct page *page
void **freelist
struct page *partial
kmem_cache_cpu
(CPU #N)
struct page *page
void **freelist
struct page *partial
slab
object
slab
object
..
slab
object
Page
slab
object
slab
object
..
slab
object
Page
kmem_cache_node
(Node #0)
struct list_head partial
kmem_cache_node
(Node #N)
struct list_head partial
.
.
slab
object
slab
object
..
slab
object
Page
slab
object
slab
object
..
slab
object
Page
kmem_cache
(Slab cache)
__percpu *cpu_slab
*node[MAX_NUMNODES]
percpu cache
Node-based cache
kmem_cache
global variable
kmem_cache_node
global variable
slab
object
slab
object
..
slab
object
Page
.
.
kmem_cache_cpu
(CPU #0)
struct page *page
void **freelist
struct page *partial
kmem_cache_cpu
(CPU #N)
struct page *page
void **freelist
struct page *partial
slab
object
slab
object
..
slab
object
Page
slab
object
slab
object
..
slab
object
Page
kmem_cache_node
(Node #0)
struct list_head partial
kmem_cache_node
(Node #N)
struct list_head partial
.
.
slab
object
slab
object
..
slab
object
Page
slab
object
slab
object
..
slab
object
Page
kmem_cache
(Slab cache)
__percpu *cpu_slab
*node[MAX_NUMNODES]
percpu cache
Node-based cache
kmem_cache
global variable
kmem_cache_node
global variable
[Slab allocator calls this automatically]
kmem_cache_alloc(kmem_cache, ..):
Allocate a ‘kmem_cache’ object from
global variable ‘kmem_cache’
[Slab allocator calls this automatically]
kmem_cache_alloc_node(kmem_cache_node, …):
Allocate ‘kmem_cache_node’ objects from global
variable ‘kmem_cache_node’
1
[User calls this function]
k_cache = kmem_cache_create():
create/get a new cache
2
3
Who initializes global variables ‘kmem_cache’ and ‘kmem_cache_node’?
Slab Allocator Initialization: “Chicken or the egg” problem
Slab Allocator Initialization: “Chicken or the egg” problem
slab
object
slab
object
..
slab
object
Page
.
.
kmem_cache_cpu
(CPU #0)
struct page *page
void **freelist
struct page *partial
kmem_cache_cpu
(CPU #N)
struct page *page
void **freelist
struct page *partial
slab
object
slab
object
..
slab
object
Page
slab
object
slab
object
..
slab
object
Page
kmem_cache_node
(Node #0)
struct list_head partial
kmem_cache_node
(Node #N)
struct list_head partial
.
.
slab
object
slab
object
..
slab
object
Page
slab
object
slab
object
..
slab
object
Page
kmem_cache
(Slab cache)
__percpu *cpu_slab
*node[MAX_NUMNODES]
percpu cache
Node-based cache
kmem_cache
global variable
kmem_cache_node
global variable
[Slab allocator calls this automatically]
kmem_cache_alloc(kmem_cache, ..):
Allocate a ‘kmem_cache’ object from
global variable ‘kmem_cache’
[Slab allocator calls this automatically]
kmem_cache_alloc_node(kmem_cache_node, …):
Allocate ‘kmem_cache_node’ objects from global
variable ‘kmem_cache_node’
1
[User calls this function]
k_cache = kmem_cache_create():
create/get a new cache
2
3
Slab allocator: statically initialized ‘kmem_cache_boot’
Slob allocator: statically initialized ‘kmem_cache_boot’
Slub allocator: static local variables ‘boot_kmem_cache’ and ‘boot_kmem_cache_node’
Slab Allocator Initialization:
“Chicken or the egg” problem
.
.
kmem_cache_cpu (CPU #0)
struct page *page
void **freelist
struct page *partial
kmem_cache_node (Node #0)
struct list_head partial
.
.
kmem_cache = NULL
(Slab cache)
__percpu *cpu_slab
*node[MAX_NUMNODES]
percpu cache
Node-based cache
[Slab Cache] struct kmem_cache *kmem_cache = NULL
.
.
kmem_cache_cpu (CPU #0)
struct page *page
void **freelist
struct page *partial
kmem_cache_node (Node #0)
struct list_head partial
.
.
kmem_cache = NULL
(Slab cache)
__percpu *cpu_slab
*node[MAX_NUMNODES]
percpu cache
Node-based cache
[Slab Cache] struct kmem_cache *kmem_cache_node = NULL
kmem_cache_cpu (CPU #0)
struct page *page
void **freelist
struct page *partial
kmem_cache_node (Node #0)
struct list_head partial
kmem_cache
(Slab cache)
__percpu *cpu_slab
*node[MAX_NUMNODES]
percpu cache
Node-based cache
[Slab Cache] struct kmem_cache *sighand_cachep
.
.
.
.
1
2
3
[Cannot work] Allocate from
‘kmem_cache_node’ slab cache

More Related Content

What's hot

Process Address Space: The way to create virtual address (page table) of user...
Process Address Space: The way to create virtual address (page table) of user...Process Address Space: The way to create virtual address (page table) of user...
Process Address Space: The way to create virtual address (page table) of user...Adrian Huang
 
Memory Compaction in Linux Kernel.pdf
Memory Compaction in Linux Kernel.pdfMemory Compaction in Linux Kernel.pdf
Memory Compaction in Linux Kernel.pdfAdrian Huang
 
Decompressed vmlinux: linux kernel initialization from page table configurati...
Decompressed vmlinux: linux kernel initialization from page table configurati...Decompressed vmlinux: linux kernel initialization from page table configurati...
Decompressed vmlinux: linux kernel initialization from page table configurati...Adrian Huang
 
semaphore & mutex.pdf
semaphore & mutex.pdfsemaphore & mutex.pdf
semaphore & mutex.pdfAdrian Huang
 
Vmlinux: anatomy of bzimage and how x86 64 processor is booted
Vmlinux: anatomy of bzimage and how x86 64 processor is bootedVmlinux: anatomy of bzimage and how x86 64 processor is booted
Vmlinux: anatomy of bzimage and how x86 64 processor is bootedAdrian Huang
 
Linux kernel debugging
Linux kernel debuggingLinux kernel debugging
Linux kernel debugginglibfetion
 
Linux Kernel - Virtual File System
Linux Kernel - Virtual File SystemLinux Kernel - Virtual File System
Linux Kernel - Virtual File SystemAdrian Huang
 
Physical Memory Management.pdf
Physical Memory Management.pdfPhysical Memory Management.pdf
Physical Memory Management.pdfAdrian Huang
 
Memory Management with Page Folios
Memory Management with Page FoliosMemory Management with Page Folios
Memory Management with Page FoliosAdrian Huang
 
Linux Kernel Booting Process (2) - For NLKB
Linux Kernel Booting Process (2) - For NLKBLinux Kernel Booting Process (2) - For NLKB
Linux Kernel Booting Process (2) - For NLKBshimosawa
 
Linux Initialization Process (2)
Linux Initialization Process (2)Linux Initialization Process (2)
Linux Initialization Process (2)shimosawa
 
Kernel Recipes 2015: Linux Kernel IO subsystem - How it works and how can I s...
Kernel Recipes 2015: Linux Kernel IO subsystem - How it works and how can I s...Kernel Recipes 2015: Linux Kernel IO subsystem - How it works and how can I s...
Kernel Recipes 2015: Linux Kernel IO subsystem - How it works and how can I s...Anne Nicolas
 
Linux kernel debugging
Linux kernel debuggingLinux kernel debugging
Linux kernel debuggingHao-Ran Liu
 
Linux Memory Management
Linux Memory ManagementLinux Memory Management
Linux Memory ManagementNi Zo-Ma
 
LISA2019 Linux Systems Performance
LISA2019 Linux Systems PerformanceLISA2019 Linux Systems Performance
LISA2019 Linux Systems PerformanceBrendan Gregg
 
Linux kernel tracing
Linux kernel tracingLinux kernel tracing
Linux kernel tracingViller Hsiao
 
The Linux Block Layer - Built for Fast Storage
The Linux Block Layer - Built for Fast StorageThe Linux Block Layer - Built for Fast Storage
The Linux Block Layer - Built for Fast StorageKernel TLV
 
Meet cute-between-ebpf-and-tracing
Meet cute-between-ebpf-and-tracingMeet cute-between-ebpf-and-tracing
Meet cute-between-ebpf-and-tracingViller Hsiao
 
Linux memory-management-kamal
Linux memory-management-kamalLinux memory-management-kamal
Linux memory-management-kamalKamal Maiti
 
Memory management in Linux kernel
Memory management in Linux kernelMemory management in Linux kernel
Memory management in Linux kernelVadim Nikitin
 

What's hot (20)

Process Address Space: The way to create virtual address (page table) of user...
Process Address Space: The way to create virtual address (page table) of user...Process Address Space: The way to create virtual address (page table) of user...
Process Address Space: The way to create virtual address (page table) of user...
 
Memory Compaction in Linux Kernel.pdf
Memory Compaction in Linux Kernel.pdfMemory Compaction in Linux Kernel.pdf
Memory Compaction in Linux Kernel.pdf
 
Decompressed vmlinux: linux kernel initialization from page table configurati...
Decompressed vmlinux: linux kernel initialization from page table configurati...Decompressed vmlinux: linux kernel initialization from page table configurati...
Decompressed vmlinux: linux kernel initialization from page table configurati...
 
semaphore & mutex.pdf
semaphore & mutex.pdfsemaphore & mutex.pdf
semaphore & mutex.pdf
 
Vmlinux: anatomy of bzimage and how x86 64 processor is booted
Vmlinux: anatomy of bzimage and how x86 64 processor is bootedVmlinux: anatomy of bzimage and how x86 64 processor is booted
Vmlinux: anatomy of bzimage and how x86 64 processor is booted
 
Linux kernel debugging
Linux kernel debuggingLinux kernel debugging
Linux kernel debugging
 
Linux Kernel - Virtual File System
Linux Kernel - Virtual File SystemLinux Kernel - Virtual File System
Linux Kernel - Virtual File System
 
Physical Memory Management.pdf
Physical Memory Management.pdfPhysical Memory Management.pdf
Physical Memory Management.pdf
 
Memory Management with Page Folios
Memory Management with Page FoliosMemory Management with Page Folios
Memory Management with Page Folios
 
Linux Kernel Booting Process (2) - For NLKB
Linux Kernel Booting Process (2) - For NLKBLinux Kernel Booting Process (2) - For NLKB
Linux Kernel Booting Process (2) - For NLKB
 
Linux Initialization Process (2)
Linux Initialization Process (2)Linux Initialization Process (2)
Linux Initialization Process (2)
 
Kernel Recipes 2015: Linux Kernel IO subsystem - How it works and how can I s...
Kernel Recipes 2015: Linux Kernel IO subsystem - How it works and how can I s...Kernel Recipes 2015: Linux Kernel IO subsystem - How it works and how can I s...
Kernel Recipes 2015: Linux Kernel IO subsystem - How it works and how can I s...
 
Linux kernel debugging
Linux kernel debuggingLinux kernel debugging
Linux kernel debugging
 
Linux Memory Management
Linux Memory ManagementLinux Memory Management
Linux Memory Management
 
LISA2019 Linux Systems Performance
LISA2019 Linux Systems PerformanceLISA2019 Linux Systems Performance
LISA2019 Linux Systems Performance
 
Linux kernel tracing
Linux kernel tracingLinux kernel tracing
Linux kernel tracing
 
The Linux Block Layer - Built for Fast Storage
The Linux Block Layer - Built for Fast StorageThe Linux Block Layer - Built for Fast Storage
The Linux Block Layer - Built for Fast Storage
 
Meet cute-between-ebpf-and-tracing
Meet cute-between-ebpf-and-tracingMeet cute-between-ebpf-and-tracing
Meet cute-between-ebpf-and-tracing
 
Linux memory-management-kamal
Linux memory-management-kamalLinux memory-management-kamal
Linux memory-management-kamal
 
Memory management in Linux kernel
Memory management in Linux kernelMemory management in Linux kernel
Memory management in Linux kernel
 

Similar to Slab Allocator in Linux Kernel

Linux kernel memory allocators
Linux kernel memory allocatorsLinux kernel memory allocators
Linux kernel memory allocatorsHao-Ran Liu
 
Linux: the first second
Linux: the first secondLinux: the first second
Linux: the first secondAlison Chaiken
 
Bdc from bare metal to k8s
Bdc   from bare metal to k8sBdc   from bare metal to k8s
Bdc from bare metal to k8sChris Adkin
 
CHI-YAPC-2009
CHI-YAPC-2009CHI-YAPC-2009
CHI-YAPC-2009jonswar
 
Blocks & GCD
Blocks & GCDBlocks & GCD
Blocks & GCDrsebbe
 
PV-Drivers for SeaBIOS using Upstream Qemu
PV-Drivers for SeaBIOS using Upstream QemuPV-Drivers for SeaBIOS using Upstream Qemu
PV-Drivers for SeaBIOS using Upstream QemuThe Linux Foundation
 
Artem Zhurbila - docker clusters (solit 2015)
Artem Zhurbila - docker clusters (solit 2015)Artem Zhurbila - docker clusters (solit 2015)
Artem Zhurbila - docker clusters (solit 2015)Artem Zhurbila
 
10 things i wish i'd known before using spark in production
10 things i wish i'd known before using spark in production10 things i wish i'd known before using spark in production
10 things i wish i'd known before using spark in productionParis Data Engineers !
 
Basics_of_Kernel_Panic_Hang_and_ Kdump.pdf
Basics_of_Kernel_Panic_Hang_and_ Kdump.pdfBasics_of_Kernel_Panic_Hang_and_ Kdump.pdf
Basics_of_Kernel_Panic_Hang_and_ Kdump.pdfstroganovboris
 
PGConf.ASIA 2019 Bali - Building PostgreSQL as a Service with Kubernetes - Ta...
PGConf.ASIA 2019 Bali - Building PostgreSQL as a Service with Kubernetes - Ta...PGConf.ASIA 2019 Bali - Building PostgreSQL as a Service with Kubernetes - Ta...
PGConf.ASIA 2019 Bali - Building PostgreSQL as a Service with Kubernetes - Ta...Equnix Business Solutions
 
0628阙宏宇
0628阙宏宇0628阙宏宇
0628阙宏宇zhu02
 
Spark 2.x Troubleshooting Guide
Spark 2.x Troubleshooting GuideSpark 2.x Troubleshooting Guide
Spark 2.x Troubleshooting GuideIBM
 
Devoxx Fr 2022 - Remèdes aux oomkill, warm-ups, et lenteurs pour des conteneu...
Devoxx Fr 2022 - Remèdes aux oomkill, warm-ups, et lenteurs pour des conteneu...Devoxx Fr 2022 - Remèdes aux oomkill, warm-ups, et lenteurs pour des conteneu...
Devoxx Fr 2022 - Remèdes aux oomkill, warm-ups, et lenteurs pour des conteneu...Jean-Philippe BEMPEL
 
Docker Security in Production Overview
Docker Security in Production OverviewDocker Security in Production Overview
Docker Security in Production OverviewDelve Labs
 
Perl Memory Use 201207 (OUTDATED, see 201209 )
Perl Memory Use 201207 (OUTDATED, see 201209 )Perl Memory Use 201207 (OUTDATED, see 201209 )
Perl Memory Use 201207 (OUTDATED, see 201209 )Tim Bunce
 

Similar to Slab Allocator in Linux Kernel (20)

memory.ppt
memory.pptmemory.ppt
memory.ppt
 
Linux kernel memory allocators
Linux kernel memory allocatorsLinux kernel memory allocators
Linux kernel memory allocators
 
Linux: the first second
Linux: the first secondLinux: the first second
Linux: the first second
 
Bdc from bare metal to k8s
Bdc   from bare metal to k8sBdc   from bare metal to k8s
Bdc from bare metal to k8s
 
CHI-YAPC-2009
CHI-YAPC-2009CHI-YAPC-2009
CHI-YAPC-2009
 
kubernetes practice
kubernetes practicekubernetes practice
kubernetes practice
 
Blocks & GCD
Blocks & GCDBlocks & GCD
Blocks & GCD
 
PV-Drivers for SeaBIOS using Upstream Qemu
PV-Drivers for SeaBIOS using Upstream QemuPV-Drivers for SeaBIOS using Upstream Qemu
PV-Drivers for SeaBIOS using Upstream Qemu
 
Artem Zhurbila - docker clusters (solit 2015)
Artem Zhurbila - docker clusters (solit 2015)Artem Zhurbila - docker clusters (solit 2015)
Artem Zhurbila - docker clusters (solit 2015)
 
10 things i wish i'd known before using spark in production
10 things i wish i'd known before using spark in production10 things i wish i'd known before using spark in production
10 things i wish i'd known before using spark in production
 
Updates
UpdatesUpdates
Updates
 
Updates
UpdatesUpdates
Updates
 
Basics_of_Kernel_Panic_Hang_and_ Kdump.pdf
Basics_of_Kernel_Panic_Hang_and_ Kdump.pdfBasics_of_Kernel_Panic_Hang_and_ Kdump.pdf
Basics_of_Kernel_Panic_Hang_and_ Kdump.pdf
 
PGConf.ASIA 2019 Bali - Building PostgreSQL as a Service with Kubernetes - Ta...
PGConf.ASIA 2019 Bali - Building PostgreSQL as a Service with Kubernetes - Ta...PGConf.ASIA 2019 Bali - Building PostgreSQL as a Service with Kubernetes - Ta...
PGConf.ASIA 2019 Bali - Building PostgreSQL as a Service with Kubernetes - Ta...
 
0628阙宏宇
0628阙宏宇0628阙宏宇
0628阙宏宇
 
Memcached Study
Memcached StudyMemcached Study
Memcached Study
 
Spark 2.x Troubleshooting Guide
Spark 2.x Troubleshooting GuideSpark 2.x Troubleshooting Guide
Spark 2.x Troubleshooting Guide
 
Devoxx Fr 2022 - Remèdes aux oomkill, warm-ups, et lenteurs pour des conteneu...
Devoxx Fr 2022 - Remèdes aux oomkill, warm-ups, et lenteurs pour des conteneu...Devoxx Fr 2022 - Remèdes aux oomkill, warm-ups, et lenteurs pour des conteneu...
Devoxx Fr 2022 - Remèdes aux oomkill, warm-ups, et lenteurs pour des conteneu...
 
Docker Security in Production Overview
Docker Security in Production OverviewDocker Security in Production Overview
Docker Security in Production Overview
 
Perl Memory Use 201207 (OUTDATED, see 201209 )
Perl Memory Use 201207 (OUTDATED, see 201209 )Perl Memory Use 201207 (OUTDATED, see 201209 )
Perl Memory Use 201207 (OUTDATED, see 201209 )
 

Recently uploaded

WSO2CON 2024 - How to Run a Security Program
WSO2CON 2024 - How to Run a Security ProgramWSO2CON 2024 - How to Run a Security Program
WSO2CON 2024 - How to Run a Security ProgramWSO2
 
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...Bert Jan Schrijver
 
WSO2CON 2024 - How CSI Piemonte Is Apifying the Public Administration
WSO2CON 2024 - How CSI Piemonte Is Apifying the Public AdministrationWSO2CON 2024 - How CSI Piemonte Is Apifying the Public Administration
WSO2CON 2024 - How CSI Piemonte Is Apifying the Public AdministrationWSO2
 
WSO2Con2024 - Facilitating Broadband Switching Services for UK Telecoms Provi...
WSO2Con2024 - Facilitating Broadband Switching Services for UK Telecoms Provi...WSO2Con2024 - Facilitating Broadband Switching Services for UK Telecoms Provi...
WSO2Con2024 - Facilitating Broadband Switching Services for UK Telecoms Provi...WSO2
 
WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2
 
WSO2CON 2024 - Designing Event-Driven Enterprises: Stories of Transformation
WSO2CON 2024 - Designing Event-Driven Enterprises: Stories of TransformationWSO2CON 2024 - Designing Event-Driven Enterprises: Stories of Transformation
WSO2CON 2024 - Designing Event-Driven Enterprises: Stories of TransformationWSO2
 
WSO2Con2024 - GitOps in Action: Navigating Application Deployment in the Plat...
WSO2Con2024 - GitOps in Action: Navigating Application Deployment in the Plat...WSO2Con2024 - GitOps in Action: Navigating Application Deployment in the Plat...
WSO2Con2024 - GitOps in Action: Navigating Application Deployment in the Plat...WSO2
 
WSO2Con2024 - Organization Management: The Revolution in B2B CIAM
WSO2Con2024 - Organization Management: The Revolution in B2B CIAMWSO2Con2024 - Organization Management: The Revolution in B2B CIAM
WSO2Con2024 - Organization Management: The Revolution in B2B CIAMWSO2
 
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...WSO2
 
WSO2Con2024 - From Blueprint to Brilliance: WSO2's Guide to API-First Enginee...
WSO2Con2024 - From Blueprint to Brilliance: WSO2's Guide to API-First Enginee...WSO2Con2024 - From Blueprint to Brilliance: WSO2's Guide to API-First Enginee...
WSO2Con2024 - From Blueprint to Brilliance: WSO2's Guide to API-First Enginee...WSO2
 
WSO2CON 2024 - Architecting AI in the Enterprise: APIs and Applications
WSO2CON 2024 - Architecting AI in the Enterprise: APIs and ApplicationsWSO2CON 2024 - Architecting AI in the Enterprise: APIs and Applications
WSO2CON 2024 - Architecting AI in the Enterprise: APIs and ApplicationsWSO2
 
Architecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastArchitecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastPapp Krisztián
 
Announcing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareAnnouncing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareJim McKeeth
 
WSO2CON 2024 - IoT Needs CIAM: The Importance of Centralized IAM in a Growing...
WSO2CON 2024 - IoT Needs CIAM: The Importance of Centralized IAM in a Growing...WSO2CON 2024 - IoT Needs CIAM: The Importance of Centralized IAM in a Growing...
WSO2CON 2024 - IoT Needs CIAM: The Importance of Centralized IAM in a Growing...WSO2
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisamasabamasaba
 
WSO2Con2024 - Software Delivery in Hybrid Environments
WSO2Con2024 - Software Delivery in Hybrid EnvironmentsWSO2Con2024 - Software Delivery in Hybrid Environments
WSO2Con2024 - Software Delivery in Hybrid EnvironmentsWSO2
 
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...Shane Coughlan
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplatePresentation.STUDIO
 
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...WSO2
 
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...WSO2
 

Recently uploaded (20)

WSO2CON 2024 - How to Run a Security Program
WSO2CON 2024 - How to Run a Security ProgramWSO2CON 2024 - How to Run a Security Program
WSO2CON 2024 - How to Run a Security Program
 
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
 
WSO2CON 2024 - How CSI Piemonte Is Apifying the Public Administration
WSO2CON 2024 - How CSI Piemonte Is Apifying the Public AdministrationWSO2CON 2024 - How CSI Piemonte Is Apifying the Public Administration
WSO2CON 2024 - How CSI Piemonte Is Apifying the Public Administration
 
WSO2Con2024 - Facilitating Broadband Switching Services for UK Telecoms Provi...
WSO2Con2024 - Facilitating Broadband Switching Services for UK Telecoms Provi...WSO2Con2024 - Facilitating Broadband Switching Services for UK Telecoms Provi...
WSO2Con2024 - Facilitating Broadband Switching Services for UK Telecoms Provi...
 
WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?
 
WSO2CON 2024 - Designing Event-Driven Enterprises: Stories of Transformation
WSO2CON 2024 - Designing Event-Driven Enterprises: Stories of TransformationWSO2CON 2024 - Designing Event-Driven Enterprises: Stories of Transformation
WSO2CON 2024 - Designing Event-Driven Enterprises: Stories of Transformation
 
WSO2Con2024 - GitOps in Action: Navigating Application Deployment in the Plat...
WSO2Con2024 - GitOps in Action: Navigating Application Deployment in the Plat...WSO2Con2024 - GitOps in Action: Navigating Application Deployment in the Plat...
WSO2Con2024 - GitOps in Action: Navigating Application Deployment in the Plat...
 
WSO2Con2024 - Organization Management: The Revolution in B2B CIAM
WSO2Con2024 - Organization Management: The Revolution in B2B CIAMWSO2Con2024 - Organization Management: The Revolution in B2B CIAM
WSO2Con2024 - Organization Management: The Revolution in B2B CIAM
 
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
 
WSO2Con2024 - From Blueprint to Brilliance: WSO2's Guide to API-First Enginee...
WSO2Con2024 - From Blueprint to Brilliance: WSO2's Guide to API-First Enginee...WSO2Con2024 - From Blueprint to Brilliance: WSO2's Guide to API-First Enginee...
WSO2Con2024 - From Blueprint to Brilliance: WSO2's Guide to API-First Enginee...
 
WSO2CON 2024 - Architecting AI in the Enterprise: APIs and Applications
WSO2CON 2024 - Architecting AI in the Enterprise: APIs and ApplicationsWSO2CON 2024 - Architecting AI in the Enterprise: APIs and Applications
WSO2CON 2024 - Architecting AI in the Enterprise: APIs and Applications
 
Architecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastArchitecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the past
 
Announcing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareAnnouncing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK Software
 
WSO2CON 2024 - IoT Needs CIAM: The Importance of Centralized IAM in a Growing...
WSO2CON 2024 - IoT Needs CIAM: The Importance of Centralized IAM in a Growing...WSO2CON 2024 - IoT Needs CIAM: The Importance of Centralized IAM in a Growing...
WSO2CON 2024 - IoT Needs CIAM: The Importance of Centralized IAM in a Growing...
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
WSO2Con2024 - Software Delivery in Hybrid Environments
WSO2Con2024 - Software Delivery in Hybrid EnvironmentsWSO2Con2024 - Software Delivery in Hybrid Environments
WSO2Con2024 - Software Delivery in Hybrid Environments
 
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation Template
 
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
 
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
 

Slab Allocator in Linux Kernel

  • 1. * Based on kernel 5.11 (x86_64) – QEMU * 2-socket CPUs (4 cores/socket) * 16GB memory * Kernel parameter: nokaslr norandmaps * KASAN: disabled * Userspace: ASLR is disabled * Legacy BIOS Slab Allocator in Linux Kernel Adrian Huang | Aug, 2022
  • 2. Agenda • Slab & Buddy System • Slab Concept o “Chicken or the egg” problem o Will not focus the differences about slab/slub/slob: Lots of info about them in Internet. o Note: SLUB is the default allocator since 2.6.23. • [Init & Creation] kmem_cache_init(): Let’s check the initialization detail o How to initialize ‘boot_kmem_cache’ and ‘boot_kmem_cache_node’? o What are freepointers? o Data structures: kmem_cache & kmem_cache_node o [Comparison] Generic kmem_cache_create() & low-level implementation __kmem_cache_create() • [Cache allocation] slab_alloc_node – Allocate a slab cache with specific node id o Fast path & slow path  Will follow kmem_cache_init() flow • [Cache release/free] kmem_cache_free() o Fast path & slow path • kmalloc & slab
  • 3. Slab & Buddy System Buddy System alloc_page(s), __get_free_page(s) Slab Allocator kmem_cache_alloc/kmem_cache_free kmalloc/kfree glibc: malloc/free brk/mmap . . . vmalloc User Space Kernel Space Hardware • Balance between brk() and mmap() • Use brk() if request size < DEFAULT_MMAP_THRESHOLD_MIN (128 KB) o The heap can be trimmed only if memory is freed at the top end. o sbrk() is implemented as a library function that uses the brk() system call. • Use mmap() if request size >= DEFAULT_MMAP_THRESHOLD_MIN (128 KB) o The allocated memory blocks can be independently released back to the system. o Deallocated space is not placed on the free list for reuse by later allocations. o Memory may be wasted because mmap allocations must be page-aligned; and the kernel must perform the expensive task of zeroing out memory allocated. o Note: glibc uses the dynamic mmap threshold o Detail: `man mallopt` [glibc] malloc
  • 4. Slab Concept Slab Allocator Slab Cache (page order = 0) Slab Cache (page order = 1) Page #0 slab object slab object slab object slab object … Page #N slab object slab object slab object slab object slab object slab object slab object slab object 4 objects per page Page #1 slab object … Page #N 2 objects per page slab object slab object slab object slab object slab object . . . Memory (object) allocation * object = data structure slab_caches global list_head slab … slab slab … Page #0 slab object slab object 1. Allocating/freeing data structures is the common operations in Linux kernel 2. Slab is a generic data structure-caching layer
  • 5. Slab Cache Node-based cache: freelist concept is implemented in page struct (not shown) slab object slab object .. slab object Page . . kmem_cache_cpu (CPU #0) struct page *page void **freelist struct page *partial kmem_cache_cpu (CPU #N) struct page *page void **freelist struct page *partial slab object slab object .. slab object Page slab object slab object .. slab object Page kmem_cache_node (Node #0) struct list_head partial kmem_cache_node (Node #N) struct list_head partial . . slab object slab object .. slab object Page slab object slab object .. slab object Page kmem_cache (Slab cache) __percpu *cpu_slab *node[MAX_NUMNODES] percpu cache Node-based cache
  • 6. Slab Cache slab object slab object .. slab object Page . . kmem_cache_cpu (CPU #0) struct page *page void **freelist struct page *partial kmem_cache_node (Node #0) struct list_head partial . . slab object slab object .. slab object Page kmem_cache (Slab cache) __percpu *cpu_slab *node[MAX_NUMNODES] percpu cache Node-based cache [Slab Cache] struct kmem_cache *kmem_cache slab object slab object .. slab object Page . . kmem_cache_cpu (CPU #0) struct page *page void **freelist struct page *partial kmem_cache_node (Node #0) struct list_head partial . . slab object slab object .. slab object Page kmem_cache (Slab cache) __percpu *cpu_slab *node[MAX_NUMNODES] percpu cache Node-based cache [Slab Cache] struct kmem_cache *kmem_cache_node kmem_cache_cpu (CPU #0) struct page *page void **freelist struct page *partial kmem_cache_node (Node #0) struct list_head partial kmem_cache (Slab cache) __percpu *cpu_slab *node[MAX_NUMNODES] percpu cache Node-based cache [Slab Cache] struct kmem_cache *sighand_cachep . . . . 1 2 3 Allocate from ‘kmem_cache_node’ slab cache
  • 7. Slab Allocator Initialization: “Chicken or the egg” problem . . kmem_cache_cpu (CPU #0) struct page *page void **freelist struct page *partial kmem_cache_node (Node #0) struct list_head partial . . kmem_cache = NULL (Slab cache) __percpu *cpu_slab *node[MAX_NUMNODES] percpu cache Node-based cache [Slab Cache] struct kmem_cache *kmem_cache = NULL . . kmem_cache_cpu (CPU #0) struct page *page void **freelist struct page *partial kmem_cache_node (Node #0) struct list_head partial . . kmem_cache = NULL (Slab cache) __percpu *cpu_slab *node[MAX_NUMNODES] percpu cache Node-based cache [Slab Cache] struct kmem_cache *kmem_cache_node = NULL kmem_cache_cpu (CPU #0) struct page *page void **freelist struct page *partial kmem_cache_node (Node #0) struct list_head partial kmem_cache (Slab cache) __percpu *cpu_slab *node[MAX_NUMNODES] percpu cache Node-based cache [Slab Cache] struct kmem_cache *kmem_cache . . . . 1 2 3 [Cannot work] Allocate from ‘kmem_cache_node’ slab cache [Slab Allocator Init] Just an example: Not in Linux kernel code
  • 8. Slab Allocator Initialization: “Chicken or the egg” problem . . kmem_cache_cpu (CPU #0) struct page *page void **freelist struct page *partial kmem_cache_node (Node #0) struct list_head partial . . kmem_cache = NULL (Slab cache) __percpu *cpu_slab *node[MAX_NUMNODES] percpu cache Node-based cache [Slab Cache] struct kmem_cache *kmem_cache = NULL . . kmem_cache_cpu (CPU #0) struct page *page void **freelist struct page *partial kmem_cache_node (Node #0) struct list_head partial . . kmem_cache = NULL (Slab cache) __percpu *cpu_slab *node[MAX_NUMNODES] percpu cache Node-based cache [Slab Cache] struct kmem_cache *kmem_cache_node = NULL kmem_cache_cpu (CPU #0) struct page *page void **freelist struct page *partial kmem_cache_node (Node #0) struct list_head partial kmem_cache (Slab cache) __percpu *cpu_slab *node[MAX_NUMNODES] percpu cache Node-based cache [Slab Cache] struct kmem_cache *kmem_cache . . . . 1 2 3 [Cannot work] Allocate from ‘kmem_cache_node’ slab cache [Example] Slab allocator init for kmem_cache Who initializes global variables ‘kmem_cache’ and ‘kmem_cache_node’?
  • 9. Slab Allocator Initialization: “Chicken or the egg” problem . . kmem_cache_cpu (CPU #0) struct page *page void **freelist struct page *partial kmem_cache_node (Node #0) struct list_head partial . . kmem_cache = NULL (Slab cache) __percpu *cpu_slab *node[MAX_NUMNODES] percpu cache Node-based cache [Slab Cache] struct kmem_cache *kmem_cache = NULL . . kmem_cache_cpu (CPU #0) struct page *page void **freelist struct page *partial kmem_cache_node (Node #0) struct list_head partial . . kmem_cache = NULL (Slab cache) __percpu *cpu_slab *node[MAX_NUMNODES] percpu cache Node-based cache [Slab Cache] struct kmem_cache *kmem_cache_node = NULL kmem_cache_cpu (CPU #0) struct page *page void **freelist struct page *partial kmem_cache_node (Node #0) struct list_head partial kmem_cache (Slab cache) __percpu *cpu_slab *node[MAX_NUMNODES] percpu cache Node-based cache [Slab Cache] struct kmem_cache *sighand_cachep . . . . 1 2 3 [Cannot work] Allocate from ‘kmem_cache_node’ slab cache Slab allocator: statically initialized ‘kmem_cache_boot’ Slob allocator: statically initialized ‘kmem_cache_boot’ Slub allocator: static local variables ‘boot_kmem_cache’ and ‘boot_kmem_cache_node’
  • 10. [Init & Creation] kmem_cache_init(): Let’s check the initialization detail • How to initialize ‘boot_kmem_cache’ and ‘boot_kmem_cache_node’? • What are freepointers? • Data structures: kmem_cache & kmem_cache_node • [Comparison] Generic kmem_cache_create() & low-level implementation __kmem_cache_create() * Call paths are based on SLUB (the unqueued slab allocator)
  • 11. kmem_cache_init() start_kernel 1. Declare two static variables ‘boot_kmem_cache’ and ‘boot_kmem_cache_node’ 2. Assign the addresses of two static variables to generic/global variables kmem_cache_node = &boot_kmem_cache_node kmem_cache = &boot_kmem_cache __kmem_cache_create init_kmem_cache_nodes calculate_sizes kmem_cache_open alloc_kmem_cache_cpus mm_init kmem_cache_init create_boot_cache(kmem_cache_node, …) create_boot_cache(kmem_cache, …) kmem_cache = bootstrap(&boot_kmem_cache) kmem_cache_node = bootstrap(&boot_kmem_cache_node) setup_kmalloc_cache_index_table create_kmalloc_caches __alloc_percpu init_kmem_cache_cpus Assign cpu id to kmem_cache_cpu->tid Allocate/init ‘kmem_cache_node’ structs for all CPU nodes (sockets) Allocate/init a ‘kmem_cache_cpu’ struct Calculate page order and objects per slab
  • 12. kmem_cache_node/boot_kmem_cache_node init kmem_cache slab Page #0 slab object #0 … … … … … … slab object #63 64 objects per page Slub allocator: A generic data structure-cache layer * object = data structure object size = sizeof(struct kmem_cache_node) = 64 object size with alignment (SLAB_HWCACHE_ALIGN: 64) = 64 calculate_sizes (): order = 0
  • 13. kmem_cache_node/boot_kmem_cache_node init object size = sizeof(struct kmem_cache_node) = 64 object size with alignment (SLAB_HWCACHE_ALIGN: 64) = 64
  • 14. kmem_cache slab Page #0 slab object slab object slab object slab object Page #1 slab object slab object slab object slab object 16 objects per page Slab allocator: A generic data structure-cache layer * object = data structure object size = offsetof(struct kmem_cache, node) + nr_node_ids * sizeof(struct kmem_cache_node *) nr_node_ids = 2 (2-socket guest OS) object size = 200 + 2 * 8 = 216 object size with alignment (SLAB_HWCACHE_ALIGN: 64) = 256 calculate_sizes (): order = 1 kmem_cache/boot_kmem_cache init
  • 15. object size = 200 + 2 * 8 = 216 kmem_cache/boot_kmem_cache init
  • 16. init_kmem_cache_nodes() n = kmem_cache_alloc_node() early_kmem_cache_node_alloc init_kmem_cache_nodes for_each_node_state() slab_state == DOWN init_kmem_cache_node kmem_cache->node[node] = n Y N __kmem_cache_create init_kmem_cache_nodes calculate_sizes kmem_cache_open alloc_kmem_cache_cpus __alloc_percpu init_kmem_cache_cpus Assign cpu id to kmem_cache_cpu->tid Allocate/init ‘kmem_cache_node’ structs for all CPU nodes (sockets) Allocate/init a ‘kmem_cache_cpu’ struct Calculate page order and objects per slab create_boot_cache(kmem_cache_node, …) Special (one-shot) path: Fix ‘chicken or the egg’ problem Generic path
  • 17. early_kmem_cache_node_alloc for ‘kmem_cache_node’ allocate_slab page = new_slab(kmem_cache_node, …) early_kmem_cache_node_alloc page = alloc_slab_page() page->slab_cache = s Update freepointers for the allocated pages n = page->freelist kmem_cache_node->node[node] = n page->freelist = get_freepointer(kmem_cache_node, n) init_kmem_cache_node(n) __add_partial(n, page, DEACTIVATE_TO_HEAD) inc_slabs_node
  • 18. page struct (slab, slob or slub) struct list_head slab_list struct page *next struct kmem_cache *slab_cache void *freelist (first free object) s_mem: first object - SLAB int pages int pobjects counters - SLUB inuse: 16 = 64 objects:15 = 64 frozen: 1 = 1 union struct (SLUB) union struct (Partial Page) kmem_cache (kmem_cache_node) __percpu *cpu_slab min_partial = 5 size = 64 object_size = 64 cpu_partial = 30 min = 64 inuse = 64 align = 64 name = “kmem_cache_node” list *node[MAX_NUMNODES] max = 64 oo (order objects) = 64 allocate_slab() – freelist/freepointer configuration offset = 32 slab object slab object .. slab object Page boot_kmem_cache_node static local variable Reference functions: get_freepointer/set_freepointer allocate_slab page = new_slab(kmem_cache_node, …) early_kmem_cache_node_alloc page = alloc_slab_page() page->slab_cache = s Update freepointers for the allocated pages n = page->freelist kmem_cache_node->node[node] = n page->freelist = get_freepointer(kmem_cache_node, n) init_kmem_cache_node(n) __add_partial(n, page, DEACTIVATE_TO_HEAD) inc_slabs_node
  • 19. page struct (slab, slob or slub) struct list_head slab_list struct page *next struct kmem_cache *slab_cache void *freelist (first free object) s_mem: first object - SLAB int pages int pobjects counters - SLUB inuse: 16 = 64 objects:15 = 64 frozen: 1 = 1 union struct (SLUB) union struct (Percpu Partial Page) kmem_cache (kmem_cache_node) __percpu *cpu_slab min_partial = 5 size = 64 object_size = 64 cpu_partial = 30 min = 64 inuse = 64 align = 64 name = “kmem_cache_node” list *node[MAX_NUMNODES] max = 64 oo (order objects) = 64 allocate_slab() – freelist/freepointer configuration offset = 32 slab object slab object .. slab object Page Page address = X X X + kmem_cache->size X + kmem_cache->size slab object #0 X + kmem_cache->size * 2 slab object #1 freepointer in slab object: Point to the next slab object (address: middle of object  avoid buffer overflow or underflow) NULL slab object #n -1 point to the next free slab object . . . X + kmem_cache->offset boot_kmem_cache_node static local variable Reference functions: get_freepointer/set_freepointer
  • 20. page struct (slab, slob or slub) struct list_head slab_list struct page *next struct kmem_cache *slab_cache void *freelist (first free object) s_mem: first object - SLAB int pages int pobjects counters - SLUB inuse: 16 = 64 1 objects:15 = 64 frozen: 1 = 1 0 union struct (SLUB) union kmem_cache (kmem_cache_node) __percpu *cpu_slab min_partial = 5 size = 64 object_size = 64 cpu_partial = 30 min = 64 inuse = 64 align = 64 name = “kmem_cache_node” list *node[MAX_NUMNODES] max = 64 oo (order objects) = 64 early_kmem_cache_node_alloc() – Get a slab object from freelist slab object slab object .. slab object Page offset = 32 kmem_cache_node nr_partial struct list_head partial nr_slabs total_objects list_lock struct list_head full Get a slab object from page->freelist kmem_cache_node->node[node] = n static local variable boot_kmem_cache_node struct (Percpu Partial Page)
  • 21. page struct (slab, slob or slub) struct list_head slab_list struct page *next struct kmem_cache *slab_cache void *freelist (first free object) s_mem: first object - SLAB int pages int pobjects counters - SLUB inuse: 16 = 1 objects:15 = 64 frozen: 1 = 0 union struct (SLUB) union kmem_cache (kmem_cache_node) __percpu *cpu_slab min_partial = 5 size = 64 object_size = 64 cpu_partial = 30 min = 64 inuse = 64 align = 64 name = “kmem_cache_node” list *node[MAX_NUMNODES] max = 64 oo (order objects) = 64 early_kmem_cache_node_alloc() – Get a slab object from freelist slab object slab object .. slab object Page offset = 32 kmem_cache_node nr_partial = 0 struct list_head partial nr_slabs = 0 total_objects = 0 list_lock struct list_head full Get a slab object from page->freelist kmem_cache_node->node[node] = n static local variable boot_kmem_cache_node struct (Percpu Partial Page)
  • 22. page struct (slab, slob or slub) struct list_head slab_list struct page *next struct kmem_cache *slab_cache void *freelist (first free object) s_mem: first object - SLAB int pages int pobjects counters - SLUB inuse: 16 = 1 objects:15 = 64 frozen: 1 = 0 union struct (SLUB) union kmem_cache (kmem_cache_node) __percpu *cpu_slab min_partial = 5 size = 64 object_size = 64 cpu_partial = 30 min = 64 inuse = 64 align = 64 name = “kmem_cache_node” list *node[MAX_NUMNODES] max = 64 oo (order objects) = 64 early_kmem_cache_node_alloc() – Get a slab object from freelist slab object slab object .. slab object Page offset = 32 kmem_cache_node nr_partial = 0 struct list_head partial nr_slabs = 0 1 total_objects = 0 64 list_lock struct list_head full Get a slab object from page->freelist kmem_cache_node->node[node] = n static local variable boot_kmem_cache_node struct (Percpu Partial Page)
  • 23. page struct (slab, slob or slub) struct list_head slab_list struct page *next struct kmem_cache *slab_cache void *freelist (first free object) s_mem: first object - SLAB int pages int pobjects counters - SLUB inuse: 16 = 1 objects:15 = 64 frozen: 1 = 0 union struct (SLUB) union struct (Percpu Partial Page) kmem_cache (kmem_cache_node) __percpu *cpu_slab min_partial = 5 size = 64 object_size = 64 cpu_partial = 30 min = 64 inuse = 64 align = 64 name = “kmem_cache_node” list *node[MAX_NUMNODES] max = 64 oo (order objects) = 64 early_kmem_cache_node_alloc() – Get a slab object from freelist slab object slab object .. slab object Page offset = 32 kmem_cache_node nr_partial = 0 1 struct list_head partial nr_slabs = 1 total_objects = 64 list_lock struct list_head full Get a slab object from page->freelist kmem_cache_node->node[node] = n static local variable boot_kmem_cache_node list_head
  • 24. page struct (slab, slob or slub) struct list_head slab_list struct page *next struct kmem_cache *slab_cache void *freelist (first free object) s_mem: first object - SLAB int pages int pobjects counters - SLUB inuse: 16 = 1 objects:15 = 64 frozen: 1 = 0 union struct (SLUB) union struct (Percpu Partial Page) kmem_cache (kmem_cache_node) __percpu *cpu_slab min_partial = 5 size = 64 object_size = 64 cpu_partial = 30 min = 64 inuse = 64 align = 64 name = “kmem_cache_node” list *node[MAX_NUMNODES] max = 64 oo (order objects) = 64 early_kmem_cache_node_alloc(): configure node #0 slab object slab object .. slab object Page offset = 32 kmem_cache_node nr_partial = 1 struct list_head partial nr_slabs = 1 total_objects = 64 list_lock struct list_head full Get a slab object from page->freelist kmem_cache_node->node[node] = n list_head CONFIG_SLUB_DEBUG = y static local variable boot_kmem_cache_node Allocate slab page(s) for kmem_cache_node struct allocation: Fix “the chicken or egg problem”
  • 25. page (slab, slob or slub) struct list_head slab_list struct page *next struct kmem_cache *slab_cache void *freelist (first free object) s_mem: first object - SLAB int pages int pobjects counters - SLUB inuse: 1 objects:15 = 64 frozen: 1 = 0 union struct (SLUB) union struct (Partial Page) kmem_cache (kmem_cache_node) __percpu *cpu_slab min_partial = 5 size = 64 object_size = 64 cpu_partial = 30 min = 64 inuse = 64 align = 64 name = “kmem_cache_node” list *node[MAX_NUMNODES] max = 64 oo (order objects) = 64 slab object slab object .. slab object Page offset = 32 kmem_cache_node nr_partial = 1 struct list_head partial nr_slabs = 1 total_objects = 64 list_lock struct list_head full Get a slab object from page->freelist kmem_cache_node->node[node] = n list_head CONFIG_SLUB_DEBUG = y early_kmem_cache_node_alloc(): configure node #0
  • 26. page struct (slab, slob or slub) struct list_head slab_list struct page *next struct kmem_cache *slab_cache void *freelist (first free object) s_mem: first object - SLAB int pages int pobjects counters - SLUB inuse: 16 = 1 objects:15 = 64 frozen: 1 = 0 union struct (SLUB) union struct (Percpu Partial Page) kmem_cache (global variable: kmem_cache_node) __percpu *cpu_slab = NULL min_partial = 5 size = 64 object_size = 64 cpu_partial = 30 min = 64 inuse = 64 align = 64 name = “kmem_cache_node” list *node[MAX_NUMNODES] max = 64 oo (order objects) = 64 slab object slab object .. slab object Page offset = 32 kmem_cache_node nr_partial = 1 struct list_head partial nr_slabs = 1 total_objects = 64 list_lock struct list_head full Get a slab object from page->freelist kmem_cache_node->node[node] = n list_head CONFIG_SLUB_DEBUG = y static local variable boot_kmem_cache_node early_kmem_cache_node_alloc(): configure node #0
  • 27. page struct (slab, slob or slub) struct list_head slab_list struct page *next struct kmem_cache *slab_cache void *freelist (first free object) s_mem: first object - SLAB int pages int pobjects counters - SLUB inuse: 16 = 1 objects:15 = 64 frozen: 1 = 0 union struct (SLUB) union struct (Percpu Partial Page) kmem_cache (global variable: kmem_cache_node) __percpu *cpu_slab = NULL min_partial = 5 size = 64 object_size = 64 cpu_partial = 30 min = 64 inuse = 64 align = 64 name = “kmem_cache_node” list *node[MAX_NUMNODES] max = 64 oo (order objects) = 64 slab object slab object .. slab object Page offset = 32 kmem_cache_node nr_partial = 1 struct list_head partial nr_slabs = 1 total_objects = 64 list_lock struct list_head full Get a slab object from page->freelist kmem_cache_node->node[node] = n list_head CONFIG_SLUB_DEBUG = y kmem_cache_node nr_partial = 1 struct list_head partial nr_slabs = 1 total_objects = 64 list_lock struct list_head full . . static local variable boot_kmem_cache_node early_kmem_cache_node_alloc(): Fully-configured
  • 28. page struct (slab, slob or slub) struct list_head slab_list struct page *next struct kmem_cache *slab_cache void *freelist (first free object) s_mem: first object - SLAB int pages int pobjects counters - SLUB inuse: 16 = 1 objects:15 = 64 frozen: 1 = 0 union struct (SLUB) union struct (Percpu Partial Page) kmem_cache (global variable: kmem_cache_node) __percpu *cpu_slab = NULL min_partial = 5 size = 64 object_size = 64 cpu_partial = 30 min = 64 inuse = 64 align = 64 name = “kmem_cache_node” list *node[MAX_NUMNODES] max = 64 oo (order objects) = 64 slab object slab object .. slab object Page offset = 32 kmem_cache_node nr_partial = 1 struct list_head partial nr_slabs = 1 total_objects = 64 list_lock struct list_head full Get a slab object from page->freelist kmem_cache_node->node[node] = n list_head kmem_cache_node nr_partial = 1 struct list_head partial nr_slabs = 1 total_objects = 64 list_lock struct list_head full . . static local variable boot_kmem_cache_node early_kmem_cache_node_alloc(): Fully-configured
  • 29. [Comparison] Generic kmem_cache_create() & low-level implementation __kmem_cache_create() start_kernel __kmem_cache_create init_kmem_cache_nodes calculate_sizes kmem_cache_open alloc_kmem_cache_cpus mm_init kmem_cache_init create_boot_cache(kmem_cache_node, …) create_boot_cache(kmem_cache, …) __alloc_percpu init_kmem_cache_cpus Assign cpu id to kmem_cache_cpu->tid Allocate/init ‘kmem_cache_node’ structs for all CPU nodes (sockets) Allocate/init a ‘kmem_cache_cpu’ struct Calculate page order and objects per slab kmem_cache_create kmem_cache_create_usercopy __kmem_cache_alias create_cache find_mergeable kmem_cache_zalloc slab_alloc_node slab_alloc kmem_cache_alloc __kmem_cache_create list_add(&s->list, &slab_caches) Allocate a ‘kmem_cache’ struct from a global variable ‘kmem_cache’
  • 30. [Cache allocation] slab_alloc_node – Allocate a slab cache with specific node id • Fast path & slow path o Will follow kmem_cache_init() flow
  • 31. kmem_cache_init() start_kernel 1. Declare two static variables ‘boot_kmem_cache’ and ‘boot_kmem_cache_node’ 2. Assign the addresses of two static variables to generic/global variables kmem_cache_node = &boot_kmem_cache_node kmem_cache = &boot_kmem_cache __kmem_cache_create init_kmem_cache_nodes calculate_sizes kmem_cache_open alloc_kmem_cache_cpus mm_init kmem_cache_init create_boot_cache(kmem_cache_node, …) create_boot_cache(kmem_cache, …) kmem_cache = bootstrap(&boot_kmem_cache) kmem_cache_node = bootstrap(&boot_kmem_cache_node) setup_kmalloc_cache_index_table create_kmalloc_caches __alloc_percpu init_kmem_cache_cpus Assign cpu id to kmem_cache_cpu->tid Allocate/init ‘kmem_cache_node’ structs for all CPU nodes (sockets) Allocate/init a ‘kmem_cache_cpu’ struct
  • 32. kmem_cache (global variable: kmem_cache) __percpu *cpu_slab min_partial = 5 size = 256 object_size = 216 cpu_partial = 13 min = 0x10 inuse = 216 align = 64 name = “kmem_cache” list *node[MAX_NUMNODES] max = 0x10020 oo (order objects) = 0x10020 kmem_cache: init_kmem_cache_nodes() offset = 112 kmem_cache_node nr_partial = 0 struct list_head partial nr_slabs = 0 total_objects = 0 list_lock struct list_head full kmem_cache_node->node[node] = n CONFIG_SLUB_DEBUG = y kmem_cache_node nr_partial = 0 struct list_head partial nr_slabs = 0 total_objects = 0 list_lock struct list_head full . . kmem_cache_cpu void **freelist tid (transaction id) = cpu id struct page *page struct page *partial stat[NR_SLUB_STAT_ITEMS] boot_kmem_cache static variable Allocate from global variable ‘kmem_cache_node’ * init_kmem_cache_nodes -> kmem_cache_alloc_node n = kmem_cache_alloc_node() early_kmem_cache_node_alloc init_kmem_cache_nodes for_each_node_state() slab_state == DOWN init_kmem_cache_node kmem_cache->node[node] = n Y N Special (one-shot) path: Fix ‘chicken or the egg’ problem Generic path
  • 33. Get a slab object from c->freelist kmem_cache_alloc_node !c->freelist or !c-> page kmem_cache_alloc_node/slab_alloc_node __slab_alloc kmem_cache_alloc slab_alloc slab_alloc_node NUMA_NO_NODE __kmalloc_node kmalloc_node node_id node_id ___slab_alloc new_slab_objects c->page = slub_percpu_partial(c) Get a slab object from page->freelist c->page && !c->freelist !c->page && c->partial get_partial Get partial page from percpu cache new_slab allocate_slab alloc_slab_page Allocate pages from Buddy system !c->page && !c->partial Get partial page from node cache (kmem_cache_node) Y N fast path slow path get_freelist -> __cmpxchg_double_slab this_cpu_cmpxchg_double()
  • 34. Fast Path Get a slab object from kmem_cache_cpu->freelist . . kmem_cache_cpu (CPU #0) struct page *page void **freelist struct page *partial kmem_cache_cpu (CPU #N) struct page *page void **freelist struct page *partial slab object slab object .. slab object Page slab object slab object .. slab object Page kmem_cache (Slab cache) __percpu *cpu_slab percpu cache . . kmem_cache_cpu (CPU #0) struct page *page void **freelist struct page *partial kmem_cache_cpu (CPU #N) struct page *page void **freelist struct page *partial slab object slab object .. slab object Page slab object slab object .. slab object Page percpu cache slab object slab object .. Page slab object slab object slab object slab object .. Page slab object slab object get a slab object
  • 35. Get a slab object from c->freelist kmem_cache_alloc_node !c->freelist or !c-> page slab_alloc_node(): slowpath #1 – Get an object from node’s partial page __slab_alloc kmem_cache_alloc slab_alloc slab_alloc_node NUMA_NO_NODE __kmalloc_node kmalloc_node node_id node_id ___slab_alloc new_slab_objects c->page = slub_percpu_partial(c) c->page && !c->freelist !c->page && c->partial get_partial Get partial page from percpu cache new_slab allocate_slab alloc_slab_page Allocate pages from Buddy system !c->page && !c->partial Get partial page from node cache (kmem_cache_node) Y N fast path slow path get_freelist -> __cmpxchg_double_slab this_cpu_cmpxchg_double() Get a slab object from page->freelist
  • 36. kmem_cache (global variable: kmem_cache_node) __percpu *cpu_slab min_partial = 5 size = 64 object_size = 64 cpu_partial = 30 min = 64 inuse = 64 align = 64 name = “kmem_cache_node” list *node[MAX_NUMNODES] max = 64 oo (order objects) = 64 offset = 32 static local variable boot_kmem_cache_node kmem_cache_cpu (CPU #N) struct page *page = NULL void **freelist = NULL struct page *partial = NULL Get a page (slab) from kmem_cache_node kmem_cache_node (node 0) nr_partial = 1 struct list_head partial nr_slabs = 1 total_objects = 64 list_lock struct list_head full kmem_cache_node (node 1) nr_partial = 1 struct list_head partial nr_slabs = 1 total_objects = 64 list_lock struct list_head full *node[MAX_NUMNODES] 2 3 Get a page (slab) from ‘partial’ member 4 Move node’s partial page (slab) to percpu cache page struct (slab, slob or slub): node #0 struct page *next struct kmem_cache *slab_cache void *freelist (first free object) s_mem: first object - SLAB int pages int pobjects counters - SLUB inuse: 16 = 1 objects:15 = 64 frozen: 1 = 0 union struct (SLUB) union struct (Partial Page) list_head struct list_head slab_list slab object slab object .. slab object Page slab object page struct (slab, slob or slub): node #1 Allocate a slab object from node #0 1 kmem_cache (kmem_cache) *node[] boot_kmem_cache static local variable Allocate slab objects for node[0] & node[1] Slowpath #1 – Get an object from node’s partial page
  • 37. kmem_cache (global variable: kmem_cache_node) __percpu *cpu_slab min_partial = 5 size = 64 object_size = 64 cpu_partial = 30 min = 64 inuse = 64 align = 64 name = “kmem_cache_node” list *node[MAX_NUMNODES] max = 64 oo (order objects) = 64 offset = 32 static local variable boot_kmem_cache_node kmem_cache_cpu (CPU #N) struct page *page = NULL void **freelist = NULL struct page *partial = NULL Get a page (slab) from kmem_cache_node kmem_cache_node (node 0) nr_partial = 1 struct list_head partial nr_slabs = 1 total_objects = 64 list_lock struct list_head full kmem_cache_node (node 1) nr_partial = 1 struct list_head partial nr_slabs = 1 total_objects = 64 list_lock struct list_head full *node[MAX_NUMNODES] 2 3 Get a page (slab) from ‘partial’ member 4 Move node’s partial page (slab) to percpu cache page struct (slab, slob or slub): node #0 struct page *next struct kmem_cache *slab_cache void *freelist (first free object) s_mem: first object - SLAB int pages int pobjects counters - SLUB inuse: 16 = 1 objects:15 = 64 frozen: 1 = 0 union struct (SLUB) union struct (Partial Page) list_head struct list_head slab_list slab object slab object .. slab object Page slab object page struct (slab, slob or slub): node #1 Allocate a slab object from node #0 1 kmem_cache (kmem_cache) *node[] boot_kmem_cache static local variable Allocate slab objects for node[0] & node[1] Slowpath #1 – Get an object from node’s partial page
  • 38. kmem_cache (global variable: kmem_cache_node) __percpu *cpu_slab min_partial = 5 size = 64 object_size = 64 cpu_partial = 30 min = 64 inuse = 64 align = 64 name = “kmem_cache_node” list *node[MAX_NUMNODES] max = 64 oo (order objects) = 64 offset = 32 static local variable boot_kmem_cache_node kmem_cache_cpu (CPU #N) struct page *page = NULL void **freelist = NULL struct page *partial = NULL Get a page (slab) from kmem_cache_node kmem_cache_node (node 0) nr_partial = 1 struct list_head partial nr_slabs = 1 total_objects = 64 list_lock struct list_head full kmem_cache_node (node 1) nr_partial = 1 struct list_head partial nr_slabs = 1 total_objects = 64 list_lock struct list_head full *node[MAX_NUMNODES] 2 3 Get a page (slab) from ‘partial’ member 4 Move node’s partial page (slab) to percpu cache page struct (slab, slob or slub): node #0 struct page *next struct kmem_cache *slab_cache void *freelist (first free object) s_mem: first object - SLAB int pages int pobjects counters - SLUB inuse: 16 = 1 objects:15 = 64 frozen: 1 = 0 union struct (SLUB) union struct (Partial Page) list_head struct list_head slab_list slab object slab object .. slab object Page slab object page struct (slab, slob or slub): node #1 Allocate a slab object from node #0 1 kmem_cache (kmem_cache) *node[] boot_kmem_cache static local variable Allocate slab objects for node[0] & node[1] Slowpath #1 – Get an object from node’s partial page
  • 39. kmem_cache (global variable: kmem_cache_node) __percpu *cpu_slab min_partial = 5 size = 64 object_size = 64 cpu_partial = 30 min = 64 inuse = 64 align = 64 name = “kmem_cache_node” list *node[MAX_NUMNODES] max = 64 oo (order objects) = 64 offset = 32 static local variable boot_kmem_cache_node kmem_cache_cpu (CPU #N) struct page *page = NULL void **freelist = NULL struct page *partial = NULL Get a page (slab) from kmem_cache_node kmem_cache_node (node 0) nr_partial = 1 struct list_head partial nr_slabs = 1 total_objects = 64 list_lock struct list_head full kmem_cache_node (node 1) nr_partial = 1 struct list_head partial nr_slabs = 1 total_objects = 64 list_lock struct list_head full *node[MAX_NUMNODES] 2 3 Get a page (slab) from ‘partial’ member 4 Move node’s partial page (slab) to percpu cache page struct (slab, slob or slub): node #0 struct page *next struct kmem_cache *slab_cache void *freelist (first free object) s_mem: first object - SLAB int pages int pobjects counters - SLUB inuse: 16 = 1 objects:15 = 64 frozen: 1 = 0 union struct (SLUB) union struct (Partial Page) list_head struct list_head slab_list slab object slab object .. slab object Page slab object page struct (slab, slob or slub): node #1 Allocate a slab object from node #0 1 kmem_cache (kmem_cache) *node[] boot_kmem_cache static local variable Allocate slab objects for node[0] & node[1] Slowpath #1 – Get an object from node’s partial page
  • 40. kmem_cache (global variable: kmem_cache_node) __percpu *cpu_slab min_partial = 5 size = 64 object_size = 64 cpu_partial = 30 min = 64 inuse = 64 align = 64 name = “kmem_cache_node” list *node[MAX_NUMNODES] max = 64 oo (order objects) = 64 offset = 32 static local variable boot_kmem_cache_node kmem_cache_cpu (CPU #N) struct page *page void **freelist struct page *partial = NULL kmem_cache_node (node 0) nr_partial = 0 struct list_head partial nr_slabs = 1 total_objects = 64 list_lock struct list_head full kmem_cache_node (node 1) nr_partial = 1 struct list_head partial nr_slabs = 1 total_objects = 64 list_lock struct list_head full *node[MAX_NUMNODES] page struct (slab, slob or slub): node #1 Allocate a slab object from node #0 kmem_cache (kmem_cache) *node[] boot_kmem_cache static local variable Return this object address page struct (slab, slob or slub): node #0 struct page *next struct kmem_cache *slab_cache void *freelist (first free object) = NULL s_mem: first object - SLAB int pages int pobjects counters - SLUB inuse: 16 = 64 objects:15 = 64 frozen: 1 = 1 union struct (SLUB) union struct (Partial Page) struct list_head slab_list = invalid slab object slab object .. slab object Page slab object Allocate slab objects for node[0] & node[1] • node[0]: allocated Slowpath #1 – Get an object from node’s partial page
  • 41. kmem_cache (global variable: kmem_cache_node) __percpu *cpu_slab min_partial = 5 size = 64 object_size = 64 cpu_partial = 30 min = 64 inuse = 64 align = 64 name = “kmem_cache_node” list *node[MAX_NUMNODES] max = 64 oo (order objects) = 64 offset = 32 static local variable boot_kmem_cache_node kmem_cache_cpu (CPU #N) struct page *page void **freelist struct page *partial = NULL kmem_cache_node (node 0) nr_partial = 0 struct list_head partial nr_slabs = 1 total_objects = 64 list_lock struct list_head full kmem_cache_node (node 1) nr_partial = 1 struct list_head partial nr_slabs = 1 total_objects = 64 list_lock struct list_head full *node[MAX_NUMNODES] page struct (slab, slob or slub): node #1 Allocate a slab object from node #0 kmem_cache (kmem_cache) *node[] boot_kmem_cache static local variable Allocate slab objects for node[0] & node[1] • node[0]: allocated Return this object address page struct (slab, slob or slub): node #0 struct page *next struct kmem_cache *slab_cache void *freelist (first free object) = NULL s_mem: first object - SLAB int pages int pobjects counters - SLUB inuse: 16 = 64 objects:15 = 64 frozen: 1 = 1 union struct (SLUB) union struct (Partial Page) struct list_head slab_list = invalid slab object slab object .. slab object Page slab object Slowpath #1 – Get an object from node’s partial page
  • 42. kmem_cache (global variable: kmem_cache_node) __percpu *cpu_slab min_partial = 5 size = 64 object_size = 64 cpu_partial = 30 min = 64 inuse = 64 align = 64 name = “kmem_cache_node” list *node[MAX_NUMNODES] max = 64 oo (order objects) = 64 offset = 32 static local variable boot_kmem_cache_node kmem_cache_cpu (CPU #N) struct page *page void **freelist struct page *partial = NULL kmem_cache_node (node 0) nr_partial = 0 struct list_head partial nr_slabs = 1 total_objects = 64 list_lock struct list_head full kmem_cache_node (node 1) nr_partial = 1 struct list_head partial nr_slabs = 1 total_objects = 64 list_lock struct list_head full *node[MAX_NUMNODES] page struct (slab, slob or slub): node #1 Allocate a slab object from node #1 kmem_cache (kmem_cache) *node[] boot_kmem_cache static local variable Allocate slab objects for node[0] & node[1] • node[0]: allocated page struct (slab, slob or slub): node #0 struct page *next struct kmem_cache *slab_cache void *freelist (first free object) = NULL s_mem: first object - SLAB int pages int pobjects counters - SLUB inuse: 16 = 64 objects:15 = 64 frozen: 1 = 1 union struct (SLUB) union struct (Partial Page) struct list_head slab_list = invalid slab object slab object .. slab object Page slab object 1 2 percpu page’s node is mismatch 3 deactivate_slab(): Remove percpu slab Slowpath #1 – Get an object from node’s partial page
  • 43. kmem_cache (global variable: kmem_cache_node) __percpu *cpu_slab min_partial = 5 size = 64 object_size = 64 cpu_partial = 30 min = 64 inuse = 64 align = 64 name = “kmem_cache_node” list *node[MAX_NUMNODES] max = 64 oo (order objects) = 64 offset = 32 static local variable boot_kmem_cache_node kmem_cache_cpu (CPU #N) struct page *page void **freelist struct page *partial = NULL kmem_cache_node (node 0) nr_partial = 0 struct list_head partial nr_slabs = 1 total_objects = 64 list_lock struct list_head full kmem_cache_node (node 1) nr_partial = 1 struct list_head partial nr_slabs = 1 total_objects = 64 list_lock struct list_head full *node[MAX_NUMNODES] page struct (slab, slob or slub): node #1 Allocate a slab object from node #1 kmem_cache (kmem_cache) *node[] boot_kmem_cache static local variable Allocate slab objects for node[0] & node[1] • node[0]: allocated page struct (slab, slob or slub): node #0 struct page *next struct kmem_cache *slab_cache void *freelist (first free object) = NULL s_mem: first object - SLAB int pages int pobjects counters - SLUB inuse: 16 = 64 objects:15 = 64 frozen: 1 = 1 union struct (SLUB) union struct (Partial Page) struct list_head slab_list = invalid slab object slab object .. slab object Page slab object 1 2 percpu page’s node is mismatch 3 deactivate_slab(): Remove percpu slab Slowpath #1 – Get an object from node’s partial page
  • 44. kmem_cache (global variable: kmem_cache_node) __percpu *cpu_slab min_partial = 5 size = 64 object_size = 64 cpu_partial = 30 min = 64 inuse = 64 align = 64 name = “kmem_cache_node” list *node[MAX_NUMNODES] max = 64 oo (order objects) = 64 offset = 32 static local variable boot_kmem_cache_node kmem_cache_cpu (CPU #N) struct page *page = NULL void **freelist = NULL struct page *partial = NULL kmem_cache_node (node 0) nr_partial = 1 struct list_head partial nr_slabs = 1 total_objects = 64 list_lock struct list_head full kmem_cache_node (node 1) nr_partial = 1 struct list_head partial nr_slabs = 1 total_objects = 64 list_lock struct list_head full *node[MAX_NUMNODES] Allocate a slab object from node #1 kmem_cache (kmem_cache) *node[] boot_kmem_cache static local variable Allocate slab objects for node[0] & node[1] • node[0]: allocated page struct (slab, slob or slub): node #0 struct page *next struct kmem_cache *slab_cache void *freelist (first free object) s_mem: first object - SLAB int pages int pobjects counters - SLUB inuse: 16 = 2 objects:15 = 64 frozen: 1 = 0 union struct (SLUB) union struct (Partial Page) struct list_head slab_list slab object slab object .. slab object Page slab object 1 page struct (slab, slob or slub): node #1 struct page *next struct kmem_cache *slab_cache void *freelist (first free object) s_mem: first object - SLAB int pages int pobjects counters - SLUB inuse: 16 = 1 objects:15 = 64 frozen: 1 = 0 union struct (SLUB) union struct (Partial Page) struct list_head slab_list slab object slab object .. slab object Page slab object freepointers: reverse! Slowpath #1 – Get an object from node’s partial page
  • 45. kmem_cache (global variable: kmem_cache_node) __percpu *cpu_slab min_partial = 5 size = 64 object_size = 64 cpu_partial = 30 min = 64 inuse = 64 align = 64 name = “kmem_cache_node” list *node[MAX_NUMNODES] max = 64 oo (order objects) = 64 offset = 32 static local variable boot_kmem_cache_node kmem_cache_cpu (CPU #N) struct page *page = NULL void **freelist = NULL struct page *partial = NULL kmem_cache_node (node 0) nr_partial = 1 struct list_head partial nr_slabs = 1 total_objects = 64 list_lock struct list_head full kmem_cache_node (node 1) nr_partial = 1 struct list_head partial nr_slabs = 1 total_objects = 64 list_lock struct list_head full *node[MAX_NUMNODES] Allocate a slab object from node #1 kmem_cache (kmem_cache) *node[] boot_kmem_cache static local variable Allocate slab objects for node[0] & node[1] • node[0]: allocated page struct (slab, slob or slub): node #0 struct page *next struct kmem_cache *slab_cache void *freelist (first free object) s_mem: first object - SLAB int pages int pobjects counters - SLUB inuse: 16 = 2 objects:15 = 64 frozen: 1 = 0 union struct (SLUB) union struct (Partial Page) struct list_head slab_list 1 page struct (slab, slob or slub): node #1 struct page *next struct kmem_cache *slab_cache void *freelist (first free object) s_mem: first object - SLAB int pages int pobjects counters - SLUB inuse: 16 = 1 objects:15 = 64 frozen: 1 = 0 union struct (SLUB) union struct (Partial Page) struct list_head slab_list slab object slab object .. slab object Page slab object slab object slab object .. slab object Page slab object freepointers: reverse! Slowpath #1 – Get an object from node’s partial page
  • 46. kmem_cache (global variable: kmem_cache_node) __percpu *cpu_slab min_partial = 5 size = 64 object_size = 64 cpu_partial = 30 min = 64 inuse = 64 align = 64 name = “kmem_cache_node” list *node[MAX_NUMNODES] max = 64 oo (order objects) = 64 offset = 32 static local variable boot_kmem_cache_node kmem_cache_cpu (CPU #N) struct page *page = NULL void **freelist = NULL struct page *partial = NULL kmem_cache_node (node 0) nr_partial = 1 struct list_head partial nr_slabs = 1 total_objects = 64 list_lock struct list_head full kmem_cache_node (node 1) nr_partial = 1 struct list_head partial nr_slabs = 1 total_objects = 64 list_lock struct list_head full *node[MAX_NUMNODES] Allocate a slab object from node #1 kmem_cache (kmem_cache) *node[] boot_kmem_cache static local variable Allocate slab objects for node[0] & node[1] • node[0]: allocated page struct (slab, slob or slub): node #0 struct page *next struct kmem_cache *slab_cache void *freelist (first free object) s_mem: first object - SLAB int pages int pobjects counters - SLUB inuse: 16 = 2 objects:15 = 64 frozen: 1 = 0 union struct (SLUB) union struct (Partial Page) struct list_head slab_list page struct (slab, slob or slub): node #1 struct page *next struct kmem_cache *slab_cache void *freelist (first free object) s_mem: first object - SLAB int pages int pobjects counters - SLUB inuse: 16 = 1 objects:15 = 64 frozen: 1 = 0 union struct (SLUB) union struct (Partial Page) struct list_head slab_list slab object slab object .. slab object Page slab object slab object slab object .. slab object Page slab object freepointers: reverse! 4 Move node’s partial page (slab) to percpu cache Slowpath #1 – Get an object from node’s partial page
  • 47. kmem_cache (global variable: kmem_cache_node) __percpu *cpu_slab min_partial = 5 size = 64 object_size = 64 cpu_partial = 30 min = 64 inuse = 64 align = 64 name = “kmem_cache_node” list *node[MAX_NUMNODES] max = 64 oo (order objects) = 64 offset = 32 static local variable boot_kmem_cache_node kmem_cache_cpu (CPU #N) struct page *page void **freelist struct page *partial = NULL kmem_cache_node (node 0) nr_partial = 1 struct list_head partial nr_slabs = 1 total_objects = 64 list_lock struct list_head full kmem_cache_node (node 1) nr_partial = 1 struct list_head partial nr_slabs = 1 total_objects = 64 list_lock struct list_head full *node[MAX_NUMNODES] Allocate a slab object from node #1 kmem_cache (kmem_cache) *node[] boot_kmem_cache static local variable Allocate slab objects for node[0] & node[1] • node[0]: allocated • node[1]: allocated page struct (slab, slob or slub): node #1 struct page *next struct kmem_cache *slab_cache void *freelist (first free object) = NULL s_mem: first object - SLAB int pages int pobjects counters - SLUB inuse: 16 = 64 objects:15 = 64 frozen: 1 = 1 union struct (SLUB) union struct (Partial Page) struct list_head slab_list = invalid slab object slab object .. slab object Page slab object page struct (slab, slob or slub): node #0 struct page *next struct kmem_cache *slab_cache void *freelist (first free object) s_mem: first object - SLAB int pages int pobjects counters - SLUB inuse: 16 = 2 objects:15 = 64 frozen: 1 = 0 union struct (SLUB) union struct (Partial Page) struct list_head slab_list slab object slab object .. slab object Page slab object freepointers: reverse! Return this object address Slowpath #1 – Get an object from node’s partial page
  • 48. kmem_cache_init() start_kernel 1. Declare two static variables ‘boot_kmem_cache’ and ‘boot_kmem_cache_node’ 2. Assign the addresses of two static variables to generic/global variables kmem_cache_node = &boot_kmem_cache_node kmem_cache = &boot_kmem_cache __kmem_cache_create init_kmem_cache_nodes calculate_sizes kmem_cache_open alloc_kmem_cache_cpus mm_init kmem_cache_init create_boot_cache(kmem_cache_node, …) create_boot_cache(kmem_cache, …) kmem_cache = bootstrap(&boot_kmem_cache) kmem_cache_node = bootstrap(&boot_kmem_cache_node) setup_kmalloc_cache_index_table create_kmalloc_caches __alloc_percpu init_kmem_cache_cpus Assign cpu id to kmem_cache_cpu->tid Allocate/init ‘kmem_cache_node’ structs for all CPU nodes (sockets) Allocate/init a ‘kmem_cache_cpu’ struct Let’s check data structures about kmem_cache & kmem_cache_node after function call “create_boot_cache”
  • 49. kmem_cache (global variable: kmem_cache_node) __percpu *cpu_slab min_partial = 5 size = 64 object_size = 64 cpu_partial = 30 min = 64 inuse = 64 align = 64 name = “kmem_cache_node” list *node[MAX_NUMNODES] max = 64 oo (order objects) = 64 create_boot_cache(): 1/2 offset = 32 static local variable boot_kmem_cache_node kmem_cache_cpu (CPU #N) struct page *page void **freelist struct page *partial = NULL kmem_cache_node (node 0) nr_partial = 1 struct list_head partial nr_slabs = 1 total_objects = 64 list_lock struct list_head full kmem_cache_node (node 1) nr_partial = 1 struct list_head partial nr_slabs = 1 total_objects = 64 list_lock struct list_head full *node[MAX_NUMNODES] page struct (slab, slob or slub): node #1 struct page *next struct kmem_cache *slab_cache void *freelist (first free object) = NULL s_mem: first object - SLAB int pages int pobjects counters - SLUB inuse: 16 = 64 objects:15 = 64 frozen: 1 = 1 union struct (SLUB) union struct (Partial Page) struct list_head slab_list = invalid slab object slab object .. slab object Page slab object page struct (slab, slob or slub): node #0 struct page *next struct kmem_cache *slab_cache void *freelist (first free object) s_mem: first object - SLAB int pages int pobjects counters - SLUB inuse: 16 = 2 objects:15 = 64 frozen: 1 = 0 union struct (SLUB) union struct (Partial Page) struct list_head slab_list slab object slab object .. slab object Page slab object freepointers: reverse! Return this object address
  • 50. kmem_cache (global variable: kmem_cache) __percpu *cpu_slab min_partial = 5 size = 256 object_size = 216 cpu_partial = 13 min = 0x10 inuse = 216 align = 64 name = “kmem_cache” list *node[MAX_NUMNODES] max = 0x10020 oo (order objects) = 0x10020 create_boot_cache(): 2/2 offset = 112 static local variable boot_kmem_cache kmem_cache_cpu (CPU #N) struct page *page = NULL void **freelist = NULL struct page *partial = NULL kmem_cache_node (node 0) nr_partial = 0 struct list_head partial nr_slabs = 0 total_objects = 0 list_lock struct list_head full kmem_cache_node (node 1) nr_partial = 0 struct list_head partial nr_slabs = 0 total_objects = 0 list_lock struct list_head full *node[MAX_NUMNODES]
  • 51. kmem_cache_init() start_kernel 1. Declare two static variables ‘boot_kmem_cache’ and ‘boot_kmem_cache_node’ 2. Assign the addresses of two static variables to generic/global variables kmem_cache_node = &boot_kmem_cache_node kmem_cache = &boot_kmem_cache __kmem_cache_create init_kmem_cache_nodes calculate_sizes kmem_cache_open alloc_kmem_cache_cpus mm_init kmem_cache_init create_boot_cache(kmem_cache_node, …) create_boot_cache(kmem_cache, …) kmem_cache = bootstrap(&boot_kmem_cache) kmem_cache_node = bootstrap(&boot_kmem_cache_node) setup_kmalloc_cache_index_table create_kmalloc_caches __alloc_percpu init_kmem_cache_cpus Assign cpu id to kmem_cache_cpu->tid Allocate/init ‘kmem_cache_node’ structs for all CPU nodes (sockets) Allocate/init a ‘kmem_cache_cpu’ struct
  • 52. bootstrap s = kmem_cache_zalloc(kmem_cache, …) Allocate a kmem_cache struct from global variable ‘kmem_cache’: Generic path __flush_cpu_slab list_add(&s->list, &slab_caches) for_each_kmem_cache_node(s, node, n) kmem_cache = bootstrap(&boot_kmem_cache) memcpy(s, static_cache, …) return s list_for_each_entry(p, &n->partial, slab_list) struct page *p; p->slab_cache = s
  • 53. Get a slab object from c->freelist kmem_cache_alloc_node !c->freelist or !c-> page slab_alloc_node(): slowpath #2 – Allocate a page from Buddy system __slab_alloc kmem_cache_alloc slab_alloc slab_alloc_node NUMA_NO_NODE __kmalloc_node kmalloc_node node_id node_id ___slab_alloc new_slab_objects c->page = slub_percpu_partial(c) c->page && !c->freelist !c->page && c->partial get_partial Get partial page from percpu cache new_slab allocate_slab alloc_slab_page Allocate pages from Buddy system !c->page && !c->partial Get partial page from node cache (kmem_cache_node) Y N fast path slow path get_freelist -> __cmpxchg_double_slab this_cpu_cmpxchg_double() bootstrap s = kmem_cache_zalloc(kmem_cache, …) Allocate a kmem_cache struct from global variable ‘kmem_cache’ Get a slab object from page->freelist
  • 54. kmem_cache (global variable: kmem_cache) __percpu *cpu_slab min_partial = 5 size = 256 object_size = 216 cpu_partial = 13 min = 0x10 inuse = 216 align = 64 name = “kmem_cache” list *node[MAX_NUMNODES] max = 0x10020 oo (order objects) = 0x10020 offset = 112 static local variable boot_kmem_cache kmem_cache_cpu (CPU #N) struct page *page = NULL void **freelist = NULL struct page *partial = NULL kmem_cache_node (node 0) nr_partial = 0 struct list_head partial nr_slabs = 0 total_objects = 0 list_lock struct list_head full kmem_cache_node (node 1) nr_partial = 0 struct list_head partial nr_slabs = 0 total_objects = 0 list_lock struct list_head full *node[MAX_NUMNODES] [slowpath #2] Before: bootstrap(&boot_kmem_cache) -> kmem_cache_zalloc()
  • 55. kmem_cache (global variable: kmem_cache) __percpu *cpu_slab min_partial = 5 size = 256 object_size = 216 cpu_partial = 13 min = 0x10 inuse = 216 align = 64 name = “kmem_cache” list *node[MAX_NUMNODES] max = 0x10020 oo (order objects) = 0x10020 offset = 112 static local variable boot_kmem_cache kmem_cache_cpu (CPU #N) struct page *page void **freelist struct page *partial = NULL kmem_cache_node (node 0) nr_partial = 0 struct list_head partial nr_slabs = 0 total_objects = 0 list_lock struct list_head full kmem_cache_node (node 1) nr_partial = 0 struct list_head partial nr_slabs = 0 total_objects = 0 list_lock struct list_head full *node[MAX_NUMNODES] page struct (slab, slob or slub): order =1 struct page *next struct kmem_cache *slab_cache void *freelist (first free object) = NULL s_mem: first object - SLAB int pages int pobjects counters - SLUB inuse: 16 = 32 objects:15 = 32 frozen: 1 = 1 union struct (SLUB) union struct (Partial Page) struct list_head slab_list = invalid slab object slab object .. slab object Slab Page(s) slab object Return this object address Page order objects per slab 0 15 16 oo (order objects) kmem_cache_zalloc(k mem_cache, …) 1 2 Allocate from Buddy system 3 slab_alloc_node(): slowpath #2 – Allocate a page from Buddy system [slowpath #2] After: bootstrap(&boot_kmem_cache) -> kmem_cache_zalloc()
  • 56. kmem_cache (global variable: kmem_cache) __percpu *cpu_slab min_partial = 5 size = 256 object_size = 216 cpu_partial = 13 min = 0x10 inuse = 216 align = 64 name = “kmem_cache” list *node[MAX_NUMNODES] max = 0x10020 oo (order objects) = 0x10020 offset = 112 static local variable boot_kmem_cache kmem_cache_cpu (CPU #N) struct page *page void **freelist struct page *partial = NULL kmem_cache_node (node 0) nr_partial = 0 struct list_head partial nr_slabs = 0 total_objects = 0 list_lock struct list_head full kmem_cache_node (node 1) nr_partial = 0 struct list_head partial nr_slabs = 0 total_objects = 0 list_lock struct list_head full *node[MAX_NUMNODES] page struct (slab, slob or slub): order =1 struct page *next struct kmem_cache *slab_cache void *freelist (first free object) = NULL s_mem: first object - SLAB int pages int pobjects counters - SLUB inuse: 16 = 32 objects:15 = 32 frozen: 1 = 1 union struct (SLUB) union struct (Partial Page) struct list_head slab_list = invalid slab object slab object .. slab object Slab Page(s) slab object Return this object address Page order objects per slab 0 15 16 oo (order objects) kmem_cache_zalloc(k mem_cache, …) 1 2 Allocate from Buddy system 3 [slowpath #2] After: bootstrap(&boot_kmem_cache) -> kmem_cache_zalloc()
  • 57. kmem_cache (global variable: kmem_cache) __percpu *cpu_slab min_partial = 5 size = 256 object_size = 216 cpu_partial = 13 min = 0x10 inuse = 216 align = 64 name = “kmem_cache” list *node[MAX_NUMNODES] max = 0x10020 oo (order objects) = 0x10020 offset = 112 static local variable boot_kmem_cache kmem_cache_cpu (CPU #N) struct page *page = NULL void **freelist = NULL struct page *partial = NULL kmem_cache_node (node 0) nr_partial = 1 struct list_head partial nr_slabs = 1 total_objects = 32 list_lock struct list_head full kmem_cache_node (node 1) nr_partial = 0 struct list_head partial nr_slabs = 0 total_objects = 0 list_lock struct list_head full *node[MAX_NUMNODES] page struct (slab, slob or slub): order =1 struct page *next struct kmem_cache *slab_cache void *freelist (first free object) = NULL s_mem: first object - SLAB int pages int pobjects counters - SLUB inuse: 16 = 32 objects:15 = 32 frozen: 1 = 1 union struct (SLUB) union struct (Partial Page) struct list_head slab_list = invalid slab object slab object .. slab object Slab Page(s) slab object Page order objects per slab 0 15 16 oo (order objects) kmem_cache_zalloc(k mem_cache, …) X X page struct (slab, slob or slub): order =1 struct page *next struct kmem_cache *slab_cache void *freelist (first free object) s_mem: first object - SLAB int pages int pobjects counters - SLUB inuse: 16 = 1 objects:15 = 32 frozen: 1 = 0 union struct (SLUB) union struct (Partial Page) struct list_head slab_list slab object slab object .. slab object Slab Page(s) slab object freepointers: reverse! deactivate_slab(): Remove percpu slab [slowpath #2] After: bootstrap(&boot_kmem_cache) -> __flush_cpu_slab()
  • 58. kmem_cache (global variable: kmem_cache) __percpu *cpu_slab min_partial = 5 size = 256 object_size = 216 cpu_partial = 13 min = 0x10 inuse = 216 align = 64 name = “kmem_cache” list *node[MAX_NUMNODES] max = 0x10020 oo (order objects) = 0x10020 offset = 112 kmem_cache_cpu (CPU #N) struct page *page = NULL void **freelist = NULL struct page *partial = NULL kmem_cache_node (node 0) nr_partial = 1 struct list_head partial nr_slabs = 1 total_objects = 32 list_lock struct list_head full kmem_cache_node (node 1) nr_partial = 0 struct list_head partial nr_slabs = 0 total_objects = 0 list_lock struct list_head full *node[MAX_NUMNODES] Page order objects per slab 0 15 16 oo (order objects) page struct (slab, slob or slub): order =1 struct page *next struct kmem_cache *slab_cache void *freelist (first free object) s_mem: first object - SLAB int pages int pobjects counters - SLUB inuse: 16 = 1 objects:15 = 32 frozen: 1 = 0 union struct (SLUB) union struct (Partial Page) struct list_head slab_list slab object slab object .. slab object Slab Page(s) slab object freepointers: reverse! kmem_cache = bootstrap(&boot_kmem_cache): after function returns static local variable boot_kmem_cache kmem_cache global variable X
  • 59. kmem_cache (global variable: kmem_cache) __percpu *cpu_slab min_partial = 5 size = 256 object_size = 216 cpu_partial = 13 min = 0x10 inuse = 216 align = 64 name = “kmem_cache” list *node[MAX_NUMNODES] max = 0x10020 oo (order objects) = 0x10020 offset = 112 kmem_cache_cpu (CPU #N) struct page *page = NULL void **freelist = NULL struct page *partial = NULL kmem_cache_node (node 0) nr_partial = 1 struct list_head partial nr_slabs = 1 total_objects = 32 list_lock struct list_head full kmem_cache_node (node 1) nr_partial = 0 struct list_head partial nr_slabs = 0 total_objects = 0 list_lock struct list_head full *node[MAX_NUMNODES] Page order objects per slab 0 15 16 oo (order objects) page struct (slab, slob or slub): order =1 struct page *next struct kmem_cache *slab_cache void *freelist (first free object) s_mem: first object - SLAB int pages int pobjects counters - SLUB inuse: 16 = 1 objects:15 = 32 frozen: 1 = 0 union struct (SLUB) union struct (Partial Page) struct list_head slab_list slab object slab object .. slab object Slab Page(s) slab object freepointers: reverse! kmem_cache = bootstrap(&boot_kmem_cache): after function returns static local variable boot_kmem_cache kmem_cache global variable X
  • 60. kmem_cache_init() start_kernel 1. Declare two static variables ‘boot_kmem_cache’ and ‘boot_kmem_cache_node’ 2. Assign the addresses of two static variables to generic/global variables kmem_cache_node = &boot_kmem_cache_node kmem_cache = &boot_kmem_cache __kmem_cache_create init_kmem_cache_nodes calculate_sizes kmem_cache_open alloc_kmem_cache_cpus mm_init kmem_cache_init create_boot_cache(kmem_cache_node, …) create_boot_cache(kmem_cache, …) kmem_cache = bootstrap(&boot_kmem_cache) kmem_cache_node = bootstrap(&boot_kmem_cache_node) setup_kmalloc_cache_index_table create_kmalloc_caches __alloc_percpu init_kmem_cache_cpus Assign cpu id to kmem_cache_cpu->tid Allocate/init ‘kmem_cache_node’ structs for all CPU nodes (sockets) Allocate/init a ‘kmem_cache_cpu’ struct
  • 61. Get a slab object from c->freelist kmem_cache_alloc_node !c->freelist or !c-> page slab_alloc_node(): slowpath #1 – Get an object from node’s partial page __slab_alloc kmem_cache_alloc slab_alloc slab_alloc_node NUMA_NO_NODE __kmalloc_node kmalloc_node node_id node_id ___slab_alloc new_slab_objects c->page = slub_percpu_partial(c) c->page && !c->freelist !c->page && c->partial get_partial Get partial page from percpu cache new_slab allocate_slab alloc_slab_page Allocate pages from Buddy system !c->page && !c->partial Get partial page from node cache (kmem_cache_node) Y N fast path slow path get_freelist -> __cmpxchg_double_slab this_cpu_cmpxchg_double() bootstrap s = kmem_cache_zalloc(kmem_cache, …) Allocate a kmem_cache struct from global variable ‘kmem_cache’ kmem_cache_node = bootstrap(&boot_kmem_cache_node) Get a slab object from page->freelist
  • 62. kmem_cache (global variable: kmem_cache) __percpu *cpu_slab min_partial = 5 size = 256 object_size = 216 cpu_partial = 13 min = 0x10 inuse = 216 align = 64 name = “kmem_cache” list *node[MAX_NUMNODES] max = 0x10020 oo (order objects) = 0x10020 offset = 112 kmem_cache_cpu (CPU #N) struct page *page = NULL void **freelist = NULL struct page *partial = NULL kmem_cache_node (node 0) nr_partial = 0 struct list_head partial nr_slabs = 1 total_objects = 32 list_lock struct list_head full kmem_cache_node (node 1) nr_partial = 0 struct list_head partial nr_slabs = 0 total_objects = 0 list_lock struct list_head full *node[MAX_NUMNODES] Page order objects per slab 0 15 16 oo (order objects) page struct (slab, slob or slub): order =1 struct page *next struct kmem_cache *slab_cache void *freelist (first free object) s_mem: first object - SLAB int pages int pobjects counters - SLUB inuse: 16 = 1 objects:15 = 32 frozen: 1 = 0 union struct (SLUB) union struct (Partial Page) struct list_head slab_list slab object slab object .. slab object Slab Page(s) slab object freepointers: reverse! kmem_cache_node = bootstrap(&boot_kmem_cache_node) -> kmem_cache_zalloc(…) kmem_cache global variable page struct (slab, slob or slub): order =1 struct page *next struct kmem_cache *slab_cache void *freelist (first free object) = NULL s_mem: first object - SLAB int pages int pobjects counters - SLUB inuse: 16 = 32 objects:15 = 32 frozen: 1 = 1 union struct (SLUB) union struct (Partial Page) struct list_head slab_list = invalid slab object slab object .. slab object Slab Page(s) slab object slab object Return this object address get_partial_node() -> acquire_slab() kmem_cache_zalloc(k mem_cache, …) slab_alloc_node(): slowpath #1 – Get an object from node’s partial page
  • 63. kmem_cache (global variable: kmem_cache) __percpu *cpu_slab min_partial = 5 size = 256 object_size = 216 cpu_partial = 13 min = 0x10 inuse = 216 align = 64 name = “kmem_cache” list *node[MAX_NUMNODES] max = 0x10020 oo (order objects) = 0x10020 offset = 112 kmem_cache_cpu (CPU #N) struct page *page void **freelist struct page *partial = NULL kmem_cache_node (node 0) nr_partial = 0 struct list_head partial nr_slabs = 1 total_objects = 32 list_lock struct list_head full kmem_cache_node (node 1) nr_partial = 0 struct list_head partial nr_slabs = 0 total_objects = 0 list_lock struct list_head full *node[MAX_NUMNODES] Page order objects per slab 0 15 16 oo (order objects) page struct (slab, slob or slub): order =1 struct page *next struct kmem_cache *slab_cache void *freelist (first free object) s_mem: first object - SLAB int pages int pobjects counters - SLUB inuse: 16 = 1 objects:15 = 32 frozen: 1 = 0 union struct (SLUB) union struct (Partial Page) struct list_head slab_list slab object slab object .. slab object Slab Page(s) slab object freepointers: reverse! kmem_cache global variable page struct (slab, slob or slub): order =1 struct page *next struct kmem_cache *slab_cache void *freelist (first free object) = NULL s_mem: first object - SLAB int pages int pobjects counters - SLUB inuse: 16 = 32 objects:15 = 32 frozen: 1 = 1 union struct (SLUB) union struct (Partial Page) struct list_head slab_list = invalid slab object slab object .. slab object Slab Page(s) slab object slab object Return this object address get_partial_node() -> acquire_slab() kmem_cache_node = bootstrap(&boot_kmem_cache_node) -> kmem_cache_zalloc(…)
  • 64. kmem_cache (global variable: kmem_cache_node) __percpu *cpu_slab min_partial = 5 size = 64 object_size = 64 cpu_partial = 30 min = 64 inuse = 64 align = 64 name = “kmem_cache_node” list *node[MAX_NUMNODES] max = 64 oo (order objects) = 64 offset = 32 static local variable boot_kmem_cache_node kmem_cache_cpu (CPU #N) struct page *page void **freelist struct page *partial = NULL kmem_cache_node (node 0) nr_partial = 1 struct list_head partial nr_slabs = 1 total_objects = 64 list_lock struct list_head full kmem_cache_node (node 1) nr_partial = 1 struct list_head partial nr_slabs = 1 total_objects = 64 list_lock struct list_head full *node[MAX_NUMNODES] page struct (slab, slob or slub): node #1 struct page *next struct kmem_cache *slab_cache void *freelist (first free object) = NULL s_mem: first object - SLAB int pages int pobjects counters - SLUB inuse: 16 = 64 objects:15 = 64 frozen: 1 = 1 union struct (SLUB) union struct (Partial Page) struct list_head slab_list = invalid slab object slab object .. slab object Page slab object page struct (slab, slob or slub): node #0 struct page *next struct kmem_cache *slab_cache void *freelist (first free object) s_mem: first object - SLAB int pages int pobjects counters - SLUB inuse: 16 = 2 objects:15 = 64 frozen: 1 = 0 union struct (SLUB) union struct (Partial Page) struct list_head slab_list slab object slab object .. slab object Page slab object freepointers: reverse! kmem_cache_node = bootstrap(&boot_kmem_cache_node) -> __flush_cpu_slab() deactivate_slab(): Remove percpu slab Reference from page #49
  • 65. kmem_cache (global variable: kmem_cache_node) __percpu *cpu_slab min_partial = 5 size = 64 object_size = 64 cpu_partial = 30 min = 64 inuse = 64 align = 64 name = “kmem_cache_node” list *node[MAX_NUMNODES] max = 64 oo (order objects) = 64 offset = 32 static local variable boot_kmem_cache_node kmem_cache_cpu (CPU #N) struct page *page = NULL void **freelist = NULL struct page *partial = NULL kmem_cache_node (node 0) nr_partial = 1 struct list_head partial nr_slabs = 1 total_objects = 64 list_lock struct list_head full kmem_cache_node (node 1) nr_partial = 1 struct list_head partial nr_slabs = 1 total_objects = 64 list_lock struct list_head full *node[MAX_NUMNODES] page struct (slab, slob or slub): node #1 struct page *next struct kmem_cache *slab_cache void *freelist (first free object) s_mem: first object - SLAB int pages int pobjects counters - SLUB inuse: 16 = 2 objects:15 = 64 frozen: 1 = 0 union struct (SLUB) union struct (Partial Page) struct list_head slab_list slab object slab object .. slab object Page slab object page struct (slab, slob or slub): node #0 struct page *next struct kmem_cache *slab_cache void *freelist (first free object) s_mem: first object - SLAB int pages int pobjects counters - SLUB inuse: 16 = 2 objects:15 = 64 frozen: 1 = 0 union struct (SLUB) union struct (Partial Page) struct list_head slab_list slab object slab object .. slab object Page slab object freepointers: reverse! kmem_cache_node = bootstrap(&boot_kmem_cache_node) -> __flush_cpu_slab() freepointers: reverse!
  • 66. kmem_cache (global variable: kmem_cache_node) __percpu *cpu_slab min_partial = 5 size = 64 object_size = 64 cpu_partial = 30 min = 64 inuse = 64 align = 64 name = “kmem_cache_node” list *node[MAX_NUMNODES] max = 64 oo (order objects) = 64 offset = 32 static local variable boot_kmem_cache_node kmem_cache_cpu (CPU #N) struct page *page = NULL void **freelist = NULL struct page *partial = NULL kmem_cache_node (node 0) nr_partial = 1 struct list_head partial nr_slabs = 1 total_objects = 64 list_lock struct list_head full kmem_cache_node (node 1) nr_partial = 1 struct list_head partial nr_slabs = 1 total_objects = 64 list_lock struct list_head full *node[MAX_NUMNODES] page struct (slab, slob or slub): node #1 struct page *next struct kmem_cache *slab_cache void *freelist (first free object) s_mem: first object - SLAB int pages int pobjects counters - SLUB inuse: 16 = 2 objects:15 = 64 frozen: 1 = 0 union struct (SLUB) union struct (Partial Page) struct list_head slab_list slab object slab object .. slab object Page slab object page struct (slab, slob or slub): node #0 struct page *next struct kmem_cache *slab_cache void *freelist (first free object) s_mem: first object - SLAB int pages int pobjects counters - SLUB inuse: 16 = 2 objects:15 = 64 frozen: 1 = 0 union struct (SLUB) union struct (Partial Page) struct list_head slab_list slab object slab object .. slab object Page slab object freepointers: reverse! kmem_cache_node = bootstrap(&boot_kmem_cache_node) -> __flush_cpu_slab() freepointers: reverse!
  • 67. kmem_cache __percpu *cpu_slab min_partial = 5 size = 64 object_size = 64 cpu_partial = 30 min = 64 inuse = 64 align = 64 name = “kmem_cache_node” list *node[MAX_NUMNODES] max = 64 oo (order objects) = 64 offset = 32 static struct kmem_cache *kmem_cache_node kmem_cache_cpu (CPU #N) struct page *page = NULL void **freelist = NULL struct page *partial = NULL kmem_cache_node (node 0) nr_partial = 1 struct list_head partial nr_slabs = 1 total_objects = 64 list_lock struct list_head full kmem_cache_node (node 1) nr_partial = 1 struct list_head partial nr_slabs = 1 total_objects = 64 list_lock struct list_head full *node[MAX_NUMNODES] page struct (slab, slob or slub): node #1 struct page *next struct kmem_cache *slab_cache void *freelist (first free object) s_mem: first object - SLAB int pages int pobjects counters - SLUB inuse: 16 = 2 objects:15 = 64 frozen: 1 = 0 union struct (SLUB) union struct (Partial Page) struct list_head slab_list slab object slab object .. slab object Page slab object page struct (slab, slob or slub): node #0 struct page *next struct kmem_cache *slab_cache void *freelist (first free object) s_mem: first object - SLAB int pages int pobjects counters - SLUB inuse: 16 = 2 objects:15 = 64 frozen: 1 = 0 union struct (SLUB) union struct (Partial Page) struct list_head slab_list slab object slab object .. slab object Page slab object freepointers: reverse! After ‘kmem_cache_node = bootstrap(&boot_kmem_cache_node)’ freepointers: reverse!
  • 68. kmem_cache_init() start_kernel 1. Declare two static variables ‘boot_kmem_cache’ and ‘boot_kmem_cache_node’ 2. Assign the addresses of two static variables to generic/global variables kmem_cache_node = &boot_kmem_cache_node kmem_cache = &boot_kmem_cache __kmem_cache_create init_kmem_cache_nodes calculate_sizes kmem_cache_open alloc_kmem_cache_cpus mm_init kmem_cache_init create_boot_cache(kmem_cache_node, …) create_boot_cache(kmem_cache, …) kmem_cache = bootstrap(&boot_kmem_cache) kmem_cache_node = bootstrap(&boot_kmem_cache_node) setup_kmalloc_cache_index_table create_kmalloc_caches __alloc_percpu init_kmem_cache_cpus Assign cpu id to kmem_cache_cpu->tid Allocate/init ‘kmem_cache_node’ structs for all CPU nodes (sockets) Allocate/init a ‘kmem_cache_cpu’ struct Will talk about this later
  • 69. Get a slab object from c->freelist kmem_cache_alloc_node !c->freelist or !c-> page slab_alloc_node(): slowpath #3 – Get a partial page from percpu cache __slab_alloc kmem_cache_alloc slab_alloc slab_alloc_node NUMA_NO_NODE __kmalloc_node kmalloc_node node_id node_id ___slab_alloc new_slab_objects c->page = slub_percpu_partial(c) c->page && !c->freelist !c->page && c->partial get_partial Get partial page from percpu cache new_slab allocate_slab alloc_slab_page Allocate pages from Buddy system !c->page && !c->partial Get partial page from node cache (kmem_cache_node) Y N fast path slow path get_freelist -> __cmpxchg_double_slab this_cpu_cmpxchg_double() Get a slab object from page->freelist Let’s talk about another slow path
  • 70. kmem_cache __percpu *cpu_slab *node[MAX_NUMNODES] kmem_cache_cpu (CPU #N) struct page *page = NULL or valid void **freelist = NULL or valid struct page *partial kmem_cache_node (node 0) struct list_head partial kmem_cache_node (node 1) struct list_head partial *node[MAX_NUMNODES] slowpath #3: Get a partial page from percpu cache slab object slab object .. slab object Page page->next Slowpath #3: Get a partial page from percpu cache slab object slab object .. slab object Page page->freelist page->freelist 1
  • 71. kmem_cache __percpu *cpu_slab *node[MAX_NUMNODES] kmem_cache_cpu (CPU #N) struct page *page = NULL void **freelist = NULL struct page *partial = NULL kmem_cache_node (node 0) struct list_head partial kmem_cache_node (node 1) struct list_head partial *node[MAX_NUMNODES] slab object slab object .. slab object Page slab object slab object .. slab object Page slab object slab object .. slab object Page Slowpath #1: Get a slab object from node’s partial (new_slab_objects->get_partial- >get_partial_node->put_cpu_partial) • [Just for a scenario] Acquire 3 pages because of the number of kmem_cache->cpu_partial 1 When to add a page to percpu partial page? – Case 1 (1/2) put_cpu_partial(): Put page(s) into a percpu partial page
  • 72. kmem_cache __percpu *cpu_slab *node[MAX_NUMNODES] kmem_cache_cpu (CPU #N) struct page *page void **freelist struct page *partial kmem_cache_node (node 0) struct list_head partial kmem_cache_node (node 1) struct list_head partial *node[MAX_NUMNODES] When to add a page to percpu partial page? – Case 1 (2/2) slab object slab object .. slab object Page slab object slab object .. slab object Page slab object slab object .. slab object Page slab object slab object .. slab object Page page->next slab object slab object .. slab object Page 1 slab object slab object .. slab object Page page->freelist page->freelist Slowpath #1: Get a slab object from node’s partial (new_slab_objects->get_partial- >get_partial_node->put_cpu_partial) • [Just for a scenario] Acquire 3 pages because of the number of kmem_cache->cpu_partial put_cpu_partial(): Put page(s) into a percpu partial page
  • 73. kmem_cache __percpu *cpu_slab *node[MAX_NUMNODES] kmem_cache_cpu (CPU #N) struct page *page void **freelist struct page *partial kmem_cache_node (node 0) struct list_head partial kmem_cache_node (node 1) struct list_head partial *node[MAX_NUMNODES] When to add a page to percpu partial page? – Case 2: slab_free() slab object slab object .. slab object Page slab object slab object .. slab object Page page->next slab object slab object .. slab object Page slab object slab object .. slab object Page page->freelist page->freelist slab_free(): Move page(s) to percpu partial and node partial (if possible)
  • 74. kmem_cache __percpu *cpu_slab *node[MAX_NUMNODES] kmem_cache_cpu (CPU #N) struct page *page void **freelist struct page *partial kmem_cache_node (node 0) struct list_head partial kmem_cache_node (node 1) struct list_head partial *node[MAX_NUMNODES] free this object slab object slab object .. slab object Page page->next put_cpu_partial(): Move this page to percpu partial 2 slab object slab object .. slab object Page slab object slab object .. slab object Page slab object slab object .. slab object Page .. slab object slab object .. slab object Page 1 Fully-allocated objects of pages When to add a page to percpu partial page? – Case 2: slab_free() slab_free(): Move page(s) to percpu partial and node partial (if possible) * Detail will be discussed in section “[Cache release/free] kmem_cache_free()”
  • 75. Get a slab object from c->freelist kmem_cache_alloc_node !c->freelist or !c-> page slab_alloc_node(): slowpath #4 – Get a slab object page from page->freelist __slab_alloc kmem_cache_alloc slab_alloc slab_alloc_node NUMA_NO_NODE __kmalloc_node kmalloc_node node_id node_id ___slab_alloc new_slab_objects c->page = slub_percpu_partial(c) c->page && !c->freelist !c->page && c->partial get_partial Get partial page from percpu cache new_slab allocate_slab alloc_slab_page Allocate pages from Buddy system !c->page && !c->partial Get partial page from node cache (kmem_cache_node) Y N fast path slow path get_freelist -> __cmpxchg_double_slab this_cpu_cmpxchg_double() Get a slab object from page->freelist
  • 76. kmem_cache __percpu *cpu_slab *node[MAX_NUMNODES] kmem_cache_cpu (CPU #N) struct page *page void **freelist = NULL struct page *partial kmem_cache_node (node 0) struct list_head partial kmem_cache_node (node 1) struct list_head partial *node[MAX_NUMNODES] slowpath #4: Get a slab object from page->freelist slab object slab object .. slab object Page page->next slab object slab object .. slab object Page page->freelist page->freelist page descriptor address = virtual memory map = vmemmap_base: page #5 of Decompressed vmlinux: linux kernel initialization from page table configuration perspective
  • 77. kmem_cache __percpu *cpu_slab *node[MAX_NUMNODES] kmem_cache_cpu (CPU #N) struct page *page void **freelist = NULL struct page *partial kmem_cache_node (node 0) struct list_head partial kmem_cache_node (node 1) struct list_head partial *node[MAX_NUMNODES] slowpath #4: Get a slab object from page->freelist slab object slab object .. slab object Page page->next slab object slab object .. slab object Page page->freelist page->freelist page = c->page = slub_percpu_partial(c) 1 slub_set_percpu_partial(c, page) 2
  • 78. kmem_cache __percpu *cpu_slab *node[MAX_NUMNODES] kmem_cache_cpu (CPU #N) struct page *page void **freelist = NULL struct page *partial slowpath #4: Get a slab object from page->freelist slab object slab object .. slab object Page page->next slab object slab object .. slab object Page page->freelist page->freelist
  • 79. kmem_cache __percpu *cpu_slab *node[MAX_NUMNODES] kmem_cache_cpu (CPU #N) struct page *page void **freelist struct page *partial slowpath #4: Get a slab object from page->freelist slab object slab object .. slab object Page slab object slab object .. slab object Page page->freelist page->freelist = NULL slab object 1 2
  • 80. /proc/slabinfo * active_objs = num_objs - free_objs
  • 82. Return the slab object to c->freelist kmem_cache_free page == c->page? kmem_cache_free() __slab_free cmpxchg_double_slab N fast path: percpu cache slow path: node cache this_cpu_cmpxchg_double() slab_free do_slab_free put_cpu_partial add_partial remove_partial discard_slab __free_slab __free_pages Return the slab object to page->freelist Slowpath#3: Slab objects in page slab are all free: free to Buddy system Y Slowpath#1: Move page(s) to percpu partial or node partial (if possible) Slowpath#2: Move pages to node partial
  • 83. kmem_cache __percpu *cpu_slab *node[MAX_NUMNODES] kmem_cache_cpu (CPU #N) struct page *page void **freelist struct page *partial kmem_cache_node (node 0) struct list_head partial kmem_cache_node (node 1) struct list_head partial *node[MAX_NUMNODES] slowpath #1:put_cpu_partial() – Case 1: Move page(s) to percpu partial slab object slab object .. slab object Page slab object slab object .. slab object Page slab object slab object .. slab object Page .. slab object slab object .. slab object Page Fully-allocated objects of pages free this object slab object slab object .. slab object Page page->next put_cpu_partial(): Move this page to percpu partial 1 2 __slab_free cmpxchg_double_slab slow path: node cache put_cpu_partial add_partial remove_partial discard_slab __free_slab __free_pages Return the slab object to page->freelist Slowpath#3: Slab objects in page slab are all free: free to Buddy system Slowpath#1: Move page(s) to percpu partial and node partial (if possible) Slowpath#2: Move pages to node partial
  • 84. kmem_cache __percpu *cpu_slab *node[MAX_NUMNODES] kmem_cache_cpu (CPU #N) struct page *page void **freelist struct page *partial kmem_cache_node (node 0) struct list_head partial kmem_cache_node (node 1) struct list_head partial *node[MAX_NUMNODES] slowpath #1:put_cpu_partial() – Case 2 (1/2) slab object slab object .. slab object Page slab object slab object .. slab object Page slab object slab object .. slab object Page page->next free this object if available objects > kmem_cache->cpu_partial && want_to_drain call unfreeze_partials() to move percpu partial pages to node partial 2 slab object slab object .. slab object Page slab object slab object .. slab object Page slab object slab object .. slab object Page .. slab object slab object .. slab object Page 1 Fully-allocated objects of pages
  • 85. kmem_cache __percpu *cpu_slab *node[MAX_NUMNODES] kmem_cache_cpu (CPU #N) struct page *page void **freelist struct page *partial kmem_cache_node (node 0) struct list_head partial kmem_cache_node (node 1) struct list_head partial *node[MAX_NUMNODES] slowpath #1:put_cpu_partial() – Case 2 (2/2) slab object slab object .. slab object Page slab object slab object .. slab object Page slab object slab object .. slab object Page free this object slab object slab object .. slab object Page page->next put_cpu_partial(): Move this page to percpu partial 2 3 if available objects > kmem_cache->cpu_partial && want_to_drain call unfreeze_partials() to move percpu partial pages to node partial slab object slab object .. slab object Page slab object slab object .. slab object Page slab object slab object .. slab object Page .. slab object slab object .. slab object Page 1 Fully-allocated objects of pages
  • 86. kmem_cache __percpu *cpu_slab *node[MAX_NUMNODES] kmem_cache_cpu (CPU #N) struct page *page void **freelist struct page *partial kmem_cache_node (node 0) struct list_head partial kmem_cache_node (node 1) struct list_head partial *node[MAX_NUMNODES] slowpath #2: add_partial() – Move a page to node partial free this object slab object slab object .. slab object Page  CONFIG_SLUB_DEBUG=y && CONFIG_SLUB_DEBUG_ON=y && !page->freelist (in percpu cache or page is full of objects) o add_partial(): Move this page to node partial 2 slab object slab object .. slab object Page slab object slab object .. slab object Page slab object slab object .. slab object Page .. slab object slab object .. slab object Page 1 Fully-allocated objects of pages
  • 87. kmem_cache __percpu *cpu_slab *node[MAX_NUMNODES] kmem_cache_cpu (CPU #N) struct page *page void **freelist struct page *partial kmem_cache_node (node 0) struct list_head partial kmem_cache_node (node 1) struct list_head partial *node[MAX_NUMNODES] slowpath #3: remove_partial() and discard_slab() Fully-allocated objects of pages free this object  All objects in page(s) are available (free) && kmem_cache_node->nr_partial >= kmem_cache->min_partial o remove_partial() 1 slab object slab object .. slab object Page slab object slab object .. slab object Page slab object slab object .. slab object Page 2 Buddy system free page(s): discard_slab -> free_slab -> __free_slab -> __free_pages 3 slab object slab object .. slab object Page slab object slab object .. slab object Page slab object slab object .. slab object Page .. slab object slab object .. slab object Page
  • 88. kmalloc & slab • Call path • Compound page
  • 89. kmalloc & slab struct kmem_cache *kmalloc_caches[NR_KMALLOC_TYPES][KMALLOC_SHIFT_HIGH + 1] struct kmem_cache *kmalloc_caches[KMALLOC_NORMAL][] kmem_cache __percpu *cpu_slab *node[MAX_NUMNODES] kmem_cache __percpu *cpu_slab *node[MAX_NUMNODES] kmem_cache __percpu *cpu_slab *node[MAX_NUMNODES] kmem_cache __percpu *cpu_slab *node[MAX_NUMNODES] kmem_cache __percpu *cpu_slab *node[MAX_NUMNODES] NULL kmalloc-96 0 1 2 3 4 13 kmalloc-192 kmalloc-8 kmalloc-16 … kmalloc-8192 struct kmem_cache *kmalloc_caches[KMALLOC_RECLAIM][] NULL kmalloc-96 0 1 2 3 4 13 kmalloc-192 kmalloc-8 kmalloc-16 … kmalloc-8192 __GFP_RECLAIMABLE struct kmem_cache *kmalloc_caches[KMALLOC_DMA][] NULL kmalloc-96 0 1 2 3 4 13 kmalloc-192 kmalloc-8 kmalloc-16 … kmalloc-8192 __GFP_DMA Check create_kmalloc_caches() &kmalloc_info
  • 90. kmalloc() & kfree() kmalloc kmem_cache_alloc kmalloc_large kmalloc_order_trace kmalloc_order kmem_cache_alloc_trace size > KMALLOC_MAX_CACHE_SIZE (8KB) size <= KMALLOC_MAX_CACHE_SIZE (8KB) kfree page = virt_to_head_page(x) slab_free compound_order A slab page (check PG_slab flag) Not a slab page alloc_pages __GFP_COMP (Compound page) __free_pages kmalloc_index • Get index based on size • if size < 8  return KMALLOC_SHIFT_LOW (3) • if size == 0  zero allocation
  • 91. Compound page Page (head page) flags |= PG_head Page (Tail page) compound_head compound_dtor compound_order compound_mapcount compound_nr First Tail Page only Page (Tail page) _compound_pad_1 (compound_head) hpage_pinned_refcount deferred_list 2nd Tail Page only . . . Compound Page Page (Tail page) _compound_pad_1 (compound_head) Page (Tail page) _compound_pad_1 (compound_head) Compound page – Use Cases • Mainly used in huge page • kmalloc: allocation size > 8192 bytes o Check kmalloc_order()
  • 92. Compound page Allocation size = 8195 order = 2  4 Pages kmalloc kmem_cache_alloc kmalloc_large kmalloc_order_trace kmalloc_order kmem_cache_alloc_trace size > KMALLOC_MAX_CACHE_SIZE (8KB) size <= KMALLOC_MAX_CACHE_SIZE (8KB) alloc_pages __GFP_COMP (Compound page)
  • 93. Compound page Page (Head page) flags |= PG_head Page (Tail page) compound_head compound_dtor compound_order = 2 compound_mapcount compound_nr = 4 First Tail Page only Page (Tail page) _compound_pad_1 (compound_head) hpage_pinned_refcount deferred_list 2nd Tail Page only Compound Page Page (Tail page) _compound_pad_1 (compound_head) Head Page First Tail Page Second Tail Page
  • 94. Compound page Page (Head page) flags |= PG_head Page (Tail page) compound_head compound_dtor compound_order = 2 compound_mapcount compound_nr = 4 First Tail Page only Page (Tail page) _compound_pad_1 (compound_head) hpage_pinned_refcount deferred_list 2nd Tail Page only Compound Page Page (Tail page) _compound_pad_1 (compound_head) First Tail Page Second Tail Page bit 0: purpose
  • 95. kmalloc kmem_cache_alloc kmalloc_large kmalloc_order_trace kmalloc_order kmem_cache_alloc_trace size > KMALLOC_MAX_CACHE_SIZE (8KB) size <= KMALLOC_MAX_CACHE_SIZE (8KB) alloc_pages __GFP_COMP (Compound page) kmalloc_index • Get index based on size • if size < 8  return KMALLOC_SHIFT_LOW (3) • if size == 0  zero allocation Q1: [SLUB] kmalloc: allocation size = 0, 1, 2, or 4  workable? struct kmem_cache *kmalloc_caches[NR_KMALLOC_TYPES][KMALLOC_SHIFT_HIGH + 1] kmem_cache __percpu *cpu_slab *node[MAX_NUMNODES] kmem_cache __percpu *cpu_slab *node[MAX_NUMNODES] struct kmem_cache *kmalloc_caches[KMALLOC_NORMAL][] NULL kmalloc-96 0 1 2 3 4 13 kmalloc-192 kmalloc-8 kmalloc-16 … kmalloc-8192 minimum kmalloc cache size = 8 bytes • if size < 8  return KMALLOC_SHIFT_LOW (3) • if size == 0  zero allocation
  • 98. Slab Cache Node-based cache: freelist concept is implemented in page struct (not shown) slab object slab object .. slab object Page . . kmem_cache_cpu (CPU #0) struct page *page void **freelist struct page *partial kmem_cache_cpu (CPU #N) struct page *page void **freelist struct page *partial slab object slab object .. slab object Page slab object slab object .. slab object Page kmem_cache_node (Node #0) struct list_head partial kmem_cache_node (Node #N) struct list_head partial . . slab object slab object .. slab object Page slab object slab object .. slab object Page kmem_cache (Slab cache) __percpu *cpu_slab *node[MAX_NUMNODES] percpu cache Node-based cache kmem_cache global variable kmem_cache_node global variable
  • 99. slab object slab object .. slab object Page . . kmem_cache_cpu (CPU #0) struct page *page void **freelist struct page *partial kmem_cache_cpu (CPU #N) struct page *page void **freelist struct page *partial slab object slab object .. slab object Page slab object slab object .. slab object Page kmem_cache_node (Node #0) struct list_head partial kmem_cache_node (Node #N) struct list_head partial . . slab object slab object .. slab object Page slab object slab object .. slab object Page kmem_cache (Slab cache) __percpu *cpu_slab *node[MAX_NUMNODES] percpu cache Node-based cache kmem_cache global variable kmem_cache_node global variable [Slab allocator calls this automatically] kmem_cache_alloc(kmem_cache, ..): Allocate a ‘kmem_cache’ object from global variable ‘kmem_cache’ [Slab allocator calls this automatically] kmem_cache_alloc_node(kmem_cache_node, …): Allocate ‘kmem_cache_node’ objects from global variable ‘kmem_cache_node’ 1 [User calls this function] k_cache = kmem_cache_create(): create/get a new cache 2 3 Who initializes global variables ‘kmem_cache’ and ‘kmem_cache_node’? Slab Allocator Initialization: “Chicken or the egg” problem
  • 100. Slab Allocator Initialization: “Chicken or the egg” problem slab object slab object .. slab object Page . . kmem_cache_cpu (CPU #0) struct page *page void **freelist struct page *partial kmem_cache_cpu (CPU #N) struct page *page void **freelist struct page *partial slab object slab object .. slab object Page slab object slab object .. slab object Page kmem_cache_node (Node #0) struct list_head partial kmem_cache_node (Node #N) struct list_head partial . . slab object slab object .. slab object Page slab object slab object .. slab object Page kmem_cache (Slab cache) __percpu *cpu_slab *node[MAX_NUMNODES] percpu cache Node-based cache kmem_cache global variable kmem_cache_node global variable [Slab allocator calls this automatically] kmem_cache_alloc(kmem_cache, ..): Allocate a ‘kmem_cache’ object from global variable ‘kmem_cache’ [Slab allocator calls this automatically] kmem_cache_alloc_node(kmem_cache_node, …): Allocate ‘kmem_cache_node’ objects from global variable ‘kmem_cache_node’ 1 [User calls this function] k_cache = kmem_cache_create(): create/get a new cache 2 3 Slab allocator: statically initialized ‘kmem_cache_boot’ Slob allocator: statically initialized ‘kmem_cache_boot’ Slub allocator: static local variables ‘boot_kmem_cache’ and ‘boot_kmem_cache_node’
  • 101. Slab Allocator Initialization: “Chicken or the egg” problem . . kmem_cache_cpu (CPU #0) struct page *page void **freelist struct page *partial kmem_cache_node (Node #0) struct list_head partial . . kmem_cache = NULL (Slab cache) __percpu *cpu_slab *node[MAX_NUMNODES] percpu cache Node-based cache [Slab Cache] struct kmem_cache *kmem_cache = NULL . . kmem_cache_cpu (CPU #0) struct page *page void **freelist struct page *partial kmem_cache_node (Node #0) struct list_head partial . . kmem_cache = NULL (Slab cache) __percpu *cpu_slab *node[MAX_NUMNODES] percpu cache Node-based cache [Slab Cache] struct kmem_cache *kmem_cache_node = NULL kmem_cache_cpu (CPU #0) struct page *page void **freelist struct page *partial kmem_cache_node (Node #0) struct list_head partial kmem_cache (Slab cache) __percpu *cpu_slab *node[MAX_NUMNODES] percpu cache Node-based cache [Slab Cache] struct kmem_cache *sighand_cachep . . . . 1 2 3 [Cannot work] Allocate from ‘kmem_cache_node’ slab cache

Editor's Notes

  1. Wedson Almeida Filho
  2. Wedson Almeida Filho
  3. Wedson Almeida Filho
  4. Wedson Almeida Filho
  5. Wedson Almeida Filho
  6. Wedson Almeida Filho
  7. Wedson Almeida Filho
  8. Wedson Almeida Filho
  9. Wedson Almeida Filho
  10. Wedson Almeida Filho
  11. Wedson Almeida Filho
  12. Wedson Almeida Filho
  13. Wedson Almeida Filho
  14. Wedson Almeida Filho
  15. Wedson Almeida Filho