06/21/11 UPDATES Isa Ansharullah
06/21/11 PROCESS in LINUX An executing program Described in  task_struct  structure that is stored in a circular linked list. A process descriptor  or Process Control  Block (PCB) Created via Slab  Allocator Linux Kernel Development, Robert Love p.25
06/21/11 <include/sched.h>  struct task_struct  1.6~2KB size (x86 – varies per arch) Contains : pid_t PID Open files Process’ address space ( struct mm_struct *mm ) Process’ state (Waiting, Running, Ready) Process’ stack address ( void* stack ) Others (parent’s PID,
06/21/11 PROCESS’ STACKS Each process has two kind of stacks : User-space stack :  This can expand Kernel stack :  This is FIXED size Mode switch : Switch from user stack to kernel stack by System call or Exception handlers. Context switch : Suspending the progress of one process, switching to another process  during kernel mode
06/21/11 Process’ Kernel Stack (1) It is stored in kernel area of physical memory: physically contiguous, non-swappable Make it as small as possible, fixed size to prevent fragmentation & hazard (if expanding)
06/21/11 Process’ Kernel Stack (2) Professional Linux Kernel Architecture, Wolfgang Mauer p.71 task_struct  is  referenced via  thread_info   structure at the bottom of kernel stack (to provide fast access)
06/21/11 PROCESS DUPLICATION There are actually 3 approaches : fork()  : Heavy-weight call (copy entirely) Allow Copy-on-write  vfork()  : Light-weight call (shares resources) Since fork() implement COW, this has no meaning  clone()  : Allow to choose which to share Fork in Linux is implemented via clone() clone() takes flags of which resources should be shared
06/21/11 Copy-on-write Usually, after forking, the child will call  exec()  that will replace resources copied from parent This is inefficient Copy-on-write: The child will have the copy of the resources only if the shared data is written into (by either parent or child)
06/21/11 FORKING PROCESS fork() clone() do_fork() defined in <kernel/fork.c> architecture-independent copy_process() The actual work of duplicating process Takes several flags about resources sharing, etc. If new child is returned successfully, The new child is woken up and run.  In the common case, child will call exec() immediately, thus no overhead cost on copying. Thanks to Copy-on-write .
06/21/11 <kernel/fork.c >  copy_process() Creates a new kernel stack,  task_struct, thread_info struct- tures similar to its parent’s Copy/Share resources Professional Linux Kernel Architecture, p73 Professional Linux Kernel Architecture, p68
06/21/11 PROCESS vs THREAD In Linux THREAD is treated as PROCESS THREAD = PROCESS THREAD Process that shares resources   with its parent clone ( CLONE_VM | CLONE_FS | CLONE_FILES | CLONE_SIGHAND, 0)
06/21/11 Questions (06/13/11) How does user-space stack allocated? created during fork process, see kernel/fork.c, dup_mmap (line 288) What are namespaces? still no clue, the description is quite complex, some clue : Professional linux kernel arch, p.47 <linux_source>/Documentation/unshare.txt
06/21/11 Allocating Process Descriptors ( task_struct ) Where are PDs’ stored? Inside the kernel’s address space, in a task list, with init_task created on boot (init process descriptor) How are task_struct structures allocated? By using the  SLAB ALLOCATOR Understanding Linux Kernel (3rd Edition)
06/21/11 Slab allocator & Buddy Allocator Linux uses both of the allocators Buddy allocator manages physical memory in pages (8KB)  Slab allocator is to enhance memory allocation of small, frequently-used data structures (< sizeof page) task_struct Takuo Watanabe, Operating System Lecture slide, “Buddy System”
The Slab Allocator Memory allocation of  kernel objects Retaining allocated memory that contains a  data objects of certain type  for  reuse 07/04/11 Proposed by Jeff Bonwick (Sun Micro), read  The Slab Allocator: An Object-Caching Kernel Memory Allocator  (google it!) Kernel objects : inode structure, task_struct, vm_area_struct etc..
06/21/11 Not only  task_struct ... ( from kernel/fork.c ) many structures are also allocated using the slab allocator
Basis The  initialization and destruction  of objects can outweigh the cost of allocating them Object caching  is used to mitigate ( 緩和)  the overhead cost of initializing objects Also to avoid  internal fragmentation  of memory (i.e.  memory allocated but not used, happens in  Buddy Allocator ) 07/04/11
Overview 07/04/11 The memory is organized in caches ,  one cache for each object type.  (e.g. inode_cache, dentry_cache, buffer_head, vm_area_struct)  .  Each cache consists out of many slabs  (they are small (usually one   page long) and always contiguous), and  each slab contains multiple   initialized objects. From  linux/mm/slab.c   header comment :
include/linux/slab_def.h |  struct kmem_cache 07/04/11 struct kmem_cache { struct array_cache *array[NR_CPUS]; unsigned int batchcount; unsigned int limit; unsigned int shared; unsigned int buffer_size; u32 reciprocal_buffer_size; unsigned int flags; unsigned int num; unsigned int gfporder; gfp_t gfpflags; size_t colour; unsigned int colour_off; struct kmem_cache *slabp_cache; unsigned int slab_size; unsigned int dflags; void (*ctor) (void *obj); const char *name; struct list_head_next; struct kmem_list3 *nodelists[MAX_NUMNODES]; }
06/21/11 /proc/slabinfo  - seeing the caches from : isa’s personal VPS @ webbynode.com
06/21/11 From allocating to freeing (1) Allocating task_struct copy_process() calls  dup_task_struct(current),  this function instructs slab allocator to allocate an instance of task_struct (also thread_info), a direct copy of  current task,  the parent process.
06/21/11 kernel/fork.c |  dup_task_struct() ( continues.. )
06/21/11 From allocating to freeing (2) Freeing task_struct When the process calls  exit()  syscall, the process do the following : after all objects associated with the process (address space, open files..) is freed, the process enters zombie state (exit_state = EXIT_ZOMBIE) inform the parent that its life has ended returns task_struct via release_task(), which calls put_task_struct() to its slab cache.
06/21/11 Slob & Slub allocators  (not yet covered) Slob allocator : List of blocks, optimized for large-scale system Slub allocator : Optimized for embedded system
This week’s Updates The Buddy Allocator
The Buddy Allocator  (mm/page_alloc.c) “ Page frame” (physical page) memory management All allocations must go through this system Implemented to prevent  external fragmentation  of memory : Free spaces become divided into small fragments, scatters around here and there
Basics All  free page  frames are grouped into lists Each list contain 2 order -sized contiguous page frames ( alloc_pages(gfp_mask, order) ) There are 11 lists :  1, 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024 page-frame list ( from order 0…10)
cat /proc/buddyinfo order Showing available memory blocks in each zones 0 1 2 3 4 5 6 7 8 9 10
Example| Allocating 256 page (1MB) Look into 256 page-frame list, if available, allocate. If not, look into next larger block, 512 page-frame list If exist, divide it, allocate 256 page, the remaining 256 page-frame goes to 256 page-frame list If not, look into next larger block, 1024 page-frame list If exist, allocate 256 page, move the remaining 512 page-frame to 512 page-frame list, and the remaining 256 page-frame to 256 page-frame list If not, the algorithm gives out error (1024 is the largest block already)
Freeing page The kernel attempts to merge pairs of free  buddy blocks  of size b together into a single block of size 2b, to blocks are considered buddy if : Both have the same size b They are located in  contiguous physical address  (neighbors) The algorithm iterates until it becomes the biggest block (1024 block), or find non-free neighboring block
Disadvantage Happens to create  internal fragmentation,  having to allocate a block of memory even though the required size is less than that E.g. To allocate 275 page, 512 page is used, wasting 237 page. This lost can be minimized using Slab Allocator (explained)
Process’s Address Space Defined by mm_struct structure Pointer to mm_struct is in every process descriptor Can be shared among processes (thus creating what we call threads) Is shared with its parent before Copy-on-Write Consist of contiguous virtual memory blocks
/bin/gonzo’s address space http://duartes.org/gustavo/blog/post/how-the-kernel-manages-your-memory
pmap <pid> Linux kernel Development, Robert Love p.314 library ELF library

Updates

  • 1.
  • 2.
    06/21/11 PROCESS inLINUX An executing program Described in task_struct structure that is stored in a circular linked list. A process descriptor or Process Control Block (PCB) Created via Slab Allocator Linux Kernel Development, Robert Love p.25
  • 3.
    06/21/11 <include/sched.h> struct task_struct 1.6~2KB size (x86 – varies per arch) Contains : pid_t PID Open files Process’ address space ( struct mm_struct *mm ) Process’ state (Waiting, Running, Ready) Process’ stack address ( void* stack ) Others (parent’s PID,
  • 4.
    06/21/11 PROCESS’ STACKSEach process has two kind of stacks : User-space stack : This can expand Kernel stack : This is FIXED size Mode switch : Switch from user stack to kernel stack by System call or Exception handlers. Context switch : Suspending the progress of one process, switching to another process during kernel mode
  • 5.
    06/21/11 Process’ KernelStack (1) It is stored in kernel area of physical memory: physically contiguous, non-swappable Make it as small as possible, fixed size to prevent fragmentation & hazard (if expanding)
  • 6.
    06/21/11 Process’ KernelStack (2) Professional Linux Kernel Architecture, Wolfgang Mauer p.71 task_struct is referenced via thread_info structure at the bottom of kernel stack (to provide fast access)
  • 7.
    06/21/11 PROCESS DUPLICATIONThere are actually 3 approaches : fork() : Heavy-weight call (copy entirely) Allow Copy-on-write vfork() : Light-weight call (shares resources) Since fork() implement COW, this has no meaning clone() : Allow to choose which to share Fork in Linux is implemented via clone() clone() takes flags of which resources should be shared
  • 8.
    06/21/11 Copy-on-write Usually,after forking, the child will call exec() that will replace resources copied from parent This is inefficient Copy-on-write: The child will have the copy of the resources only if the shared data is written into (by either parent or child)
  • 9.
    06/21/11 FORKING PROCESSfork() clone() do_fork() defined in <kernel/fork.c> architecture-independent copy_process() The actual work of duplicating process Takes several flags about resources sharing, etc. If new child is returned successfully, The new child is woken up and run. In the common case, child will call exec() immediately, thus no overhead cost on copying. Thanks to Copy-on-write .
  • 10.
    06/21/11 <kernel/fork.c > copy_process() Creates a new kernel stack, task_struct, thread_info struct- tures similar to its parent’s Copy/Share resources Professional Linux Kernel Architecture, p73 Professional Linux Kernel Architecture, p68
  • 11.
    06/21/11 PROCESS vsTHREAD In Linux THREAD is treated as PROCESS THREAD = PROCESS THREAD Process that shares resources with its parent clone ( CLONE_VM | CLONE_FS | CLONE_FILES | CLONE_SIGHAND, 0)
  • 12.
    06/21/11 Questions (06/13/11)How does user-space stack allocated? created during fork process, see kernel/fork.c, dup_mmap (line 288) What are namespaces? still no clue, the description is quite complex, some clue : Professional linux kernel arch, p.47 <linux_source>/Documentation/unshare.txt
  • 13.
    06/21/11 Allocating ProcessDescriptors ( task_struct ) Where are PDs’ stored? Inside the kernel’s address space, in a task list, with init_task created on boot (init process descriptor) How are task_struct structures allocated? By using the SLAB ALLOCATOR Understanding Linux Kernel (3rd Edition)
  • 14.
    06/21/11 Slab allocator& Buddy Allocator Linux uses both of the allocators Buddy allocator manages physical memory in pages (8KB) Slab allocator is to enhance memory allocation of small, frequently-used data structures (< sizeof page) task_struct Takuo Watanabe, Operating System Lecture slide, “Buddy System”
  • 15.
    The Slab AllocatorMemory allocation of kernel objects Retaining allocated memory that contains a data objects of certain type for reuse 07/04/11 Proposed by Jeff Bonwick (Sun Micro), read The Slab Allocator: An Object-Caching Kernel Memory Allocator (google it!) Kernel objects : inode structure, task_struct, vm_area_struct etc..
  • 16.
    06/21/11 Not only task_struct ... ( from kernel/fork.c ) many structures are also allocated using the slab allocator
  • 17.
    Basis The initialization and destruction of objects can outweigh the cost of allocating them Object caching is used to mitigate ( 緩和) the overhead cost of initializing objects Also to avoid internal fragmentation of memory (i.e. memory allocated but not used, happens in Buddy Allocator ) 07/04/11
  • 18.
    Overview 07/04/11 Thememory is organized in caches , one cache for each object type. (e.g. inode_cache, dentry_cache, buffer_head, vm_area_struct) . Each cache consists out of many slabs (they are small (usually one page long) and always contiguous), and each slab contains multiple initialized objects. From linux/mm/slab.c header comment :
  • 19.
    include/linux/slab_def.h | struct kmem_cache 07/04/11 struct kmem_cache { struct array_cache *array[NR_CPUS]; unsigned int batchcount; unsigned int limit; unsigned int shared; unsigned int buffer_size; u32 reciprocal_buffer_size; unsigned int flags; unsigned int num; unsigned int gfporder; gfp_t gfpflags; size_t colour; unsigned int colour_off; struct kmem_cache *slabp_cache; unsigned int slab_size; unsigned int dflags; void (*ctor) (void *obj); const char *name; struct list_head_next; struct kmem_list3 *nodelists[MAX_NUMNODES]; }
  • 20.
    06/21/11 /proc/slabinfo - seeing the caches from : isa’s personal VPS @ webbynode.com
  • 21.
    06/21/11 From allocatingto freeing (1) Allocating task_struct copy_process() calls dup_task_struct(current), this function instructs slab allocator to allocate an instance of task_struct (also thread_info), a direct copy of current task, the parent process.
  • 22.
    06/21/11 kernel/fork.c | dup_task_struct() ( continues.. )
  • 23.
    06/21/11 From allocatingto freeing (2) Freeing task_struct When the process calls exit() syscall, the process do the following : after all objects associated with the process (address space, open files..) is freed, the process enters zombie state (exit_state = EXIT_ZOMBIE) inform the parent that its life has ended returns task_struct via release_task(), which calls put_task_struct() to its slab cache.
  • 24.
    06/21/11 Slob &Slub allocators (not yet covered) Slob allocator : List of blocks, optimized for large-scale system Slub allocator : Optimized for embedded system
  • 25.
    This week’s UpdatesThe Buddy Allocator
  • 26.
    The Buddy Allocator (mm/page_alloc.c) “ Page frame” (physical page) memory management All allocations must go through this system Implemented to prevent external fragmentation of memory : Free spaces become divided into small fragments, scatters around here and there
  • 27.
    Basics All free page frames are grouped into lists Each list contain 2 order -sized contiguous page frames ( alloc_pages(gfp_mask, order) ) There are 11 lists : 1, 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024 page-frame list ( from order 0…10)
  • 28.
    cat /proc/buddyinfo orderShowing available memory blocks in each zones 0 1 2 3 4 5 6 7 8 9 10
  • 29.
    Example| Allocating 256page (1MB) Look into 256 page-frame list, if available, allocate. If not, look into next larger block, 512 page-frame list If exist, divide it, allocate 256 page, the remaining 256 page-frame goes to 256 page-frame list If not, look into next larger block, 1024 page-frame list If exist, allocate 256 page, move the remaining 512 page-frame to 512 page-frame list, and the remaining 256 page-frame to 256 page-frame list If not, the algorithm gives out error (1024 is the largest block already)
  • 30.
    Freeing page Thekernel attempts to merge pairs of free buddy blocks of size b together into a single block of size 2b, to blocks are considered buddy if : Both have the same size b They are located in contiguous physical address (neighbors) The algorithm iterates until it becomes the biggest block (1024 block), or find non-free neighboring block
  • 31.
    Disadvantage Happens tocreate internal fragmentation, having to allocate a block of memory even though the required size is less than that E.g. To allocate 275 page, 512 page is used, wasting 237 page. This lost can be minimized using Slab Allocator (explained)
  • 32.
    Process’s Address SpaceDefined by mm_struct structure Pointer to mm_struct is in every process descriptor Can be shared among processes (thus creating what we call threads) Is shared with its parent before Copy-on-Write Consist of contiguous virtual memory blocks
  • 33.
    /bin/gonzo’s address spacehttp://duartes.org/gustavo/blog/post/how-the-kernel-manages-your-memory
  • 34.
    pmap <pid> Linuxkernel Development, Robert Love p.314 library ELF library