Published on


Published in: Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide


  1. 1. 06/21/11 UPDATES Isa Ansharullah
  2. 2. 06/21/11 PROCESS in LINUX <ul><li>An executing program </li></ul><ul><li>Described in task_struct structure that is stored in a circular linked list. </li></ul><ul><ul><li>A process descriptor </li></ul></ul><ul><ul><li>or Process Control </li></ul></ul><ul><ul><li>Block (PCB) </li></ul></ul><ul><li>Created via Slab </li></ul><ul><li>Allocator </li></ul>Linux Kernel Development, Robert Love p.25
  3. 3. 06/21/11 <include/sched.h> struct task_struct <ul><li>1.6~2KB size (x86 – varies per arch) </li></ul><ul><li>Contains : </li></ul><ul><ul><li>pid_t PID </li></ul></ul><ul><ul><li>Open files </li></ul></ul><ul><ul><li>Process’ address space ( struct mm_struct *mm ) </li></ul></ul><ul><ul><li>Process’ state (Waiting, Running, Ready) </li></ul></ul><ul><ul><li>Process’ stack address ( void* stack ) </li></ul></ul><ul><ul><li>Others (parent’s PID, </li></ul></ul>
  4. 4. 06/21/11 PROCESS’ STACKS <ul><li>Each process has two kind of stacks : </li></ul><ul><ul><li>User-space stack : This can expand </li></ul></ul><ul><ul><li>Kernel stack : This is FIXED size </li></ul></ul><ul><li>Mode switch : Switch from user stack to kernel stack by System call or Exception handlers. </li></ul><ul><li>Context switch : Suspending the progress of one process, switching to another process during kernel mode </li></ul>
  5. 5. 06/21/11 Process’ Kernel Stack (1) <ul><li>It is stored in kernel area of physical memory: physically contiguous, non-swappable </li></ul><ul><ul><li>Make it as small as possible, fixed size to prevent fragmentation & hazard (if expanding) </li></ul></ul>
  6. 6. 06/21/11 Process’ Kernel Stack (2) Professional Linux Kernel Architecture, Wolfgang Mauer p.71 task_struct is referenced via thread_info structure at the bottom of kernel stack (to provide fast access)
  7. 7. 06/21/11 PROCESS DUPLICATION <ul><li>There are actually 3 approaches : </li></ul><ul><ul><li>fork() : Heavy-weight call (copy entirely) </li></ul></ul><ul><ul><ul><li>Allow Copy-on-write </li></ul></ul></ul><ul><ul><li>vfork() : Light-weight call (shares resources) </li></ul></ul><ul><ul><ul><li>Since fork() implement COW, this has no meaning </li></ul></ul></ul><ul><ul><li>clone() : Allow to choose which to share </li></ul></ul><ul><li>Fork in Linux is implemented via clone() </li></ul><ul><ul><li>clone() takes flags of which resources should be shared </li></ul></ul>
  8. 8. 06/21/11 Copy-on-write <ul><li>Usually, after forking, the child will call exec() that will replace resources copied from parent </li></ul><ul><ul><li>This is inefficient </li></ul></ul><ul><li>Copy-on-write: The child will have the copy of the resources only if the shared data is written into (by either parent or child) </li></ul>
  9. 9. 06/21/11 FORKING PROCESS fork() clone() do_fork() defined in <kernel/fork.c> architecture-independent copy_process() The actual work of duplicating process Takes several flags about resources sharing, etc. If new child is returned successfully, The new child is woken up and run. In the common case, child will call exec() immediately, thus no overhead cost on copying. Thanks to Copy-on-write .
  10. 10. 06/21/11 <kernel/fork.c > copy_process() Creates a new kernel stack, task_struct, thread_info struct- tures similar to its parent’s Copy/Share resources Professional Linux Kernel Architecture, p73 Professional Linux Kernel Architecture, p68
  11. 11. 06/21/11 PROCESS vs THREAD <ul><li>In Linux THREAD is treated as PROCESS </li></ul><ul><ul><li>THREAD = PROCESS </li></ul></ul><ul><li>THREAD </li></ul><ul><ul><li>Process that shares resources with its parent </li></ul></ul>clone ( CLONE_VM | CLONE_FS | CLONE_FILES | CLONE_SIGHAND, 0)
  12. 12. 06/21/11 Questions (06/13/11) <ul><li>How does user-space stack allocated? </li></ul><ul><ul><li>created during fork process, see kernel/fork.c, dup_mmap (line 288) </li></ul></ul><ul><li>What are namespaces? </li></ul><ul><ul><li>still no clue, the description is quite complex, some clue : </li></ul></ul><ul><ul><ul><li>Professional linux kernel arch, p.47 </li></ul></ul></ul><ul><ul><ul><li><linux_source>/Documentation/unshare.txt </li></ul></ul></ul>
  13. 13. 06/21/11 Allocating Process Descriptors ( task_struct ) <ul><li>Where are PDs’ stored? </li></ul><ul><ul><li>Inside the kernel’s address space, in a task list, with init_task created on boot (init process descriptor) </li></ul></ul><ul><li>How are task_struct structures allocated? </li></ul><ul><ul><li>By using the SLAB ALLOCATOR </li></ul></ul>Understanding Linux Kernel (3rd Edition)
  14. 14. 06/21/11 Slab allocator & Buddy Allocator <ul><li>Linux uses both of the allocators </li></ul><ul><ul><li>Buddy allocator manages physical memory in pages (8KB) </li></ul></ul><ul><ul><li>Slab allocator is to enhance memory allocation of small, frequently-used data structures (< sizeof page) </li></ul></ul><ul><ul><ul><li>task_struct </li></ul></ul></ul>Takuo Watanabe, Operating System Lecture slide, “Buddy System”
  15. 15. The Slab Allocator <ul><li>Memory allocation of kernel objects </li></ul><ul><li>Retaining allocated memory that contains a data objects of certain type for reuse </li></ul>07/04/11 Proposed by Jeff Bonwick (Sun Micro), read The Slab Allocator: An Object-Caching Kernel Memory Allocator (google it!) Kernel objects : inode structure, task_struct, vm_area_struct etc..
  16. 16. 06/21/11 Not only task_struct ... ( from kernel/fork.c ) many structures are also allocated using the slab allocator
  17. 17. Basis <ul><li>The initialization and destruction of objects can outweigh the cost of allocating them </li></ul><ul><li>Object caching is used to mitigate ( 緩和) the overhead cost of initializing objects </li></ul><ul><li>Also to avoid internal fragmentation of memory (i.e. memory allocated but not used, happens in Buddy Allocator ) </li></ul>07/04/11
  18. 18. Overview 07/04/11 The memory is organized in caches , one cache for each object type. (e.g. inode_cache, dentry_cache, buffer_head, vm_area_struct) . Each cache consists out of many slabs (they are small (usually one page long) and always contiguous), and each slab contains multiple initialized objects. From linux/mm/slab.c header comment :
  19. 19. include/linux/slab_def.h | struct kmem_cache 07/04/11 struct kmem_cache { struct array_cache *array[NR_CPUS]; unsigned int batchcount; unsigned int limit; unsigned int shared; unsigned int buffer_size; u32 reciprocal_buffer_size; unsigned int flags; unsigned int num; unsigned int gfporder; gfp_t gfpflags; size_t colour; unsigned int colour_off; struct kmem_cache *slabp_cache; unsigned int slab_size; unsigned int dflags; void (*ctor) (void *obj); const char *name; struct list_head_next; struct kmem_list3 *nodelists[MAX_NUMNODES]; }
  20. 20. 06/21/11 /proc/slabinfo - seeing the caches from : isa’s personal VPS @ webbynode.com
  21. 21. 06/21/11 From allocating to freeing (1) <ul><li>Allocating task_struct </li></ul><ul><ul><li>copy_process() calls dup_task_struct(current), this function instructs slab allocator to allocate an instance of task_struct (also thread_info), a direct copy of current task, the parent process. </li></ul></ul>
  22. 22. 06/21/11 kernel/fork.c | dup_task_struct() ( continues.. )
  23. 23. 06/21/11 From allocating to freeing (2) <ul><li>Freeing task_struct </li></ul><ul><ul><li>When the process calls exit() syscall, the process do the following : </li></ul></ul><ul><ul><ul><li>after all objects associated with the process (address space, open files..) is freed, the process enters zombie state (exit_state = EXIT_ZOMBIE) </li></ul></ul></ul><ul><ul><ul><li>inform the parent that its life has ended </li></ul></ul></ul><ul><ul><ul><li>returns task_struct via release_task(), which calls put_task_struct() to its slab cache. </li></ul></ul></ul>
  24. 24. 06/21/11 Slob & Slub allocators (not yet covered) <ul><li>Slob allocator : List of blocks, optimized for large-scale system </li></ul><ul><li>Slub allocator : Optimized for embedded system </li></ul>
  25. 25. This week’s Updates <ul><li>The Buddy Allocator </li></ul>
  26. 26. The Buddy Allocator (mm/page_alloc.c) <ul><li>“ Page frame” (physical page) memory management </li></ul><ul><li>All allocations must go through this system </li></ul><ul><li>Implemented to prevent external fragmentation of memory : </li></ul><ul><ul><ul><li>Free spaces become divided into small fragments, scatters around here and there </li></ul></ul></ul>
  27. 27. Basics <ul><li>All free page frames are grouped into lists </li></ul><ul><li>Each list contain 2 order -sized contiguous page frames ( alloc_pages(gfp_mask, order) ) </li></ul><ul><li>There are 11 lists : </li></ul><ul><ul><li>1, 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024 page-frame list ( from order 0…10) </li></ul></ul>
  28. 28. cat /proc/buddyinfo order Showing available memory blocks in each zones 0 1 2 3 4 5 6 7 8 9 10
  29. 29. Example| Allocating 256 page (1MB) <ul><li>Look into 256 page-frame list, if available, allocate. </li></ul><ul><li>If not, look into next larger block, 512 page-frame list </li></ul><ul><ul><li>If exist, divide it, allocate 256 page, the remaining 256 page-frame goes to 256 page-frame list </li></ul></ul><ul><ul><li>If not, look into next larger block, 1024 page-frame list </li></ul></ul><ul><ul><ul><li>If exist, allocate 256 page, move the remaining 512 page-frame to 512 page-frame list, and the remaining 256 page-frame to 256 page-frame list </li></ul></ul></ul><ul><ul><ul><li>If not, the algorithm gives out error (1024 is the largest block already) </li></ul></ul></ul>
  30. 30. Freeing page <ul><li>The kernel attempts to merge pairs of free buddy blocks of size b together into a single block of size 2b, to blocks are considered buddy if : </li></ul><ul><ul><li>Both have the same size b </li></ul></ul><ul><ul><li>They are located in contiguous physical address (neighbors) </li></ul></ul><ul><li>The algorithm iterates until it becomes the biggest block (1024 block), or find non-free neighboring block </li></ul>
  31. 31. Disadvantage <ul><li>Happens to create internal fragmentation, having to allocate a block of memory even though the required size is less than that </li></ul><ul><ul><li>E.g. To allocate 275 page, 512 page is used, wasting 237 page. </li></ul></ul><ul><li>This lost can be minimized using Slab Allocator (explained) </li></ul>
  32. 32. Process’s Address Space <ul><li>Defined by mm_struct structure </li></ul><ul><li>Pointer to mm_struct is in every process descriptor </li></ul><ul><li>Can be shared among processes (thus creating what we call threads) </li></ul><ul><li>Is shared with its parent before Copy-on-Write </li></ul><ul><li>Consist of contiguous virtual memory blocks </li></ul>
  33. 33. /bin/gonzo’s address space http://duartes.org/gustavo/blog/post/how-the-kernel-manages-your-memory
  34. 34. pmap <pid> Linux kernel Development, Robert Love p.314 library ELF library