VM and IO Topics in Linux


Published on

Brief introduction to Linux memory management, focus on page reclamation. Swap and IO architecture are also mentioned.

Published in: Technology
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

VM and IO Topics in Linux

  1. 1. VM and I/O Topics in Linux Page Replacement, Swap and I/O Jiannan Ouyang Ph.D. Student Computer Science Department University of Pittsburgh 05/05/2011
  2. 2. Outline • Overview of Linux Memory Management • Page Reclamation • Swap & I/OJiannan Ouyang, CS PhD@PITT 2
  3. 3. Describing Physical Memory Node: NUMA memory region Zone: memory type Struct Page: page frameJiannan Ouyang, CS PhD@PITT 3
  4. 4. Physical Page Allocation Binary Buddy Allocator: • If a block of the desired size is not available, a large block is broken up in half, and the two blocks are buddies to each other. One half is used for the allocation, and the other is free. The blocks are continuously halved as necessary until a block of the desired size is available. • When a block is later freed, the buddy is examined, and the two are coalesced if it is free.Jiannan Ouyang, CS PhD@PITT 4
  5. 5. Page Table Management • Three Level MappingJiannan Ouyang, CS PhD@PITT 5
  6. 6. Kernel Memory Mapping display memory device memory 896-MB 0xC0000000 4-GB0x3FFFFFFF 1-GB 896-MB0x00000000 0x00000000 Physical memory Jiannan Ouyang, CS PhD@PITT Virtual Memory 6
  7. 7. User Memory Mapping kernel space stack stack mappings text data user space 3-GB physical memory data textJiannan Ouyang, CS PhD@PITT virtual memory 7
  8. 8. User Memory Mapping virtual memory virtual memory physical memory kernel kernel space space stack stack stack data data user space stack user space data text data text textJiannan Ouyang, CS PhD@PITT 8
  9. 9. Outline • Overview of Linux Memory Management • Page Reclamation • Swap & I/OJiannan Ouyang, CS PhD@PITT 9
  10. 10. Memory Customers Kernel Code & data Request Slab Cache Buddy Icache & dcache System Reclaim User Code & Data Page Cache • All memory except “User Code & data” are used by the kernel • “User Code & Data” are managed in user space, i.e. malloc/free, kernel can only swap out user pagesJiannan Ouyang, CS PhD@PITT 10
  11. 11. Slab Cache • Cache for commonly used objects kept in an initialized state available for use by the kernel. • Save time of allocating, initializing and freeing the same object.Jiannan Ouyang, CS PhD@PITT 11
  12. 12. Disk related caches • Dcache (metadata): dentry objects representing filesystem pathnames. • Icache (metadata): inode objects representing disk inodes. • Page Cache (data): data pages from disk, main disk cache usedJiannan Ouyang, CS PhD@PITT 12
  13. 13. Memory Customers Review Kernel Code & data Request Slab Cache Buddy Icache & dcache System Reclaim User Code & Data Page Cache We’ll see when will the kernel start reclaim pages, which pages to reclaim, and the replacement policy.Jiannan Ouyang, CS PhD@PITT 13
  14. 14. Reclamation: When? Zone Watermarks • Pages Low: kswapd is woken up by the buddy allocator to start freeing pages. The value is twice the value of pages min by default. • Pages Min: the allocator will do the kswapd work in a synchronous fashion, sometimes referred to as the direct-reclaim path. • Pages High: kswapd will go back to sleep. The default for pages high is three times the value of pages min.Jiannan Ouyang, CS PhD@PITT 14
  15. 15. Jiannan Ouyang, CS PhD@PITT 15
  16. 16. Reclamation: Which?Jiannan Ouyang, CS PhD@PITT 16
  17. 17. Reclamation: Which? (Con.) • Mapped & Anonymous Pages – Mapped: backed up by a file – Anonymous: anonymous memory region of a process • Shared & Non-shared Pages – Unmapping from all page table entries at once: reverse mapping, important improvement in Linux 2.6 KernelJiannan Ouyang, CS PhD@PITT 17
  18. 18. Reclamation: Which? (Con.) shrink_caches until given target number of pages is met, 1. slab cache (Kmem_cache_reap) 2. User pages & page cache (refill & shrink_cache) 3. dcache and icacheJiannan Ouyang, CS PhD@PITT 18
  19. 19. Replacement Policy (active, ref) = {11,10, 01, 00} access Ref=1, clear active active=1 access Ref=0 inactive active=0 reclaimJiannan Ouyang, CS PhD@PITT 19
  20. 20. Moving pages across the list mark_page_accessed( ): on each access increase the (active, ref) counter; if active=1 move inactive->active; Refill_inactive_zone(): if (ref=1) {ref=0; move to head of active list;} else {move active -> inactive;}Jiannan Ouyang, CS PhD@PITT 20
  21. 21. Outline • Overview of Linux Memory Management • Page Reclamation • Swap & I/OJiannan Ouyang, CS PhD@PITT 21
  22. 22. Swap • Able to reclaim all the page frames obtained by a process, and not only those have an image on disk – anonymous pages (User stack or heap) – Dirty pages that belong to a private memory mapping of a process – IPC shared pagesJiannan Ouyang, CS PhD@PITT 22
  23. 23. Swap (Con.) • Set up “swap areas” on disk • allocating and freeing “page slots” in swap areas • Provide functions both to “swap out” pages from RAM into a swap area and to “swap in” pages from a swap area into RAM. • Mark Page Table entries to keep track of the positions of data in the swap areas.Jiannan Ouyang, CS PhD@PITT 23
  24. 24. Example While(1){ p = malloc(N); memset(p, 0, N); //demand paging } total used free shared buffers cached Mem: 2013 1811 201 0 157 872 -/+ buffers/cache: 782 1231 Swap: 397 0 397 $free -m total used free shared buffers cached Mem: 2013 1956(+) 56(-) 0 4(-) 109(-) -/+ buffers/cache: 1842(+) 170(-) Swap: 397 8 389Jiannan Ouyang, CS PhD@PITT 24
  25. 25. Linux I/O Architecture • Default file I/O API, fwrite(), are buffered • File System: (dir, name, offset) -> LBA • Device File: not normal file • How to do bypassing?Jiannan Ouyang, CS PhD@PITT 25
  26. 26. I/O Bypassing • Disk Cache – O_DIRECT • File System – Device file • I/O Scheduler – To be solvedJiannan Ouyang, CS PhD@PITT 26
  27. 27. Thanks Q&AJiannan Ouyang, CS PhD@PITT 27
  28. 28. Reference • Understanding the Linux Kernel, 3rd • Understanding the Linux Virtual Memory ManagerJiannan Ouyang, CS PhD@PITT 28
  29. 29. BACKUP SLICESJiannan Ouyang, CS PhD@PITT 29
  30. 30. Page Table Management • Three Level MappingJiannan Ouyang, CS PhD@PITT 30
  31. 31. Page Table Management (Con.) PGD Address Linear Address Physical Address MMUJiannan Ouyang, CS PhD@PITT 31