Memory management in Linux kernel

33,307 views

Published on

Основные темы, затронутые на семинаре:
Задачи и компоненты подсистемы управления памятью;
Аппаратные возможности платформы x86_64;
Как описывается в ядре физическая и виртуальная память;
API подсистемы управления памятью;
Высвобождение ранее занятой памяти;
Инструменты мониторинга;
Memory Cgroups;
Compaction — дефрагментация физической памяти.

Published in: Technology
0 Comments
7 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
33,307
On SlideShare
0
From Embeds
0
Number of Embeds
29,714
Actions
Shares
0
Downloads
41
Comments
0
Likes
7
Embeds 0
No embeds

No notes for slide

Memory management in Linux kernel

  1. 1. 1
  2. 2. 2 Memory management in Linux kernel
  3. 3. 3 Memory management tasks • Physical memory allocator • Physical memory management • Virtual memory allocator • PTE management • Memory allocator for kernel needs
  4. 4. 4 Memory management subsystem • >100K lines • Buddy allocator • Page replacement (“LRU” reclaim model) • PTE management • Slab/slob/slub kernel allocator • Pagecache/writeback/readahead/swap • Cgroup memory controller • Compaction
  5. 5. 5 Hardware • X86_64 • Paging (MMU, TLB, ...) • 4KB, 2MB and 1GB pages • NUMA • 4-level PTE's • Hardware referenced bit
  6. 6. 6 Physical memory description • Node (pg_data_t) • Zone (struct zone) • Page (struct page) $ cat /proc/zoneinfo | grep Node Node 0, zone DMA Node 0, zone DMA32 Node 0, zone Normal Node 1, zone Normal
  7. 7. 7 Virtual memory description• Address space (struct mm_struct) • VM area (struct vm_area_struct) $ cat /proc/self/maps 00400000-0040c000 r-xp 00000000 08:03 2359718 /usr/bin/cat 0060b000-0060c000 r--p 0000b000 08:03 2359718 /usr/bin/cat 0060c000-0060d000 rw-p 0000c000 08:03 2359718 /usr/bin/cat 011a7000-011c8000 rw-p 00000000 00:00 0 [heap] 7f4d072e5000-7f4d0d80e000 r--p 00000000 08:03 2369473 /usr/lib/locale/locale-archive 7f4d0d80e000-7f4d0d9c2000 r-xp 00000000 08:03 2366682 /usr/lib64/libc-2.18.so 7f4d0d9c2000-7f4d0dbc2000 ---p 001b4000 08:03 2366682 /usr/lib64/libc-2.18.so 7f4d0dbc2000-7f4d0dbc6000 r--p 001b4000 08:03 2366682 /usr/lib64/libc-2.18.so ...
  8. 8. 8 File mappings • File mappings (struct address_space) • Radix tree with all resident pages • Pagecache • Major/minor pagefault
  9. 9. 9 Kernel API • __get_free_page() • kmalloc()/kfree() • vmalloc() • ...
  10. 10. 10 Userspace API • pagefault • mmap()/munmap() • brk() • mlock()/munlock() • fadvise(), madvise() • ...
  11. 11. 11 Memory reclaim • Normal/direct reclaim (free pool) • Per-node kswapd • Working set • Memory pressure • File memory vs anonymous memory • Swap • OOM
  12. 12. 12 “LRU” model • 5 double linked lists: inactive file, active file, inactive anon, active anon, unevictable • Referenced flag in struct page_struct flag
  13. 13. 13 List transition rules • mark_page_accessed(): – unreferenced -> referenced – inactive && referenced -> active • shrink_inactive_list(): – if (ptes referenced) • anonymous -> active • referenced -> active • (ptes referenced > 1) -> active (3.2) • (vm_flags & VM_EXEC) -> active (3.2) • set referenced • rotate – else • reclaim • shrink_active_list(): – If referenced • file & VM_EXEC -> rotate – -> inactive
  14. 14. 14 Memory pressure balancing • nr_pages_to_scan = nr_pages/2^priority • priority = [12..0] 1/4096, 1/2048, 1/1024, ... • swappiness • active > inactive
  15. 15. 15 Yasearch-specific problems & solutions • Working set > 1/2 available memory • Memory thrashing • promote_mapped_pages • file_inactive_ratio
  16. 16. 16 Monitoring & tools • top • vmtouch • /proc/vmstat • /proc/buddyinfo • /proc/slabinfo • perf top • oom-message in dmesg
  17. 17. 17 Demonstration
  18. 18. 18 Cgroups • Each cgroup has own LRU lists. • No common LRU (since 3.3)! • Common free pool(s) • Common kswapd thread(s) • Global reclaim vs target reclaim
  19. 19. 19 Memory controller • memory.limit_in_bytes • memory.soft_limit_in_bytes (will be deprecated) • memory.use_hierarchy • ...
  20. 20. 20 Monitoring • memory.usage_in_bytes • memory.max_usage_in_bytes • memory.stat
  21. 21. 21 Accounting • Each page belongs to one cgroup • First accessed - owner • memory.move_charge_at_immigr ate
  22. 22. 22 Yasearch-specific problems & solutions • memory.low_limit_in_bytes • First accessed – owner? mlock()? low_limit? • memory.recharge_on_pgfault
  23. 23. 23 Compaction • Physical pages migration to zone's top • https://lwn.net/Articles/368869 • Broken in 3.3-3.7 • Replacement for lumpy reclaim • Use perf top for problem diagnostics
  24. 24. 24 Спасибо за внимание!

×