Linux memory-management-kamal

643 views

Published on

Presentation is on Linux Memory, how to tune, troubleshoot etc.

Published in: Software
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
643
On SlideShare
0
From Embeds
0
Number of Embeds
11
Actions
Shares
0
Downloads
0
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide

Linux memory-management-kamal

  1. 1. Linux Memory Management Kamal Maiti Sr. Linux System Engineer Amdocs DVCI, Pune, India
  2. 2. AGENDA  Basic concept of computer  Hardware, firmware, driver, software, application  CPU, RAM, How RAM used  Moving Information within Computer  Primary & Other Memory,  Segment of RAM  Memory Mapping, Process Address Space  Page, Frame, Hugepage, MMU etc.  Virtual Memory, PageCache  Memory nodes, zones, lowmem  NUMA  Kernel Memory allocator  Pagefault Handling, Tools, Memory leak, Memory related issues  Hands-on Troubleshooting : sysrq, backtrace analysis, OOM messages investigation etc
  3. 3. BASIC CONCEPTS OF COMPUTER HARDWARE  This model of the typical digital computer is often called the von Neumann computer.  Programs and data are stored in the same memory: primary memory CPU (Central Processing Unit) Input Units Output Units Primary Memory
  4. 4. HARDWARE, FIRMWARE, DRIVER, SOFTWARE, APPLICATION Hardware : All computer devices like - Input, Output devices, Motherboard, mouse, keyboard Firmware : Vendor provided low level codes that interacts with hardware to get the output of instructions passed to device. Driver : On top of firmware, driver is used to interacts with firmware or hardware directly. Software/Application: which interacts with system calls to call kernel and kernel interacts with driver to get the output.
  5. 5. CPU  The three major components of the CPU are: 1. Arithmetic Unit (Computations performed) Accumulator (Results of computations kept here) 2. Control Unit (Has two locations where numbers are kept) Instruction Register (Instruction placed here for analysis) Program Counter (Which instruction will be performed next?) 3. Instruction Decoding Unit (Decodes the instruction)  Motherboard: The place where most of the electronics including the CPU are mounted.
  6. 6. RAM  Commonly known as random access memory, or just RAM  Holds instructions and data needed for programs that are currently running  RAM is usually a volatile type of memory  Contents of RAM are lost when power is turned off
  7. 7. HOW RAM USED ? Memory is used to store:  i) instructions - > to execute a program  ii) data -> When the computer is doing any job, the data that have to be processed are stored in the primary memory. This data may come from an input device like keyboard or from a secondary storage device like a floppy disk.
  8. 8. MOVING INFORMATION WITHIN THE COMPUTER  How do binary numerals move into, out of, and within the computer?  Information is moved about in bytes, or multiple bytes called words.  Words are the fundamental units of information.  The number of bits per word may vary per computer.  A word length for most large IBM computers is 32 bits:
  9. 9. MOVING INFORMATION WITHIN THE COMPUTER …  Bits that compose a word are passed in parallel from place to place.  Ribbon cables:  Consist of several wires, molded together.  One wire for each bit of the word or byte.  Additional wires coordinate the activity of moving information.  Each wire sends information in the form of a voltage pulse.
  10. 10. MOVING INFORMATION WITHIN THE COMPUTER …  Example of sending the word WOW over the ribbon cable  Voltage pulses corresponding to the ASCII codes would pass through the cable.
  11. 11. PRIMARY MEMORY  Primary storage or memory: Where the data & program that are currently in operation or being accessed are stored during use.  Consists of electronic circuits: Extremely fast and expensive.  Two types:  RAM (non-permanent)  Programs and data can be stored here for the computer’s use.  Volatile: All information will be lost once the computer shuts down.  ROM (permanent)  Contents do not change.
  12. 12.  ROM : a transistor [storing video game software, electronic musical instruments]. ROM is mostly used for firmware updates.  EROM : Erasable programmable read-only memory  EEPROM :Electrically Erasable Programmable Read-Only Memory  Cache : Location in RAM where data is stored for a certain amount of time of that it can be reused.  Registers : various flip flop register[RS, D, JK, shift etc] holds information  Swap : External disk is used to accommodate the demand of more RAM. OTHER MEMORY
  13. 13. SEGMENT OF RAM  Low mem, high mem, Normal mem, DMA, DMA32  On a 32-bit architecture[DMA, Normal & HighMem] : the address space range for addressing RAM is: 0x00000000 - 0xffffffff or 4'294'967'295 (4 GB). The user space range: 0x00000000 - 0xbfffffff or 3 GB The kernel space range: 0xc0000000 - 0xffffffff or 1 GB Linux splits the 1GB kernel space into 2 pieces: LOWMEM and HIGHMEM.  On 64 bit machine[DMA, DMA32 & Normal] : Normal memory available beyond 4 GB
  14. 14. MEMORY MAPPING  Linux uses only 4 segments in 32 bit arch:  2 segments (code and data/stack) for KERNEL SPACE from [0xC000 0000] (3 GB) to [0xFFFF FFFF] (4 GB)  2 segments (code and data/stack) for USER SPACE from [0] (0 GB) to [0xBFFF FFFF] (3 GB) See virtual Map : $ pmap <PID> , see stack : $pstack <PID>  Segmentation, Paging [To overcome flaw in segmentation] –  allocating virtual small pages to each process so that they will be fit in RAM with out wasting it.
  15. 15. PROCESS ADDRESS SPACE – 31 BIT ARCH Kernel 0xC0000000 File name, Environment Arguments Stack Bss[Block started by Symbol] _end _bss_start Data _edata _etext Text/code Header 0x84000000 Shared Libs Text/Code Segment: contains the actual code Data: contains global variables BSS: contains uninitialized global variables Heap: dynamic memory Stack: collection of frames/functions Heap Unused Memory 4 GB --> 3 GB --> 0 GB --> Kernel Space User Space
  16. 16. PAGE & FRAME  Paging, Demand Paging, Swapping  Page Tables [64 bit 4, 32 bit 2]: Page Global Directory, Page Upper Directory, Page Middle Directory, Page  Min page size : getconf -a|grep -i page  Life cycle of page: active----> inactive list --> dirty --> clean
  17. 17. SWAP, HUGE PAGE, MMU,TLB  SWAP : All pages can’t be fit in RAM, need to call/send data from and to storage disk  Hugepage : default page is 4MB but large program uses chunks of memory area. Hence, allow large page. [sysctl -a|grep -i huge]  MMU/TLB : Responsible for translating logical address to physical address. TLB is buffer that is used by MMU.  Active/Inactive regions [cat /proc/meminfo]  Shmem : shared memory area[ipcs -m]  Buddyinfo : view memory fragmentation/ allocation[cat /proc/buddyinfo]  Cache : For speeding up, sync to flush out and forcefully write on disk, bdflush does at background [flush-253:0 in rhel 6] buffer's policy is first-in, first-out cache's policy is Least Recently Used[LRU] [$ vmstat -S M 1]
  18. 18. VIRTUAL MEMORY, HOW PROGRAM MAPS?  Executable text  Executable data  Heap space  Stack  Get exact required memory by process :  $ pmap -x <pid>,  $cat /proc/<pid>/status
  19. 19. PAGE CACHE MEMORY CONTROL  vm.dirty_expire_centisecs=2000  vm.dirty_writeback_centisecs=400 //how long they’ll wait  vm.dirty_background_ratio=5 // when percentage of total RAM filled, pdflush/flush daemon will start write dirty data on disk  vm.dirty_ratio=20 //when percentage of total RAM filled, process will start write data on disk  vfs_cache_pressure [100] : controls the tendency of the kernel to reclaim the memory which is used for caching of directory and inode objects  Swappiness[60] : controls how kernel will use swap space.  To free pagecache: To free pagecache: echo 1 > /proc/sys/vm/drop_caches To free dentries and inodes : echo 2 > /proc/sys/vm/drop_caches To free pagecache, dentries and inodes: echo 3 > /proc/sys/vm/drop_caches  cache writes done by : kernel thread pdflush/bdflush, now in rhel 6 it is flush.  Life cycle of pages : active---->inactive list -->dirty > clean Link : https://www.kernel.org/doc/Documentation/sysctl/vm.txt
  20. 20. PHYSICAL MEMORY ALLOCATION LIMIT  CommitLimit : total mem to be allocated based on ovcercommit_ratio  Committed_AS : currently allocated  overcommit_memory : from 0 to 2 << Start from here 0 = allow available memory on the system to be overloaded //default 1 = no memory over commit handling 2 = allocate best on overcommit_ratio // allocate best on condition  Overcommit_ratio: % of RAM when overcommit_memory is set 2, default value 50 Example : 4 GB RAM, 2 GB Swap, overcommit_memory=2, Overcommit_ratio=50 , so commitLimit = 2+ (4*50/100)=2+2= 4 GB Issue : Application failed to start due to shortage of memory, Needed to disable
  21. 21. WHY MEMORY CACHE IS REALLY REQUIRED Speed up processing :  $ cat > XYZ  $ echo 3 > /proc/sys/vm/drop_caches  $ time cat XYZ //much time  $ time cat XYZ //less time
  22. 22. MEMORY NODES, ZONES IN 32 BIT & 64 BIT  Below zones are in 32 bits :  Zone_DMA (0-16MB)  Zone_Normal (16MB-896MB)  ZONE_HIGH_MEM (896MB-above) HIGHMEM's lower zone is NORMAL+DMA , NORMAL's lower zone is DMA.  Below zones are in 64 bits :  Normal : Beyond 4 GB  DMA : till 16 MB  DMA32 : till 4GB  $ cat /proc/zoneinfo  $ cat /proc/pagetypeinfo  $cat /proc/<pid>/numa_maps  $ cat /proc/buddyinfo
  23. 23. LOW MEMORY, ZONE_RECLAIM  "lowmem" often means NORMAL+DMA  “lowmem” is not present in RHEL 6, 64bit  Reservation is controlled by : lowmem_reserve_ratio [DMA NORMAL HIGMEM]  cat /proc/sys/vm/lowmem_reserve_ratio 256 256 32 // (1/256)*100 % = 0.39% of nearset zone is reserved  zone_reclaim_mode: How more or less aggressive approaches to reclaim memory when a zone runs out of memory 1 = Zone reclaim on 2 = Zone reclaim writes dirty pages out 4 = Zone reclaim swaps pages
  24. 24. NON-UNIFORM MEMORY ACCESS(NUMA)  Numa concept : Numa Placement – placement of processor & Memory, manual – application, MPI(Message Passing Interface)  Place application in correct node  Two memory policy – Node Local[after linux boot], Interleave [during kernel boot]  cat /proc/<pid>/numa_maps  numactl -s //show policy  numactl –hardware  numactl [ --interleave nodes ] [ --preferred node ] [ --membind nodes ] [ --cpunodebind nodes ] [ --physcpubind cpus ] [ -- localalloc ] [--] command {arguments ...} Ref : http://www.redhat.com/summit/2012/pdf/2012-DevDay-Lab-NUMA-Hacker.pdf
  25. 25. NUMA MANAGEMENT  numactl --physcpubind=0,1,2,3 example_process  numactl --physcpubind=0-3 example_process  numactl --cpunodebind=2 example_process //run on this cpu  numactl --physcpubind=0 --localalloc example_process  numactl --membind=4 example_process  numactl --cpunodebind=0 example_process //Only execute command on the CPUs of 0  numactl --cpubind=0 --membind=0,1 process // Run process on node 0 with memory allocated on node 0 and 1  numactl –hardware  cat /sys/devices/system/node/node*/numastat  Allocation : $watch -n1 numastat
  26. 26. KERNEL MEMORY ALLOCATORS  Low-level page allocator :  Buddy system for contiguous multi-page allocations  Provides pages for  in-kernel allocations (slab cache)  vmalloc areas (kernel modules, multi-page data areas)  page cache, anonymous user pages  misc. other users  Slab cache :  Manages allocations of objects of the same type  Large-scale users: inodes, dentries, block I/O, network ...  kmalloc (generic allocator) implemented on top  Tool : slabtop
  27. 27. PAGE FAULT HANDLING  Hardware support :  Accessing invalid pages causes 'page translation' check  Writing to protected pages causes 'protection exception'  Translation-exception identification provides address  'Suppression on protection' facility essential!  Linux kernel page fault handler :  Determine address/access validity according to VMA  Invalid accesses cause SIGSEGV delivery  Valid accesses trigger: page-in, swap-in, copy-on-write  Extra support for stack VMA: grows automatically  Out-of-memory if overcommitted causes SIGBUS
  28. 28. TOOLS TO CHECK MEMORY USAGE  Report paging statistics : sar -B  Report memory utilization statistics : sar –r  Report memory statistics : sar –R  Report swap space utilization statistics: sar –S  Current memory usage :  free –m|k|g  Cat /proc/meminfo  Memory allocation :  cat /proc/buddyinfo  VM memory allocation:  pmap -x <PID>  Cat /proc/<pid>/status  Display kernel slab cache & memory information in real time:  slabtop  vmstat  ps  top  cat /proc/meminfo  strace, gcore
  29. 29. MEMORY LEAK CHECK  Usage check : historical sar report  mtrace : builtin c function.  Valgrind :  valgrind --tool=memcheck --leak-check=full --show-reachable=yes snmpd -f –Lo
  30. 30. ISSUES RELATED TO MEMORY  TCP/IP communication delay – RH cluster broken  High cache usage : slowdown application / system  Memory pressure : Memory leak, App is not tuned properly  Memory fragmentation : hugepage not used  OOM killer kills application: Memory pressure, OOM is enabled by default, kills based on badness value.  Segmentation fault : Kernel reclaims in normal/low memory region, hence no room for kernel, encounters segmentation fault.  Faulty Memory : Hardware failure or circuit failure in chip, need a diagnosis and replace chip
  31. 31. TROUBLESHOOTING MEMORY ISSUE  Memory & swap usage test : swap_tendency = mapped_ratio/2 + distress + vm_swappiness mapped_ratio= % of physical memory in use distress = how much trouble kernel in freeing memory vm_swappiness= default 60 swap_tendency >= 100, eligible for swap swap_tendency < 100, reclaim from page cache  Sysrq : echo 1 > /proc/sys/kernel/sysrq echo m > /proc/sysrq-trigger  backtrace analysis
  32. 32. TROUBLESHOOTING  OOM messages investigation : Messages :  Oct 25 07:28:34 nldedip4k031 kernel: [87976.461588] [] oom_kill_process+0x5c/0x80  Oct 25 07:28:34 nldedip4k031 kernel: [87976.461591] [] out_of_memory+0xc5/0x1c0  Oct 25 07:28:34 nldedip4k031 kernel: [87976.461595] [] __alloc_pages_nodemask+0x72c/0x740  Oct 25 07:28:34 nldedip4k031 kernel: [87976.461599] [] __get_free_pages+0x1c/0x30  Oct 25 07:28:34 nldedip4k031 kernel: [87976.461602] [] get_zeroed_page+0x12/0x20  Oct 25 07:28:34 nldedip4k031 kernel: [87976.461606] [] fill_read_buffer.isra.8+0xaa/0xd0  Oct 25 07:28:34 nldedip4k031 kernel: [87976.461609] [] sysfs_read_file+0x7d/0x90  Oct 25 07:28:34 nldedip4k031 kernel: [87976.461613] [] vfs_read+0x8c/0x160  Oct 25 07:28:34 nldedip4k031 kernel: [87976.461616] [] ? fill_read_buffer.isra.8+0xd0/0xd0  Oct 25 07:28:34 nldedip4k031 kernel: [87976.461619] [] sys_read+0x3d/0x70  Oct 25 07:28:34 nldedip4k031 kernel: [87976.461624] [] sysenter_do_call+0x12/0x28
  33. 33. Q/A Ref : https://www.kernel.org/ https://www.redhat.com/en http://www.tldp.org/LDP/tlk/mm/memory.html https://en.wikipedia.org/wiki/Virtual_memory https://lwn.net/

×