Linux Memory Management
Sr. Linux System Engineer
Amdocs DVCI, Pune, India
Basic concept of computer
Hardware, firmware, driver, software, application
CPU, RAM, How RAM used
Moving Information within Computer
Primary & Other Memory,
Segment of RAM
Memory Mapping, Process Address Space
Page, Frame, Hugepage, MMU etc.
Virtual Memory, PageCache
Memory nodes, zones, lowmem
Kernel Memory allocator
Pagefault Handling, Tools, Memory leak, Memory related issues
Hands-on Troubleshooting : sysrq, backtrace analysis, OOM messages investigation etc
BASIC CONCEPTS OF COMPUTER HARDWARE
This model of the typical digital computer is often called the von
Programs and data are stored in the same memory: primary memory
(Central Processing Unit)
HARDWARE, FIRMWARE, DRIVER, SOFTWARE, APPLICATION
Hardware : All computer devices like - Input, Output
devices, Motherboard, mouse, keyboard
Firmware : Vendor provided low level codes that
interacts with hardware to get the output of instructions
passed to device.
Driver : On top of firmware, driver is used to interacts with
firmware or hardware directly.
Software/Application: which interacts with system calls
to call kernel and kernel interacts with driver to get the
The three major components of the CPU are:
1. Arithmetic Unit (Computations performed)
Accumulator (Results of computations kept here)
2. Control Unit (Has two locations where numbers are kept)
Instruction Register (Instruction placed here for
Program Counter (Which instruction will be
3. Instruction Decoding Unit (Decodes the instruction)
Motherboard: The place where most of the electronics including
the CPU are mounted.
Commonly known as random access memory, or just
Holds instructions and data needed for programs that
are currently running
RAM is usually a volatile type of memory
Contents of RAM are lost when power is turned off
HOW RAM USED ?
Memory is used to store:
i) instructions - > to execute a program
ii) data -> When the computer is doing any job, the data that
have to be processed are stored in the primary memory. This
data may come from an input device like keyboard or from a
secondary storage device like a floppy disk.
MOVING INFORMATION WITHIN THE COMPUTER
How do binary numerals move into, out of, and within the computer?
Information is moved about in bytes, or multiple bytes called
Words are the fundamental units of information.
The number of bits per word may vary per computer.
A word length for most large IBM computers is 32 bits:
MOVING INFORMATION WITHIN THE COMPUTER …
Bits that compose a word are passed in parallel from place to
Consist of several wires, molded together.
One wire for each bit of the word or byte.
Additional wires coordinate the activity of moving
Each wire sends information in the form of a voltage
MOVING INFORMATION WITHIN THE COMPUTER …
Example of sending the word WOW over the ribbon cable
Voltage pulses corresponding to the ASCII codes would pass
through the cable.
Primary storage or memory: Where the data & program that are
currently in operation or being accessed are stored during use.
Consists of electronic circuits: Extremely fast and expensive.
Programs and data can be stored here for the
Volatile: All information will be lost once the computer
Contents do not change.
ROM : a transistor [storing video game software, electronic musical
instruments]. ROM is mostly used for firmware updates.
EROM : Erasable programmable read-only memory
EEPROM :Electrically Erasable Programmable Read-Only Memory
Cache : Location in RAM where data is stored for a certain amount of time of
that it can be reused.
Registers : various flip flop register[RS, D, JK, shift etc] holds information
Swap : External disk is used to accommodate the demand of more RAM.
SEGMENT OF RAM
Low mem, high mem, Normal mem, DMA, DMA32
On a 32-bit architecture[DMA, Normal & HighMem] : the
address space range for addressing RAM is:
0x00000000 - 0xffffffff or 4'294'967'295 (4 GB).
The user space range: 0x00000000 - 0xbfffffff or 3 GB
The kernel space range: 0xc0000000 - 0xffffffff or 1 GB
Linux splits the 1GB kernel space into 2 pieces: LOWMEM and HIGHMEM.
On 64 bit machine[DMA, DMA32 & Normal] : Normal
memory available beyond 4 GB
Linux uses only 4 segments in 32 bit arch:
2 segments (code and data/stack) for KERNEL SPACE from [0xC000 0000] (3 GB) to [0xFFFF FFFF] (4 GB)
2 segments (code and data/stack) for USER SPACE from  (0 GB) to [0xBFFF FFFF] (3 GB)
See virtual Map : $ pmap <PID> , see stack : $pstack <PID>
Segmentation, Paging [To overcome flaw in segmentation] –
allocating virtual small pages to each process so that they will be fit in RAM with out wasting it.
PROCESS ADDRESS SPACE – 31 BIT ARCH
File name, Environment
Bss[Block started by Symbol]
Text/Code Segment: contains the actual
Data: contains global variables
BSS: contains uninitialized global variables
Heap: dynamic memory
Stack: collection of frames/functions
4 GB -->
3 GB -->
0 GB -->
PAGE & FRAME
Paging, Demand Paging, Swapping
Page Tables [64 bit 4, 32 bit 2]: Page Global Directory, Page Upper Directory,
Page Middle Directory, Page
Min page size : getconf -a|grep -i page
Life cycle of page: active----> inactive list --> dirty --> clean
SWAP, HUGE PAGE, MMU,TLB
SWAP : All pages can’t be fit in RAM, need to call/send data from and to storage
Hugepage : default page is 4MB but large program uses chunks of memory area.
Hence, allow large page. [sysctl -a|grep -i huge]
MMU/TLB : Responsible for translating logical address to physical address. TLB is buffer
that is used by MMU.
Active/Inactive regions [cat /proc/meminfo]
Shmem : shared memory area[ipcs -m]
Buddyinfo : view memory fragmentation/ allocation[cat /proc/buddyinfo]
Cache : For speeding up, sync to flush out and forcefully write on disk, bdflush does
at background [flush-253:0 in rhel 6]
buffer's policy is first-in, first-out
cache's policy is Least Recently Used[LRU] [$ vmstat -S M 1]
VIRTUAL MEMORY, HOW PROGRAM MAPS?
Get exact required memory by process :
$ pmap -x <pid>,
PAGE CACHE MEMORY CONTROL
vm.dirty_writeback_centisecs=400 //how long they’ll wait
vm.dirty_background_ratio=5 // when percentage of total RAM filled, pdflush/flush daemon will
start write dirty data on disk
vm.dirty_ratio=20 //when percentage of total RAM filled, process will start write data on disk
vfs_cache_pressure  : controls the tendency of the kernel to reclaim the memory which is
used for caching of directory and inode objects
Swappiness : controls how kernel will use swap space.
To free pagecache:
To free pagecache: echo 1 > /proc/sys/vm/drop_caches
To free dentries and inodes : echo 2 > /proc/sys/vm/drop_caches
To free pagecache, dentries and inodes: echo 3 > /proc/sys/vm/drop_caches
cache writes done by : kernel thread pdflush/bdflush, now in rhel 6 it is flush.
Life cycle of pages :
active---->inactive list -->dirty > clean
Link : https://www.kernel.org/doc/Documentation/sysctl/vm.txt
PHYSICAL MEMORY ALLOCATION LIMIT
CommitLimit : total mem to be allocated based on ovcercommit_ratio
Committed_AS : currently allocated
overcommit_memory : from 0 to 2 << Start from here
0 = allow available memory on the system to be overloaded //default
1 = no memory over commit handling
2 = allocate best on overcommit_ratio // allocate best on condition
Overcommit_ratio: % of RAM when overcommit_memory is set 2, default value 50
Example : 4 GB RAM, 2 GB Swap, overcommit_memory=2, Overcommit_ratio=50 , so
commitLimit = 2+ (4*50/100)=2+2= 4 GB
Issue : Application failed to start due to shortage of memory, Needed to disable
WHY MEMORY CACHE IS REALLY REQUIRED
Speed up processing :
$ cat > XYZ
$ echo 3 > /proc/sys/vm/drop_caches
$ time cat XYZ //much time
$ time cat XYZ //less time
MEMORY NODES, ZONES IN 32 BIT & 64 BIT
Below zones are in 32 bits :
HIGHMEM's lower zone is NORMAL+DMA , NORMAL's lower zone is DMA.
Below zones are in 64 bits :
Normal : Beyond 4 GB
DMA : till 16 MB
DMA32 : till 4GB
$ cat /proc/zoneinfo
$ cat /proc/pagetypeinfo
$ cat /proc/buddyinfo
LOW MEMORY, ZONE_RECLAIM
"lowmem" often means NORMAL+DMA
“lowmem” is not present in RHEL 6, 64bit
Reservation is controlled by : lowmem_reserve_ratio [DMA NORMAL HIGMEM]
256 256 32 // (1/256)*100 % = 0.39% of nearset zone is reserved
zone_reclaim_mode: How more or less aggressive approaches to reclaim
memory when a zone runs out of memory
1 = Zone reclaim on
2 = Zone reclaim writes dirty pages out
4 = Zone reclaim swaps pages
ISSUES RELATED TO MEMORY
TCP/IP communication delay – RH cluster broken
High cache usage : slowdown application / system
Memory pressure : Memory leak, App is not tuned properly
Memory fragmentation : hugepage not used
OOM killer kills application: Memory pressure, OOM is enabled
by default, kills based on badness value.
Segmentation fault : Kernel reclaims in normal/low memory
region, hence no room for kernel, encounters segmentation
Faulty Memory : Hardware failure or circuit failure in chip, need
a diagnosis and replace chip
TROUBLESHOOTING MEMORY ISSUE
Memory & swap usage test :
swap_tendency = mapped_ratio/2 + distress + vm_swappiness
mapped_ratio= % of physical memory in use
distress = how much trouble kernel in freeing memory
vm_swappiness= default 60
swap_tendency >= 100, eligible for swap
swap_tendency < 100, reclaim from page cache
echo 1 > /proc/sys/kernel/sysrq
echo m > /proc/sysrq-trigger
OOM messages investigation :
Oct 25 07:28:34 nldedip4k031 kernel: [87976.461588]  oom_kill_process+0x5c/0x80
Oct 25 07:28:34 nldedip4k031 kernel: [87976.461591]  out_of_memory+0xc5/0x1c0
Oct 25 07:28:34 nldedip4k031 kernel: [87976.461595]  __alloc_pages_nodemask+0x72c/0x740
Oct 25 07:28:34 nldedip4k031 kernel: [87976.461599]  __get_free_pages+0x1c/0x30
Oct 25 07:28:34 nldedip4k031 kernel: [87976.461602]  get_zeroed_page+0x12/0x20
Oct 25 07:28:34 nldedip4k031 kernel: [87976.461606]  fill_read_buffer.isra.8+0xaa/0xd0
Oct 25 07:28:34 nldedip4k031 kernel: [87976.461609]  sysfs_read_file+0x7d/0x90
Oct 25 07:28:34 nldedip4k031 kernel: [87976.461613]  vfs_read+0x8c/0x160
Oct 25 07:28:34 nldedip4k031 kernel: [87976.461616]  ? fill_read_buffer.isra.8+0xd0/0xd0
Oct 25 07:28:34 nldedip4k031 kernel: [87976.461619]  sys_read+0x3d/0x70
Oct 25 07:28:34 nldedip4k031 kernel: [87976.461624]  sysenter_do_call+0x12/0x28