SlideShare a Scribd company logo
1 of 9
Download to read offline
Name: Lin Yang, Department of EE&CS,

Ohio University, Stocker Center, Athens, OH 45701,

linyang@bobcat.ent.ohiou.edu.




                                             CS 558

                                 OPERATING SYSTEM 2

                                             Spring 2003

                                   Instructor: Dr. Frank Drews

                                          Due: 05/31/2003

                                           (Final Version)




                    The Virtual Memory Management of Linux

                                      (Research Report)

                                              Author:
                                              Lin Yang




                                                                 1
1. Introduction

Linux is outstanding in the area of memory management. Linux will use every scrap of memory in a
system to its full potential. For example: (1) The Linux kernel itself is much smaller and more efficient
than the NT kernel. NT typically takes up more memory than Linux kernel, which means extra memory
can be used by applications instead of just holding the OS. (2) Linux uses a copy-on-write scheme. If two
or more programs are using the same block of memory, only one copy is actually in RAM, and all the
programs read the same block. If one program writes to that block, then a copy is made for just that
program. All other programs still share the same memory. When loading things like shared objects, this is
a major memory saver. (3) Demand-loading is very useful, as well. Linux only loads into RAM the
portions of a program that are actually being used, which reduces overall RAM requirements significantly.
At the same time, when swapping is necessary, only portions of programs are swapped out to disk, not
entire processes. This helps to greatly enhance multiprocessing performance. (4) Finally, any RAM not
being used by the kernel or applications is automatically used as a disk cache. This speeds access to the
disk so long as there is unused memory.

The Linux virtual memory system is responsible for maintaining the address space visible to each process.
It creates pages of virtual memory on demand and it needs to manage the loading and swapping operation
of the pages. Virtual memory provides a way of running more processes than can physically fit within a
computer's physical address space. Each process that is a candidate for running on a processor is allocated
it's own virtual memory area which defines the logical set of addresses that a process can access to carry
out it's required task. As this total virtual memory area is very large (typically constrained by the number
of address bits the processor has and the maximum number of processes it supports), each process can be
allocated a large logical address space (typically 3Gb) in which to operate. It is the job of a virtual
memory manager to ensure that active processes and the areas they wish to access are remapped to
physical memory as required. This is achieved by a method of swapping or paging the required sections
(pages) into and out of physical memory as required. Swapping involves replacing a complete process
with another in memory whereas paging involves removal of a 'page' (typically 2-4kbytes) of the
process's mapped memory and replacing it with a page from another process. As this may be a computer
intensive and time consuming task, care is taken to minimize the overhead that it has. This is done by
usage of a number of algorithms designed to take advantage of the common locality of related sections of
code and also only carrying out some operations such as memory duplication or reading when absolutely
required ( techniques known as copy on write, lazy paging and demand paging).
The virtual memory owned by a process may contain code and data from many sources. Executable code
may be shared between processes in the form of shared libraries, as these areas are read-only there is little


                                                                                                           2
chance of them becoming corrupted. Processes can allocate and link virtual memory to use during their
processing, Some of the memory management techniques used by Linux include the following issues:

      Page based                      Each virtual page has a set of flags which determine the
      protection mechanism            types of access allowed in user mode or kernel mode.

      Demand paging / lazy reading    the virtual memory of a process is brought into physical
                                      memory only when a process attempts to use it.

      Kernel and User modes of Unrestricted access to process's memory in kernel mode but
      operation                       access only to it's own memory for a process in user mode.

      Mapped files                    Memory is extended by allowing disk files to be used as a
                                      staging area for pages swapped out of physical memory.

      Copy on write memory            When two processes require access to a common area of
                                      code the virtual memory manager does not copy the section
                                      immediately as if only read access is required the section
                                      may be used safely by both processes. Only when a write is
                                      requested does the copy take place.

      Shared memory                   An area of memory may be mapped into the address space of
                                      more than one process by the calling of privileged
                                      operations.

      Memory Locking                  To ensure a critical page can never be swapped out of
                                      memory it may be locked in, the vritual memory manager
                                      will not then remove it.

In this research report, we focus on the research of virtual memory management, especially the page
replacement and the swapping technology, of Linux. The rest of the report is organized as follows:
section 2 introduces page replacement algorithm in Linux; section 3 introduce the swapping and caching
technology in Linux; some problems of virtual memory management have been proposed in section 4;
Section 5 concludes the report.



                                  2. Page replacement algorithm in Linux

Before we introduce the algorithm used in Linux, we need to introduce the concept of PTE Cache. All
modern computers designed for virtual memory incorporate a special hardware cache called a PTE cache


                                                                                                    3
or TLB (Translation Look aside Buffer), which caches page table entries in the CPU, so that the CPU
usually doesn't have to probe the page table to find a PTE that lets it translate an address.


The PTE cache is the magic gadget that makes virtual memory practical. Without it the CPU would have
to do extra main memory reads for every read or write instruction executed by the running program, just
to look up the PTE that let it translate a virtual address into a physical one.


Rather than looking up a PTE in the page table each time it needs to translate an address, the CPU looks
in its page table entry cache to find the right page table entry. If it's there already, it reuses it without
actually traversing the page table. Occasionally, the PTE cache doesn't hold the PTE it needs, so the CPU
loads the needed entry from the page table, and caches that. Note that a PTE cache does not cache normal
data---it only caches address translation information from the page table. A page table entry is very small,
and the PTE cache only caches a relatively small number of them (depending on the CPU, usually
somewhere from 32 and1024 of them). This means that PTE cache misses are a couple of orders of
magnitude more common than page faults---any time you touch a page you haven't touched fairly
recently, you're likely to miss the PTE cache. This isn't usually a big deal, because PTE cache misses are
many orders of magnitude cheaper than page faults---you only need to fetch a PTE from main memory,
not fetch a page from disk.


A PTE cache is very fast on a hit, and is able to translate addresses in a fraction of an instruction cycle.
This translation can generally be overlapped with other parts of instruction setup, so the PTE hardware
gives you virtual memory support at essentially zero time cost.

After knowing the concept of PTE, we can introduce the core page replacement algorithm used in Linux
The main component of the VM replacement mechanism is a clock algorithm. Clock algorithms are
commonly used because they provide a possible approximation of LRU replacement and are cheap to
implement. (All common general-purpose CPU's have hardware support for clock algorithms, in the form
of the reference bit maintained by the PTE Cache. This hardware support is very simple and fast, which is
why all designers of modern general-purpose CPU's put it in.)


A little refresher on the general idea clock algorithms: A clock algorithm cycles slowly through the pages
that are in RAM, checking to see whether they have been touched (and perhaps dirtied) lately. For this,
the hardware-supported reference and dirty bits of the page table entries are used. The reference bit is
automatically set by the PTE cache hardware whenever the page is touched---a flag bit is set in the page


                                                                                                           4
table entry, if the PTE is evicted from the PTE cache, it will be written back to its home position in the
page table. The clock algorithm can therefore examine the reference bits in page-table entries to
"examine" the corresponding page.


The basic idea of the clock algorithm is that a slow incremental sweep repeatedly cycles through the all of
the cached (in-RAM) pages, noticing whether each page has been touched (and perhaps dirtied) since the
last time it was examined. If a page's reference bit is set, the clock algorithm doesn't consider it for
eviction at this cycle, and continues its sweep, looking for a better candidate for eviction. Before
continuing its sweep, however, it resets the reference bit in the page table entry. Resetting the reference
bit ensures that the next time the page is reached in the cyclic sweep, it will indicate whether the page was
touched since this time. Visiting all of the pages cyclically ensures that a page is only considered for
eviction if it hasn't been touched for at least a whole cycle.


The clock algorithm proceeds in increments, usually sweeping a small fraction of jobs in-memory pages
at a time, and keeps a record of its current position between increments of sweeping. This allows it to
resume its sweeping from that page at the next increment. Technically, this simple clock scheme is known
as "second chance" algorithm, because it gives a page a second chance to stay in memory---one more
sweep cycle.


More refined versions of the clock algorithm may keep multiple bits, recording whether a page has been
touched in the last two cycles, or even three or four. Only one hardware-supported bit is needed for this,
however. Rather than just testing the hardware supported bit, the clock hand records the current value of
the bit before resetting it, for use next time around. Intuitively, it would seem that the more bits are used,
the more precise an approximation of LRU we'd get, but that's usually not the case. Once two bits are
used, clock algorithms don't generally get much better, due to fundamental weaknesses of clock
algorithms. Linux uses a simple second-chance (one-bit clock) algorithm, sort of, but with several
elaborations and complications.


The main clock algorithm is implemented by the kernel swap demon, a kernel thread that runs the
procedure Kswapd(). Kswapd is an infinite loop, which incrementally scans all the normal VM pages
subject to paging, then starting over. Kswapd generally does its clock sweeping in increments, and sleeps
in between increments so that normal processes may run. The page out daemon should usually be able to
keep enough free memory, but if it isn’t, the programs will end up calling the page out code itself (The
following file is the source code in Linux to realize this algorithm):


                                                                                                            5
static int swap_out(unsigned int priority, int gfp_mask)
int counter;
int __ret = 0;
counter = (nr_threads << SWAP_SHIFT) >> priority;
if (counter < 1)
counter = 1;
for (; counter >= 0; counter--) {
struct list_head *p;
unsigned long max_cnt = 0;
struct mm_struct *best = NULL;
int assign = 0;
int found_task = 0;
select:
spin_lock(&mmlist_lock);
p = init_mm.mmlist.next;
for (; p != &init_mm.mmlist; p = p->next) {
struct mm_struct *mm = list_entry(p, struct mm_struct, mmlist);
if (mm->rss <= 0)
continue;
found_task++;
if (assign == 1) {
mm->swap_cnt = (mm->rss >> SWAP_SHIFT);
if (mm->swap_cnt < SWAP_MIN)
mm->swap_cnt = SWAP_MIN;
if (mm->swap_cnt > max_cnt) {
max_cnt = mm->swap_cnt;
best = mm;
if (best)
atomic_inc(&best->mm_users);
spin_unlock(&mmlist_lock);
if (!best) {
if (!assign && found_task > 0) {
assign = 1;
goto select;


                                                                  6
break;
} else {
__ret = swap_out_mm(best, gfp_mask);
mmput(best);
break;
}
}
return __ret;
}


                                  3. Swapping and caching technology in Linux

Linux performs a clock sweep over the *virtual* pages, by cycling through each process's pages in
address order. For this it uses the vm_area mappings and page tables of the processes,so that it can scan
the pages of each process sequentially.


Rather than sweeping through all of the pages of an entire process before switching to another, the main
clock tries to evict a batch of pages from a process, and then move on to another process. It visits all of
the (page able) processes and then repeats. The effect of this is that there is a large number of distinct
clock sweeps, one per page able processes, and the overall clock sweep advances each of these smaller
sweeps periodically.


The following considerations led to this design:
    •      Related pages should be paged out together, to increase locality in the paging store (so-called
           swap files or swap partitions). By evicting a moderate number of virtual pages from a given
           process, in virtual address order, the sweep through virtual address space tends to group related
           pages together in the paging store.
    •      By alternating between processes at a coarser granularity, it avoids evicting a large number of
           pages from a given victim process---after it's evicted a reasonable number of pages from a
           particular victim, it moves on to another to provide some semblance of fairness between the
           processes.
    •      The use of a main clock over processes and virtual address pages and a secondary clock over page
           frames provides a way of combining the hardware-supported virtual page reference bits to get
           recency-of-touch information about logical pages stored in page frames.



                                                                                                          7
•     The secondary clock (and the use of a separate per-page-frame PG_referenced bit maintained in
         software) can act as an additional aging" period for pages that are evicted from the main clock. A
         page can be held in the "swap cache" after being evicted from the main clock, and allowed to age
         a while before being evicted from RAM.
The swap cache is just a set of page frames holding logical pages that have been evicted from the main
clock, but whose contents are have not yet been discarded. The contents of page frames need not be
copied to "move" them into the swap cache---rather, the page frame is simply marked as "swap cached"
by the main clock algorithm, and linked into a hash table that holds all of the page frames that currently
constitute the swap cache. The following is the part of source code used in Linux to operate the swap
cache.


void show_swap_cache_info(void)
printk("Swap cache: add %ld, delete %ld, find %ld/%ldn",
swap_cache_add_total,
swap_cache_del_total,
swap_cache_find_success, swap_cache_find_total);
endif
void add_to_swap_cache(struct page *page, swp_entry_t entry)
unsigned long flags;
#ifdef SWAP_CACHE_INFO
swap_cache_add_total++;
#endif
if (!PageLocked(page))
BUG();
if (PageTestandSetSwapCache(page))
BUG();
if (page->mapping)
BUG();
flags = page->flags & ~((1 << PG_error) | (1 << PG_arch_1));
page->flags = flags | (1 << PG_uptodate);
add_to_page_cache_locked(page, &swapper_space, entry.val);
static inline void remove_from_swap_cache(struct page *page)
struct address_space *mapping = page->mapping;
if (mapping != &swapper_space)


                                                                                                         8
BUG();
if (!PageSwapCache(page) || !PageLocked(page))
PAGE_BUG(page);
PageClearSwapCache(page);
ClearPageDirty(page);
remove_inode_page(page);
}

                              4. Problems of virtual memory management in Linux

There are several possible problems with the page replacement algorithm in Linux in my opinion, which
can be listed as follows:
    •   The system may react badly to variable VM load or to load spikes after a period of no VM activity.
        Since the Kswapd, the page out daemon, only scans when the system is low on memory, the
        system can end up in a state where some pages have reference bits from the last 5 seconds, while
        other pages have reference bits from 20 minutes ago. This means that on a load spike the system
        have no clue which are the right pages to evict from memory, this can lead to a swapping storm,
        where the wrong pages are evicted and almost immediately after towards faulted back in, leading
        to the page out of another random page, etc.
    •   There is no method to prevent the possible memory deadlock. With the arrival of journaling and
        delay allocation file systems it is possible that the systems will need to allocate memory in order
        to free memory, that is, to write out data so memory can become free. It may be useful to
        introduce some algorithm to prevent the possible deadlock under extremely low memory situation.


                                                    5. Conclusion
The virtual memory management system, especially the paging and swapping technologies have been
introduced in this paper. Some problems have been proposed based on these strategies.
.
                                                     6. Reference
[1] Rodrigo S. de Castro @home, Linux 2.4 Virtual Memory overview (2001); http://linuxcompressed .sourceforge.net/vm24
[2] Sun Grid Engine, Matthew Dillon Design Elements of the FreeBSD VM System (2000):
http://www.daemonews.org/200001/freebsd_vm.html
[3] Kernelnewbies http://kernelnewbies.org/
[4] The Linux Memory Management home page; http//linux-mm.org
[5] Yannis Smaragdakis, Scott F. Kaplan and Paul R. Wilson; EELRU: Simple and Effective Adaptive Page Replacement,
SIGMETRICS’ 99 http://www.cs.amherst.edu/~sfkaplan/papers/index.html



                                                                                                                     9

More Related Content

What's hot

Process management
Process managementProcess management
Process managementMohd Arif
 
Memory Management in OS
Memory Management in OSMemory Management in OS
Memory Management in OSvampugani
 
Memory Management in OS
Memory Management in OSMemory Management in OS
Memory Management in OSKumar Pritam
 
Virtual Memory
Virtual MemoryVirtual Memory
Virtual Memoryvampugani
 
Operating Systems 1 (9/12) - Memory Management Concepts
Operating Systems 1 (9/12) - Memory Management ConceptsOperating Systems 1 (9/12) - Memory Management Concepts
Operating Systems 1 (9/12) - Memory Management ConceptsPeter Tröger
 
Processes and Threads in Windows Vista
Processes and Threads in Windows VistaProcesses and Threads in Windows Vista
Processes and Threads in Windows VistaTrinh Phuc Tho
 
Linux Memory Management with CMA (Contiguous Memory Allocator)
Linux Memory Management with CMA (Contiguous Memory Allocator)Linux Memory Management with CMA (Contiguous Memory Allocator)
Linux Memory Management with CMA (Contiguous Memory Allocator)Pankaj Suryawanshi
 
Operating Systems Part III-Memory Management
Operating Systems Part III-Memory ManagementOperating Systems Part III-Memory Management
Operating Systems Part III-Memory ManagementAjit Nayak
 
Chapter 9 OS
Chapter 9 OSChapter 9 OS
Chapter 9 OSC.U
 
8 memory management strategies
8 memory management strategies8 memory management strategies
8 memory management strategiesDr. Loganathan R
 
Driver development – memory management
Driver development – memory managementDriver development – memory management
Driver development – memory managementVandana Salve
 

What's hot (20)

Process management
Process managementProcess management
Process management
 
Os unit 3
Os unit 3Os unit 3
Os unit 3
 
Memory Management in OS
Memory Management in OSMemory Management in OS
Memory Management in OS
 
Linux Memory Management
Linux Memory ManagementLinux Memory Management
Linux Memory Management
 
Memory management
Memory managementMemory management
Memory management
 
Cs8493 unit 3
Cs8493 unit 3Cs8493 unit 3
Cs8493 unit 3
 
OSCh14
OSCh14OSCh14
OSCh14
 
Memory Management in OS
Memory Management in OSMemory Management in OS
Memory Management in OS
 
Virtual Memory
Virtual MemoryVirtual Memory
Virtual Memory
 
Operating Systems 1 (9/12) - Memory Management Concepts
Operating Systems 1 (9/12) - Memory Management ConceptsOperating Systems 1 (9/12) - Memory Management Concepts
Operating Systems 1 (9/12) - Memory Management Concepts
 
Processes and Threads in Windows Vista
Processes and Threads in Windows VistaProcesses and Threads in Windows Vista
Processes and Threads in Windows Vista
 
Linux Memory Management with CMA (Contiguous Memory Allocator)
Linux Memory Management with CMA (Contiguous Memory Allocator)Linux Memory Management with CMA (Contiguous Memory Allocator)
Linux Memory Management with CMA (Contiguous Memory Allocator)
 
virtual memory
virtual memoryvirtual memory
virtual memory
 
Ch4 memory management
Ch4 memory managementCh4 memory management
Ch4 memory management
 
Operating Systems Part III-Memory Management
Operating Systems Part III-Memory ManagementOperating Systems Part III-Memory Management
Operating Systems Part III-Memory Management
 
Chapter 9 OS
Chapter 9 OSChapter 9 OS
Chapter 9 OS
 
8 memory management strategies
8 memory management strategies8 memory management strategies
8 memory management strategies
 
Driver development – memory management
Driver development – memory managementDriver development – memory management
Driver development – memory management
 
Cs8493 unit 4
Cs8493 unit 4Cs8493 unit 4
Cs8493 unit 4
 
Cs8493 unit 2
Cs8493 unit 2Cs8493 unit 2
Cs8493 unit 2
 

Viewers also liked

Data Breaches Preparedness (Credit Union Conference Session)
Data Breaches Preparedness (Credit Union Conference Session)Data Breaches Preparedness (Credit Union Conference Session)
Data Breaches Preparedness (Credit Union Conference Session)NAFCU Services Corporation
 
Cidade da cultura
Cidade da culturaCidade da cultura
Cidade da culturamercerey84
 
1 hdc de thi thu truong thpt chuyen le quy don quang tri nam 2015
1 hdc de thi thu truong thpt chuyen le quy don quang tri nam 20151 hdc de thi thu truong thpt chuyen le quy don quang tri nam 2015
1 hdc de thi thu truong thpt chuyen le quy don quang tri nam 2015Giang Hồ Tiếu Ngạo
 
Taller de redaccion 2da semana
Taller de redaccion 2da semanaTaller de redaccion 2da semana
Taller de redaccion 2da semanaCarlos Mendez
 
Taller para cartografos de suelos
Taller para cartografos de suelosTaller para cartografos de suelos
Taller para cartografos de suelosCarlos Mendez
 
«Как учиться эффективно?»
«Как учиться эффективно?»«Как учиться эффективно?»
«Как учиться эффективно?»Dmitrii Morovov
 
Modulo maestria fisico parte 3
Modulo maestria fisico parte 3Modulo maestria fisico parte 3
Modulo maestria fisico parte 3Carlos Mendez
 
Stephen Moker Creative Snapshot
Stephen Moker Creative SnapshotStephen Moker Creative Snapshot
Stephen Moker Creative Snapshotstephenmoker
 
Market challenge & opportunities in construction sector in ksa
Market challenge & opportunities in construction sector in ksaMarket challenge & opportunities in construction sector in ksa
Market challenge & opportunities in construction sector in ksaSamer MOBAYED
 
Spanish 2 tema 5 pronouns
Spanish 2 tema 5 pronounsSpanish 2 tema 5 pronouns
Spanish 2 tema 5 pronounssc12276405mhs
 
Двор 2.0 (презентация) ЗШ 2012
Двор 2.0 (презентация) ЗШ 2012Двор 2.0 (презентация) ЗШ 2012
Двор 2.0 (презентация) ЗШ 2012Dmitrii Morovov
 
Daily Affirmations
Daily AffirmationsDaily Affirmations
Daily Affirmationsmartyncgreen
 
Niet meer toekijken vanaf de zijlijn - TOPdesk on Tour 2010
Niet meer toekijken vanaf de zijlijn - TOPdesk on Tour 2010Niet meer toekijken vanaf de zijlijn - TOPdesk on Tour 2010
Niet meer toekijken vanaf de zijlijn - TOPdesk on Tour 2010Jordi Recasens
 
2005 cpr 之修訂1
2005 cpr 之修訂12005 cpr 之修訂1
2005 cpr 之修訂1u001072
 

Viewers also liked (20)

Jenis jenis batuk
Jenis jenis batukJenis jenis batuk
Jenis jenis batuk
 
Data Breaches Preparedness (Credit Union Conference Session)
Data Breaches Preparedness (Credit Union Conference Session)Data Breaches Preparedness (Credit Union Conference Session)
Data Breaches Preparedness (Credit Union Conference Session)
 
Cidade da cultura
Cidade da culturaCidade da cultura
Cidade da cultura
 
1 hdc de thi thu truong thpt chuyen le quy don quang tri nam 2015
1 hdc de thi thu truong thpt chuyen le quy don quang tri nam 20151 hdc de thi thu truong thpt chuyen le quy don quang tri nam 2015
1 hdc de thi thu truong thpt chuyen le quy don quang tri nam 2015
 
Taller de redaccion 2da semana
Taller de redaccion 2da semanaTaller de redaccion 2da semana
Taller de redaccion 2da semana
 
Taller para cartografos de suelos
Taller para cartografos de suelosTaller para cartografos de suelos
Taller para cartografos de suelos
 
«Как учиться эффективно?»
«Как учиться эффективно?»«Как учиться эффективно?»
«Как учиться эффективно?»
 
Modulo maestria fisico parte 3
Modulo maestria fisico parte 3Modulo maestria fisico parte 3
Modulo maestria fisico parte 3
 
Update 1
Update 1Update 1
Update 1
 
Stephen Moker Creative Snapshot
Stephen Moker Creative SnapshotStephen Moker Creative Snapshot
Stephen Moker Creative Snapshot
 
Update 1
Update 1Update 1
Update 1
 
Market challenge & opportunities in construction sector in ksa
Market challenge & opportunities in construction sector in ksaMarket challenge & opportunities in construction sector in ksa
Market challenge & opportunities in construction sector in ksa
 
soap
soapsoap
soap
 
Tuyen tap 410 cau he phuong trinh
Tuyen tap 410 cau he phuong trinh Tuyen tap 410 cau he phuong trinh
Tuyen tap 410 cau he phuong trinh
 
Spanish 2 tema 5 pronouns
Spanish 2 tema 5 pronounsSpanish 2 tema 5 pronouns
Spanish 2 tema 5 pronouns
 
Двор 2.0 (презентация) ЗШ 2012
Двор 2.0 (презентация) ЗШ 2012Двор 2.0 (презентация) ЗШ 2012
Двор 2.0 (презентация) ЗШ 2012
 
Daily Affirmations
Daily AffirmationsDaily Affirmations
Daily Affirmations
 
Niet meer toekijken vanaf de zijlijn - TOPdesk on Tour 2010
Niet meer toekijken vanaf de zijlijn - TOPdesk on Tour 2010Niet meer toekijken vanaf de zijlijn - TOPdesk on Tour 2010
Niet meer toekijken vanaf de zijlijn - TOPdesk on Tour 2010
 
2005 cpr 之修訂1
2005 cpr 之修訂12005 cpr 之修訂1
2005 cpr 之修訂1
 
Diagonisma fisikis g kat
Diagonisma fisikis g katDiagonisma fisikis g kat
Diagonisma fisikis g kat
 

Similar to Vmreport

Similar to Vmreport (20)

Opetating System Memory management
Opetating System Memory managementOpetating System Memory management
Opetating System Memory management
 
Paging +Algorithem+Segmentation+memory management
Paging +Algorithem+Segmentation+memory managementPaging +Algorithem+Segmentation+memory management
Paging +Algorithem+Segmentation+memory management
 
Operating system
Operating systemOperating system
Operating system
 
Power Point Presentation on Virtual Memory.ppt
Power Point Presentation on Virtual Memory.pptPower Point Presentation on Virtual Memory.ppt
Power Point Presentation on Virtual Memory.ppt
 
operating system
operating systemoperating system
operating system
 
Operating system Memory management
Operating system Memory management Operating system Memory management
Operating system Memory management
 
Chapter 2 part 1
Chapter 2 part 1Chapter 2 part 1
Chapter 2 part 1
 
ppt
pptppt
ppt
 
Linux%20 memory%20management
Linux%20 memory%20managementLinux%20 memory%20management
Linux%20 memory%20management
 
UNIT-2 OS.pptx
UNIT-2 OS.pptxUNIT-2 OS.pptx
UNIT-2 OS.pptx
 
OSCh9
OSCh9OSCh9
OSCh9
 
Ch9 OS
Ch9 OSCh9 OS
Ch9 OS
 
OS_Ch9
OS_Ch9OS_Ch9
OS_Ch9
 
I/O System and Case Study
I/O System and Case StudyI/O System and Case Study
I/O System and Case Study
 
Linux Internals - Interview essentials 3.0
Linux Internals - Interview essentials 3.0Linux Internals - Interview essentials 3.0
Linux Internals - Interview essentials 3.0
 
Bab 4
Bab 4Bab 4
Bab 4
 
How many total bits are required for a direct-mapped cache with 2048 .pdf
How many total bits are required for a direct-mapped cache with 2048 .pdfHow many total bits are required for a direct-mapped cache with 2048 .pdf
How many total bits are required for a direct-mapped cache with 2048 .pdf
 
Memory Management
Memory ManagementMemory Management
Memory Management
 
Virtual memory 20070222-en
Virtual memory 20070222-enVirtual memory 20070222-en
Virtual memory 20070222-en
 
virtual memory - Computer operating system
virtual memory - Computer operating systemvirtual memory - Computer operating system
virtual memory - Computer operating system
 

Vmreport

  • 1. Name: Lin Yang, Department of EE&CS, Ohio University, Stocker Center, Athens, OH 45701, linyang@bobcat.ent.ohiou.edu. CS 558 OPERATING SYSTEM 2 Spring 2003 Instructor: Dr. Frank Drews Due: 05/31/2003 (Final Version) The Virtual Memory Management of Linux (Research Report) Author: Lin Yang 1
  • 2. 1. Introduction Linux is outstanding in the area of memory management. Linux will use every scrap of memory in a system to its full potential. For example: (1) The Linux kernel itself is much smaller and more efficient than the NT kernel. NT typically takes up more memory than Linux kernel, which means extra memory can be used by applications instead of just holding the OS. (2) Linux uses a copy-on-write scheme. If two or more programs are using the same block of memory, only one copy is actually in RAM, and all the programs read the same block. If one program writes to that block, then a copy is made for just that program. All other programs still share the same memory. When loading things like shared objects, this is a major memory saver. (3) Demand-loading is very useful, as well. Linux only loads into RAM the portions of a program that are actually being used, which reduces overall RAM requirements significantly. At the same time, when swapping is necessary, only portions of programs are swapped out to disk, not entire processes. This helps to greatly enhance multiprocessing performance. (4) Finally, any RAM not being used by the kernel or applications is automatically used as a disk cache. This speeds access to the disk so long as there is unused memory. The Linux virtual memory system is responsible for maintaining the address space visible to each process. It creates pages of virtual memory on demand and it needs to manage the loading and swapping operation of the pages. Virtual memory provides a way of running more processes than can physically fit within a computer's physical address space. Each process that is a candidate for running on a processor is allocated it's own virtual memory area which defines the logical set of addresses that a process can access to carry out it's required task. As this total virtual memory area is very large (typically constrained by the number of address bits the processor has and the maximum number of processes it supports), each process can be allocated a large logical address space (typically 3Gb) in which to operate. It is the job of a virtual memory manager to ensure that active processes and the areas they wish to access are remapped to physical memory as required. This is achieved by a method of swapping or paging the required sections (pages) into and out of physical memory as required. Swapping involves replacing a complete process with another in memory whereas paging involves removal of a 'page' (typically 2-4kbytes) of the process's mapped memory and replacing it with a page from another process. As this may be a computer intensive and time consuming task, care is taken to minimize the overhead that it has. This is done by usage of a number of algorithms designed to take advantage of the common locality of related sections of code and also only carrying out some operations such as memory duplication or reading when absolutely required ( techniques known as copy on write, lazy paging and demand paging). The virtual memory owned by a process may contain code and data from many sources. Executable code may be shared between processes in the form of shared libraries, as these areas are read-only there is little 2
  • 3. chance of them becoming corrupted. Processes can allocate and link virtual memory to use during their processing, Some of the memory management techniques used by Linux include the following issues: Page based Each virtual page has a set of flags which determine the protection mechanism types of access allowed in user mode or kernel mode. Demand paging / lazy reading the virtual memory of a process is brought into physical memory only when a process attempts to use it. Kernel and User modes of Unrestricted access to process's memory in kernel mode but operation access only to it's own memory for a process in user mode. Mapped files Memory is extended by allowing disk files to be used as a staging area for pages swapped out of physical memory. Copy on write memory When two processes require access to a common area of code the virtual memory manager does not copy the section immediately as if only read access is required the section may be used safely by both processes. Only when a write is requested does the copy take place. Shared memory An area of memory may be mapped into the address space of more than one process by the calling of privileged operations. Memory Locking To ensure a critical page can never be swapped out of memory it may be locked in, the vritual memory manager will not then remove it. In this research report, we focus on the research of virtual memory management, especially the page replacement and the swapping technology, of Linux. The rest of the report is organized as follows: section 2 introduces page replacement algorithm in Linux; section 3 introduce the swapping and caching technology in Linux; some problems of virtual memory management have been proposed in section 4; Section 5 concludes the report. 2. Page replacement algorithm in Linux Before we introduce the algorithm used in Linux, we need to introduce the concept of PTE Cache. All modern computers designed for virtual memory incorporate a special hardware cache called a PTE cache 3
  • 4. or TLB (Translation Look aside Buffer), which caches page table entries in the CPU, so that the CPU usually doesn't have to probe the page table to find a PTE that lets it translate an address. The PTE cache is the magic gadget that makes virtual memory practical. Without it the CPU would have to do extra main memory reads for every read or write instruction executed by the running program, just to look up the PTE that let it translate a virtual address into a physical one. Rather than looking up a PTE in the page table each time it needs to translate an address, the CPU looks in its page table entry cache to find the right page table entry. If it's there already, it reuses it without actually traversing the page table. Occasionally, the PTE cache doesn't hold the PTE it needs, so the CPU loads the needed entry from the page table, and caches that. Note that a PTE cache does not cache normal data---it only caches address translation information from the page table. A page table entry is very small, and the PTE cache only caches a relatively small number of them (depending on the CPU, usually somewhere from 32 and1024 of them). This means that PTE cache misses are a couple of orders of magnitude more common than page faults---any time you touch a page you haven't touched fairly recently, you're likely to miss the PTE cache. This isn't usually a big deal, because PTE cache misses are many orders of magnitude cheaper than page faults---you only need to fetch a PTE from main memory, not fetch a page from disk. A PTE cache is very fast on a hit, and is able to translate addresses in a fraction of an instruction cycle. This translation can generally be overlapped with other parts of instruction setup, so the PTE hardware gives you virtual memory support at essentially zero time cost. After knowing the concept of PTE, we can introduce the core page replacement algorithm used in Linux The main component of the VM replacement mechanism is a clock algorithm. Clock algorithms are commonly used because they provide a possible approximation of LRU replacement and are cheap to implement. (All common general-purpose CPU's have hardware support for clock algorithms, in the form of the reference bit maintained by the PTE Cache. This hardware support is very simple and fast, which is why all designers of modern general-purpose CPU's put it in.) A little refresher on the general idea clock algorithms: A clock algorithm cycles slowly through the pages that are in RAM, checking to see whether they have been touched (and perhaps dirtied) lately. For this, the hardware-supported reference and dirty bits of the page table entries are used. The reference bit is automatically set by the PTE cache hardware whenever the page is touched---a flag bit is set in the page 4
  • 5. table entry, if the PTE is evicted from the PTE cache, it will be written back to its home position in the page table. The clock algorithm can therefore examine the reference bits in page-table entries to "examine" the corresponding page. The basic idea of the clock algorithm is that a slow incremental sweep repeatedly cycles through the all of the cached (in-RAM) pages, noticing whether each page has been touched (and perhaps dirtied) since the last time it was examined. If a page's reference bit is set, the clock algorithm doesn't consider it for eviction at this cycle, and continues its sweep, looking for a better candidate for eviction. Before continuing its sweep, however, it resets the reference bit in the page table entry. Resetting the reference bit ensures that the next time the page is reached in the cyclic sweep, it will indicate whether the page was touched since this time. Visiting all of the pages cyclically ensures that a page is only considered for eviction if it hasn't been touched for at least a whole cycle. The clock algorithm proceeds in increments, usually sweeping a small fraction of jobs in-memory pages at a time, and keeps a record of its current position between increments of sweeping. This allows it to resume its sweeping from that page at the next increment. Technically, this simple clock scheme is known as "second chance" algorithm, because it gives a page a second chance to stay in memory---one more sweep cycle. More refined versions of the clock algorithm may keep multiple bits, recording whether a page has been touched in the last two cycles, or even three or four. Only one hardware-supported bit is needed for this, however. Rather than just testing the hardware supported bit, the clock hand records the current value of the bit before resetting it, for use next time around. Intuitively, it would seem that the more bits are used, the more precise an approximation of LRU we'd get, but that's usually not the case. Once two bits are used, clock algorithms don't generally get much better, due to fundamental weaknesses of clock algorithms. Linux uses a simple second-chance (one-bit clock) algorithm, sort of, but with several elaborations and complications. The main clock algorithm is implemented by the kernel swap demon, a kernel thread that runs the procedure Kswapd(). Kswapd is an infinite loop, which incrementally scans all the normal VM pages subject to paging, then starting over. Kswapd generally does its clock sweeping in increments, and sleeps in between increments so that normal processes may run. The page out daemon should usually be able to keep enough free memory, but if it isn’t, the programs will end up calling the page out code itself (The following file is the source code in Linux to realize this algorithm): 5
  • 6. static int swap_out(unsigned int priority, int gfp_mask) int counter; int __ret = 0; counter = (nr_threads << SWAP_SHIFT) >> priority; if (counter < 1) counter = 1; for (; counter >= 0; counter--) { struct list_head *p; unsigned long max_cnt = 0; struct mm_struct *best = NULL; int assign = 0; int found_task = 0; select: spin_lock(&mmlist_lock); p = init_mm.mmlist.next; for (; p != &init_mm.mmlist; p = p->next) { struct mm_struct *mm = list_entry(p, struct mm_struct, mmlist); if (mm->rss <= 0) continue; found_task++; if (assign == 1) { mm->swap_cnt = (mm->rss >> SWAP_SHIFT); if (mm->swap_cnt < SWAP_MIN) mm->swap_cnt = SWAP_MIN; if (mm->swap_cnt > max_cnt) { max_cnt = mm->swap_cnt; best = mm; if (best) atomic_inc(&best->mm_users); spin_unlock(&mmlist_lock); if (!best) { if (!assign && found_task > 0) { assign = 1; goto select; 6
  • 7. break; } else { __ret = swap_out_mm(best, gfp_mask); mmput(best); break; } } return __ret; } 3. Swapping and caching technology in Linux Linux performs a clock sweep over the *virtual* pages, by cycling through each process's pages in address order. For this it uses the vm_area mappings and page tables of the processes,so that it can scan the pages of each process sequentially. Rather than sweeping through all of the pages of an entire process before switching to another, the main clock tries to evict a batch of pages from a process, and then move on to another process. It visits all of the (page able) processes and then repeats. The effect of this is that there is a large number of distinct clock sweeps, one per page able processes, and the overall clock sweep advances each of these smaller sweeps periodically. The following considerations led to this design: • Related pages should be paged out together, to increase locality in the paging store (so-called swap files or swap partitions). By evicting a moderate number of virtual pages from a given process, in virtual address order, the sweep through virtual address space tends to group related pages together in the paging store. • By alternating between processes at a coarser granularity, it avoids evicting a large number of pages from a given victim process---after it's evicted a reasonable number of pages from a particular victim, it moves on to another to provide some semblance of fairness between the processes. • The use of a main clock over processes and virtual address pages and a secondary clock over page frames provides a way of combining the hardware-supported virtual page reference bits to get recency-of-touch information about logical pages stored in page frames. 7
  • 8. The secondary clock (and the use of a separate per-page-frame PG_referenced bit maintained in software) can act as an additional aging" period for pages that are evicted from the main clock. A page can be held in the "swap cache" after being evicted from the main clock, and allowed to age a while before being evicted from RAM. The swap cache is just a set of page frames holding logical pages that have been evicted from the main clock, but whose contents are have not yet been discarded. The contents of page frames need not be copied to "move" them into the swap cache---rather, the page frame is simply marked as "swap cached" by the main clock algorithm, and linked into a hash table that holds all of the page frames that currently constitute the swap cache. The following is the part of source code used in Linux to operate the swap cache. void show_swap_cache_info(void) printk("Swap cache: add %ld, delete %ld, find %ld/%ldn", swap_cache_add_total, swap_cache_del_total, swap_cache_find_success, swap_cache_find_total); endif void add_to_swap_cache(struct page *page, swp_entry_t entry) unsigned long flags; #ifdef SWAP_CACHE_INFO swap_cache_add_total++; #endif if (!PageLocked(page)) BUG(); if (PageTestandSetSwapCache(page)) BUG(); if (page->mapping) BUG(); flags = page->flags & ~((1 << PG_error) | (1 << PG_arch_1)); page->flags = flags | (1 << PG_uptodate); add_to_page_cache_locked(page, &swapper_space, entry.val); static inline void remove_from_swap_cache(struct page *page) struct address_space *mapping = page->mapping; if (mapping != &swapper_space) 8
  • 9. BUG(); if (!PageSwapCache(page) || !PageLocked(page)) PAGE_BUG(page); PageClearSwapCache(page); ClearPageDirty(page); remove_inode_page(page); } 4. Problems of virtual memory management in Linux There are several possible problems with the page replacement algorithm in Linux in my opinion, which can be listed as follows: • The system may react badly to variable VM load or to load spikes after a period of no VM activity. Since the Kswapd, the page out daemon, only scans when the system is low on memory, the system can end up in a state where some pages have reference bits from the last 5 seconds, while other pages have reference bits from 20 minutes ago. This means that on a load spike the system have no clue which are the right pages to evict from memory, this can lead to a swapping storm, where the wrong pages are evicted and almost immediately after towards faulted back in, leading to the page out of another random page, etc. • There is no method to prevent the possible memory deadlock. With the arrival of journaling and delay allocation file systems it is possible that the systems will need to allocate memory in order to free memory, that is, to write out data so memory can become free. It may be useful to introduce some algorithm to prevent the possible deadlock under extremely low memory situation. 5. Conclusion The virtual memory management system, especially the paging and swapping technologies have been introduced in this paper. Some problems have been proposed based on these strategies. . 6. Reference [1] Rodrigo S. de Castro @home, Linux 2.4 Virtual Memory overview (2001); http://linuxcompressed .sourceforge.net/vm24 [2] Sun Grid Engine, Matthew Dillon Design Elements of the FreeBSD VM System (2000): http://www.daemonews.org/200001/freebsd_vm.html [3] Kernelnewbies http://kernelnewbies.org/ [4] The Linux Memory Management home page; http//linux-mm.org [5] Yannis Smaragdakis, Scott F. Kaplan and Paul R. Wilson; EELRU: Simple and Effective Adaptive Page Replacement, SIGMETRICS’ 99 http://www.cs.amherst.edu/~sfkaplan/papers/index.html 9