Main memoryfinal

1
Main Memory (Galvin Notes,9th Ed.)
Chapter 8: Main Memory
 BACKGROUND
o Basic Hardware
o Address Binding (Compile Time, Load Time, Execution Time)
o Logical Versus Physical Address Space
o Dynamic Loading
o Dynamic Linking and Shared Libraries
 SWAPPING
o Standard Swapping
o Swapping on Mobile Systems
 CONTIGUOUS MEMORY ALLOCATION
o Memory Protection (was Memory Mapping and Protection)
o Memory Allocation
o Fragmentation
 SEGMENTATION
o Basic Method
o Segmentation Hardware
 PAGING
o Basic Method
o Hardware Support
o Protection
o Shared Pages
 STRUCTURE OF THE PAGE TABLE
o Hierarchical Paging
o Hashed Page Tables
o Inverted Page Tables
 EXAMPLE: INTEL 32 AND 64-BIT ARCHITECTURES (Optional)
o IA-32 Segmentation
o IA-32 Paging
 LINUX ON PENTIUM SYSTEMS - (Optional)
Content
BACKGROUND
Every instructionhas to be fetchedfrommemorybefore it can be executed, andmost instructions involve retrieving data from memoryor storing data
in memoryor both. As processes are swappedinandout of CPU, so must their code and data be swapped in and out of memory, allat high speeds and
without interfering with any other processes.
Basic Hardware:
 The CPU can onlyaccess its registers ANDmainmemory. It can't make direct access
to hard drive for example.
 Registers that are built intothe CPU are generallyaccessible withinone cycle ofthe
CPU clock. Most CPUs candecode instructions andperform simple operations on
register contents at the rate of one or more operations per clock tick. The same
cannot be saidof mainmemory, whichis accessedvia a transactiononthe memory
bus. Completinga memoryaccessmaytake manycycles of the CPU clock. In such
cases, the processor normally needs to stall, since it does not have the data
requiredto complete the instructionthat it is executing. This situation is intolerable
because of the frequencyof memoryaccesses. The remedyis to add fast memory
betweenthe CPU andmainmemory, typicallyon the CPU chip for fast access. Such
a cache was described in Section 1.8.3.
 The base register holds the smallest legal physical memory address; the limit

2
register specifies the size of the range. For example, ifthe base register holds 300040 andthe limit register is 120900, th en the program can
legallyaccess all addressesfrom 300040 through 420939 (inclusive). Protection of memory space is accomplished by having the CPU
hardware compare everyaddressgeneratedinuser mode withthe registers. Anyattempt bya programexecuting in user mode to access
operating-systemmemoryor other users’ memoryresults ina trapto the operating system, which treats the attempt as a fatal error (Figure
8.2). This scheme prevents a user programfrom (accidentallyor deliberately) modifyingthe code or data structures ofeither the operating
systemor other users. The base andlimit registers canbe loaded onlybythe operating system, whichuses a specialprivileged instruction.
Since privilegedinstructions canbe executed only in kernel mode, and
since only the operating system executes in kernel mode, only the
operatingsystemcanload the base and limit registers. This scheme
allows the operating system to change the value of the registers but
prevents user programs from changing the registers’ contents.
 The operating system, executinginkernelmode, is givenunrestricted
access to bothoperating-systemmemoryandusers’ memory. This
provisionallows the operating system to load users’ programs intousers’
memory, to dumpout those programs incase of errors, to access and
modifyparameters of system calls, to perform I/O to andfrom user
memory, andto provide manyother services. Consider, for example, that
an operating systemfor a multiprocessing system must execute context switches, storingthe state ofone process fromthe registers into
mainmemorybefore loading the next process’s context from mainmemoryintothe registers.
 User processesmust be restrictedsothat theyonlyaccess memorylocations that "belong" to that
particular process. Thisis usuallyimplemented using a base register anda limit register for each
process. Everymemoryaccess made bya user process is checkedagainst these two registers, and if a
memoryaccess is attemptedoutside the valid range, then a fatal error is generated. The OS obviously
has access to all existing memorylocations, as this is necessaryto swapusers'code anddata inand
out of memory. (Fig. 8.2)
Address Binding
 User programs typicallyrefer to memoryaddresses withsymbolic names such as "i", "count", etc.
These symbolic namesmust be mapped or boundto physical memory addresses, which typically
occurs in several stages:
o Compile Time - If it is known at compile time where a program will reside in physical
memory, then absolute code can be generated bythe compiler, containing actual physical
addresses. However ifthe loadaddress changesat some later time, thenthe program will
have to be recompiled. DOS .COM programs use compile time binding.
o Load Time - If the locationat whicha programwill be loadedis not knownat compile time,
then the compiler must generate relocatable code, which referencesaddresses relative to
the start of the program. If that starting address changes, then the program must be
reloaded but not recompiled. (Ifit is not knownat compile time where the process will
reside in memory, thenthe compiler must generate relocatable code. In this case, final
binding is delayed until load time.)
o Execution Time - If a program canbe movedaroundin memory during the course of its
execution, thenbinding must be delayed until execution time. This requires special
hardware, and is the method implemented by most modern OSes.
 The processes on the disk that are waiting to be brought into memory for execution form the input queue.
 A compiler typicallybinds these symbolic addressesto relocatable addresses(suchas “14 bytes from the beginning of this mo dule”). The
linkage editor or loader inturn binds the relocatable addresses to absolute addresses (suchas 74014). Eachbindingis a mapping from one
address space to another.
 The address space of the computer maystart at 00000, the first address ofthe user processneed not be 00000. You will see later how a user
program actually places a process in physical memory.
Logical Versus Physical Address Space
 An address generated bythe CPU is commonlyreferred to as a logical address, whereas anaddress seen by the memory unit—that is, the
one loaded into the memory-address register of the memory—is commonly referred to as a physical address.
 The compile-time and load-time address-bindingmethods generate identical logical andphysical addresses. However, the execution -time
address binding scheme results indifferinglogical andphysical addresses. In this case, we usually refer to the logical address as a virtual
address. We use logical addressandvirtual address interchangeablyinthistext. The set ofalllogical addressesgeneratedbya program is a
logical address space. The set of all physical addresses corresponding to these logical
addresses is a physical address space. Thus, inthe execution-time address-binding scheme,
the logical and physical address spaces differ.
 The run-time mapping from virtualto physicaladdresses is done bya hardware device called
the memory-management unit (MMU).We can choose from many different methods to
accomplish such mapping, as we discuss in Section 8.3 through Section 8.5.
 The MMU can take onmanyforms. One of the simplest is a modificationof the base-register
scheme describedearlier. The base register is now called a relocation register. The value in
the relocation register is addedto everyaddress generatedbya user processat the time the
address is sent to memory(see Figure 8.4). For example, if the base is at 14000, then an
attempt bythe user to address location 0 is dynamically relocated to location 14000; an
access to location 346 is mapped to location 14346.

3
 The address generated bythe CPU is a logical address, whereasthe address actuallyseenbythe memoryhardware is a physical address.
Physical address is the one loadedintothe memory-address register of the memory. In a computer, the MemoryAddress Register (MAR) is a
CPU register that either stores the memoryaddressfrom which data will be fetched to the CPU or the address to whichdata will be sent and
stored. When readingfrom memory, data addressedbyMAR is fed into the MDR (memorydata register)and thenusedbythe CPU. The run
time mapping oflogical to physical addressesis handledbythe memory-management unit, MMU. In the MMU, the base register is now
termed a relocation register, whose value is added to every memory request at the hardware level.
 User programs workentirelyinlogical address space, andanymemoryreferences or manipulations are done usingpurelylogical addresses.
Onlywhenthe address gets sent to the physical memorychips is the physicalmemoryaddress generated. (The user program never sees the
real physicaladdresses. User programs work entirelyin logicaladdress space, and anymemoryreferencesor manipulations are done using
purelylogical addresses. Onlywhenthe address gets sent to the physical memory chips is the physica l memory address generated.).
Dynamic Loading
 Dynamic loading:Rather than loadinganentire program into memoryat once, dynamic loading loads up each routine as it is ca lled. The
advantage is that unused routines neednever be loaded, reducing total memoryusage and generating faster program startup times. The
downside is the added complexityandoverheadof checking to see if a routine is loadedeverytime it is called and thenthen loading it up if it
is not already loaded.
Dynamic Linking and Shared Libraries
 Dynamic and static linking: Benefits of the former (More: Reentrant code sectionof libraryroutinesthat canbe shared)
 With static linking librarymodulesget fullyincluded in executable modules, wasting bothdiskspace andmainmemory usage, because every
program that includeda certain routine from the librarywould have to have their owncopyof that routine linkedintotheir executable code.
With dynamic linking, however, onlya stubis linkedinto the executable module, containing referencesto the actual librarymodule linked in
at run time. An added benefit of dynamicallylinkedlibraries (DLLs, alsoknown as shared libraries or shared objects on UNIX systems )
involves easyupgrades andupdates. Whena programuses a routine from a standard library and the routine changes, then the program
must be re-built (re-linked) inorder to incorporate the changes. However ifDLLs are used, then as long as the stub doesn't change, the
program can be updated merely by loading new versions of the DLLs onto the system.
SWAPPING
A process must be inmemoryto be executed. A process, however, canbe swappedtemporarily out
of memoryto a backing store and thenbrought backintomemoryfor continued execution (Figure
8.5). Swapping makes it possible for the totalphysicaladdress space of all processes to exceed the
real physicalmemoryof the system, thus increasingthe degree of multiprogramming in a system.
Standard Swapping
 If there is not enoughmemoryavailable to keepall running processesinmemoryat the
same time, thensome processesthat are not currentlyusing the CPU mayhave their
memoryswapped out to a fast local disk calledthe backing store. If compile-time or load-
time address binding is used, then processesmust be swappedbackinto the same memory
locationfrom which theywere swappedout. If execution time binding is used, thenthe
processes can be swappedback intoanyavailable location. (Fig. 8.5)
 Swappingis a veryslow process comparedto other operations. For example, if a user processoccupied10 MB and the transfer rate for the
backingstore were 40 MB per second, thenit wouldtake 1/4 second( 250 milliseconds ) just to do the data transfer. Swapping involves
moving olddata out as well as new data in.
 It is important to swapprocesses out of memoryonlywhentheyare idle, or more to the point, onlywhenthere are nopendingI/O
operations. (Otherwisethe pending I/O operationcouldwrite into the wrongprocess's memoryspace. ) The solution is to either swaponly
totallyidle processes, or do I/O operations onlyintoandout ofOS buffers.
 Most modern OSes nolonger use swapping, because it is tooslow and there are faster alternatives available. (E.g. Paging) However some
UNIXsystems will still invoke swapping if the systemgets extremelyfull, andthen discontinue swapping when the load reduce s again.
Windows 3.1 would use a modifiedversion ofswappingthat was somewhat controlled bythe user, swapping process's out if necess ary and
then only swapping them back in when the user focused on that particular window.
CONTIGUOUS MEMORY ALLOCATION
One approach to memorymanagement is to load eachprocessinto a contiguous space. The operating system is allocated space first, usually at either
low or high memorylocations, andthen the remaining available memoryis allocatedto processes as needed. (The OS is usuallyloaded low, because
that is where the interrupt vectors are located)
Memory Protection
 The systemshowninFigure 8.6 below allows protectionagainst user programs
accessing areas that theyshould not, allows programs to be relocatedto different
memorystartingaddresses as needed, andallows the memoryspace devotedto
the OS to grow or shrink dynamicallyas needs change.
Memory Allocation

4
 One methodof allocating contiguous memoryis to divide all available memoryinto equal sized partitions, and to assigneachprocess to their
own partition. This restricts both the number of simultaneous processes and the maximumsize of each process, and is no longe r used. An
alternate approachis to keepa list ofunused (free ) memoryblocks ( holes ), andto finda hole of a suitable size whenever a process needs
to be loadedinto memory. In this approach, strategies for finding the "best" allocation of memory to processes include: First fit, best fit,
worst fit.
o First fit - Search the list of holesuntil one is foundthat is big enoughto satisfythe request, andassigna portionof that hole to that
process. Whatever fractionof the hole not neededbythe request is left onthe free list as a smaller hole. Subsequent requ ests may
start looking either from the beginning of the list or from the point at which this search ended.
o Best fit - Allocate the smallest hole that is big enoughto satisfythe request. Thissaves large holes for other process requests that
mayneedthem later, but the resulting unusedportions of holes maybe toosmall to be ofanyuse, andwill therefore be wasted.
Keeping the free list sorted can speed up the process of finding the right hole.
o Worst fit - Allocate the largest hole available, thereby increasing the likelihood that the remaining portion will be usable for
satisfying future requests.
Fragmentation
 External fragmentationmeans that the available memoryis brokenupinto lots of little pieces, none of whichis big enough to satisfythe next
memoryrequirement, although the sum totalcould. All the memoryallocation strategies suffer from external fragmentation, thoughfirst
and best fits experience the problems more so thanworst fit. Statisticalanalysisof first fit, for example, shows that for N blocks of allocated
memory, another 0.5 N will be lost to fragmentation. That is, one-third ofmemorymaybe unusable! This propertyis knownas the 50-
percent rule.
 Internalfragmentationalso occurs, withallmemoryallocation strategies. This is causedbythe fact that memoryis allocated in blocks of a
fixed size, whereas the actualmemoryneeded willrarelybe that exact size. For a
randomdistributionof memoryrequests, on the average 1/2 blockwill be wasted
per memoryrequest, because on the average the last allocatedblock will be only
halffull. Note that the same effect happens with hard drives, and that modern
hardware gives us increasinglylarger drives and memory at the expense of ever
larger blocksizes, whichtranslatesto more memorylost to internalfragmentation
(some systems use variable-sized blocks to deal with this ).
 One solutionto the problem ofexternalfragmentation is compaction. The goal is
to shuffle the memorycontents soas to place all free memory together in one
large block. Compaction is not always possible, however. Ifrelocationis static and
is done at assemblyor load time, compactioncannot be done. It is possible only if
relocationis dynamic and is done at execution time. If addresses are re located
dynamically, relocation requires only moving the program and data and then
changing the base register to reflect the newbase address. When compaction is
possible, we must determine its cost. The simplest compaction algorithm is to
move all processes toward one endof memory;all holes move in the other direction, producing one large hole of available memory. This
scheme can be expensive.
 Another possible solution to the external -fragmentation problem is to permit the logical address space of the processes to be
noncontiguous, thus allowinga process to be allocated physical memory wherever such memory is available. Two complementary
techniques achieve this solution: segmentation (Section 8.4) and paging (Section 8.5). These techniques can also be combined.
SEGMENTATION
Do programmers think of memoryas a linear arrayof bytes, some containing instructions andothers containing data?Most programmers wouldsay
“no.” Rather, theyprefer to viewmemoryas a collection ofvariable-sizedsegments, withno necessaryordering among the segments (Figure 8.7).
What if the hardware couldprovide a memorymechanismthat mappedthe programmer’s view to the actual physical memory? The systemwould
have more freedom to manage memory, while the programmer wouldhave a more naturalprogrammingenvironment. Segmentationprovides such a
mechanism. Segmentationis a memory-management scheme that supports thisprogrammer viewof memory. A logical address space is a collection of
segments.
Each segment has a name anda length. The addresses specifyboththe segment
name and the offset withinthe segment. The programmer therefore specifies
each address bytwo quantities:a segment name andanoffset. For simplicityof
implementation, segments are numberedandare referredto bya segment
number, rather thanbya segment name. Thus, a logical address consists of a two
tuple: <segment-number, offset>.
 Segmentation: Most users (programmers) do not think of their
programs as existing in one continuous linear address space. Rather
they tend to think of their memory in multiple segments, each
dedicatedto a particular use, such as code, data, the stack, the heap,
etc. Memorysegmentation supports this viewbyproviding addresses
with a segment number (mappedto a segment base address) and an
offset from the beginning ofthat segment. For example, a C compiler
might generate 5 segments for the user code, library code, global
(static) variables, the stack, and the heap, as shown in Figure 8.7.
(Elements within a segment are identified by their offset from the
beginning of the segment)

5
 A segment table maps segment-offset addresses to physical addresses, andsimultaneouslychecks for invalid addresses, using a system
similar to the page tables andrelocationbase registers discussedpreviously. ( Note that at this point inthe discussionof segmentation, each
segment is kept in contiguous memoryandmaybe ofdifferent sizes, but that segmentation canalso be combinedwith paging as we shall
see shortly.). (Fig. 8.8 says the segment table is part of the segmentation hardware). Fig. 8.9 completely clarifies it.
 Fig. 8.4, 8.6, 8.8, 8.9 – Note these carefully, what the CPU sends to the MMU is the limit, whichifvalidis addedto the value inthe relocation
register. In the case ofsegmentation, 8.9 seems to suggest that the CPU sends a segment + limit pair, where the segment is referred to the
segment table.
 Each segment hasa name anda length. The addresses specifyboththe segment name andthe offset withinthe segment. The programmer
therefore specifieseachaddress bytwo quantities:a segment name andanoffset. For simplicityof implementation, segments are numbered
and are referredto bya segment number, rather than bya segment name. Thus, a logical address consists of a two tuple: <segment-number,
offset>.
 Normally, whena program is compiled, the compiler automaticallyconstructs segments reflecting the input program. A C compil er might
create separate segments for the following:1. The code 2. Global variables3. The heap, from whichmemoryis allocated 4. The stacks used
byeach thread5. The standard ClibraryLibraries that are linkedin during compile time might be assignedseparate segments . The loader
would take all these segments and assign them segment numbers.
 Segmentation Hardware: Although the programmer cannow refer to objects inthe program by a two-dimensional address, the actual
physical memoryis still, ofcourse, a one-dimensional sequence of bytes. Thus, we must define animplementationto maptwo-dimensional
user-definedaddressesinto one-dimensional physical addresses. This mapping is effected by a segment table. Each entry in the segment
table hasa segment base anda segment limit. The segment base contains the starting physical address where the segment resides in
memory, and the segment limit specifies the length of the segment.
The use of a segment table is illustratedinFigure 8.8. A logical address consists of two parts:a segment number, s, and an offset into
that segment, d. The segment number is usedas anindex to the segment table. The offset d of the logical address must be between 0 and
the segment limit. Ifit is not, we trap to the operating system (logical addressing attempt beyondend ofsegment). Whenan offset is legal, it
is addedto the segment base to produce the address inphysical memoryof the desiredbyte. The segment table is thus essentially an array
of base–limit register pairs. As anexample, consider the situationshowninFigure 8.9. We have five segments numbered from 0 throug h 4.
The segments are stored inphysical memoryas shown. The segment table has a separate entry for each segment, giving the beginning
address of the segment inphysical memory(or base) andthe length ofthat segment (or limit). For example, segment 2 is 400 bytes long and
begins at location4300. Thus, a reference to byte 53 of segment 2 is mapped ontolocation4300 + 53 = 4353. A reference to segment 3, byte
852, is mappedto 3200 (the base ofsegment 3) + 852 = 4052. A reference to byte 1222 of segment 0 would result ina trap to the operating
system, as this segment is only 1,000 bytes long.
PAGING
Segmentationpermits the physicaladdress space of a process to be non-contiguous. Paging is another memory-management scheme that offers
this advantage. However, pagingavoids external fragmentationandthe need for compaction, whereas segmentation does not. It also solves the
considerable problem offittingmemorychunks of varying sizes onto the backing store. Most memory-management schemes used before the
introduction ofpaging sufferedfrom this problem. The problemarises because, when code fragments or data residing in main memory need to be
swappedout, space must be found onthe backingstore. The backing store hasthe same fragmentationproblems discussed incon nection with main
memory, but accessis much slower, so compactionis impossible. Because of its advantages over earlier methods, paginginits various forms is used in
most operating systems, from those for mainframes through those for smartphones. Paging is implemented through cooperation between the
operating system and the computer hardware.
 Pagingis a memorymanagement scheme that allows processes physical memoryto be discontinuous, andwhicheliminates problems with
fragmentationbyallocating memory inequal sizedblocks known as pages (When a process is to be executed, its pages are loaded into any
available memory frames from their source (a file system or the backing store). The backing store is divided into fixed-sized blocks that
are the same size as the memory frames or clusters of multiple frames.). Pagingeliminates most of the problems of the other methods
discussed previously, and is the predominant memory management technique used today.
 The basic idea behind paging is to divide physical memoryintoa number of equalsized blocks called frames, and to divide a programs
LOGICAL memoryspace intoblocks of the same size called pages. Anypage (from anyprocess) can be placed into anyavailable frame. The
page table is usedto lookupwhat frame a particular page is stored inat the moment. In the followingexample, for instance, page 2 of the
program's logical memoryis currentlystoredinframe 3 of physicalmemory(Fig. 8.10, 8.11).

6
 The hardware support for paging is illustratedinFigure 8.10. Everyaddressgeneratedbythe CPU is dividedinto two parts: a page number
(p) and a page offset (d). The page number is used as an index intoa page table. The page table contains the base address of each page in
physical memory. This base addressis combinedwiththe page offset to define the physicalmemoryaddress that is sent to the memoryunit.
The paging model of memory is shown in Figure 8.11.
 **Each process has its own page table (and The page size (like the frame size) is defined by the hardware.)
 As a concrete (although minuscule) example, consider the memoryinFigure 8.12. Here, inthe logical address, n= 2 andm = 4. Using a page
size of 4 bytes anda physical memoryof 32 bytes (8 pages), we show how the programmer’s viewof memorycan be mapped into physical
memory. Logical address 0 is page 0, offset 0. Indexing intothe page table, we findthat page 0 is inframe 5. Thus, logical address 0 maps to
physical address 20 [= (5 × 4) + 0]. Logical address3 (page 0, offset 3) maps to physical address 23 [= (5 × 4) + 3]. Logical address 4 is page 1,
offset 0;according to the page table, page 1 is mappedto frame 6. Thus, logicaladdress 4 maps to physical address24 [= (6 × 4) + 0]. Logical
address 13 maps to physical address 9.
 When we use a paging scheme, we have noexternal fragmentation:anyfree frame can be allocatedto a processthat needs it. However, we
mayhave some internal fragmentation. Notice that frames are allocatedas units. If the memoryrequirements o f a process donot happento
coincide withpage boundaries, the last frame allocatedmaynot be completelyfull. For example, if page size is 2,048 bytes, a process of
72,766 bytes will need35 pagesplus 1,086 bytes. It will be allocated36 frames, resulting in internal fragmentation of 2,048 − 1,086 = 962
bytes. In the worst case, a process would needn pages plus 1 byte. It wouldbe allocatedn + 1 frames, resulting in internal fragmentation of
almost an entire frame.
 You mayhave noticed that paging itselfis a formof dynamic relocation. Everylogical address is bound by the paging hardware to some
physical address. Using paging is similar to using a table of base (or relocation) registers, one for each frame of memory.
 A logical addressconsists oftwo parts:A page number inwhich the address resides, andan offset from the beginning of that page. (The
number of bits in the page number limits how manypages a single process can address. The number of bits in the offset determ ines the
maximum size of each page, and should correspond to the system frame size.)
 The page table maps the page number to a frame number, to yield a physical address whichalso has two parts:The frame number and the
offset within that frame. The number ofbits in the frame number determines howmanyframes the system canaddress, and th e number of
bits in the offset determines the size of each frame. (Won’t the offset be same as the page offset mentionedabove? Andwon’t the number
of bits inthe frame be same as the number of bits inpage number? Analyze–the answer
for former is yes, second is no).
 Page numbers, frame numbers, and frame sizes are determinedbythe architecture, but
are typicallypowers oftwo, allowing addresses to be split at a certainnumber of bits. For
example, if the logicaladdress size is 2^m and the page size is 2^n, thenthe high-order m-nbits of a logical address designate the page
number and the remaining n bits represent the offset. (Applythis to the 16 byte
process logical space in Fig. 8.12)
 Overheadis involvedin each page-table entry, andthis overhead is reduced as
the size of the pages increases. Also, disk I/O is more efficient when the amount
data being transferred is larger (Chapter 10). Generally, page sizes have grown
over time as processes, data sets, andmain memoryhave become larger. Today,
pages typicallyare between4 KB and8 KB in size, and some systems support
even larger page sizes. Some CPUs andkernels evensupport multiple page sizes.
For instance, Solarisuses page sizes of 8 KB and 4 MB, depending on the data
stored bythe pages. Researchers are now developingsupport for variable on-the-
fly page size.
 Frequently, on a 32-bit CPU, eachpage-table entryis 4 bytes long, but that size
can varyas well. A 32-bit entrycanpoint to one of 232 physical page frames. If
frame size is 4 KB (212), then a system with4-byte entries can address244 bytes
(or 16 TB) of physicalmemory.
 Note alsothat the number of bits in the page number andthe number of bits in
the frame number do not have to be identical. The former determines the
address range ofthe logical address space, andthe latter relates to the physical
address space. (p=m-n, d=n) 
 DOS usedto use anaddressing scheme with 16 bit frame numbers and 16-bit
offsets, onhardware that onlysupported 24-bit hardware addresses. The result
was a resolutionof starting frame addresses finer than the size of a single frame,
and multiple frame offset combinations that mapped to the same physical
hardware address. )
 In the illustrative example ofFig. 8.12, a process has 16 bytes of logical memory,
mappedin4 byte pagesinto32 bytes ofphysical memory.(Presumably some
other processeswouldbe consuming the remaining 16 bytes of physical
memory.)
 Note that paging is like having a table ofrelocation registers, one FOR EACH
page of the logical memory.
 There is no external fragmentation withpaging. All blocks ofphysicalmemory
are used, andthere are nogaps inbetweenandnoproblems with finding the
right sizedhole for a particular chunk ofmemory. There is, however, internal
fragmentation. Memoryis allocatedinchunks the size of a page, and on the
average, the last page will onlybe halffull, wasting onthe average half a page
of memoryper process. (Possiblymore, ifprocesses keep their code anddata
in separate pages.)

7
 Larger page sizes waste more memory, but are more efficient in terms ofoverhead. Modern trends have beento increase page siz es, and
some systems even have multiple size pages to try and make the best of both worlds.
 Page table entries (frame numbers) are typically32 bit numbers, allowing access to 2^32 physical page frames. If those frames are 4 KB in
size each, that translates to 16 TB of addressable physical memory. (32 + 12 = 44 bits ofphysical address space. Note that l og24kb is 12).
 When a process requests memory(e.g. whenits code is loadedinfrom disk), free frames are allocatedfrom a free-frame list, and inserted
into that process's page table(Fig 8.13 page 11). Processesare blocked fromaccessing anyone else's memorybecause all of their memory
requests are mapped through their page table. There is nowayfor them to generate anaddress that maps intoanyother process's memory
space.
 The operating systemmust keeptrackof each individual process's page table, updatingit whenever the process's pages get moved in and
out of memory, andapplyingthe correct page table when processingsystemcallsfor a particular process. This all increases the overhead
involvedwhenswappingprocesses in and out of the CPU. (The currentlyactive page table must be updated to reflect the process that is
currently running.)
 When a process arrives inthe systemto be executed, its size, expressedin pages, is examined. Eachpage ofthe process need s one frame.
Thus, ifthe processrequiresn pages,at least n framesmust be available in memory. If n frames are available, they are allocated to this
arriving process. The first page of the process is loadedintoone ofthe allocatedframes, and the frame number is put in th e page table for
this process. The next page is loaded into another frame, its frame number is put into the page table, and so on (Figure 8.13).
 An important aspect ofpaging is the clear separationbetweenthe programmer’s view of memory and the actual physical memory. The
programmer views memoryas one single space, containing onlythis one program. In fact, the user program is scatteredthroughout physical
memory, which alsoholds other programs. The difference betweenthe programmer’s viewof memoryand the actual physical memory is
reconciled bythe address-translationhardware. The logical addresses are translatedintophysical addresses. This mapping is hidden from
the programmer andis controlled bythe operatingsystem. Notice that the user process bydefinitionis unable to accessmemo ryit does not
own. It has no wayof addressingmemoryoutside of its page table, and the table includes only those pages that the process o wns.
 Since the operating system is managingphysical memory, it must be aware of the allocationdetails of physical memory—which frames are
allocated, whichframes are available, how manytotal framesthere are, andsoon. This information is generally kept in a da ta structure
called a frame table. The frame table hasone entryfor eachphysical page frame, indicatingwhether the latter is free or allocatedand, ifit is
allocated, to which page of which process or processes.
 The operating systemmaintains a copyof the page table for each process, just as it maintains a copyof the instructioncoun ter and register
contents. This copyis usedto translate logical addressesto physicaladdresses whenever the operating systemmust mapa logical address to
a physical address manually. It is also used bythe CPU dispatcher to define the hardware page table whena process is to be allocated the
CPU. Paging therefore increases the context-switch time.
Hardware Support
 Page lookups must be done for everymemoryreference, andwhenever
a process gets swappedinor out of the CPU, its page table must be
swappedin and out too, along withthe instructionregisters, etc. (Why
not PC alone?)
 One optionis to use a set of registers for the page table. For example,
the DECPDP-11 uses 16-bit addressingand8 KB pages, resultinginonly
8 pages per process. (It takes13 bits to address8 KB of offset, leaving
only3 bits to define a page number.)
 An alternate optionis to store the page table inmain memory, and to
use a single register (called the page-table base register, PTBR) to
record where in memorythe page table is located. Processswitching is
fast, because onlythe single register needs to be changed. However
memoryaccess just got half as fast, because everymemoryaccess now
requirestwo memoryaccesses - One to fetch the frame number from
memoryandthen another one to accessthe desiredmemorylocation.
The solutionto this problem is to use a veryspecial high-speedmemory
device calledthe translationlook-aside buffer, TLB. The benefit of the
TLB is that it can searchanentire table for a keyvalue inparallel, and if
it is found anywhere in the table, thenthe corresponding lookup value
is returned. Fig. 18.14 illustrates paging hardware with TLB. It seems
like in case of a TLB miss, the page table is searched as usual.
 The TLB is veryexpensive, however, and therefore verysmall.(Not large enough to holdthe entire page table.) It is therefore usedas a cache
device. Addresses are first checked against the TLB, andif the infois not there (a TLB miss), then the frame is looked up from main memory
and the TLB is updated. If the TLB is full, then replacement strategies range from least-recently used, LRU to random. Some TLBs allow some
entries to be wireddown, which means that theycannot be removedfrom the TLB. Typicallythese would be kernel frames. Some TLBs store
address-space identifiers, ASIDs, to keeptrackof whichprocess "owns" a particular entry in the TLB. This allows entries from multi ple
processes to be storedsimultaneouslyinthe TLB without granting one processaccessto some other process's me morylocation. Without this
feature the TLB has to be flushed clean with every process switch.
 The percentage of time that the desired information is found in the TLB is termed the hit ratio.

8
 Each operating systemhas its own methods for storingpage tables. Some allocate a page table for eachprocess. A pointer to the page table
is stored withthe other register values(like the instructioncounter) in the process control block. When the dispatcher is told to start a
process, it must reloadthe user registers anddefine the correct hardware page-table values from the stored user page table. Other
operatingsystems provide one or at most a few page tables, whichdecreases the overhead involved whenprocesses are context-switched.
 The hardware implementationof the page table can be done inseveralways. In the simplest case, the page table is implemented as a set of
dedicated registers. These registers shouldbe built withveryhigh-speed logic to make the paging-addresstranslationefficient. Everyaccess
to memorymust go throughthe pagingmap, soefficiencyis a major consideration. The CPU dispatcher reloads these registers, just as it
reloads the other registers. Instructions to load or modifythe page-table registers are, of course, privileged, so that only the operating
systemcanchange the memorymap. The DECPDP-11 is an example of such an architecture. The addressconsists of 16 bits, and the page
size is 8 KB. The page table thus consists of eight entries that are kept i n fast registers.
 The use of registers for the page table is satisfactoryif the page table is reasonablysmall (for example, 256 entries). Mos t contemporary
computers, however, allow the page table to be verylarge (for example, 1 millionentries). For these machines, the use of fast registers to
implement the page table is not feasible. Rather, the page table is kept inmainmemory, anda page-table base register(PTBR) points to the
page table. Changing page tables requires changing only this one registe r, substantially reducing context-switch time.
 The problemwith this approachis the time requiredto access a user memorylocation. If we want to accesslocation i, we mus t first index
into the page table, using the value inthe PTBRoffset bythe page number for i. Thistaskrequires a memoryaccess. It provides us with the
frame number, whichis combinedwiththe page offset to produce the actualaddress. We canthen access the desiredplace inmemory. With
this scheme, two memoryaccessesare neededto accessa byte (one for the page-table entry, one for the byte). Thus, memory access is
slowed by a factor of 2. This delay would be intolerable under most circumstances. We might as well resort to swapping!
 The standardsolutionto this problemis to use a special, small, fast lookuphardware cache called a translation look-aside buffer (TLB). The
TLB is associative, high-speedmemory. Each entryinthe TLB consists oftwo parts:a key(or tag) and a value. Whenthe associative memory
is presentedwithanitem, the itemis compared withall keys simultaneously. If the itemis found, the corresponding value field is returned.
The search is fast;a TLB lookupin modernhardware is part of the instructionpipeline, essentiallyadding noperformance penalty. To be able
to execute the searchwithin a pipeline step, however, the TLB must be kept small. It is typicallybetween32 and1,024 entries in size. Some
CPUs implement separate instructionanddata address TLBs. That candouble the number of TLB entries available, because those lookups
occur in different pipeline steps. We cansee inthisdevelopment anexample of the evolutionof CPU technology:systems have evolved from
having no TLBs to having multiple levels of TLBs, just as they have multiple levels of ca ches.
 The TLB is usedwith page tablesinthe following way. The TLB contains onlya fewof the page -table entries. When a logical address is
generatedbythe CPU, its page number is presentedto the TLB. If the page number is found, its frame number is immediatelyavailable and is
usedto accessmemory. As just mentioned, these steps are executed as part of the instruction pipeline within the CPU, adding no
performance penaltycomparedwitha system that does not implement paging. If the page number is not in the TLB (knownas a TLB miss), a
memoryreference to the page table must be made. Depending onthe CPU, this maybe done automaticallyinhardware or via an i nterrupt
to the operatingsystem. When the frame number is obtained, we can use it to access memory(Figure 8.14). In addition, we add the page
number and frame number to the TLB, sothat theywill be foundquicklyon the next reference. Ifthe TLB is alreadyfull ofe ntries, an existing
entrymust be selected for replacement. Replacement policies range from least recentlyused(LRU)throughround-robin to random. Some
CPUs allow the operating system to participate inLRU entryreplacement, while others handle the matter themselves. Furthermo re, some
TLBs allow certain entriesto be wireddown, meaning that theycannot be removedfrom the TLB. Typically, TLB entriesfor key kernel code
are wired down.
 The percentage of times that the page number of interest is foundinthe TLB is called the hit ratio. An 80-percent hit ratio, for example,
means that we findthe desiredpage number in the TLB 80 percent of the time. If it takes 100 nanoseconds to access memory, then a
mapped-memoryaccess takes 100 nanoseconds whenthe page number is inthe TLB. If we fail to findthe page number in the TLB then we
must first access memoryfor the page table andframe number (100 nanoseconds) and then access the desired byte in memory (100
nanoseconds), for a total of200 nanoseconds. (We are assumingthat a page-table lookup takes only one memory access, but it can take
more, as we shall see.) To find the effective memory-access time, we weight the case by its probability:
effective access time = 0.80 × 100 + 0.20 × 200 = 120 nanoseconds
In this example, we suffer a 20-percent slowdowninaverage memory-access time (from100 to 120 nanoseconds). For a 99-percent hit ratio,
which is much more realistic, we have
effective access time = 0.99 × 100 + 0.01 × 200 = 101 nanoseconds
This increased hit rate produces only a 1 percent slowdown in access time.
 As we noted earlier, CPUs todaymayprovide multiple levels of TLBs. Calculating memoryaccess times in modern CPUs is theref ore much
more complicated thanshown in the example above. For instance, the Intel Core i7 CPU has a 128-entryL1 instruction TLB and a 64-entry L1
data TLB. Inthe case of a miss at L1, it takesthe CPU six cyclesto checkfor the entryinthe L2 512-entryTLB. A miss inL2 means that the CPU
must either walk through the page-table entries inmemoryto find the associated frame address, which can take hundreds of cycles, or
interrupt to the operating system to have it do the work.

9
Protection
The page table canalsohelpto protect processes from accessing memorythat theyshouldn't, or their ownmemoryinways that theyshouldn't. A bit
or bits canbe addedto the page table to classifya page as read-write, read-only, read-write-execute, or some combination ofthese sorts ofthings.
Then eachmemoryreference canbe checkedto ensure it is accessingthe memoryinthe a ppropriate mode.
 Valid/ invalid bits canbe addedto "maskoff"entries inthe page table that are not inuse bythe current process, as shownbythe figure 8.15
below.
 Note that the valid/invalidbits describedabove cannot block all illegal memory acce sses, due to the internal fragmentation. (Areas of
memoryin the last page that are not entirelyfilled by the process, and may contain data left over by whoever used that fram e last.)
 Manyprocessesdo not use all of the page table available to them, particularlyinmodernsystems with very large potential page tables.
Rather thanwaste memorybycreatinga full-size page table for every process, some systems use a page-table length register, PTLR, to
specify the length of the page table.
Shared Pages
 Pagingsystems canmake it veryeasyto share blocks of memory, bysimplyduplicatingpage numbers in multiple page frames (Confusingly
phrased, diagram makes it clear; the frame numbers are duplicated). This may be done with either code or data.
 If code is reentrant, that means that it does not write to or change the code in anyway(it is non-self-modifying), andit is therefore safe to
re-enter it. More importantly, it means the code canbe sharedbymultiple processes, so long as each has their own copy of the data and
registers, including the instruction register.
 In the example givenbelowinFig. 8.16, three different users are running the editor simultaneously, but the code is onlyloadedintomemory
(in the page frames) one time.
STRUCTURE OFTHE PAGE TABLE
(Hierarchical Page Tables, HashedPage Tables, Inverted Page Tables)
Hierarchical Page Tables
 Most modern computer systems support logical address spaces of 2^32 to 2^64. With a
2^32 address space and4K(2^12) page sizes, thisleave 2^20 entriesinthe page table. At 4
bytes per entry, this amounts to a 4 MB page table, whichis too large to reasonablykeep in
contiguous memory. (And to swap inand out of memorywith each process switch.) Note
that with 4K pages, this would take 1024 pages just to hold the page table!
 One optionis to use a two-tier paging system, i.e. to page the page table. For example, the
20 bits describedabove couldbe broken down into two 10-bit page numbers. The first
identifiesanentryinthe outer page table, which identifies where in memory to find one
page ofaninner page table. The second 10 bits finds a specific entry in that inner page
table, which inturn identifies a particular frame in physical memory. (The remaining 12 bits
of the 32 bit logicaladdress are the offset within the 4K frame.). Note how frame size is
same as page size.

10
 VAX Architecture divides32-bit addressesinto 4 equal sizedsections, andeach page is 512 bytes, yielding anaddress form of:
 With a 64-bit logical addressspace and4Kpages, there are 52 bits worthof page numbers, whichis still too manyeven for two -level paging.
One could increase the paginglevel, but with 10-bit page tables it wouldtake 7 levels of indirection, which would be prohibitively slow
memory access. So some other approach must be used.
Hashed Page Tables: One commondata structure for accessing data that is sparsely
distributed over a broadrange of possible values is withhashtables. Figure 8.16 below
illustrates a hashed page table using chain-and-bucket hashing:
Inverted Page Tables: In this approach, insteadof a table listingallof the pages for a particular process, aninverted page table lists all of the pages
currentlyloadedinmemory, for all processes. (i.e. there is one entryper frame instead ofone entryper page.)Access to an invertedpage table can be
slow, as it maybe necessaryto searchthe entire table inorder to findthe desired page (or to discover that it is not there.)Hashing the table ca n help
speedupthe search process. Inverted page tables prohibit the normalmethod ofimplementingsharedmemory, which is to mapm ultiple logical pages
to a common physical frame. (Because eachframe is now mappedto one and only one process.)
EXAMPLE: INTEL 32 AND 64-BIT ARCHITECTURES (Optional)
IA-32 Segmentation
 The Pentium CPU provides bothpure segmentationand segmentationwith paging. In the
latter case, the CPU generates a logical address (segment-offset pair), which the
segmentationunit converts intoa logical linear address, which in turn is mapped to a
physical frame by the paging unit, as shown in Figure 8.21:
 Readingthe part on IA-32 segmentationin“ReadLater” sectionwillbe helpful. It has the
details.
 Specifics:
 Linux-specific stuff
o Others (In“Further Reading”)
 Hardware-specific:
o Device drivers communicate with their hardware via interrupts and "memory" accesses, sendingshort instructions for example to
transfer data from the hard drive to a specified location in main memory. The disk controller monitors the bus for such
instructions, transfers the data, andthen notifies the CPU that the data is there with another interrupt, but the CPU never gets
direct access to the disk.
o A memorymanagement unit (MMU), sometimes calledpagedmemorymanagement unit (PMMU), is a computer hardware unit
havingall memoryreferences passedthroughitself, primarilyperforming the translationof virtual memoryaddresses to physi cal
addresses. It is usuallyimplemented as part of the central processing unit (CPU), but it also can be in the form of a se parate
integrated circuit.

11
o Memoryaccessesto registers are veryfast, generallyone clock tick, anda CPU maybe able to execute more than one machine
instruction per clocktick. Memoryaccesses to main memoryare comparatively slow, and may take a number of clock ticks to
complete. Thiswouldrequire intolerable waitingbythe CPU if it were not for an intermediaryfast memorycache built into m ost
modern CPUs.
o The basic idea of the cache is to transfer chunks of memoryat a time from the main memory to the cache, and then to access
individual memory locations one at a time from the cache.
o Flashmemorycan onlybe written to a limitednumber of times before it becomesunreliable. Bandwidth to flash memory is also
lower thanthat to harddisks.
To be cleared
 Sharedmemoryimplementation, Copy-on-write forking
 Okay, a baseand a limit register define a logical addressspace, but howit that translatedto physical address? Which chapter?
 Addresses boundat compile time or loadtime have identical logical and physical addresses. Addressescreatedat executiontime, however,
have different logical andphysical addresses (Logical address=virtual address)
 Swappingis a veryslow process comparedto other operations. For efficient processor scheduling the CPU time slice shouldbe significantly
longer thanthis lost transfer time.
 How can segmentationbe combined withpaging(Sec. 8.4.2)
 What are s andd in Fig. 8.8
Q’s Later
 Is the addressinthe MAR translatedbythe MMU to physicaladdress?
 If processesare loadedcontiguouslyin memory, what if theygrow in size?
Glossary
Copy-on-write forking, MAR, MDR, Logical/Virtual address, logical andphysical address space of program, Dynamic loading (Loads only required
routines ofa program), Dynamic linking andstatic linking (Stubs/DLLs), backingstore(diskwhere x are swappedout to), relocationregister, limit
register, fragmentation(internal,external), segmentation, segment table,Trapto operatingsystem, host controller(=host adapter, NICis one),
reentrant code, Memoryallocation(first fit, best fit andworst fit), External andInternal fragmentation, Frames, pages, page-table base register(PTBR),
TLB
ReadLater
IA-32 Segmentation
 The Pentium architecture allows segments to be as large as 4 GB, ( 24 bits of offset ).
 Processes can have as many as 16K segments, divided into two 8K groups:
o 8K private to that particular process, stored in the Local Descriptor Table,
LDT.
o 8K sharedamong all processes, storedinthe Global Descriptor Table, GDT.
 Logical addresses are ( selector, offset ) pairs, where the selector is made up of16 bits:
o A 13 bit segment number ( up to 8K)
o A 1 bit flag for LDT vs. GDT.
o 2 bits for protection codes.
 The descriptor tables contain 8-byte
descriptions of each segment, including base and li mit registers.
 Logical linear addresses are generated bylookingthe selector upinthe descriptor table andadding the appropriate base add ress to the
offset, as shown in Figure 8.22:
IA-32 Paging
 Pentiumpagingnormallyuses a two-tier pagingscheme, withthe first 10 bits beinga
page number for an outer page table (a.k.a. page directory), and the next 10 bits
being a page number within one of the 1024 inner page tables, leaving the remaining
12 bits as an offset into a 4Kpage.
 A special bit in the page directorycan indicate that this page is a 4MB page, in which
case the remaining 22 bits are allusedas offset and the inner tier ofpage tables is
not used.
 The CR3 register points to the page directoryfor the current process, as shownin
Figure 8.23 below.
 If the inner page table is currentlyswapped out to disk, thenthe page directorywill
have an"invalidbit"set, andthe remaining 31 bits provide informationonwhere to
find the swapped out page table onthe disk (8.25 in “Further Reading” is next to the
below images).

12
Further Reading
 Do I/O operations onlyintoandout ofOS buffers
 8.2.2 Swappinginmobile systems--TYPICALLY not supported(9th
Edition). Toucheson Apple iOS and ANDROID!! Prior to
terminatinga process, Android writes its application state to
flashmemoryfor quick restarting.
 http://www.cs.columbia.edu/~junfeng/10sp-
w4118/lectures/l19-mem.pdf (Good concise content)
 https://en.wikipedia.org/wiki/Host_adapter


Grey Areas
 The amount of memorylost to fragmentationmayvarywith algorithm, usage patterns, andsome design decisions suchas whiche nd ofa
hole to allocate and whichend to save on the free list.
CHEW
 Whether the logicalpage size is equal to the physicalframe size (Yes!)
 Note that paging is like having a table ofrelocation registers, one for each page of the logicalmemory
 Page table entries (frame numbers) are typically32 bit numbers, allowing access to 2^32 physical page frames. Ifthose frames are 4 KB in
size each, that translates to 16 TB of addressable physical memory. (32 + 12 = 44 bits ofphysical address space.)
 One optionis to use a set of registers for the page table. For example, the DECPDP-11 uses16-bit addressingand8 KB pages, resultingin
only8 pages per process. (It takes 13 bits to address 8 KB of offset, leavingonly3 bits to define a page number.)
 On page 12 of the lecture, do the TLB mathunder "(EighthEditionVersion)". Required.
 More on TLB
 Apropos page 10 second bullet point of lecture, does it implicitlymean that the offset for bothpage number andframer number should be
same?
 Page 15 VAXArchitecture divides 32-bit addresses into 4 equalsized sections, andeachpage is 512 bytes, yieldinganaddress form of:
 What are segmentationunit and paging unit?
 Can parts of a page table/ page directorybe swappedout too?

Main memoryfinal

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Main memoryfinal

Similar to Main memoryfinal (20)

More from marangburu42

More from marangburu42 (20)

Recently uploaded

Recently uploaded (19)

Main memoryfinal