3. • Location :
• The term location refers to whether memory is internal and exter- nal to the
computer. Internal memory is often equated with main memory.
• But there are other forms of internal memory. The processor requires its own local
memory, in the form of registers.
• Further, the control unit portion of the processor may also require its own internal
memory.
• Cache is another form of internal memory.
• External memory consists of peripheral storage devices that are accessible to the
processor via I/O controllers.
• Unit of Transfer : For internal memory, the unit of transfer is equal to the number of
electrical lines into and out of the memory module. This may be equal to the word
length, but is often larger, such as 64, 128, or 256 bits.
4. • Sequential Access : Memory is organized into units of data, called records. Access must be
made in a specific linear sequence. E.g Tape units.
• Direct Access : As with sequential access, direct access involves a shared read–write
mechanism. However, individual blocks or records have a unique address based on physical
location. Access is accomplished by direct access to reach a general vicinity plus sequential
searching, counting, or waiting to reach the final location. Again, access time is variable.
Disk units
• Random Access : Each addressable location in memory has a unique, physically wired-in
addressing mechanism.
• Access time : The time from the instant that an address is presented to the memory to the
instant that data have been stored or made available for use. For non-random-access
memory, access time is the time it takes to position the read–write mechanism at the desired
location.
5. • Transfer rate : This is the rate at which data can be transferred into or out of a
memory unit. For random-access memory, it is equal to 1/(cycle time).
For non-random-access memory, the following relationship holds:
where
Tn = Average time to read or write n bits
TA = Average access time
n = Number of bits
R = Transfer rate, in bits per second (bps)
6. • As one goes down the hierarchy, the following occur:
• Decreasing cost per bit
• Increasing capacity
• Increasing access time
• Decreasing frequency of access of the memory
by the processor
Memory Hierarchy
7. • A dynamic RAM (DRAM) is made with cells that store data as charge on capacitors.
The presence or absence of charge in a capacitor is interpreted as a binary 1 or 0.
Because capacitors have a natural tendency to discharge, dynamic
• RAMs require periodic charge refreshing to maintain data storage. The term dynamic
refers to this tendency of the stored charge to leak away, even with power
continuously applied.
Dynamic RAM
8. • The address line is activated when the bit value from this cell is to be read or written.
• The transistor acts as a switch that is closed (allowing current to flow) if a voltage is
applied to the address line and open (no current flows) if no voltage is present on the
address line.
• For the write operation, a voltage signal is applied to the bit line; a high volt- age
represents 1, and a low voltage represents 0. A signal is then applied to the address
line, allowing a charge to be transferred to the capacitor.
• For the read operation, when the address line is selected, the transistor turns on and
the charge stored on the capacitor is fed out onto a bit line and to a sense amplifier.
• The sense amplifier compares the capacitor voltage to a reference value and
determines if the cell contains a logic 1 or a logic 0.
9. • The readout from the cell discharges the capacitor, which must be restored to
complete the operation.
• Although the DRAM cell is used to store a single bit (0 or 1), it is essentially an analog
device. The capacitor can store any charge value within a range; a thresh- old value
determines whether the charge is interpreted as 1 or 0.
10. • In a SRAM, binary values are stored using traditional flip-flop logic-gate
configurations. A static RAM will hold its data as long as power is supplied to it.
Static RAM
11. • Four transistors (T1, T2, T3, T4) are cross connected in an arrangement that
produces a stable logic state.
• In logic state 1, point C1 is high and point C2 is low; in this state, T1 and T4 are off
and T2 and T3 are on.
• In logic state 0, point C1 is low and point C2 is high; in this state, T1 and T4 are on
and T2 and T3 are off.
• Both states are stable as long as the direct current (dc) voltage is applied. Unlike the
DRAM, no refresh is needed to retain data.
• As in the DRAM, the SRAM address line is used to open or close a switch. The
address line controls two transistors (T5 and T6).
• When a signal is applied to this line, the two transistors are switched on, allowing a
read or write operation. For a write operation, the desired bit value is applied to line B,
while its complement is applied to line B.
• For a read operation, the bit value is read from line B.
12. SRAM vs DRAM
Sr.
No
SRAM Sr. No DRAM
1 It consists of Number of flip-flops to store data. 1 It stores data as a change on capacitor.
2 It contains less memory cells per units area. 2 It contains more memory cells per unit area.
3 It is faster. 3 It is slower.
4 Refreshing circuitry is not required. 4 Refreshing circuitry is required
5 Cost and size is more. 5 Cost and size is less.
6 Consume more power. 6 Consume less power.
7
Generally in smaller applications like CPU cache
memory and hard drive buffers
7
Commonly used as the main memory in personal
computers
8 Access is easy 8 Access is hard
13. • A ROM is created like any other integrated circuit chip, with the data actually wired
into the chip as part of the fabrication process. This presents two problems:
• The data insertion step includes a relatively large fixed cost, whether one or
thousands of copies of a particular ROM are fabricated.
• There is no room for error. If one bit is wrong, the whole batch of ROMs must be
thrown out.
ROM
14. • PROM
• EPROM (Read mostly memory)
• EEPROM (Read mostly memory)
• Flash (Read mostly memory)
• A variation on read-only memory is the read-mostly memory, which is useful for
applications in which read operations are far more frequent than write operations but
for which nonvolatile storage is required.
Types of ROM
15. • When only a small number of ROMs with a particular memory content is needed, a
less expensive alternative is the programmable ROM (PROM).
• Like the ROM, the PROM is nonvolatile and may be written into only once.
• For the PROM, the writing process is performed electrically and may be performed by
a supplier or customer at a time later than the original chip fabrication.
• Special equipment is required for the writing or “programming” process.
• PROMs provide flexibility and convenience.
PROM
16. • The optically erasable programmable read-only memory (EPROM) is read and written
electrically, as with PROM.
• However, before a write operation, all the storage cells must be erased to the same
initial state by exposure of the packaged chip to ultraviolet radiation.
• Erasure is performed by shining an intense ultraviolet light through a window that is
designed into the memory chip.
• This erasure process can be performed repeatedly; each erasure can take as much
as 20 minutes to perform. Thus, the EPROM can be altered multiple times and, like
the ROM and PROM, holds its data virtually indefinitely.
• For comparable amounts of storage, the EPROM is more expensive than PROM, but
it has the advantage of the multiple update capability.
EPROM
17. • A more attractive form of read-mostly memory is electrically erasable pro- grammable
read-only memory (EEPROM).
• This is a read-mostly memory that can be written into at any time without erasing prior
contents; only the byte or bytes addressed are updated.
• The write operation takes considerably longer than the read operation, on the order of
several hundred microseconds per byte.
• The EEPROM combines the advantage of nonvolatility with the flexibility of being
updatable in place.
• EEPROM is more expensive than EPROM and also is less dense, supporting fewer
bits per chip.
EEPROM
18. • Another form of semiconductor memory is flash memory (so named because of the speed
with which it can be reprogrammed).
• Flash memory is intermediate between EPROM and EEPROM in both cost and
functionality.
• Like EEPROM, flash memory uses an electrical erasing technology. An entire flash
memory can be erased in one or a few seconds, which is much faster than EPROM.
• In addition, it is possible to erase just blocks of memory rather than an entire chip.
• Flash memory gets its name because the microchip is organized so that a section of
memory cells are erased in a single action or “flash.” However, flash memory does not
provide byte-level erasure.
Flash Memory
19. • It is a technique for compensating the relatively slow speed of DRAM(Dynamic RAM).
• In this technique, the main memory is divided into memory banks which can be
accessed individually without any dependency on the other.
• For example: If we have 4 memory banks(4-way Interleaved memory), with each
containing 256 bytes, then, the Block Oriented scheme(no interleaving), will assign
virtual address 0 to 255 to the first bank, 256 to 511 to the second bank.
• But in Interleaved memory, virtual address 0 will be with the first bank, 1 with the
second memory bank, 2 with the third bank and 3 with the fourth, and then 4 with the
first memory bank again.
Interleaved Memory
20.
21. • CPU can access alternate sections immediately without waiting for memory to be
cached.
• Memory interleaving is a technique for increasing memory speed. It is a process that
makes the system more efficient, fast and reliable.
• For example: In the above example of 4 memory banks, data with virtual address 0, 1,
2 and 3 can be accessed simultaneously as they reside in separate memory banks,
hence we do not have to wait for completion of a data fetch, to begin with the next.
• An interleaved memory with n banks is said to be n-way interleaved. In an interleaved
memory system, there are still two banks of DRAM but logically the system seems
one bank of memory that is twice as large.
22. • Locality of reference : The phenomenon of locality of reference states that, when a
block of data is fetched into the cache to satisfy a single memory reference, it is likely
that there will be future references to that same memory location or to other words in
the block.
• Temporal Locality : The concept that a resource that is referenced at one point in time
will be referenced again sometime in the near future
• Spatial locality : The concept that likelihood of referencing a resource is higher if a
resource near it was just referenced.
Cache Memory Principles
23. • Main memory consists of up to 2n addressable words, with each word having a unique
n-bit address.
• For mapping purposes, this memory is considered to consist of a number of fixed-
length blocks of K words each. That is, there are M = 2n / K blocks in main memory.
• The cache consists of m blocks, called lines. Each line contains K words, plus a tag of
a few bits.
• Each line also includes control bits, such as a bit to indicate whether the line has been
modified since being loaded into the cache.
• The length of a line, not including tag and control bits, is the line size.
Cache Organization
24.
25. • If a word in a block of memory is read, that block is transferred to one of the lines of
the cache.
• Because there are more blocks than lines, an individual line cannot be uniquely and
permanently dedicated to a particular block.
• Thus, each line includes a tag that identifies which particular block is currently being
stored.
• The tag is usually a portion of the main memory address
26. • Cache Addresses :
• When virtual addresses are used, the system designer may choose to place the
cache between the processor and the MMU or between the MMU and main memory
Elements of Cache Design
27. • One obvious advantage of the logical cache is that cache access speed is faster than
for a physical cache, because the cache can respond before the MMU performs an
address translation.
• The disadvantage has to do with the fact that most virtual memory systems supply
each application with the same virtual memory address space.
• Cache Size : The larger the cache, the larger the number of gates involved in
addressing the cache. The result is that large caches tend to be slightly slower than
small ones
• Mapping Function : A means is needed for determining which main memory block
currently occupies a cache line. The choice of the mapping function dictates how the
cache is organized.
28. • The simplest technique, known as direct mapping, maps each block of main memory
into only one possible cache line. The mapping is expressed as
• i = j % m
• where i = cache line number
• j = main memory block number
• m = number of lines in the cache
Direct Mapping
29. • The least significant w bits identify a unique word or byte within a block of main
memory.
• The remaining s bits specify one of the 2s blocks of main memory.
• The cache logic interprets these s bits as a tag of s - r bits (most significant portion)
and a line field of r bits.
• This latter field identifies one of the m = 2r lines of the cache.
• 2w = Block Size = K
• 2s = No. of Blocks = M
• 2r = No. of lines in cache = m
30.
31. • The direct mapping technique is simple and inexpensive to implement.
• Its main disadvantage is that there is a fixed cache location for any given block.
• Thus, if a program happens to reference words repeatedly from two different blocks
that map into the same line, then the blocks will be continually swapped in the cache,
and the hit ratio will be low (a phenomenon known as thrashing).
• One approach to lower the miss penalty is to remember what was discarded in case it
is needed again. Since the discarded data has already been fetched, it can be used
again at a small cost. Such recycling is possible using a victim cache.
32. • A main memory block can load into any line (block)of cache
• Hence, In this case, the cache control logic interprets a memory address simply as a
Tag and a Word field.
• The Tag field uniquely identifies a block of main memory.
• To determine whether a block is in the cache, the cache control logic must
simultaneously examine every line’s tag for a match.
Associative Mapping
33. • With associative mapping, there is flexibility as to which block to replace when a new
block is read into the cache.
• Replacement algorithms are designed to maximize the hit ratio.
• The principal disadvantage of associative mapping is the complex circuitry required to
examine the tags of all cache lines in parallel.
• s = tag bits
• w = word bits
34.
35. • Set-associative mapping is a compromise that exhibits the strengths of both the direct
and associative approaches while reducing their disadvantages.
• In this case, the cache consists of a number sets, each of which consists of a number
of lines. The relationships are
m = v * k
• i = j % v
• i = cache set number
• j = main memory block number
• m = number of lines in the cache
• v = number of sets
• k = number of lines in each set
Set Associative Mapping
36. • With set-associative mapping, block Bj can be mapped into any of the lines of set j.
• With fully associative mapping, the tag in a memory address is quite large and must
be compared to the tag of every line in the cache.
• With k-way set-associative mapping, the tag in a memory address is much smaller
and is only compared to the k tags within a single set.
37.
38.
39. • If at least one write operation has been performed on a word in a line of the cache,
then main memory must be updated by writing the line of cache out to the block of
memory before bringing in the new block.
• If a word has been altered only in the cache, then the corresponding memory word is
invalid. Further, if the I/O device has altered main memory, then the cache word is
invalid.
• A more complex problem occurs when multiple processors are attached to the same
bus and each processor has its own local cache. Then, if a word is altered in one
cache, it could conceivably invalidate a word in other caches.
Write Policy
40. • Write through : The simplest technique is called write through.
• Using this technique, all write operations are made to main memory as well as to the
cache, ensuring that main memory is always valid.
• Any other processor–cache module can monitor traffic to main memory to maintain
consistency within its own cache.
• The main disadvantage of this technique is that it generates substantial memory traffic
and may create a bottleneck.
• Write Back : An alternative technique, known as write back, minimizes memory writes.
• With write back, updates are made only in the cache.
• When an update occurs, a dirty bit, or use bit, associated with the line is set. Then, when
a block is replaced, it is written back to main memory if and only if the dirty bit is set.
• The problem with write back is that portions of main memory are invalid, and hence
accesses by I/O modules can be allowed only through the cache.
41. • In contemporary multiprocessor systems, it is customary to have one or two levels of
cache associated with each processor.
• This organization is essential to achieve reasonable performance. It does, however,
create a problem known as the cache coherence problem.
• The essence of the problem is this: Multiple copies of the same data can exist in
different caches simultaneously, and if processors are allowed to update their own
copies freely, an inconsistent view of memory can result.
• It is clear that a write-back policy can result in inconsistency. If two caches contain the
same line, and the line is updated in one cache, the other cache will unknowingly
have an invalid value. Subsequent reads to that invalid line produce invalid results.
Cache Coherency
42. • Software cache coherence schemes rely on the compiler and operating system to
deal with the problem.
• Compiler-based coherence mechanisms perform an analysis on the code to
determine which data items may become unsafe for caching, and they mark those
items accordingly.
• The operating system or hardware then prevents noncacheable items from being
cached. The simplest approach is to prevent any shared data variables from being
cached.
Software Solutions
43. • Directory protocols collect and maintain information about where copies of lines
reside. Typically, there is a centralized controller that is part of the main memory
controller, and a directory that is stored in main memory.
• Before a processor can write to a local copy of a line, it must request exclusive access
to the line from the controller.
• Before granting this exclusive access, the controller sends a message to all
processors with a cached copy of this line, forcing each processor to invalidate its
copy.
• After receiving acknowledgments back from each such processor, the controller
grants exclusive access to the requesting processor.
Directory Protocols
44. • When another processor tries to read a line that is exclusively granted to another
processor, it will send a miss notification to the controller.
• The controller then issues a command to the processor holding that line that requires
the processor to do a write back to main memory. The line may now be shared for
reading by the original processor and the requesting processor.
• Directory schemes suffer from the drawbacks of a central bottleneck and the
overhead of communication between the various cache controllers and the central
controller.
45. • Snoopy protocols distribute the responsibility for maintaining cache coherence among
all of the cache controllers in a multiprocessor.
• When an update action is performed on a shared cache line, it must be announced to
all other caches by a broadcast mechanism. Each cache controller is able to “snoop”
on the network to observe these broadcasted notifications, and react accordingly.
• Snoopy protocols are ideally suited to a bus-based multiprocessor, because the
shared bus provides a simple means for broadcasting and snooping. However,
because one of the objectives of the use of local caches is to avoid bus accesses,
care must be taken that the increased bus traffic required for broadcasting and
snooping does not cancel out the gains from the use of local caches.
Snoopy Protocol
46. • Two basic approaches to the snoopy protocol have been explored: write invalidate
and write update (or write broadcast).
• With a write-invalidate protocol, there can be multiple readers but only one writer at a
time.
• Initially, a line may be shared among several caches for reading purposes.
• When one of the caches wants to per- form a write to the line, it first issues a notice
that invalidates that line in the other caches, making the line exclusive to the writing
cache. Once the line is exclusive, the owning processor can make cheap local writes
until some other processor requires the same line.
• With a write-update protocol, there can be multiple writers as well as multiple readers.
When a processor wishes to update a shared line, the word to be updated is
distributed to all others, and caches containing that line can update it.
47. • To provide cache consistency on an SMP, the data cache often supports a protocol
known as MESI. For MESI, the data cache includes two status bits per tag, so that
each line can be in one of four states:
• Modified : The line in the cache has been modified (different from main memory) and
is available only in this cache.
• Exclusive : The line in the cache is the same as that in main memory and is not
present in any other cache.
• Shared : The line in the cache is the same as that in main memory and may be
present in another cache.
• Invalid : The line in the cache does not contain valid data.
MESI Protocol
48. • When a read miss occurs in the local cache, the processor initiates a memory read to
read the line of main memory containing the missing address.
• The processor inserts a signal on the bus that alerts all other processor/cache units to
snoop the transaction.
• Line is in Exclusive State : The responding processor transitions the state of its copy
from exclusive to shared, and the initiating processor reads the line from main memory
and transitions the line in its cache from invalid to shared.
• Line is in Shared State : The initiating processor reads the line and transitions the line in
its cache from invalid to shared. The responding processor does not witness any
transition.
Read Miss (MESI Protocol)
49. • Line is in Modified State : If one other cache has a modified copy of the line, then
that cache blocks the memory read and provides the line to the requesting cache over
the shared bus. The responding cache then changes its line from modified to shared.
The initiating cache changes then changes the line’s state from invalid to shared.
• Line is in Invalid State : The initiating processor simply reads the line and transitions
the line in its cache from invalid to exclusive.
50. • When a read hit occurs on a line currently in the local cache, the processor simply
reads the required item. There is no state change
Read Hit (MESI Protocol)
51. • Line is in Modified State : The responding processor writes the line back to main
memory(since the line is in modified state) and transitions the state of the line from
modified to invalid since the initiating processor is going to modify this line.
• Line is in Shared or Exclusive State : Each cache which has a clean copy of the line
invalidates the line since it is going to be modified by the initiating processor.
Write Miss (MESI Protocol)
52. • Line is in Shared State : Each responding cache invalidates its copy of the line since
the line will be modified. The initiating processor then changes the state of the line
from shared to modified.
• Line is in Exclusive State : The processor already has exclusive control of this line,
and so it simply performs the update and transitions its copy of the line from exclusive
to modified.
• Line is in Modified State : The processor already has exclusive control of this line
and has the line marked as modified, and so it simply performs the update.
Write Hit (MESI Protocol)
53. • All L2 caches are engaged in snoopy protocols.
• We have L1 caches which cannot connect to snoopy protocols.
• For any line that is present in both an L2 cache and its corresponding L1 cache, the
L1 line state should track the state of the L2 line.
• A simple means of doing this is to adopt the write-through policy in the L1 cache; in
this case the write through is to the L2 cache and not to the memory.
• If the L1 cache has a write-back policy, the relationship between the two caches is
more complex.
L1-L2 Cache Consistency
54. • The basic method for implementing paging involves breaking physical memory into fixed
sized blocks called frames and breaking logical memory into blocks of the same size
called pages.
• When a process if to be executed its pages are loaded into any available memory frames
from their source
• Every address generated by the CPU is divided into two parts : page number and page
offset.
• For paging there is no need for limit register as in segmentation since page size is fixed.
• The page number is used as an index in page table which contain the base addresses of
each page in the memory.
• This address is combined with the offset to construct a physical address.
Paging
55.
56. • Because the page table is of variable length, depending on the size of the process, we cannot
expect to hold it in registers. Instead, it must be in main memory to be accessed.
• We can have a register that holds the starting address of the page table for a particular
process.
• Clearly, the amount of memory devoted to page tables alone could be unacceptably high. To
overcome this problem, most virtual memory schemes store page tables in virtual memory
rather than real memory. This means that page tables are subject to paging just as other
pages are.
• Some processors make use of a two-level scheme to organize large page tables. In this
scheme, there is a page directory, in which each entry points to a page table. Thus, if the
length of the page directory is X, and if the maximum length of a page table is Y, then a
process can consist of up to X * Y pages.
Page Table Structure
57. • If each process has its own page table, then the amount of memory consumed will be
large. To overcome this, one approach is to use Inverted page table.
• In this approach, the page number portion of a virtual address is mapped into a hash
value using a simple hashing function. The hash value is a pointer to the inverted
page table, which contains the page table entries.
• There is one entry in the inverted page table for each real memory page frame rather
than one per virtual page. Thus a fixed proportion of real memory is required for the
tables regardless of the number of processes or virtual pages supported.
• The page table’s structure is called inverted because it indexes page table entries by
frame number rather than by virtual page number.
Inverted Page Table
58. • A hash function maps numbers in the range 0 through M into numbers in the range 0
through N, where M > N .
• The output of the hash function is used as an index into the hash table.
• Since more than one input maps into the same output, it is possible for an input item
to map to a hash table entry that is already occupied.
• In that case, the new item must overflow into another hash table location. Typically,
the new item is placed in the first succeeding empty space, and a pointer from the
original location is provided to chain the entries together.
59. • The TLB is associative, high speed memory. Each entry in the TLB consists of two
parts, a key and a value.
• When the associative memory is presented with an item, the item is compared with all
keys simultaneously.
• If the item is found, the corresponding value field is returned.
• The search is fast; a TLB lookup in modern hardware is a part of the instruction
pipeline, essentially adding no performance penalty.
• It is typically between 32 to 1024 entries in size.
Translation Lookaside Buffer
60. • The TLB contains only few of the page table entries (similar to cache).
• When a logic address is generated by the CPU, its page number is presented to the
TLB.
• If the page number is found, its frame number is immediately available ad is used to
access memory
• If the page number is not found (i.e TLB miss), a memory reference to the page table
must be made
• When the frame number is obtained, we can use it to access memory. In addition we
add the page number and frame number to the TLB, so that they will be found quickly
on next reference.
• If TLB is full of entries then replacement must be done.
61. • Some TLBs store Address Space Identifiers (ASIDs) in each TLB entry. An ASID
uniquely identifies each process and is used to provide address space protection for
that process.
• If the TLB doesn't support separate ASIDs then every time a new page table is
selected, the TLB must be flushed to ensure that the next executing process doesn't
use the wrong translation information.
62. • Segmentation is a memory management scheme that supports programmers view of
memory.
• Segments vary in length. Elements within a segment are identified by their offset from
the beginning of the segment.
• Each segment has a segment number and a length. Thus a logical address specifies
the segment number and offset
• This logical address is presented to the segmentation table which consists of limit
(length of segment) and base (starting physical address) of the segment.
• If the offset lies between the base and limit then it is legal and is added with the base
to produce a physical address where the segment is located.
• If the offset is not legal, then a trap occurs.
Segmentation
63. • Advantages of Segmentation :
• It simplifies the handling of growing data structures. The data structure can be
assigned its own segment, and the OS will expand or shrink the segment as needed.
It allows programs to be altered and recompiled independently without requiring that
an entire set of programs be relinked and reloaded. Again, this is accomplished using
multiple segments.
• It lends itself to sharing among processes. A programmer can place a utility program
or a useful table of data in a segment that can be addressed by other processes.
64. • In the most of the computer system, the physical main memory is not as large as address space of the
processor.
• Suppose user tries to run a program.
• If the program run by the user does not completely fit into the main memory then the parts of its currently
being executed are stored in main memory and remaining portion is stored in secondary storage device
such as HDD.
• When a new part of program is to be brought into main memory for execution and if the memory is full, it
must replace another part which is already is in main memory.
• As this secondary memory is not actually part of system memory, so for CPU, secondary memory is
considered as Virtual Memory.
• Virtual memory is a memory management technique that is implemented using both hardware and software.
• It maps memory addresses used by a program, called virtual addresses, into physical addresses in
computer memory.
Virtual Memory
65. • Benefits :
• Large programs can be written, as virtual space available is huge compared to
physical memory.
• Less I/O required, leads to faster and easy swapping of processes.
• More physical memory available, as programs are stored on virtual memory, so
they occupy very less space on actual physical memory.
• Disadvantages :
• Applications run slower if the system is using virtual memory.
• It takes more time to switch between applications.
• Less hard drive space for your use.
• It reduces system stability.