SlideShare a Scribd company logo
1 of 106
Memory
Programs and the data they operate on are held in
the main memory of the computer during execution
The execution speed of programs is highly dependent
on the speed at which instructions and data
transferred between the CPU and the main memory
Ideally the memory would be very fast, large and
inexpensive
But technology do not permit such single memory, so
hirechay of memory is used
First we discuss main memory (RAM and ROM)
Basic Concepts
 The maximum size of the memory that can be used in any computer is
determined by the addressing scheme.
16-bit addresses = 216 = 64K memory locations
 Most modern computers are byte addressable.: successive addresses refer to
successive byte locations
2
k
4
- 2
k
3
- 2
k
2
- 2
k
1
- 2
k
4
-
2
k
4
-
0 1 2 3
4 5 6 7
0
0
4
2
k
1
- 2
k
2
- 2
k
3
- 2
k
4
-
3 2 1 0
7 6 5 4
Byte address
Byte address
(a) Big-endian assignment (b) Little-endian assignment
4
Word
address
•
•
•
•
•
•
memeory
 Let address is 32 bit. When 32 bit address is
sent from the CPU to memory unit, the higher
order 30 bit determine which word will be
accessed.
 If a byte quantity is specified the lower order 2
bit of the address specify which byte location
is specified.
 In byte addressable computers another control
line may be added to indicate when only a
byte rather than a full word of n bits is
transferred.
Traditional Architecture
Up to 2k
addressable
MDR
MAR
Figure 5.1. Connection of the memory to the processor.
k-bit
address bus
n-bit
data bus
Control lines
( , MFC, etc.)
Processor Memory
locations
Word length =n bits
W
R /
Basic Concepts
 Memory access time : time that elapses between the initiation of an
operation and the completion of that operation e.g. the time between
the read and MFC signals
 Memory cycle time: minimum time delay required between the
initiation of two successive memory operations. E.g. time between
two successive read operation. This cycle time is longer than access
time
 RAM – any location can be accessed for a Read or Write operation in
some fixed amount of time that is independent of the location’s
address.
 The basic technology for implementing main memories uses a
semiconductor integrated circuits.
 The CPU process instructions and data faster than they can be
fetched form a main memory. Memory cycle time is the bottleneck in
the system
RAM
 The memory cycle time is bottleneck in the
sytem
 One way to reduce the memory access
time is Cache memory and interleaving
 Cache is small and faster memory inserted
between CPU and larger, slower main
memory
 Virtual memory increase apparent size of
the main memory, memory management
unit
Semiconductor RAM
Memories
Semiconductor memories are available in a
wide range of speeds
Their cycle time range from few hundred ns
to less than 10nsec
RAM
 Memory cells are usually organized in the form of an array
in which each cell is capable of soring one bit of
information
 Each row of cells constitutes a memory word and all cell
of a row are connected to a common line referred to as
the word line, which is driven by the address decoder on
the chip.
 The cells in each column are connected to a Sense/Write
circuit by two bit lines
 The Sense/Write circuits are connected to the data
input/output lines of the chip
 During a read operation these circuits sense to read the
information stored in the cells selected by a word line
Internal Organization of
Memory Chips
FF
Figure 5.2. Organization of bit cells in a memory chip.
circuit
Sense / Write
Address
decoder
FF
CS
cells
Memory
circuit
Sense / Write Sense / Write
circuit
Data input
/output lines:
A0
A1
A2
A3
W0
W1
W15
b7 b1 b0
W
R /
b7 b1 b0
b7 b1 b0
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
16 words of 8 bits each: 16x8 memory
org.. It has 16 external connections:
addr. 4, data 8, control: 2,
power/ground: 2
1K memory cells(1K X1): 128x8
memory, external connections: ?
19(7+8+2+2)
1Kx1:? 15 (10+1+2+2)
A Memory Chip
Figure 5.3. Organization of a 1K  1 memory chip.
CS
Sense
/ Write
circuitry
array
memory cell
address
5-bit row
input/output
Data
5-bit
decoder
address
5-bit column
address
10-bit
output multiplexer
32-to-1
input demultiplexer
32 32

W
R/
W0
W1
W31
and
Static Memories
 The circuits are capable of retaining their state as long as power
is applied. Two inverters are cross-connected to form a latch.
The latch is connected to two bit lines by transistors T1 and T2.
Y
X
Word line
Bit lines
Figure 5.4. A static RAM cell.
b
T2
T1
b
Read: word
line is
activated to
close switch
T1 and T2
Write : put
appropriate
value on b
and activate
word line
Static Memories
 CMOS cell: low power consumption. Current only
flow when the cell is being accessed. X is 1 when T3
and T6 on while T4 and T5 off
CMOS:
complementary
metal-oxide
semiconducor
Advantage of
CMOS RAM is low
power
consumption.
Because current
flow in the cell only
when cell is being
accessed.
Otherwise there is
no continuous
electrical path bet
Vsupply and
ground.
Word line
b
Bit lines
Figure 5.5. An example of a CMOS memory cell.
T1 T2
T6
T5
T4
T3
Y
X
Vsupply
b
Structures of Larger Memories
uisng static memory chips
Figure 5.10. Organization of a 2M  32 memory module using 512K  8 static memory chips.
19-bit internal chip address
decoder
2-bit
addresses
21-bit
A0
A1
A19
memory chip
A20
D31-24
D7-0
D23-16
D15-8
512K 8
´
Chip select
memory chip
19-bit
address
512K 8
´
8-bit data
input/output
2M X 32
memory using
512K X8
Show organization of 64KX8 using 16K X 1
static memory chips
Asynchronous DRAMs
 Static RAMs are fast(10ns), but they cost more because their cells
required several transistors so are more expensive.
 Dynamic RAMs (DRAMs) are cheap and area efficient, but they can not
retain their state indefinitely (only a few milliseocnds)– need to be
periodically refreshed. they stored information in form of charge on
capacitor
Figure 5.6. A single-transistor dynamic memory cell
T
C
Word line
Bit line
DRAM
 After transistor is turned off capacitor begins to
discharge because of capacitor’s own leakage
resistance and transistor continues to conduct a tiny
amount of current even it is off
 In order to retain the information stored in the cell,
DRAM includes special circuitry that writes back the
values that has been read. Each row of the cells must be
accessed periodically, Once every 2 to 16 milliseconds.
Refresh circuity usually performs this function
automatically.
 Because of their high density and low cost DRAM are
widely used as main memory in computer
1M X 1
DRAM
A Dynamic Memory Chip
Column
CS
Sense / Write
circuits
cell array
latch
address
Row
Column
latch
decoder
Row
decoder
address
4096 512 8

( )

R/ W
A20 9
- A8 0
-

D0
D7
RAS
CAS
Figure 5.7. Internal organization of a 2M  8 dynamic memory chip.
Row Addr. Strobe
Column Addr. Strobe
DRAM
 RAS and CAS signals are generated by a
memory controller circuit external to the chip
when the processor issue a read or write
command
 During read operation the output data are
transferred to processor after a delay
equivalent to memory access time. Such
memories are called asynchronous DRAM.
 The memory controller is also responsible for
refreshing the data stored in the memory
Problem
 Memory cycle time of 64nsec. It has to
refreshed 100times per msec and refresh
takes 100nsec. What % of memory cycle time
is used for refreshing
Answer
 No of refresh=64 X100ns/1X10-3=64 X10-4
 Time=64X10-4X100ns=64X10-11
 So 64X10-11/64X10-9 =1%
Fast Page Mode
 When the DRAM in last slide is accessed, the contents of all 4096
cells in the selected row are sensed, but only 8 bits are placed on
the data lines D7-0, as selected by A8-0.
 A simple addition to the circuit makes it possible to access the other
bytes in the same row without having to reselect the row. Each
sense amplifier act as latch. When a row address is applied the
contents of all the cells are loaded into corresponding latch
 Thus a block of data can be transferred at a much faster rate. The
block transfer capability is referred to as the Fast page mode (a
large block of data is called a page)–Good for bulk transfer.
Synchronous DRAM
 DRAM whose operation is synchronized with a clock signal called
SDRAM
 SDRAM have several different modes of operation
 E.g Burst mode :
 . First the row address is latched under the control of RAS line.
Memory cell takes 5 or 6 clock cycles to activate the selected row.
 Then column address is latched under the control of CAS. After
delay of 1 cycle, the first set of data bits is placed on the data lines.
The SDRAM automatically increments the column address
 It is not necessary to provide externally generated pulses on the
CAS line to select successive columns. The necessary control
signals are generated internally using a column counter and the
clock signal. New data are placed on the data lines at the rising
edge of each clock pulse
 SDRAM can deliver data at a very high rate because all the control
signals are generated inside the chip. Today SDRAM available that
can work at 1GHZ
Synchronous DRAMs
 The operations of SDRAM are controlled by a clock signal.
R/ W
RAS
CAS
CS
Clock
Cell array
latch
address
Row
decoder
Row
Figure 5.8. Synchronous DRAM.
decoder
Column Read/Write
circuits & latches
counter
address
Column
Row/Column
address
Data input
register
Data output
register
Data
Refresh
counter
Mode register
and
timing control
Synchronous DRAMs
R/W
RAS
CAS
Clock
Figure 5.9. Burst read of length 4 in an SDRAM.
Row Col
D0 D1 D2 D3
Address
Data
Synchronous DRAMs
 No CAS pulses is needed in burst operation.
 Refresh circuits are included (every 64ms).
 Clock frequency > 100 MHz
Latency and Bandwidth
 Data transfers to and from the main memory often involve
blocks of data. The speed of these transfers has a large
impact on the performance of a computer system.
 The memory access time defined earlier is not sufficient for
describing the memory’s performance when transferring
blocks of data
 Memory access time is not sufficient for describing the
memory performance when transferring block of data.
 During block transfers, Memory latency – the amount of
time it takes to transfer the first word of data to or from the
memory.
 The time required to transfer a complete block depends
also on the rate at which successive words can be
transferred and on the size of the block.
 The latency of previous SDRAM is 5 cycles
 So if clock is 500MHZ then latency is 10ns while remaining
word at a rate of 2ns
Bandwidth
 The example above illustrates that we need a parameter
other than memory latency to describe the memory’s
performance during block transfers.
 Memory bandwidth – the number of bits or bytes that can
be transferred in one second. It is used to measure how
much time is needed to transfer an entire block of data.
 Bandwidth depends on the speed of access to the stored
data and on the number of bits that can be accessed in
parallel. It is the product of the rate at which data are
transferred (and accessed) and the width of the data bus.
Consider a main memory constructed with SDRAM chips that have timing
requirements depicted in Figure 5.9, except that the burst length is 8.
Assume that 32 bits of data are transferred in parallel. If a 133-MHz clock is
used, how much time does it take to transfer: (a) 32 bytes of data (b) 64
bytes of data What is the latency in each case?
(a) It takes 5 +2+ 8 = 15 clock cycles. Total time = 15/ (133 ×
106) =112.78ns
(b) Latency = 7/ (133 × 106) = 0.038 × 10−6 s = 52.63 ns
(c) It takes 5+2+8+2+8=25 clock cycles. The latency is the
same, i.e. 52.63 ns.
DDR SDRAM
 Double-Data-Rate SDRAM
 The key idea is to take advantage of the fact that a large
number of bits are accessed at the same time inside the
chip when a row address is applied.
 Standard SDRAM performs all actions on the rising edge of
the clock signal.
 DDR SDRAM accesses the cell array in the same way, but
transfers the data on both edges of the clock. DDR2, DDR3,
DDR4
 The cell array is organized in two banks. Each can be
accessed separately.
 DDR SDRAMs and standard SDRAMs are most efficiently
used in applications where block transfers are prevalent.
 For example, DDR2 and DDR3 can operate at clock
frequencies of 400 and 800 MHz, respectively.
Therefore, they transfer data using the effective clock
speeds of 800 and 1600 MHz, respectively
 DDR3 stands for Double Data Rate version 3.
 Whereas DDR4 stands for Double Data Rate version 4.
While it’s speed is faster than DDR3.
 The clock speed of DDR3 vary from 800 MHz to 2133
MHz. While the minimum clock speed of DDR4 is 2133
MHz and it has no defined maximum clock speed.
Problems
 Describe the structure for an 8M X32 memory
using 512KX8 memory chips. How many
chips required?
 Describe the structure for a 16M X32 memory
using 1MX4 memory chips. How many chips
required?
Answer
 223X25/219X23 =64 chips required
 224X25/220X22 =128 chips required
problem
 Build a memory with 4 byte words with a
capacity of 221 bits. What type of decoder
required if memory is built using 2K X 8 RAM
chips?
 How many 32KX1 RAM chips are needed to
provide 256Kbyte?
 Build 16K X16 RAM using 1K X8 chips. How
many 2 to 4decoder used?
Answer
 Capacity of memory=221/8=218byte
 No of words=218 /4 =216
 Required RAM chips = (216 x 32) / (2K x 8) =
32 x 4
 So 5 to 32 decoder required
 So, the arrangement of these RAM chips
should contain 32 rows each with 4 columns.
 RAM chip size = 1k ×8[1024 words of 8 bits each]
 RAM to construct =16k ×16
 Number of chips required = (16k x 16)/ ( 1k x 8)
 = (16 x 2)
 So to select one chip out of 16 vertical chips,
 we need 4 x 16 decoder.
 Available decoder is 2 x 4 decoder
 To be constructed is 4 x 16 decoder
 Hence 4 + 1 = 5 decoders are required.
Problem
 If each address space represents one byte of
storage space, how many address lines are
needed to access RAM chips arranged in a
4X6 array where each chip is 8K X4
 Total=24X8K *4/8=96K=17 lines
DRAM
 They use less power and have considerably low cost per bit
 Memory module house many memory chips in the range of
16 to 32
 Memory modules are called SIMM (single in line) and DIMM
(Dual In-Line Memory Module) depending on the
configuration of pins
 In SIMM, Pins present in either facet are connected. There
are two type of SIMM presents, one with 30 pins and another
one is with 72 pins.
 There are three type of DIMM presents which are used by
modern motherboard, one with 168 pins and second one is
with 184 pins and third one is 240 pins.
 SIMM supports 32 bit channel for data transferring. DIMM
supports 64 bit channel for data transferring.
Memory System
Considerations
 The choice of a RAM chip for a given application depends on
several factors:
Cost, speed, power, size…
 SRAMs are faster, more expensive, smaller.
 DRAMs are slower, cheaper, larger.
 Which one for cache and main memory, respectively?
 Refresh overhead
Memory Controller
Processor
RAS
CAS
R/ W
Clock
Address
Row/Column
address
Memory
controller
R/ W
Clock
Request
CS
Data
Memory
Figure 5.11. Use of a memory controller.
Read-Only Memories
Read-Only-Memory
 Volatile / non-volatile memory
 ROM, PROM: programmable ROM
 EPROM: erasable, reprogrammable ROM
 EEPROM: can be programmed and erased
electrically
Not connected to store a 1
Connected to store a 0
Figure 5.12. A ROM cell.
Word line
P
Bit line
T
ROM
 A logic 0 is stored in the cell if the transistor is connected
to ground at point P otherwise a 1 is stored.
 Bit line is connected through register to power supply
 ROM: data are written into ROM when it is manufactured
 PROM : put fuse at P. provide flexibility and convenience
compared to ROM
 EPROM : ultraviolet light to erase information. Special
transistor is used at P, which has ability to function either
as a normal transistor or as a disable transistor which is
turned off by injecting a charge into it. Erase required
dissipating the charges trapped in the transistor. They
have transparent window. stored information cannot be
erased selectively
 E2PROM (EEPROM) : to erase the cell
contents selectively. Byte level erasing
required. Disadvantage is the different
voltages are needed for erasing, writing
and reading the stored data which
increases circuit complexity
Flash Memory
 Similar to EEPROM
 Difference: only possible to write an entire block of cells instead of a
single cell. Have greater density which leads to higher capacity and
lower cost per bit.
 Require a single power supply voltage and consume less power in
their operation
 Use in portable equipment. Hand-held computers, cell phone, digital
camera, Mp3 player etc. in hand held computers and cell phones a
flash memory holds the software needed to operate equipment, so no
need for disk drive
 Implementation of such modules
 Flash cards : USB interface flash cards known as memory keys.
32Gbytes. Hence, a 32-Gbyte flash card can store approximately
500 hours of music
 Flash drives : 64 to 128Gbytes. Sloid state electronics so no
moving parts. So shorter access times low power consumption so
popular for batter driven applications
Speed, Size, and Cost
Processor
Primary
cache
Secondary
cache
Main
Magnetic disk
memory
Increasing
size
Increasing
speed
Figure 5.13. Memory hierarchy.
secondary
memory
Increasing
cost per bit
Registers
L1
L2
Cache Memories
Cache
 What is cache?
 It is a small and very fast memory, interposed between the
processor and main memory
 Why we need it?
 Its purpose is to make the main memory appear to the
processor to be much faster than it actually is.
 Locality of reference (very important)
- temporal :during execution of the programmed small group
of instructions are executed repeatedly during some time
period. It means a recently executed instruction is likely to be
executed again very soon known as temporal locality
- spatial : generally found in data, i.e. data close to recently
used also required very soon so load entire block
 Cache block – cache line
 A set of contiguous address locations of some size
Cache
 When processor issues a read request, the contents of a block of
memory words containing the location specified are transferred into the
cache. The correspondence between the main memory blocks and those
in the cache is specified by a mapping function.
 Replacement algorithm : when cache is full and memory word refereed
not in cache then cache control hardware must decide which block
should be removed to create space for the new block
Figure 5.14. Use of a cache memory.
Cache
Main
memory
Processor
cache
 Hit / miss : found not found
 Write-through : both the cache location and the memory
location is updated
 Write-back : only cache copy is updated and to set dirty bit
of the block. Memory block updated later on when
corresponding cache block is removed form the cache to
make room for new block
 Cache miss : Read miss: Two approach : first block is
loaded and then requested word is transferred. Second
approach is as son as requested word read it is send to
processor known as load through or early restart. Reduce
waiting time of processor
 When write miss occur, if write through protocol is used
then information is written directly into the main memory.
For the write back is use then first block is loaded into
cache and desired word is modified
55 / 19
Memory Hierarchy
CPU
Cache
Main Memory I/O Processor
Magnetic
Disks Magnetic Tapes
56 / 19
Cache Memory
 High speed (towards CPU speed)
 Small size (power & cost)
CPU
Cache
(Fast)
Cache
Main
Memory
(Slow)
Mem
Hit
Miss
95% hit ratio
Access = 0.95 Cache + 0.05 Mem
57 / 19
Cache Memory
CPU
Cache
1 Mword
Main
Memory
1 Gword
30-bit Address
Only 20 bits !!!
58 / 19
Cache Memory
Cache
Main
Memory
00000000
00000001
•
•
•
•
•
•
•
•
•
•
3FFFFFFF
00000
00001
•
•
•
•
FFFFF
Address Mapping !!!
 Where memory blocks are placed in the
cache?
 Three mapping techniques
Direct Mapping
tag
tag
tag
Cache
Main
memory
Block 0
Block 1
Block 127
Block 128
Block 129
Block 255
Block 256
Block 257
Block 4095
Block 0
Block 1
Block 127
7 4 Main memory address
Tag Block Word
Figure 5.15. Direct-mapped cache.
5
4: one of 16 words. (each
block has 16=24 words)
7: points to a particular block
in the cache (128=27)
5: 5 tag bits are compared
with the tag bits associated
with its location in the cache.
Identify which of the 32
blocks that are resident in
the cache (4096/128).
Cache size is 2K words and block size is 16words
main memory size is 64Kwords
Block j of main memory maps onto block
j modulo no of cache blocks
Cache block=j%128
Placement of a block in
the cache is determined
from the memory
address
Direct mapping
Limitation
 Since more than memory block is mapped
onto a given cache block position, contention
may arise for that position even when cache
is not full
Direct Mapping
7 4 Main memory address
Tag Block Word
5
 Tag: 11101
 Block: 1111111=127, in the 127th block of the
cache
 Word:1100=12, the 12th word of the 127th
block in the cache
11101,1111111,1100
 Consider a machine with a byte addressable
main memory of 220 bytes, block size of 16
bytes and a direct mapped cache having
212 cache lines. Find no of bit for TAG, block
and word?
 What are the tag and cache line address (in
hex) for main memory address (E201F)16?
Word=4
Block=12
Tag=4
An 8KB direct-mapped write-back cache is organized as multiple
blocks, each of size 32-bytes. The processor generates 32-bit
addresses. The cache controller maintains the tag information for
each cache block comprising of the following.
1 Valid bit
1 Modified bit
As many bits as the minimum needed to identify the memory block
mapped in the cache. What is the total size of memory needed at the
cache controller to store meta-data (tags) for the cache?
(A) 4864 bits
(B) 6144 bits
(C) 6656 bits
(D) 5376 bits
cache size = 8 KB
Block size = 32 bytes
Number of cache lines = Cache size /
Block size = (8 × 1024 bytes)/32 = 256
total bits required to store meta-data of 1
line = 1 + 1 + 19 = 21 bits
total memory required = 21 × 256 = 5376
bits
Associative Mapping
4
tag
tag
tag
Cache
Main
memory
Block 0
Block 1
Block i
Block 4095
Block 0
Block 1
Block 127
12 Main memory address
Figure 5.16. Associative-mapped cache.
Tag Word
4: one of 16 words. (each
block has 16=24 words)
12: 12 tag bits Identify which
of the 4096 blocks that are
resident in the cache
4096=212.
Associative Mapping
 Tag: 111011111111
 Word:1100=12, the 12th word of a block in the
cache
111011111111,1100
4
12 Main memory address
Tag Word
Associative mapping
 Give freedom in choosing the cache location
in which to place the memory block, resulting
in a more use of space in the cache
 When a new block is brought into the cache,
it replaces an existing block only if cache is
full. In this case we need an algorithm to
select the block to be replaced
 Associative search required to minimize
delay
Set-Associative Mapping
tag
tag
tag
Cache
Main
memory
Block 0
Block 1
Block 63
Block 64
Block 65
Block 127
Block 128
Block 129
Block 4095
Block 0
Block 1
Block 126
tag
tag
Block 2
Block 3
tag
Block 127
Main memory address
6 6 4
Tag Set Word
Set 0
Set 1
Set 63
Figure 5.17. Set-associative-mapped cache with two blocks per set.
4: one of 16 words. (each
block has 16=24 words)
6: points to a particular set in
the cache (128/2=64=26)
6: 6 tag bits is used to check
if the desired block is
present (4096/64=26).
Set-Associative Mapping
 Tag: 111011
 Set: 111111=63, in the 63th set of the cache
 Word:1100=12, the 12th word of the 63th set
in the cache
Main memory address
6 6 4
Tag Set Word
111011,111111,1100
Set-associative
 The number of blocks per set is a parameter
that can be selected to suit the requirements
of a particular computer
 A cache that has k blocks per set is referred
to as a k way set associative cache
Examples
 A computer system uses 32 bit memory
addresses. It has a 4K byte cache organized
in the block-set associative manner with 4
blocks per set and 64 bytes per block
 1. calculate the number of bits in each of the
Tag, Set and word fields of the memory
address
Answer
 Word : 6
Cache block=4k/64=64
Set=64/4=16 so set filed is 4 bits
Tag=32-10=22
problem
 A block set associative cache consists of total
of 64blocks divided into 4 block sets. The
main memory contains 4096 blocks each
consisting of 32 words. How many bits are
there in Tag, Set and word fields
answer
 Word=5
 Set=4
 Tag=8
Example
 Block set associative cache with total 64
block and 4 way set associative. Main
memory has 4096 blocks each consists of
128 words
Answer
(a) 4096 blocks of 128 words each require 19 bits for
the main memory
address.
(b) TAG field is 8 bits. SET field is 4 bits. WORD field
is 7 bits.
The width of the physical address on a
machine is 40 bits. The width of the tag
field in a 512 KB 8-way set associative
cache is ____________ bits
(A) 24
(B) 20
(C) 30
(D) 40
We know cache size = no.of.sets*
lines-per-set*
block-size
Let us assume no of sets = 2^x
And block size= 2^y
So applying it in formula.
2^19 = 2^x * 8 * 2^y;
So x+y = 16
Now we know that to address block size and
set number we need 16 bits so remaining bits
must be for tag
i.e., 40 - 16 = 24
The answer is 24 bits
Replacement Algorithms
 Difficult to determine which blocks to kick out
 Objective is to keep blocks in the cache that
are likely to be referred in near future.
 Least Recently Used (LRU) block
 The cache controller tracks references to all
blocks as computation proceeds.
LRU
 In 4 way set associative cache 2 bit counter can be used for each
block
 When hit occurs the counter of that block is set to 0. counters with
values originally lower than the referenced one are incremented by
one. And all others remain unchanged.
 When miss occur and set is full, the block with the counter 3 is
removed, new block is put in its place and counter set to 0. the other
block counters are incremented by one
 When miss occur, if set is not full then counter associated with new
block is set to zero and the values of other counters are incremented
by one.
83 / 19
Replacement Algorithms
CPU
Reference
A B C A D E A D C F
Miss Miss Miss Hit Miss Miss Miss Hit Hit Miss
Cache
FIFO 
A A
B
A
B
C
A
B
C
A
B
C
D
E
B
C
D
E
A
C
D
E
A
C
D
E
A
C
D
E
A
F
D
Hit Ratio = 3 / 10 = 0.3
84 / 19
Replacement Algorithms
CPU
Reference
A B C A D E A D C F
Miss Miss Miss Hit Miss Miss Hit Hit Hit Miss
Cache
LRU 
A B
A
C
B
A
A
C
B
D
A
C
B
E
D
A
C
A
E
D
C
D
A
E
C
C
D
A
E
F
C
D
A
Hit Ratio = 4 / 10 = 0.4
Consider a 2-way set associative cache memory with 4 sets and total 8
cache blocks (0-7) and a main memory with 128 blocks (0-127). What
memory blocks will be present in the cache after the following sequence
of memory block references if LRU policy is used for cache block
replacement. Assuming that initially the cache did not have any memory
block from the current job?
0 5 3 9 7 0 16 55
(A) 0 3 5 7 16 55
(B) 0 3 5 7 9 16 55
(C) 0 5 7 9 16 55
(D) 3 5 7 9 16 55
0--->0 ( block 0 is placed in set 0, set 0 has 2 empty block locations,
block 0 is placed in any one of them )
5--->1 ( block 5 is placed in set 1, set 1 has 2 empty block locations,
block 5 is placed in any one of them )
3--->3 ( block 3 is placed in set 3, set 3 has 2 empty block locations,
block 3 is placed in any one of them )
9--->1 ( block 9 is placed in set 1, set 1 has currently 1 empty block location,
block 9 is placed in that, now set 1 is full, and block 5 is the
least recently used block )
7--->3 ( block 7 is placed in set 3, set 3 has 1 empty block location,
block 7 is placed in that, set 3 is full now,
and block 3 is the least recently used block)
0--->block 0 is referred again, and it is present in the cache memory in set 0,
so no need to put again this block into the cache memory.
16--->0 ( block 16 is placed in set 0, set 0 has 1 empty block location,
block 0 is placed in that, set 0 is full now, and block 0 is the LRU
one)
55--->3 ( block 55 should be placed in set 3, but set 3 is full with block 3 and
7, hence need to replace one block with block 55, as block 3 is the
least recently used block in the set 3, it is replaced with block 55.
Hence the main memory blocks present in the cache memory are : 0, 5, 7, 9,
16, 55 .
 Consider a fully associative cache with 8
cache blocks (0-7). The memory block
requests are in the order-
4, 3, 25, 8, 19, 6, 25, 8, 16, 35, 45, 22, 8, 3, 16,
25, 7
If LRU replacement policy is used, which cache
block will have memory block 7? Draw cache
and show the use of LRU algorithm after every
memory reference. Also, calculate the hit ratio
and miss ratio.
Array is 4X10
Block size=1word
No of cache block=8
Address length=16bit
Set=4blocks so no of
set=2
Column major
Array is 4X10
Block size=1word
No of cache block=8
Address length=16bit
Set=4blocks so no of set=2
Column major
Hit
ratio=2/20
Direct
mapping
problem
Hit ratio=8/20
associative
Problem
Hit ratio=4/20
Consider a 2-way set associative cache with 256 blocks and
uses LRU replacement, Initially the cache is empty. Conflict
misses are those misses which occur due the contention of
multiple blocks for the same cache set. Compulsory misses
occur due to first time access to the block. The following
sequence of accesses to memory blocks
(0,128,256,128,0,128,256,128,1,129,257,129,1,129,257,129
)
is repeated 10 times. The number of conflict misses
experienced by the cache is
(A) 78
(B) 76
(C) 74
(D) 80
Performance
Considerations
Overview
 Two key factors: performance and cost
 Price/performance ratio
 Performance depends on how fast machine
instructions can be brought into the processor for
execution and how fast they can be executed.
 memory hierarchy . The main purpose of this
hierarchy to create a memory that the CPU sees as
having short access time and a large capacity.
 it is beneficial if transfers to and from the faster units
can be done at a rate equal to that of the faster unit.
Hit Rate and Miss Penalty
 An excellent indication of success of the memory
hierarchy is the success rate in accessing information at
various levels of the hierarchy
 The successful access to data in a cache is called –
hit .the number of hits stated as a fraction of all attempted
access is called hit rate
 Ideally, the entire memory hierarchy would appear to the
processor as a single memory unit that has the access
time of a cache on the processor chip and the size of a
magnetic disk – depends on the hit rate (>>0.9).
 A miss causes extra time needed to bring the desired
information into the cache. Total access time seen by the
processor when a miss occurs is called miss penalty
Hit Rate and Miss Penalty (cont.)
 Tave=hC+(1-h)M
 Tave: average access time experienced by the processor
 h: hit rate
 M: miss penalty, the time to access information in the main
memory i.e. total access time seen by processor
 C: the time to access information in the cache
= 4.7
Assume that a read request takes 50 nsec on a cache miss and
5 nsec on a cache hit. While running a program, it is observed
that 80% of the processor’s read requests result in a cache hit.
The average read access time is ………….. nsec. Correct
answer is 14.
Average read time = 0.80 x 5 + (1 – 0.80) x 50 = 14 nsec
A cache memory needs an access time of 30 ns and main
memory 150 ns, what is the average access time of CPU
(assume hit ratio = 80%)?
A 60
B 70
C 150
D 30
How to Improve Hit Rate?
 Use larger cache – increased cost
 Increase the block size while keeping the total cache
size constant. Achieve higher data rate
 However, if the block size is too large, some items may
not be referenced before the block is replaced – it take
more time to load so increase miss penalty so block size
should be neither too small nor too large. Generally
block size of 16 to 128 bytes are most popular choice
 Hit rate depends on the size of the cache, its design, and
the instruction and data access patterns of the program
being executed
 Miss penalty can be reduced if Load-through approach
Caches on the Processor
Chip
 On chip vs. off chip
 Two separate caches for instructions and data, respectively
 Single cache for both
 Which one has better hit rate? -- Single cache
 What’s the advantage of separating caches? – parallelism, better
performance
 Level 1 and Level 2 caches
 L1 cache – faster and smaller. Access more than one word
simultaneously and let the processor use them one at a time. Tens of
kilobytes
 L2 cache – slower and larger. Hundreds of kilobytes or several
megabytes
 How about the average access time?
 Average access time: tave = h1C1 + (1-h1)(h2C2 + (1-h2)M)
 =h1c1 +(1-h1)h2c2+(1-h1)(1-h2)M
where h is the hit rate, C is the time to access information in cache, M is
the time to access information in main memory.
No of misses in secondary cache is given by (1-h1)(1-h2)
In a two-level cache system, the access times of L1 and L2
caches are 1 and 8 clock cycles respectively. The miss penalty
from L2 cache to main memory is 18 clock cycles. The miss rate of
L1 cache is twice that of L2. The average memory access time of
the cache system is 2 cycles. The miss rates of L1 and L2 caches
respectively are:
a. 0.130 and 0.065
b. b. 0.056 and 0.111
c. c. 0.0892 and 0.1784
d. d. 0.1784 and 0.0892
Let the miss rate of L2 cache be x. So, miss rate of L1 cache = 2x.
Thus, average memory access time
AMAT = (1-2x).1 + 2x. [(1-x).8 + x.18] = 2
(givenSolving, we get x = 0.065
Performance
 This is not possible if both the slow and
the fast units are accessed in the same
manner.
 However, it can be achieved when
parallelism is used in the organizations of
the slower unit.
Interleaving
 If the main memory is structured as a collection of physically
separated modules, each with its own ABR (Address buffer
register) and DBR( Data buffer register), memory access
operations may proceed in more than one module at the same
time.
mbits
Address in module MM address
(a) Consecutive words in a module
i
k bits
Module Module Module
Module
DBR
ABR DBR
ABR ABR DBR
0 n 1
-
Figure 5.25. Addressing multiple-module memory systems.
(b) Consecutive words in consecutive modules
i
k bits
0
Module
Module
Module
Module MM address
DBR
ABR
ABR DBR
ABR DBR
Address in module
2
k
1
-
mbits
 One clock cycle to send an address
 8 clock cycles to get first word and
subsequent word required 4 clock cycles
 One clock cycle to send word to the cache
 Without interleaving
 Time to load block=1+8 +7*4+1=38
 1+8+4+4=17cycles
Assume a computer has L1 and L2 caches, as discussed .The cache
blocks consist of 8 words. Assume that the hit rate is the same for both
caches and that it is equal to 0.95 for instructions and 0.90 for data.
Assume also that the times needed to access an 8-word block in these
caches are C1 = 1 cycle and C2 = 10 cycles.
(a) What is the average access time experienced by the processor if the
main memory uses interleaving?
(b) What is the average access time if the main memory is not
interleaved?
(c) What is the improvement obtained with interleaving?
memeoryorganization PPT for organization of memories

More Related Content

Similar to memeoryorganization PPT for organization of memories

COMPUTER ORGANIZATION NOTES Unit 5
COMPUTER ORGANIZATION NOTES Unit 5COMPUTER ORGANIZATION NOTES Unit 5
COMPUTER ORGANIZATION NOTES Unit 5Dr.MAYA NAYAK
 
Interfacing memory with 8086 microprocessor
Interfacing memory with 8086 microprocessorInterfacing memory with 8086 microprocessor
Interfacing memory with 8086 microprocessorVikas Gupta
 
Memory Hierarchy PPT of Computer Organization
Memory Hierarchy PPT of Computer OrganizationMemory Hierarchy PPT of Computer Organization
Memory Hierarchy PPT of Computer Organization2022002857mbit
 
Kiến trúc máy tính - COE 301 - Memory.ppt
Kiến trúc máy tính - COE 301 - Memory.pptKiến trúc máy tính - COE 301 - Memory.ppt
Kiến trúc máy tính - COE 301 - Memory.pptTriTrang4
 
Computer structurepowerpoint
Computer structurepowerpointComputer structurepowerpoint
Computer structurepowerpointhamid ali
 
Chp3 designing bus system, memory & io copy
Chp3 designing bus system, memory & io   copyChp3 designing bus system, memory & io   copy
Chp3 designing bus system, memory & io copymkazree
 
Memory Organisation in Computer Architecture.pdf
Memory Organisation in Computer Architecture.pdfMemory Organisation in Computer Architecture.pdf
Memory Organisation in Computer Architecture.pdfSangitaBose2
 
Microprocessor Part 1
Microprocessor    Part 1Microprocessor    Part 1
Microprocessor Part 1Sajan Agrawal
 
Random Access Memory
Random Access Memory Random Access Memory
Random Access Memory rohitladdu
 
Electro -Mechanical components/devices
Electro -Mechanical components/devices Electro -Mechanical components/devices
Electro -Mechanical components/devices MuhammadTanveer121
 
VJITSk 6713 user manual
VJITSk 6713 user manualVJITSk 6713 user manual
VJITSk 6713 user manualkot seelam
 
301378156 design-of-sram-in-verilog
301378156 design-of-sram-in-verilog301378156 design-of-sram-in-verilog
301378156 design-of-sram-in-verilogSrinivas Naidu
 
Cache performance-x86-2009
Cache performance-x86-2009Cache performance-x86-2009
Cache performance-x86-2009Léia de Sousa
 
Hcs Topic 2 Computer Structure V2
Hcs Topic 2  Computer Structure V2Hcs Topic 2  Computer Structure V2
Hcs Topic 2 Computer Structure V2ekul
 

Similar to memeoryorganization PPT for organization of memories (20)

COMPUTER ORGANIZATION NOTES Unit 5
COMPUTER ORGANIZATION NOTES Unit 5COMPUTER ORGANIZATION NOTES Unit 5
COMPUTER ORGANIZATION NOTES Unit 5
 
Interfacing memory with 8086 microprocessor
Interfacing memory with 8086 microprocessorInterfacing memory with 8086 microprocessor
Interfacing memory with 8086 microprocessor
 
Memory Hierarchy PPT of Computer Organization
Memory Hierarchy PPT of Computer OrganizationMemory Hierarchy PPT of Computer Organization
Memory Hierarchy PPT of Computer Organization
 
Kiến trúc máy tính - COE 301 - Memory.ppt
Kiến trúc máy tính - COE 301 - Memory.pptKiến trúc máy tính - COE 301 - Memory.ppt
Kiến trúc máy tính - COE 301 - Memory.ppt
 
ram.pdf
ram.pdfram.pdf
ram.pdf
 
Computer structurepowerpoint
Computer structurepowerpointComputer structurepowerpoint
Computer structurepowerpoint
 
Memory Organization
Memory OrganizationMemory Organization
Memory Organization
 
Chp3 designing bus system, memory & io copy
Chp3 designing bus system, memory & io   copyChp3 designing bus system, memory & io   copy
Chp3 designing bus system, memory & io copy
 
Memory management
Memory managementMemory management
Memory management
 
Memory Organisation in Computer Architecture.pdf
Memory Organisation in Computer Architecture.pdfMemory Organisation in Computer Architecture.pdf
Memory Organisation in Computer Architecture.pdf
 
Memoryhierarchy
MemoryhierarchyMemoryhierarchy
Memoryhierarchy
 
Coa presentation3
Coa presentation3Coa presentation3
Coa presentation3
 
Microprocessor Part 1
Microprocessor    Part 1Microprocessor    Part 1
Microprocessor Part 1
 
Random Access Memory
Random Access Memory Random Access Memory
Random Access Memory
 
Electro -Mechanical components/devices
Electro -Mechanical components/devices Electro -Mechanical components/devices
Electro -Mechanical components/devices
 
VJITSk 6713 user manual
VJITSk 6713 user manualVJITSk 6713 user manual
VJITSk 6713 user manual
 
Unit IV Memory.pptx
Unit IV  Memory.pptxUnit IV  Memory.pptx
Unit IV Memory.pptx
 
301378156 design-of-sram-in-verilog
301378156 design-of-sram-in-verilog301378156 design-of-sram-in-verilog
301378156 design-of-sram-in-verilog
 
Cache performance-x86-2009
Cache performance-x86-2009Cache performance-x86-2009
Cache performance-x86-2009
 
Hcs Topic 2 Computer Structure V2
Hcs Topic 2  Computer Structure V2Hcs Topic 2  Computer Structure V2
Hcs Topic 2 Computer Structure V2
 

Recently uploaded

A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991RKavithamani
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxOH TEIK BIN
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxmanuelaromero2013
 
Science 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsScience 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsKarinaGenton
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3JemimahLaneBuaron
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdfssuser54595a
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAssociation for Project Management
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon AUnboundStockton
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
PSYCHIATRIC History collection FORMAT.pptx
PSYCHIATRIC   History collection FORMAT.pptxPSYCHIATRIC   History collection FORMAT.pptx
PSYCHIATRIC History collection FORMAT.pptxPoojaSen20
 

Recently uploaded (20)

A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
 
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptx
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptx
 
Science 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsScience 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its Characteristics
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon A
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
Staff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSDStaff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSD
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
PSYCHIATRIC History collection FORMAT.pptx
PSYCHIATRIC   History collection FORMAT.pptxPSYCHIATRIC   History collection FORMAT.pptx
PSYCHIATRIC History collection FORMAT.pptx
 

memeoryorganization PPT for organization of memories

  • 1. Memory Programs and the data they operate on are held in the main memory of the computer during execution The execution speed of programs is highly dependent on the speed at which instructions and data transferred between the CPU and the main memory Ideally the memory would be very fast, large and inexpensive But technology do not permit such single memory, so hirechay of memory is used First we discuss main memory (RAM and ROM)
  • 2. Basic Concepts  The maximum size of the memory that can be used in any computer is determined by the addressing scheme. 16-bit addresses = 216 = 64K memory locations  Most modern computers are byte addressable.: successive addresses refer to successive byte locations 2 k 4 - 2 k 3 - 2 k 2 - 2 k 1 - 2 k 4 - 2 k 4 - 0 1 2 3 4 5 6 7 0 0 4 2 k 1 - 2 k 2 - 2 k 3 - 2 k 4 - 3 2 1 0 7 6 5 4 Byte address Byte address (a) Big-endian assignment (b) Little-endian assignment 4 Word address • • • • • •
  • 3. memeory  Let address is 32 bit. When 32 bit address is sent from the CPU to memory unit, the higher order 30 bit determine which word will be accessed.  If a byte quantity is specified the lower order 2 bit of the address specify which byte location is specified.  In byte addressable computers another control line may be added to indicate when only a byte rather than a full word of n bits is transferred.
  • 4. Traditional Architecture Up to 2k addressable MDR MAR Figure 5.1. Connection of the memory to the processor. k-bit address bus n-bit data bus Control lines ( , MFC, etc.) Processor Memory locations Word length =n bits W R /
  • 5. Basic Concepts  Memory access time : time that elapses between the initiation of an operation and the completion of that operation e.g. the time between the read and MFC signals  Memory cycle time: minimum time delay required between the initiation of two successive memory operations. E.g. time between two successive read operation. This cycle time is longer than access time  RAM – any location can be accessed for a Read or Write operation in some fixed amount of time that is independent of the location’s address.  The basic technology for implementing main memories uses a semiconductor integrated circuits.  The CPU process instructions and data faster than they can be fetched form a main memory. Memory cycle time is the bottleneck in the system
  • 6. RAM  The memory cycle time is bottleneck in the sytem  One way to reduce the memory access time is Cache memory and interleaving  Cache is small and faster memory inserted between CPU and larger, slower main memory  Virtual memory increase apparent size of the main memory, memory management unit
  • 7. Semiconductor RAM Memories Semiconductor memories are available in a wide range of speeds Their cycle time range from few hundred ns to less than 10nsec
  • 8. RAM  Memory cells are usually organized in the form of an array in which each cell is capable of soring one bit of information  Each row of cells constitutes a memory word and all cell of a row are connected to a common line referred to as the word line, which is driven by the address decoder on the chip.  The cells in each column are connected to a Sense/Write circuit by two bit lines  The Sense/Write circuits are connected to the data input/output lines of the chip  During a read operation these circuits sense to read the information stored in the cells selected by a word line
  • 9. Internal Organization of Memory Chips FF Figure 5.2. Organization of bit cells in a memory chip. circuit Sense / Write Address decoder FF CS cells Memory circuit Sense / Write Sense / Write circuit Data input /output lines: A0 A1 A2 A3 W0 W1 W15 b7 b1 b0 W R / b7 b1 b0 b7 b1 b0 • • • • • • • • • • • • • • • • • • • • • • • • • • • 16 words of 8 bits each: 16x8 memory org.. It has 16 external connections: addr. 4, data 8, control: 2, power/ground: 2 1K memory cells(1K X1): 128x8 memory, external connections: ? 19(7+8+2+2) 1Kx1:? 15 (10+1+2+2)
  • 10. A Memory Chip Figure 5.3. Organization of a 1K  1 memory chip. CS Sense / Write circuitry array memory cell address 5-bit row input/output Data 5-bit decoder address 5-bit column address 10-bit output multiplexer 32-to-1 input demultiplexer 32 32  W R/ W0 W1 W31 and
  • 11. Static Memories  The circuits are capable of retaining their state as long as power is applied. Two inverters are cross-connected to form a latch. The latch is connected to two bit lines by transistors T1 and T2. Y X Word line Bit lines Figure 5.4. A static RAM cell. b T2 T1 b Read: word line is activated to close switch T1 and T2 Write : put appropriate value on b and activate word line
  • 12.
  • 13. Static Memories  CMOS cell: low power consumption. Current only flow when the cell is being accessed. X is 1 when T3 and T6 on while T4 and T5 off CMOS: complementary metal-oxide semiconducor Advantage of CMOS RAM is low power consumption. Because current flow in the cell only when cell is being accessed. Otherwise there is no continuous electrical path bet Vsupply and ground. Word line b Bit lines Figure 5.5. An example of a CMOS memory cell. T1 T2 T6 T5 T4 T3 Y X Vsupply b
  • 14. Structures of Larger Memories uisng static memory chips Figure 5.10. Organization of a 2M  32 memory module using 512K  8 static memory chips. 19-bit internal chip address decoder 2-bit addresses 21-bit A0 A1 A19 memory chip A20 D31-24 D7-0 D23-16 D15-8 512K 8 ´ Chip select memory chip 19-bit address 512K 8 ´ 8-bit data input/output 2M X 32 memory using 512K X8
  • 15. Show organization of 64KX8 using 16K X 1 static memory chips
  • 16.
  • 17. Asynchronous DRAMs  Static RAMs are fast(10ns), but they cost more because their cells required several transistors so are more expensive.  Dynamic RAMs (DRAMs) are cheap and area efficient, but they can not retain their state indefinitely (only a few milliseocnds)– need to be periodically refreshed. they stored information in form of charge on capacitor Figure 5.6. A single-transistor dynamic memory cell T C Word line Bit line
  • 18. DRAM  After transistor is turned off capacitor begins to discharge because of capacitor’s own leakage resistance and transistor continues to conduct a tiny amount of current even it is off  In order to retain the information stored in the cell, DRAM includes special circuitry that writes back the values that has been read. Each row of the cells must be accessed periodically, Once every 2 to 16 milliseconds. Refresh circuity usually performs this function automatically.  Because of their high density and low cost DRAM are widely used as main memory in computer
  • 20. A Dynamic Memory Chip Column CS Sense / Write circuits cell array latch address Row Column latch decoder Row decoder address 4096 512 8  ( )  R/ W A20 9 - A8 0 -  D0 D7 RAS CAS Figure 5.7. Internal organization of a 2M  8 dynamic memory chip. Row Addr. Strobe Column Addr. Strobe
  • 21.
  • 22. DRAM  RAS and CAS signals are generated by a memory controller circuit external to the chip when the processor issue a read or write command  During read operation the output data are transferred to processor after a delay equivalent to memory access time. Such memories are called asynchronous DRAM.  The memory controller is also responsible for refreshing the data stored in the memory
  • 23. Problem  Memory cycle time of 64nsec. It has to refreshed 100times per msec and refresh takes 100nsec. What % of memory cycle time is used for refreshing
  • 24. Answer  No of refresh=64 X100ns/1X10-3=64 X10-4  Time=64X10-4X100ns=64X10-11  So 64X10-11/64X10-9 =1%
  • 25. Fast Page Mode  When the DRAM in last slide is accessed, the contents of all 4096 cells in the selected row are sensed, but only 8 bits are placed on the data lines D7-0, as selected by A8-0.  A simple addition to the circuit makes it possible to access the other bytes in the same row without having to reselect the row. Each sense amplifier act as latch. When a row address is applied the contents of all the cells are loaded into corresponding latch  Thus a block of data can be transferred at a much faster rate. The block transfer capability is referred to as the Fast page mode (a large block of data is called a page)–Good for bulk transfer.
  • 26. Synchronous DRAM  DRAM whose operation is synchronized with a clock signal called SDRAM  SDRAM have several different modes of operation  E.g Burst mode :  . First the row address is latched under the control of RAS line. Memory cell takes 5 or 6 clock cycles to activate the selected row.  Then column address is latched under the control of CAS. After delay of 1 cycle, the first set of data bits is placed on the data lines. The SDRAM automatically increments the column address  It is not necessary to provide externally generated pulses on the CAS line to select successive columns. The necessary control signals are generated internally using a column counter and the clock signal. New data are placed on the data lines at the rising edge of each clock pulse  SDRAM can deliver data at a very high rate because all the control signals are generated inside the chip. Today SDRAM available that can work at 1GHZ
  • 27. Synchronous DRAMs  The operations of SDRAM are controlled by a clock signal. R/ W RAS CAS CS Clock Cell array latch address Row decoder Row Figure 5.8. Synchronous DRAM. decoder Column Read/Write circuits & latches counter address Column Row/Column address Data input register Data output register Data Refresh counter Mode register and timing control
  • 28. Synchronous DRAMs R/W RAS CAS Clock Figure 5.9. Burst read of length 4 in an SDRAM. Row Col D0 D1 D2 D3 Address Data
  • 29. Synchronous DRAMs  No CAS pulses is needed in burst operation.  Refresh circuits are included (every 64ms).  Clock frequency > 100 MHz
  • 30. Latency and Bandwidth  Data transfers to and from the main memory often involve blocks of data. The speed of these transfers has a large impact on the performance of a computer system.  The memory access time defined earlier is not sufficient for describing the memory’s performance when transferring blocks of data  Memory access time is not sufficient for describing the memory performance when transferring block of data.  During block transfers, Memory latency – the amount of time it takes to transfer the first word of data to or from the memory.  The time required to transfer a complete block depends also on the rate at which successive words can be transferred and on the size of the block.  The latency of previous SDRAM is 5 cycles  So if clock is 500MHZ then latency is 10ns while remaining word at a rate of 2ns
  • 31. Bandwidth  The example above illustrates that we need a parameter other than memory latency to describe the memory’s performance during block transfers.  Memory bandwidth – the number of bits or bytes that can be transferred in one second. It is used to measure how much time is needed to transfer an entire block of data.  Bandwidth depends on the speed of access to the stored data and on the number of bits that can be accessed in parallel. It is the product of the rate at which data are transferred (and accessed) and the width of the data bus.
  • 32. Consider a main memory constructed with SDRAM chips that have timing requirements depicted in Figure 5.9, except that the burst length is 8. Assume that 32 bits of data are transferred in parallel. If a 133-MHz clock is used, how much time does it take to transfer: (a) 32 bytes of data (b) 64 bytes of data What is the latency in each case?
  • 33. (a) It takes 5 +2+ 8 = 15 clock cycles. Total time = 15/ (133 × 106) =112.78ns (b) Latency = 7/ (133 × 106) = 0.038 × 10−6 s = 52.63 ns (c) It takes 5+2+8+2+8=25 clock cycles. The latency is the same, i.e. 52.63 ns.
  • 34. DDR SDRAM  Double-Data-Rate SDRAM  The key idea is to take advantage of the fact that a large number of bits are accessed at the same time inside the chip when a row address is applied.  Standard SDRAM performs all actions on the rising edge of the clock signal.  DDR SDRAM accesses the cell array in the same way, but transfers the data on both edges of the clock. DDR2, DDR3, DDR4  The cell array is organized in two banks. Each can be accessed separately.  DDR SDRAMs and standard SDRAMs are most efficiently used in applications where block transfers are prevalent.
  • 35.  For example, DDR2 and DDR3 can operate at clock frequencies of 400 and 800 MHz, respectively. Therefore, they transfer data using the effective clock speeds of 800 and 1600 MHz, respectively  DDR3 stands for Double Data Rate version 3.  Whereas DDR4 stands for Double Data Rate version 4. While it’s speed is faster than DDR3.  The clock speed of DDR3 vary from 800 MHz to 2133 MHz. While the minimum clock speed of DDR4 is 2133 MHz and it has no defined maximum clock speed.
  • 36. Problems  Describe the structure for an 8M X32 memory using 512KX8 memory chips. How many chips required?  Describe the structure for a 16M X32 memory using 1MX4 memory chips. How many chips required?
  • 37. Answer  223X25/219X23 =64 chips required  224X25/220X22 =128 chips required
  • 38. problem  Build a memory with 4 byte words with a capacity of 221 bits. What type of decoder required if memory is built using 2K X 8 RAM chips?  How many 32KX1 RAM chips are needed to provide 256Kbyte?  Build 16K X16 RAM using 1K X8 chips. How many 2 to 4decoder used?
  • 39. Answer  Capacity of memory=221/8=218byte  No of words=218 /4 =216  Required RAM chips = (216 x 32) / (2K x 8) = 32 x 4  So 5 to 32 decoder required  So, the arrangement of these RAM chips should contain 32 rows each with 4 columns.
  • 40.  RAM chip size = 1k ×8[1024 words of 8 bits each]  RAM to construct =16k ×16  Number of chips required = (16k x 16)/ ( 1k x 8)  = (16 x 2)  So to select one chip out of 16 vertical chips,  we need 4 x 16 decoder.  Available decoder is 2 x 4 decoder  To be constructed is 4 x 16 decoder  Hence 4 + 1 = 5 decoders are required.
  • 41. Problem  If each address space represents one byte of storage space, how many address lines are needed to access RAM chips arranged in a 4X6 array where each chip is 8K X4  Total=24X8K *4/8=96K=17 lines
  • 42. DRAM  They use less power and have considerably low cost per bit  Memory module house many memory chips in the range of 16 to 32  Memory modules are called SIMM (single in line) and DIMM (Dual In-Line Memory Module) depending on the configuration of pins  In SIMM, Pins present in either facet are connected. There are two type of SIMM presents, one with 30 pins and another one is with 72 pins.  There are three type of DIMM presents which are used by modern motherboard, one with 168 pins and second one is with 184 pins and third one is 240 pins.  SIMM supports 32 bit channel for data transferring. DIMM supports 64 bit channel for data transferring.
  • 43. Memory System Considerations  The choice of a RAM chip for a given application depends on several factors: Cost, speed, power, size…  SRAMs are faster, more expensive, smaller.  DRAMs are slower, cheaper, larger.  Which one for cache and main memory, respectively?  Refresh overhead
  • 44. Memory Controller Processor RAS CAS R/ W Clock Address Row/Column address Memory controller R/ W Clock Request CS Data Memory Figure 5.11. Use of a memory controller.
  • 46. Read-Only-Memory  Volatile / non-volatile memory  ROM, PROM: programmable ROM  EPROM: erasable, reprogrammable ROM  EEPROM: can be programmed and erased electrically Not connected to store a 1 Connected to store a 0 Figure 5.12. A ROM cell. Word line P Bit line T
  • 47. ROM  A logic 0 is stored in the cell if the transistor is connected to ground at point P otherwise a 1 is stored.  Bit line is connected through register to power supply  ROM: data are written into ROM when it is manufactured  PROM : put fuse at P. provide flexibility and convenience compared to ROM  EPROM : ultraviolet light to erase information. Special transistor is used at P, which has ability to function either as a normal transistor or as a disable transistor which is turned off by injecting a charge into it. Erase required dissipating the charges trapped in the transistor. They have transparent window. stored information cannot be erased selectively
  • 48.  E2PROM (EEPROM) : to erase the cell contents selectively. Byte level erasing required. Disadvantage is the different voltages are needed for erasing, writing and reading the stored data which increases circuit complexity
  • 49. Flash Memory  Similar to EEPROM  Difference: only possible to write an entire block of cells instead of a single cell. Have greater density which leads to higher capacity and lower cost per bit.  Require a single power supply voltage and consume less power in their operation  Use in portable equipment. Hand-held computers, cell phone, digital camera, Mp3 player etc. in hand held computers and cell phones a flash memory holds the software needed to operate equipment, so no need for disk drive  Implementation of such modules  Flash cards : USB interface flash cards known as memory keys. 32Gbytes. Hence, a 32-Gbyte flash card can store approximately 500 hours of music  Flash drives : 64 to 128Gbytes. Sloid state electronics so no moving parts. So shorter access times low power consumption so popular for batter driven applications
  • 50. Speed, Size, and Cost Processor Primary cache Secondary cache Main Magnetic disk memory Increasing size Increasing speed Figure 5.13. Memory hierarchy. secondary memory Increasing cost per bit Registers L1 L2
  • 52. Cache  What is cache?  It is a small and very fast memory, interposed between the processor and main memory  Why we need it?  Its purpose is to make the main memory appear to the processor to be much faster than it actually is.  Locality of reference (very important) - temporal :during execution of the programmed small group of instructions are executed repeatedly during some time period. It means a recently executed instruction is likely to be executed again very soon known as temporal locality - spatial : generally found in data, i.e. data close to recently used also required very soon so load entire block  Cache block – cache line  A set of contiguous address locations of some size
  • 53. Cache  When processor issues a read request, the contents of a block of memory words containing the location specified are transferred into the cache. The correspondence between the main memory blocks and those in the cache is specified by a mapping function.  Replacement algorithm : when cache is full and memory word refereed not in cache then cache control hardware must decide which block should be removed to create space for the new block Figure 5.14. Use of a cache memory. Cache Main memory Processor
  • 54. cache  Hit / miss : found not found  Write-through : both the cache location and the memory location is updated  Write-back : only cache copy is updated and to set dirty bit of the block. Memory block updated later on when corresponding cache block is removed form the cache to make room for new block  Cache miss : Read miss: Two approach : first block is loaded and then requested word is transferred. Second approach is as son as requested word read it is send to processor known as load through or early restart. Reduce waiting time of processor  When write miss occur, if write through protocol is used then information is written directly into the main memory. For the write back is use then first block is loaded into cache and desired word is modified
  • 55. 55 / 19 Memory Hierarchy CPU Cache Main Memory I/O Processor Magnetic Disks Magnetic Tapes
  • 56. 56 / 19 Cache Memory  High speed (towards CPU speed)  Small size (power & cost) CPU Cache (Fast) Cache Main Memory (Slow) Mem Hit Miss 95% hit ratio Access = 0.95 Cache + 0.05 Mem
  • 57. 57 / 19 Cache Memory CPU Cache 1 Mword Main Memory 1 Gword 30-bit Address Only 20 bits !!!
  • 58. 58 / 19 Cache Memory Cache Main Memory 00000000 00000001 • • • • • • • • • • 3FFFFFFF 00000 00001 • • • • FFFFF Address Mapping !!!
  • 59.  Where memory blocks are placed in the cache?  Three mapping techniques
  • 60. Direct Mapping tag tag tag Cache Main memory Block 0 Block 1 Block 127 Block 128 Block 129 Block 255 Block 256 Block 257 Block 4095 Block 0 Block 1 Block 127 7 4 Main memory address Tag Block Word Figure 5.15. Direct-mapped cache. 5 4: one of 16 words. (each block has 16=24 words) 7: points to a particular block in the cache (128=27) 5: 5 tag bits are compared with the tag bits associated with its location in the cache. Identify which of the 32 blocks that are resident in the cache (4096/128). Cache size is 2K words and block size is 16words main memory size is 64Kwords Block j of main memory maps onto block j modulo no of cache blocks Cache block=j%128 Placement of a block in the cache is determined from the memory address
  • 61. Direct mapping Limitation  Since more than memory block is mapped onto a given cache block position, contention may arise for that position even when cache is not full
  • 62. Direct Mapping 7 4 Main memory address Tag Block Word 5  Tag: 11101  Block: 1111111=127, in the 127th block of the cache  Word:1100=12, the 12th word of the 127th block in the cache 11101,1111111,1100
  • 63.  Consider a machine with a byte addressable main memory of 220 bytes, block size of 16 bytes and a direct mapped cache having 212 cache lines. Find no of bit for TAG, block and word?  What are the tag and cache line address (in hex) for main memory address (E201F)16?
  • 65. An 8KB direct-mapped write-back cache is organized as multiple blocks, each of size 32-bytes. The processor generates 32-bit addresses. The cache controller maintains the tag information for each cache block comprising of the following. 1 Valid bit 1 Modified bit As many bits as the minimum needed to identify the memory block mapped in the cache. What is the total size of memory needed at the cache controller to store meta-data (tags) for the cache? (A) 4864 bits (B) 6144 bits (C) 6656 bits (D) 5376 bits
  • 66. cache size = 8 KB Block size = 32 bytes Number of cache lines = Cache size / Block size = (8 × 1024 bytes)/32 = 256 total bits required to store meta-data of 1 line = 1 + 1 + 19 = 21 bits total memory required = 21 × 256 = 5376 bits
  • 67. Associative Mapping 4 tag tag tag Cache Main memory Block 0 Block 1 Block i Block 4095 Block 0 Block 1 Block 127 12 Main memory address Figure 5.16. Associative-mapped cache. Tag Word 4: one of 16 words. (each block has 16=24 words) 12: 12 tag bits Identify which of the 4096 blocks that are resident in the cache 4096=212.
  • 68. Associative Mapping  Tag: 111011111111  Word:1100=12, the 12th word of a block in the cache 111011111111,1100 4 12 Main memory address Tag Word
  • 69. Associative mapping  Give freedom in choosing the cache location in which to place the memory block, resulting in a more use of space in the cache  When a new block is brought into the cache, it replaces an existing block only if cache is full. In this case we need an algorithm to select the block to be replaced  Associative search required to minimize delay
  • 70. Set-Associative Mapping tag tag tag Cache Main memory Block 0 Block 1 Block 63 Block 64 Block 65 Block 127 Block 128 Block 129 Block 4095 Block 0 Block 1 Block 126 tag tag Block 2 Block 3 tag Block 127 Main memory address 6 6 4 Tag Set Word Set 0 Set 1 Set 63 Figure 5.17. Set-associative-mapped cache with two blocks per set. 4: one of 16 words. (each block has 16=24 words) 6: points to a particular set in the cache (128/2=64=26) 6: 6 tag bits is used to check if the desired block is present (4096/64=26).
  • 71. Set-Associative Mapping  Tag: 111011  Set: 111111=63, in the 63th set of the cache  Word:1100=12, the 12th word of the 63th set in the cache Main memory address 6 6 4 Tag Set Word 111011,111111,1100
  • 72. Set-associative  The number of blocks per set is a parameter that can be selected to suit the requirements of a particular computer  A cache that has k blocks per set is referred to as a k way set associative cache
  • 73. Examples  A computer system uses 32 bit memory addresses. It has a 4K byte cache organized in the block-set associative manner with 4 blocks per set and 64 bytes per block  1. calculate the number of bits in each of the Tag, Set and word fields of the memory address
  • 74. Answer  Word : 6 Cache block=4k/64=64 Set=64/4=16 so set filed is 4 bits Tag=32-10=22
  • 75. problem  A block set associative cache consists of total of 64blocks divided into 4 block sets. The main memory contains 4096 blocks each consisting of 32 words. How many bits are there in Tag, Set and word fields
  • 77. Example  Block set associative cache with total 64 block and 4 way set associative. Main memory has 4096 blocks each consists of 128 words
  • 78. Answer (a) 4096 blocks of 128 words each require 19 bits for the main memory address. (b) TAG field is 8 bits. SET field is 4 bits. WORD field is 7 bits.
  • 79. The width of the physical address on a machine is 40 bits. The width of the tag field in a 512 KB 8-way set associative cache is ____________ bits (A) 24 (B) 20 (C) 30 (D) 40
  • 80. We know cache size = no.of.sets* lines-per-set* block-size Let us assume no of sets = 2^x And block size= 2^y So applying it in formula. 2^19 = 2^x * 8 * 2^y; So x+y = 16 Now we know that to address block size and set number we need 16 bits so remaining bits must be for tag i.e., 40 - 16 = 24 The answer is 24 bits
  • 81. Replacement Algorithms  Difficult to determine which blocks to kick out  Objective is to keep blocks in the cache that are likely to be referred in near future.  Least Recently Used (LRU) block  The cache controller tracks references to all blocks as computation proceeds.
  • 82. LRU  In 4 way set associative cache 2 bit counter can be used for each block  When hit occurs the counter of that block is set to 0. counters with values originally lower than the referenced one are incremented by one. And all others remain unchanged.  When miss occur and set is full, the block with the counter 3 is removed, new block is put in its place and counter set to 0. the other block counters are incremented by one  When miss occur, if set is not full then counter associated with new block is set to zero and the values of other counters are incremented by one.
  • 83. 83 / 19 Replacement Algorithms CPU Reference A B C A D E A D C F Miss Miss Miss Hit Miss Miss Miss Hit Hit Miss Cache FIFO  A A B A B C A B C A B C D E B C D E A C D E A C D E A C D E A F D Hit Ratio = 3 / 10 = 0.3
  • 84. 84 / 19 Replacement Algorithms CPU Reference A B C A D E A D C F Miss Miss Miss Hit Miss Miss Hit Hit Hit Miss Cache LRU  A B A C B A A C B D A C B E D A C A E D C D A E C C D A E F C D A Hit Ratio = 4 / 10 = 0.4
  • 85. Consider a 2-way set associative cache memory with 4 sets and total 8 cache blocks (0-7) and a main memory with 128 blocks (0-127). What memory blocks will be present in the cache after the following sequence of memory block references if LRU policy is used for cache block replacement. Assuming that initially the cache did not have any memory block from the current job? 0 5 3 9 7 0 16 55 (A) 0 3 5 7 16 55 (B) 0 3 5 7 9 16 55 (C) 0 5 7 9 16 55 (D) 3 5 7 9 16 55
  • 86. 0--->0 ( block 0 is placed in set 0, set 0 has 2 empty block locations, block 0 is placed in any one of them ) 5--->1 ( block 5 is placed in set 1, set 1 has 2 empty block locations, block 5 is placed in any one of them ) 3--->3 ( block 3 is placed in set 3, set 3 has 2 empty block locations, block 3 is placed in any one of them ) 9--->1 ( block 9 is placed in set 1, set 1 has currently 1 empty block location, block 9 is placed in that, now set 1 is full, and block 5 is the least recently used block ) 7--->3 ( block 7 is placed in set 3, set 3 has 1 empty block location, block 7 is placed in that, set 3 is full now, and block 3 is the least recently used block) 0--->block 0 is referred again, and it is present in the cache memory in set 0, so no need to put again this block into the cache memory. 16--->0 ( block 16 is placed in set 0, set 0 has 1 empty block location, block 0 is placed in that, set 0 is full now, and block 0 is the LRU one) 55--->3 ( block 55 should be placed in set 3, but set 3 is full with block 3 and 7, hence need to replace one block with block 55, as block 3 is the least recently used block in the set 3, it is replaced with block 55. Hence the main memory blocks present in the cache memory are : 0, 5, 7, 9, 16, 55 .
  • 87.  Consider a fully associative cache with 8 cache blocks (0-7). The memory block requests are in the order- 4, 3, 25, 8, 19, 6, 25, 8, 16, 35, 45, 22, 8, 3, 16, 25, 7 If LRU replacement policy is used, which cache block will have memory block 7? Draw cache and show the use of LRU algorithm after every memory reference. Also, calculate the hit ratio and miss ratio.
  • 88. Array is 4X10 Block size=1word No of cache block=8 Address length=16bit Set=4blocks so no of set=2 Column major
  • 89. Array is 4X10 Block size=1word No of cache block=8 Address length=16bit Set=4blocks so no of set=2 Column major Hit ratio=2/20 Direct mapping
  • 92. Consider a 2-way set associative cache with 256 blocks and uses LRU replacement, Initially the cache is empty. Conflict misses are those misses which occur due the contention of multiple blocks for the same cache set. Compulsory misses occur due to first time access to the block. The following sequence of accesses to memory blocks (0,128,256,128,0,128,256,128,1,129,257,129,1,129,257,129 ) is repeated 10 times. The number of conflict misses experienced by the cache is (A) 78 (B) 76 (C) 74 (D) 80
  • 94. Overview  Two key factors: performance and cost  Price/performance ratio  Performance depends on how fast machine instructions can be brought into the processor for execution and how fast they can be executed.  memory hierarchy . The main purpose of this hierarchy to create a memory that the CPU sees as having short access time and a large capacity.  it is beneficial if transfers to and from the faster units can be done at a rate equal to that of the faster unit.
  • 95. Hit Rate and Miss Penalty  An excellent indication of success of the memory hierarchy is the success rate in accessing information at various levels of the hierarchy  The successful access to data in a cache is called – hit .the number of hits stated as a fraction of all attempted access is called hit rate  Ideally, the entire memory hierarchy would appear to the processor as a single memory unit that has the access time of a cache on the processor chip and the size of a magnetic disk – depends on the hit rate (>>0.9).  A miss causes extra time needed to bring the desired information into the cache. Total access time seen by the processor when a miss occurs is called miss penalty
  • 96. Hit Rate and Miss Penalty (cont.)  Tave=hC+(1-h)M  Tave: average access time experienced by the processor  h: hit rate  M: miss penalty, the time to access information in the main memory i.e. total access time seen by processor  C: the time to access information in the cache = 4.7
  • 97. Assume that a read request takes 50 nsec on a cache miss and 5 nsec on a cache hit. While running a program, it is observed that 80% of the processor’s read requests result in a cache hit. The average read access time is ………….. nsec. Correct answer is 14. Average read time = 0.80 x 5 + (1 – 0.80) x 50 = 14 nsec A cache memory needs an access time of 30 ns and main memory 150 ns, what is the average access time of CPU (assume hit ratio = 80%)? A 60 B 70 C 150 D 30
  • 98. How to Improve Hit Rate?  Use larger cache – increased cost  Increase the block size while keeping the total cache size constant. Achieve higher data rate  However, if the block size is too large, some items may not be referenced before the block is replaced – it take more time to load so increase miss penalty so block size should be neither too small nor too large. Generally block size of 16 to 128 bytes are most popular choice  Hit rate depends on the size of the cache, its design, and the instruction and data access patterns of the program being executed  Miss penalty can be reduced if Load-through approach
  • 99. Caches on the Processor Chip  On chip vs. off chip  Two separate caches for instructions and data, respectively  Single cache for both  Which one has better hit rate? -- Single cache  What’s the advantage of separating caches? – parallelism, better performance  Level 1 and Level 2 caches  L1 cache – faster and smaller. Access more than one word simultaneously and let the processor use them one at a time. Tens of kilobytes  L2 cache – slower and larger. Hundreds of kilobytes or several megabytes  How about the average access time?  Average access time: tave = h1C1 + (1-h1)(h2C2 + (1-h2)M)  =h1c1 +(1-h1)h2c2+(1-h1)(1-h2)M where h is the hit rate, C is the time to access information in cache, M is the time to access information in main memory. No of misses in secondary cache is given by (1-h1)(1-h2)
  • 100. In a two-level cache system, the access times of L1 and L2 caches are 1 and 8 clock cycles respectively. The miss penalty from L2 cache to main memory is 18 clock cycles. The miss rate of L1 cache is twice that of L2. The average memory access time of the cache system is 2 cycles. The miss rates of L1 and L2 caches respectively are: a. 0.130 and 0.065 b. b. 0.056 and 0.111 c. c. 0.0892 and 0.1784 d. d. 0.1784 and 0.0892
  • 101. Let the miss rate of L2 cache be x. So, miss rate of L1 cache = 2x. Thus, average memory access time AMAT = (1-2x).1 + 2x. [(1-x).8 + x.18] = 2 (givenSolving, we get x = 0.065
  • 102. Performance  This is not possible if both the slow and the fast units are accessed in the same manner.  However, it can be achieved when parallelism is used in the organizations of the slower unit.
  • 103. Interleaving  If the main memory is structured as a collection of physically separated modules, each with its own ABR (Address buffer register) and DBR( Data buffer register), memory access operations may proceed in more than one module at the same time. mbits Address in module MM address (a) Consecutive words in a module i k bits Module Module Module Module DBR ABR DBR ABR ABR DBR 0 n 1 - Figure 5.25. Addressing multiple-module memory systems. (b) Consecutive words in consecutive modules i k bits 0 Module Module Module Module MM address DBR ABR ABR DBR ABR DBR Address in module 2 k 1 - mbits
  • 104.  One clock cycle to send an address  8 clock cycles to get first word and subsequent word required 4 clock cycles  One clock cycle to send word to the cache  Without interleaving  Time to load block=1+8 +7*4+1=38  1+8+4+4=17cycles
  • 105. Assume a computer has L1 and L2 caches, as discussed .The cache blocks consist of 8 words. Assume that the hit rate is the same for both caches and that it is equal to 0.95 for instructions and 0.90 for data. Assume also that the times needed to access an 8-word block in these caches are C1 = 1 cycle and C2 = 10 cycles. (a) What is the average access time experienced by the processor if the main memory uses interleaving? (b) What is the average access time if the main memory is not interleaved? (c) What is the improvement obtained with interleaving?