Explain cache memory with a diagram, demonstrate hit ratio and miss penalty with an example. Discussed different types of cache mapping: direct mapping, fully-associative mapping and set-associative mapping. Discussed temporal and spatial locality of references in cache memory. Explained cache write policies: write through and write back. Shown the differences between unified cache and split cache.
2. Cache Memory
• A small and fast memory unit made of SRAM which sits between main memory and CPU is called
cache memory
• May present in CPU chip, e.g., L1, L2 cache or module L3 cache
• Most frequently used item of main memory stored in cache memory for faster access by CPU
• CPU first checks cache memory for the data content
• If present (Cache hit) collect the data from cache memory (faster access time)
• If not available in cache (Cache miss), then collects the data from main memory (slower access
time) and copy the data into cache for future references
Dr.PrasenjitDey
CPU
Cache
(SRAM)
Main Memory
(DRAM)Word Block
fast slow
L1 cache L2 cache L3 cache
3. Hit ratio & Miss penalty
• When CPU finds the data in cache memory it is called cache hit
• When CPU fails to find the data in cache memory it is called cache miss
• The percentage of time data is found in the cache memory is called hit rate
• The percentage of time data is not found in the cache memory is called miss rate
• Miss rate = 1 - hit rate
• Hit ratio is the ratio between cache hit and (cache hit + cache miss)
• Miss ratio = 1 - hit ratio
• In case of cache miss, the ovarall time required to get the data from main memory into cache
plus the time required to send the data into CPU is called miss penalty
4. Average Memory Access time
• It is the average time to get the data from memory
• The equation of the average memory access time is
• T = (hit ratio)*(CA) + (miss ratio)*(CA+MA) ;
• T = h*CA + (1-h)*(CA+MA)
• Let the data access time from cache memory, Cache Access (CA), is 0.1ms and
data access time from main memory, memory access (MA), is 10ms. The hit ratio
(h) = 0.95
• Hit ratio = h = 0.95
• Miss ratio = (1-h) = (1-0.95) = 0.05
• CA = 0.1ms
• MA = 10ms
• T = (0.95)*(0.1) + (0.05)*(0.1+10) = 0.6ms
CA: cache memory access time
MA: main memory access time
5. Cache design
• Let, the main memory consists of 2n words
• Now, if broke up the main memory into blocks where block size = k words, let, k = 4
• Then the number of memory blocks = (2n/k)
• Let, the number of cache line/slot, each capable of holding a memory block = C
• Need a mapping function which will map these (2n/k) memory blocks into C cache lines/slots
Dr.PrasenjitDey
.
.
.
.
.
.
Tag Block
0
1
2
C-1
Cache Memory
Line no.
block size = k words
.
.
.
.
.
.
0
1
2
2n-1
Main Memory
Memory address
3 }
}word size
4 words
=1 block
4 words
=1 block
6. Cache Design
• Mapping Function
• Locality of References
• Write Policy
• Number of Caches: Unified cache and Split cache
7. Cache Mapping
Cache mapping is of 3 types
1. Direct Mapping
2. Associative mapping
3. Set-Associative mapping
8. Direct Mapping
• Each block of main memory can be mapped into only one cache line/slot
• i.e. a block from main memory can only be transferred to one specific cache line/slot
• CPU generated physical memory address is divided into two parts
• Least Significant w bits uniquely identify a word within a block
• Most Significant s bits uniquely identify a memory block
• Both s and w bits jointly, identify any specific memory word within a specific memory
block
• Again the block field is divided into two parts
• Least significant r bits form a cache line field
• Most Significant s-r bits form tag field
Tag Line
Word
s
r
w
CPU generated address
block
s-r w
9. Tag field & cache line field
• The s bits block field uniquely identify a memory block from 2s
memory blocks
• The r bits cache line field uniquely identify a cache line/slot
from 2r cache lines
• Where, each cache line can accommodate any 1 of the 2s-r memory
blocks
• (s-r) bits tag field helps to map one memory block into
cache from 2s-r memory blocks
• Tag field is used to identify which block of main memory is in
each cache line/slot
Line Tag
0
0
1
.
.
2s-r-1
1
2s-r
2s-r+1
.
.
2.2s-r-1
.
.
.
.
.
.
2r-1
2s-2s-r
1
.
.
2s-1
Tag Line
rs-r
word
w
10. Example
• Let us consider a main memory of size = 16MBytes = 224 Byte
• main memory word size is 1Byte = 8 bits
• CPU generated address will be of 24 bits, to uniquely identify a word in main memory
• Let us consider main memory block size = 4 words = 4Bytes = 22Bytes
• Number of blocks in main memory(2s) = (main memory size/block size) = (224/4) = 222
• Number of bits in block field(s) to uniquely identify a memory block = 22
• Let us consider a cache memory of 64kByte = 216 Byte
• Cache memory block size = 4Bytes = 22Bytes
• Number of bits in the word field(w) to uniquely identify a word within a block = 2
• Number of blocks in cache memory(2r) = (cache memory size/block size) = (216/4) = 214
• Number of bits in cache line field(r) to uniquely identify a cache line = 14
• Thus, number of bits in tag field = (s-r) = (22-14) = 8
• Tag field will select one memory block from 28 memory blocks by using 8 bits tag field
Tag Line
r = 14s-r = (22-14) = 8
word
w = 2
24 bits
11. Example
Cache line Main Memory blocks held
0 0, C, 2C, 3C…2s-C
1 1, C+1, 2C+1…2s-C+1
…
C-1 C-1, 2C-1,3C-1…2s-1
Let us consider the below mapping function to map a memory block into a cache line:
Let, cache line index = i
main memory block index = j
number of lines in cache = C
then, i = j mod c
So, if we divide the index of a memory block by the number of cache lines then the
remainder of the division will give us the index of cache line where we can transfer the
memory ith block
i = j % C
Cache line Main Memory blocks held
0 0, 4, 8, 12, …
1 1, 5, 9, 13, …
2 2, 6, 10, 14, …
3 3, 7, 11, 15, …
i = j % 4
12. Example
Block 0
Block 1
Block 2
Block 3
Block 4
Block 5
Block 6
Block 7
Main Memory
line 0
line 1
line 2
line 3
Cache
Direct Mapping with C=4
13. PrasenjitDey
1 2 2 0
2 3 4 0
3 4 5 0
4 5 6 0
5 6 7 0
6 7 1 0
Memory data
Memory
address
000000
02777
02000
01777
01000
00777
00 1 2 2 0
02 6 7 1 0
Tag Data
Index
address
000
777
(a) Main memory
(b) Cache memory
Example from the book Computer System Architecture
Main memory
32K×12
CPU
Cache memory
512×12
Main memory size = 32K x 12 = 215x12
Address bus = 15 bits => 5 bits in octal system
Data bus = 12 bits => 4 bits in octal system
Cache memory size = 512 x 12 = 29x12
Address bus = 9 bits => 3 bits in octal system
Data bus = 12 bits => 4 bits in octal system
If CPU generated address is = (02000)8
then,
Cache line index(r) = index address = (000)8
Tag index(s-r) = (02)8
Cache miss
000 02 5670
Index Tag Data
14. PrasenjitDey
32K×12
Main memory
Address = 15 bits
Data = 12 bits
Tag Index
6 bits 9 bits
Hex
Address
00 000
3F 1FF
512×12
Cache memory
Address = 9 bits
Data = 12 bits
000
1FF
Octal
address
Example from the book Computer System Architecture
Tag field
Index field
CPU generated address
15. Valid bit & Dirty bit
Each cache line/slot has one valid bit and one dirty bit
• Valid bit(V) indicates whether the memory block in cache line belongs to the current
executable program or not
• If V = 0, then that particular cache line/slot is unusable
• Dirty bit (D) indicates if a block has been modified while in the cache or not
• If dirty bit is 1, then the memory block needs to be transferred first to the main memory
and then the cache line can be used
• If D = 1, then that particular cache line/slot is unusable
PrasenjitDey
Tag Line wordV D
17. Direct Mapping: Advantages & Disadvantages
Advantages
• Simple
• Inexpensive
• Limited comparison between tag fields
Disadvantages
• Fixed cache line location for given memory block
• If number of cache lines is 4, then we can not accommodate memory block 0 and memory
block 4 simultaneously.
• if we want to access both the memory blocks repeatedly, then there will be cache miss each
time which will eventually creates thrashing
18. Associative Mapping
In associative mapping
• Any block of main memory can be mapped into any cache line, no restriction
• i.e., a block from main memory can be transferred to any cache line
• Removes the drawbacks of direct mapping
• Memory address consists of 2 fields
• tag field
• word field
• Tag field uniquely identifies which memory block is in which cache line
• Also need a Dirty and Valid bit
PrasenjitDey
19. Example
Block 0
Block 1
Block 2
Block 3
Block 4
Block 5
Block 6
Block 7
Main Memory
line 0
line 1
line 2
line 3
Cache
Number of cache lines = 4
20. Associative Mapping: Drawbacks
• Cache searching is more expensive than direct mapping
• Tag fields of all the cache line’s are searched in parallel to find the desired memory
block
• Here, tag field is of size = 2s
• Instead of comparing the with just 2s-r memory blocks, compare with 2s memory
blocks
• Lots of circuitry are needed to simultaneously comparing all tags for a match, thus very
costly
PrasenjitDey
Tag
s = 22
word
W = 2
24 bits
21. Associative memory
PrasenjitDey
A Register 101 101100
K Register 111 000000
Temp = (A & K) bit operation will turn off all the bits except first 3 bits
Temp Register = 101 000000
Now, we will compare Temp register with each words in each cache line
• if matches , we transfer the word from the cache line
Word 1 100 101100 M = 0
Word 2 101 001011 M = 1
Argument register (A)
Key register (K)
Associative memory
array and logic
m words
n bits per word
M
Match
register
Input
Write
Read
Output
Match Logic
Argument register
Key (Mask) register
• Here, we compare the whole or a portion of argument register(A)
with the words in each cache line
• If it matches then only we transfer the corresponding word from
cache line/block
• Let us explain with an example, here we are comparing with a portion
of argument register (first 3 bits)
Associative memory is a type of Content Addressable Memory (CAM) where a memory unit accessed by content
22. Hardware Realization
• m word x n cells per word
PrasenjitDey
A1
C11
AnA j
K1 KnK j
C 1j C1n
C i1 C ij C in
Cm1 Cmj Cmn
M1
Mm
M i
Bit 1 Bit nBit j
Word 1
Word m
Word i
R S Match
logic
Input
Read
Write
Output
To M i
K jA i
F ij
=1 =1 =1
Kj: aj == cij ; for all j
transfer word Ci from cache line i
24. Set Associative Mapping
• It is a trade off between fully-associative mapping and direct mapping
• Cache is divided into a number of sets
• Each set contains a number of cache lines
• A memory block maps into any cache line (like associative mapping) within a specific set
(like direct mapping)
• Use direct-mapping to determine which set in the cache corresponds to a set in memory
• After that, memory block can be in any cache line of that set like fully-associative mapping
• e.g. K cache lines per set
• K way associative mapping
• A given memory block can be in one of K cache lines in a specific set
• Much easier to simultaneously search one set than all cache lines
25. Example
• To compute cache set number:
• SetIndex = i mod S
• i = main memory block index
• S = number of sets in cache
Block 0
Block 1
Block 2
Block 3
Main Memory
Line 0
Line 1
Line 2
Line 3
Set 0
Set 1 Block 4
Block 5
Cache Memory
Each set has 2 cache lines
2 way associative mapping
A memory block can be in either of 2 lines in a
specific set
26. PrasenjitDey
1 2 2 0
2 3 4 0
3 4 5 0
4 5 6 0
5 6 7 0
6 7 1 0
Memory data
Memory
address
000000
02777
02000
01777
01000
00777
00 1 2 2 0
02 6 7 1 0
Tag Data
Index
address
000
777
(a) Main memory
(b) Cache memory
Example from the book Computer System Architecture
Main memory
32K×12
CPU
Cache memory
512×12
Main memory size = 32K x 12 = 215x12
Address bus = 15 bits => 5 bits in octal system
Data bus = 12 bits => 4 bits in octal system
Cache memory size = 512 x 12 = 29x12
Address bus = 9 bits => 3 bits in octal system
Data bus = 12 bits => 4 bits in octal system
If CPU generated address is = (02000)8
then,
Cache set index(r) = index address = (000)8
Tag index(s-r) = (02)8
Cache hit
0 1 3 4 5 0 0 2 5 6 7 0
0 2 6 7 1 0 0 0 2 3 4 0
000
777
Index Tag Data Tag Data
2-way Set-associative mapping
28. Locality of Reference
• It suggests CPU to take care of locality while transferring a block from memory to cache
• There are 2 types of locality of references
• 1) Temporal locality
• 2) Spatial locality
• Temporal Locality
• CPU tend to reference the same memory locations at a future point in time
• Due to loops and iteration, programs spending a lot of time in one section of code
• Spatial Locality
• Programs tend to reference memory locations that are near other recently-referenced memory
locations
• Due to the way contiguous memory is referenced, e.g. an array or the instructions that make up a program
• Sequential Locality
• Instructions tend to be accessed sequentially
PrasenjitDey
29. Write Policy
• In cache memory write operation, it updates the cache data
• Now if we don’t update the corresponding memory data, then the cache data becomes
inconsistent with main memory data
• Write policy says when to copy the data content of cache memory into main memory
• There are 2 types of write policy
1) write-through
2) write-back
PrasenjitDey
30. Write Policies
Write Through
• Update both the cache memory and main memory simultaneously
• Multiple CPUs must monitor main memory traffic (snooping) to keep local cache up to date in case
another CPU also has a copy of a shared memory location in its cache
• Huge traffic thus slows down write operation
Write Back
• Only Updates the cache data
• set the dirty bit in the corresponding cache line
• this indicates the cache data is now inconsistent with main memory data
• When cache line is to be replaced(cache data to be erased), update the corresponding
main memory data with the latest cache data only if the dirty bit is 1 (as cache data is
inconsistent with main memory data)
PrasenjitDey
31. Unified vs Split Caches
• Unified Cache: One cache for data and instructions
• Split Cache: Two caches, one for data and one for instructions
• Advantages of unified cache
• Higher hit rate
• Balances load of instruction and data fetch
• Only one cache to design & implement
• Advantages of split cache
• Eliminates cache contention between instruction fetch/decode unit and execution unit
• Important in pipelining