SlideShare a Scribd company logo
CACHE MEMORY
By
Anand Goyal
2010C6PS648
G
Memory Hierarchy
 Computer memory is organized in a
hierarchy. This done to cope up with
the speed of processor and hence
increase performance.
 Closest to the processor are the
Processing registers. Then comes the
Cache memory, followed by Main
memory.
SRAM and DRAM
 Both are random access memories and are
volatile, i.e. constant power supply is required
to avoid data loss.
 DRAM :- made up of a capacitor and a
transistor. Transistor acts as a switch and
data in the form of charge is present on the
capacitor. Requires periodic charge
refreshing to maintain data storage. Lesser
cost per bit, less expensive. Used for large
memory
 SRAM :- made up of 4 transistors, which are
cross-connected in an arrangement that
produces stable logic state. Greater costs per
bit, more expensive. Used for small memory.
Principles of Locality
 Since programs can access a small
portion of their address space at any
given instant, thus to increase
performance, two policies are followed
:-
 A) Temporal Locality :- locality in time,
i.e. if an item is referred, it will tend to
referred again soon.
 B) Spatial Locality :- locality in space,
i.e. if an item is referred, its neighboring
Mapping Functions
 There are three main types of memory
mapping functions :-
 1) Direct Mapped
 2) Fully Associative
 3) Set Associative
 For the coming explanations, let us
assume 1GB main memory, 128KB
Cache memory and Cache line size
32B.
Direct Mapping
TAG LINE or SLOT (r) OFFSET
•Each memory block is mapped to a
single cache line. For the purpose of
cache access, each main memory
address can be viewed as consisting of
three fields
•No two block in the same line have the
same Tag field
•Check contents of the cache by finding
s w
 For the given example, we have –
 1GB main memory = 220 bytes
 Cache size = 128KB = 217 bytes
 Block size = 32B = 25 bytes
 No. of cache lines = 217/25 = 212, thus
12 bits are required to locate 212 lines.
 Also, offset is 25bytes and thus 5 bits
are required to locate individual byte.
 Thus Tag bits = 32 – 12 - 5 = 14 bits
14 12 5
Summary
 Address length = (s + w) bits
 Number of addressable units = 2s+w
words or bytes
 Block size = line size = 2w words or bytes
 No. of blocks in main memory = 2s+ w/2w
= 2s
 Number of lines in cache = m = 2r
 Size of tag = (s – r) bits
 Mapping Function
 Jth Block of the main memory maps to ith
cache line
 I = J modulo M (M = no. of cache lines)
Pro’s and Con’s
 Simple
 Inexpensive
 Fixed location for given block
 If a program accesses 2 blocks that
map to the same line repeatedly,
cache misses (conflict misses) are
very high
Fully Associative Mapping
 A main memory block can load into any
line
 of cache
 Memory address is interpreted as tag
and
 word
 Tag uniquely identifies block of memory
 Every line’s tag is examined for a match
 Cache searching gets expensive and
more power consumption due to parallel
comparators
TAG OFFSET
s w
Fully Associative Cache
Organization
 For the given example, we have –
 1GB main memory = 220 bytes
 Cache size = 128KB = 217 bytes
 Block size = 32B = 25 bytes
Here, offset is 25bytes and thus 5 bits
are required to locate individual byte.
 Thus Tag bits = 32 – 5 = 27 bits
27 5
Fully Associative Mapping
Summary
 Address length = (s + w) bits
 Number of addressable units = 2s+w words
or bytes
 Block size = line size = 2w words or bytes
 No. of blocks in main memory = 2s+ w/2w =
2s
 Number of lines in cache = Total Number
of cache blocks
 Size of tag = s bits
Pro’s and Con’s
 There is flexibility as to which block to
replace when a new block is read into
the cache
 The complex circuitry required for
parallel Tag comparison is however a
major disadvantage.
Set Associative Mapping
 Cache is divided into a number of sets
 Each set contains a number of lines
 A given block maps to any line in a
given set. e.g. Block B can be in any
line of set i
 If 2 lines per set,
 2 way associative mapping
 A given block can be in one of 2 lines in
only one sets w
TAG SET (d) OFFSET
K-Way Set Associative
Organization
 For the given example, we have –
 1GB main memory = 220 bytes
 Cache size = 128KB = 217 bytes
 Block size = 32B = 25 bytes
 Let it be a 2-way set associative cache,
 No. of sets = 217/(2*25 )= 211, thus 11 bits
are required to locate 211 sets and each
set containing 2 lines each
 Also, offset is 25bytes and thus 5 bits are
required to locate individual byte.
 Thus Tag bits = 32 – 11 - 5 = 16 bits
16 11 5
Set Associative Mapping
Summary
 Address length = (s + w) bits
 Number of addressable units = 2s+w words or
bytes
 Block size = line size = 2w words or bytes
 Number of blocks in main memory = 2s
 Number of lines in set = k
 Number of sets = v = 2d
 Number of lines in cache = kv = k * 2d
 Size of tag = (s – d) bits
 Mapping Function
 Jth Block of the main memory maps to ith set
 I = J modulo v (v = no. of sets)
 Within the set, the block can be mapped to any
cache line.
Pro’s and Con’s
 After simulating the hit ratio for direct
mapped and (2,4,8 way) set associative
mapped cache, we observe that there
is significant difference in performance
at least up to cache size of 64KB, set
associative being the better one.
 However, beyond that, the complexity
of cache increases in proportion to the
associativity, hence both mapping give
approximately similar hit ratio.
N-way Set Associative Cache
Vs. Direct Mapped Cache:
 N comparators Vs 1
 Extra mux delay for the data
 Data comes after hit/miss
 In a direct map cache, cache block is
available before hit/miss
 Number of misses
 DM > SA > FA
 Access latency : time to perform read or
write operation, i.e. time from instant
address is presented to memory to the
instant that data have stored or made
available
 DM < SA < FA
Types of Misses
Compulsory Misses :-
 When a program is started, the cache
is completely empty and hence the
first access to the block will always be
a miss as it has to brought to the
cache from memory, at least for the
first time.
 Also called first reference misses.
Can’t be avoided easily.
Capacity Misses
 Since the cache cannot hold all the
blocks needed during the execution of
program
 Thus this miss occurs due to the
blocks being discarded and later
retrieved.
 They occur because the cache is
limited in size.
 Fully Associative cache has this as its
major miss reason.
Conflict Misses
 It occurs because multiple distinct
memory locations map to the same
cache location.
 Thus in case of DM or SA, it occurs
because a blocks being discarded and
later retrieved.
 In DM, this is a repeated phenomenon
as two blocks which map to the same
cache line can be accessed alternately
and thereby decreasing the hit ratio.
 This phenomenon is called
Solutions to reduce misses
 Capacity Misses :-
◦ Increase cache size
◦ Re-structure the program
 Conflict Misses :-
◦ Increase cache size
◦ Increase associativity
Coherence Misses
 Occur when other processors update
memory which in turn invalidates the
data block present in other
processor’s cache.
Replacement Algorithms
 For Direct Mapped Cache, since each
block maps to only one line, we have no
choice but the replace that line itself
 Hence there isn’t any replacement policy
for DM.
 For SA and FA, few replacement policies
:-
◦ Optimal
◦ Random
◦ Arrival
◦ Frequency
◦ Recently Used
Optimal
This is the ideal benchmarking
replacement strategy.
 All other policies are compared to it.
 This is not implemented, but used just
for comparison purposes.
Random
 Block to be replaced is randomly
picked
 Minimum hardware complexity – just a
pseudo random number generator
required.
 Access time is not affected by the
replacement circuit.
 Not suitable for high performance
systems
Arrival - FIFO
 For an N-way set associative cache
 Implementation 1
 Use N-bit register per cache line to store arrival time information
 On cache miss – registers of all cache line in the set are
 compared to choose the victim cache line
 Implementation 2
 Maintain a FIFO queue
 Register with (log2 N) bits per cache line
 On cache miss – cache line corresponding to register value 00
 will be the victim.
 Decrement all other registers in the set by 1 and set the victim
 register with value N-1
FIFO : Advantages &
Disadvantages
 Advantages
 Low hardware Complexity
 Better cache hit performance than Random
replacement
 The cache access time is not affected by the
replacement
 strategy (not in critical path)
 Disadvantages
 Cache hit performance is poor compared to LRU and
frequency based replacement schemes
 Not suitable for high performance systems
 Replacement circuit complexity increases with increase
Frequency – Least Frequently
Used
 Requires a register per cache line to
save number of references (frequency
count)
 If cache access is hit, then increase
frequency count of the corresponding
register by 1
 If cache miss, find the victim cache line
as the cache line corresponding to
minimum frequency count in the set
 Reset the register corresponding to
victim cache line as 0
 LFU can not differentiate between past
Least Frequently Used –
Dynamic Aging (LFU-DA)
 When any frequency count register in
the set reaches its maximum value, all
the frequency count registers in that
set is shifter one position right (divide
by 2)
 Rest is same as LFU
LFU : Advantages &
Disadvantages
 Advantages
 For small and medium caches LFU works better
than
 FIFO and Random replacements
 Suitable for high performance systems whose
memory pattern follows frequency order
 Disadvantages
 The register should be updated in every cache
access
 Affects the critical path
 The replacement circuit becomes more complicated
when
Least Recently Used Policy
 Most widely used replacement
strategy
 Replaces the least recently used
cache line
 Implemented by two techniques :-
◦ Square Matrix Implementation
◦ Counter Implementation
Square Matrix Implementation
 N2 bits per set (DFF’s) to store the LRU
information
 The cache line corresponding to the row
with all zeros is the victim cache line
for replacement
 If cache hit, all the bits in corresponding
row is set to 1 and all the bits in
corresponding column is set to 0.
 If cache miss, priority encoder selects
the cache line corresponding to the row
with all zeros for replacement
 Used when associativity is less
Matrix Implementation – 4 way
set Associative Cache
Counter Implementation
 N registers with log2N bits for N- way
set associativity. Thus Nlog2N bits
used.
 Each register for each line
 Cache line corresponding to counter 0
is victim cache line for replacement
 If hit, all cache line with counter
greater than hit cache line is
decremented by 1 & hit cache line is
set to N-1
 If miss, the cache whose count value
Look Policy
Look Through : Access Cache, if data not found access the lower
level
Look Aside : Request to Cache and its lower level at the same
Write Policy
Need of Write Policy :-
 A block in cache might have been be
updated, but corresponding updation
in main memory might not have been
done
 Multiple CPU’s have individual
cache’s, thereby invalidating the data
in other processor’s cache
 I/O may be able to read write directly
into main memory
Write Through
 In this technique, all the write operations
are made to main memory as well as to
cache, ensuring MM is always valid.
 Any other processor-cache module, may
monitor traffic to MM to maintain
consistency.
DISADVANTAGE
 It generates memory traffic and may
create bottleneck.
 Bottleneck : delay in transmission of data
due to less bandwidth. Hence info is not
relayed at speed it is processed.
Pseudo Write Through
 Also called Write Buffer
 Processor writes data into the cache
and the write buffer
 Memory controller writes contents of
the buffer to memory
 FIFO (typical number of entries 4)
 After write is complete, buffer is
flushed
Write Back
 In this technique, the updates are made only
in cache.
 When an update is made, a dirty bit or use bit,
associated with the line is set
 Then when a block is replaced, it is written
back into the main memory, iff the dirty bit is
set
 Thus it minimizes memory writes
DISADVANTAGE
 Portions of MM are still invalid, hence I/O
should be allowed access only through cache
 This makes complex circuitry and potential
bottleneck
Cache Coherency
This is required only in case of
multiprocessors where each CPU has
its own cache
Why is it needed ?
 Be it any write policy, if the data is
modified in one cache, it invalidates
the data in other cache, if they seem
to hold the same data
 Hence we need to maintain a cache
coherency to obtain correct results
Approaches towards Cache
Coherency
1) Bus watching write through :
 Cache controller monitors writes into
shared memory that also resides in
the cache memory
 If any writes are made, the controller
invalidates the cache entry
 This approach depends on use of
write through policy
2) Hardware Transparency :-
 Additional hardware to ensure that all
updates to main memory via cache
are reflected in all cache
3) Non Cacheable memory :-
 Only a portion of main memory is
shared by more than 1 processor, and
this is designated as non cacheable.
 Here, all access to shared memory
are cache misses, as its never copied
to cache
Cache Optimization
 Reducing the miss penalty
1. Multi level caches
2. Critical word first
3. Priority to Read miss over writes
4. Merging write buffers
5. Victim caches
Multilevel Cache
 The inclusion of an on-chip cache gave
left a question whether another external
cache is still desirable?
 The answer is yes! The reasons are :
◦ If there is no L2 cache and Processor makes
a request for a memory location not in the L1
cache, then it accesses the DRAM or ROM.
Due to relatively slower bus speed,
performance degrades.
◦ Whereas, if an L2 SRAM cache is included,
the frequently missing information can be
quickly retrieved. Also SRAM is fast enough
to match the bus speed, hence giving zero-
wait state transaction.
 L2 cache do not use the system bus as
path for transfer between L2 and
processor, but a separate data path to
reduce burden
 A series have simulations have proved
that L2 cache is most efficient when
its double the size of L1 cache, as
otherwise, its contents will be similar to
L1
 Due to continued shrinkage of processor
components, many processors can
accommodate L2 cache on chip giving
rise to opportunity to include an L3 cache
 The only disadvantage of multilevel
cache is that it complicates the design,
Cache Performance
 Average memory access time = Hit
timeL1+Miss Rate L1 X (Hit time L2 +
Miss Rate L2 X Miss penalty L2)
 Average memory stalls per instruction
= Misses per instruction L1 X (Hit time
L2 + Misses per instruction L2 X Miss
penalty L2)
Unified Vs Split Cache
 Earlier same cache is used for data as
well as instructions i.e. Unified Cache
 Now we have separate caches for
data and instructions i.e. Split cache
 Thus, if the processor attempts to
fetch instruction from main memory, it
first consults the instruction L1 cache
and similarly for data.
Advantages of Unified Cache
 It balances load between data and
instructions automatically.
 That is, if execution involves more
instruction fetches, the cache will tend
to fill up with instructions, and if
execution involves more of data
fetches, the cache tends to fill up with
data.
 Only one cache is needed to design
Advantages of Split Cache
 Useful in parallel instruction execution
and pre-fetching of predicted future
instructions
 Eliminate contention for the instruction
fetch/decode unit and the execution
unit and thereby supporting pipelining
 the processor will fetch the instructions
ahead of time and fill the buffer, or
pipeline.
 E.g. Super scalar machines Pentium
and Power PC
Critical Word First
 This policy involves sending the
requested word first and then transfer
the rest. Thus getting the data to the
processor in 1st cycle.
 Assume that 1 block = 16 bytes. 1 cycle
transfers 4 bytes. Thus at least 4 cycles
required to transfer the block.
 If the processor demands for 2nd byte,
then why should we wait for entire block
to be transferred. We can first send that
word and then the complete block with
the remaining bytes.
Priority to read miss over
writes
Write Buffer:
 Using write buffers: RAW conflicts with reads on cache
misses
 If simply wait for write buffer to empty - increases read
miss penalty by 50%
 Check the content of the write buffer on read miss, if no
conflicts and memory system is available, allow read
miss to continue. If there is a conflict, then flush the
buffer before read
Write Back?
 Read miss replacing dirty block
 Normal: Write dirty block to memory, and then do the
read
 Instead copy the dirty block to a write buffer, then do the
Victim Cache
 How to combine fast hit time of DM with
reduced conflict Misses?
 Add a small fully associative buffer
(cache) to hold data discarded from
cache Victim Cache
 A small fully associative cache is used
for collecting spill out data
 Blocks that are discarded because of a
miss (Victim) is stored in victim cache
and is checked on a cache miss.
 If found swap the data block between
victim cache and main cache
 Replacement will always happen with the LRU
block of victim cache. The block that we want
to transfer is made MRU.
 Then from cache, the block will come to victim
cache and made MRU.
 The block which was transferred to cache is
now made LRU
 If miss in victim cache also, then MM is
referred.
01 00 10 11
8
8 0
0
11 11
00
Cache Optimization
 Reducing hit time
1. Small and simple caches
2. Way prediction cache
3. Trace cache
4. Avoid Address translation during
indexing of the cache
Cache Optimization
 Reducing miss rate
1)Changing cache configurations
2)Compiler optimization
Cache Optimization
 Reducing miss penalty per miss rate
via parallelism
1)Hardware prefetching
2)Compiler prefetching
Cache Optimization
 Increasing cache bandwidth
1)Pipelined cache,
2)Multi-banked cache
3)Non-blocking cache

More Related Content

What's hot

Set associative mapping
Set associative mappingSet associative mapping
Set associative mapping
Ashik Khan
 
cache memory
 cache memory cache memory
cache memory
NAHID HASAN
 
Parallel processing
Parallel processingParallel processing
Parallel processing
Syed Zaid Irshad
 
Cache memory ppt
Cache memory ppt  Cache memory ppt
Cache memory ppt
Arpita Naik
 
Cache Memory
Cache MemoryCache Memory
Cache Memory
sathish sak
 
Ct213 processor design_pipelinehazard
Ct213 processor design_pipelinehazardCt213 processor design_pipelinehazard
Ct213 processor design_pipelinehazardrakeshrakesh2020
 
Demand paging
Demand pagingDemand paging
Demand paging
Trinity Dwarka
 
Directed Acyclic Graph Representation of basic blocks
Directed Acyclic Graph Representation of basic blocksDirected Acyclic Graph Representation of basic blocks
Directed Acyclic Graph Representation of basic blocks
Mohammad Vaseem Akaram
 
Computer architecture cache memory
Computer architecture cache memoryComputer architecture cache memory
Computer architecture cache memory
Mazin Alwaaly
 
Cache memory principles
Cache memory principlesCache memory principles
Cache memory principles
bit allahabad
 
Hardware multithreading
Hardware multithreadingHardware multithreading
Hardware multithreading
Fraboni Ec
 
Memory organization
Memory organizationMemory organization
Memory organization
Dhaval Bagal
 
Thrashing allocation frames.43
Thrashing allocation frames.43Thrashing allocation frames.43
Thrashing allocation frames.43myrajendra
 
cache memory
 cache memory cache memory
pipelining
pipeliningpipelining
pipelining
Siddique Ibrahim
 
Cache memory
Cache memoryCache memory
Cache memory
Ahsan Ashfaq
 
Cache memory
Cache memoryCache memory
Cache memory
Ganesh Rocky
 
Memory management ppt
Memory management pptMemory management ppt
Memory management ppt
ManishaJha43
 

What's hot (20)

Set associative mapping
Set associative mappingSet associative mapping
Set associative mapping
 
cache memory
 cache memory cache memory
cache memory
 
Memory Mapping Cache
Memory Mapping CacheMemory Mapping Cache
Memory Mapping Cache
 
Parallel processing
Parallel processingParallel processing
Parallel processing
 
Cache memory ppt
Cache memory ppt  Cache memory ppt
Cache memory ppt
 
Parallel Algorithms
Parallel AlgorithmsParallel Algorithms
Parallel Algorithms
 
Cache Memory
Cache MemoryCache Memory
Cache Memory
 
Ct213 processor design_pipelinehazard
Ct213 processor design_pipelinehazardCt213 processor design_pipelinehazard
Ct213 processor design_pipelinehazard
 
Demand paging
Demand pagingDemand paging
Demand paging
 
Directed Acyclic Graph Representation of basic blocks
Directed Acyclic Graph Representation of basic blocksDirected Acyclic Graph Representation of basic blocks
Directed Acyclic Graph Representation of basic blocks
 
Computer architecture cache memory
Computer architecture cache memoryComputer architecture cache memory
Computer architecture cache memory
 
Cache memory principles
Cache memory principlesCache memory principles
Cache memory principles
 
Hardware multithreading
Hardware multithreadingHardware multithreading
Hardware multithreading
 
Memory organization
Memory organizationMemory organization
Memory organization
 
Thrashing allocation frames.43
Thrashing allocation frames.43Thrashing allocation frames.43
Thrashing allocation frames.43
 
cache memory
 cache memory cache memory
cache memory
 
pipelining
pipeliningpipelining
pipelining
 
Cache memory
Cache memoryCache memory
Cache memory
 
Cache memory
Cache memoryCache memory
Cache memory
 
Memory management ppt
Memory management pptMemory management ppt
Memory management ppt
 

Similar to Cache memory

CACHEMAPPING POLICIE AND MERITS & DEMERITS
CACHEMAPPING POLICIE AND MERITS & DEMERITSCACHEMAPPING POLICIE AND MERITS & DEMERITS
CACHEMAPPING POLICIE AND MERITS & DEMERITS
AnkitPandey440
 
Cache management
Cache managementCache management
Cache management
UET Taxila
 
Memory mapping techniques and low power memory design
Memory mapping techniques and low power memory designMemory mapping techniques and low power memory design
Memory mapping techniques and low power memory design
UET Taxila
 
Cache memory
Cache memoryCache memory
Cache memory
MohanChimanna
 
Main Memory Management in Operating System
Main Memory Management in Operating SystemMain Memory Management in Operating System
Main Memory Management in Operating System
Rashmi Bhat
 
Computer architecture for HNDIT
Computer architecture for HNDITComputer architecture for HNDIT
Computer architecture for HNDIT
tjunicornfx
 
Cache.pptx
Cache.pptxCache.pptx
Cache.pptx
VCETCSE
 
Cache Memory for Computer Architecture.ppt
Cache Memory for Computer Architecture.pptCache Memory for Computer Architecture.ppt
Cache Memory for Computer Architecture.ppt
rularofclash69
 
Chapter5 the memory-system-jntuworld
Chapter5 the memory-system-jntuworldChapter5 the memory-system-jntuworld
Chapter5 the memory-system-jntuworld
Praveen Kumar
 
Memory Organization
Memory OrganizationMemory Organization
Memory Organization
Mumthas Shaikh
 
cache memory introduction, level, function
cache memory introduction, level, functioncache memory introduction, level, function
cache memory introduction, level, function
TeddyIswahyudi1
 
memorytechnologyandoptimization-140416131506-phpapp02.pptx
memorytechnologyandoptimization-140416131506-phpapp02.pptxmemorytechnologyandoptimization-140416131506-phpapp02.pptx
memorytechnologyandoptimization-140416131506-phpapp02.pptx
shahdivyanshu1002
 
Unit I Memory technology and optimization
Unit I Memory technology and optimizationUnit I Memory technology and optimization
Unit I Memory technology and optimization
K Gowsic Gowsic
 
Memory technology and optimization in Advance Computer Architechture
Memory technology and optimization in Advance Computer ArchitechtureMemory technology and optimization in Advance Computer Architechture
Memory technology and optimization in Advance Computer ArchitechtureShweta Ghate
 
Cache recap
Cache recapCache recap
Cache recap
Fraboni Ec
 
Cache recap
Cache recapCache recap
Cache recap
James Wong
 

Similar to Cache memory (20)

04 Cache Memory
04  Cache  Memory04  Cache  Memory
04 Cache Memory
 
CACHEMAPPING POLICIE AND MERITS & DEMERITS
CACHEMAPPING POLICIE AND MERITS & DEMERITSCACHEMAPPING POLICIE AND MERITS & DEMERITS
CACHEMAPPING POLICIE AND MERITS & DEMERITS
 
Cache management
Cache managementCache management
Cache management
 
Memory mapping techniques and low power memory design
Memory mapping techniques and low power memory designMemory mapping techniques and low power memory design
Memory mapping techniques and low power memory design
 
Cache memory
Cache memoryCache memory
Cache memory
 
Main Memory Management in Operating System
Main Memory Management in Operating SystemMain Memory Management in Operating System
Main Memory Management in Operating System
 
Computer architecture for HNDIT
Computer architecture for HNDITComputer architecture for HNDIT
Computer architecture for HNDIT
 
Cache.pptx
Cache.pptxCache.pptx
Cache.pptx
 
Cache Memory for Computer Architecture.ppt
Cache Memory for Computer Architecture.pptCache Memory for Computer Architecture.ppt
Cache Memory for Computer Architecture.ppt
 
Chapter5 the memory-system-jntuworld
Chapter5 the memory-system-jntuworldChapter5 the memory-system-jntuworld
Chapter5 the memory-system-jntuworld
 
Memory Organization
Memory OrganizationMemory Organization
Memory Organization
 
cache memory introduction, level, function
cache memory introduction, level, functioncache memory introduction, level, function
cache memory introduction, level, function
 
memorytechnologyandoptimization-140416131506-phpapp02.pptx
memorytechnologyandoptimization-140416131506-phpapp02.pptxmemorytechnologyandoptimization-140416131506-phpapp02.pptx
memorytechnologyandoptimization-140416131506-phpapp02.pptx
 
Unit I Memory technology and optimization
Unit I Memory technology and optimizationUnit I Memory technology and optimization
Unit I Memory technology and optimization
 
Memory technology and optimization in Advance Computer Architechture
Memory technology and optimization in Advance Computer ArchitechtureMemory technology and optimization in Advance Computer Architechture
Memory technology and optimization in Advance Computer Architechture
 
Cache recap
Cache recapCache recap
Cache recap
 
Cache recap
Cache recapCache recap
Cache recap
 
Cache recap
Cache recapCache recap
Cache recap
 
Cache recap
Cache recapCache recap
Cache recap
 
Cache recap
Cache recapCache recap
Cache recap
 

Recently uploaded

1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx
JosvitaDsouza2
 
Home assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdfHome assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdf
Tamralipta Mahavidyalaya
 
The approach at University of Liverpool.pptx
The approach at University of Liverpool.pptxThe approach at University of Liverpool.pptx
The approach at University of Liverpool.pptx
Jisc
 
Basic phrases for greeting and assisting costumers
Basic phrases for greeting and assisting costumersBasic phrases for greeting and assisting costumers
Basic phrases for greeting and assisting costumers
PedroFerreira53928
 
ESC Beyond Borders _From EU to You_ InfoPack general.pdf
ESC Beyond Borders _From EU to You_ InfoPack general.pdfESC Beyond Borders _From EU to You_ InfoPack general.pdf
ESC Beyond Borders _From EU to You_ InfoPack general.pdf
Fundacja Rozwoju Społeczeństwa Przedsiębiorczego
 
How to Break the cycle of negative Thoughts
How to Break the cycle of negative ThoughtsHow to Break the cycle of negative Thoughts
How to Break the cycle of negative Thoughts
Col Mukteshwar Prasad
 
Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345
beazzy04
 
Thesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.pptThesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.ppt
EverAndrsGuerraGuerr
 
MARUTI SUZUKI- A Successful Joint Venture in India.pptx
MARUTI SUZUKI- A Successful Joint Venture in India.pptxMARUTI SUZUKI- A Successful Joint Venture in India.pptx
MARUTI SUZUKI- A Successful Joint Venture in India.pptx
bennyroshan06
 
How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...
Jisc
 
special B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdfspecial B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdf
Special education needs
 
How to Make a Field invisible in Odoo 17
How to Make a Field invisible in Odoo 17How to Make a Field invisible in Odoo 17
How to Make a Field invisible in Odoo 17
Celine George
 
How to Split Bills in the Odoo 17 POS Module
How to Split Bills in the Odoo 17 POS ModuleHow to Split Bills in the Odoo 17 POS Module
How to Split Bills in the Odoo 17 POS Module
Celine George
 
The geography of Taylor Swift - some ideas
The geography of Taylor Swift - some ideasThe geography of Taylor Swift - some ideas
The geography of Taylor Swift - some ideas
GeoBlogs
 
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXXPhrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
MIRIAMSALINAS13
 
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
siemaillard
 
Unit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdfUnit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdf
Thiyagu K
 
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup   New Member Orientation and Q&A (May 2024).pdfWelcome to TechSoup   New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
TechSoup
 
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCECLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
BhavyaRajput3
 
Introduction to Quality Improvement Essentials
Introduction to Quality Improvement EssentialsIntroduction to Quality Improvement Essentials
Introduction to Quality Improvement Essentials
Excellence Foundation for South Sudan
 

Recently uploaded (20)

1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx
 
Home assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdfHome assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdf
 
The approach at University of Liverpool.pptx
The approach at University of Liverpool.pptxThe approach at University of Liverpool.pptx
The approach at University of Liverpool.pptx
 
Basic phrases for greeting and assisting costumers
Basic phrases for greeting and assisting costumersBasic phrases for greeting and assisting costumers
Basic phrases for greeting and assisting costumers
 
ESC Beyond Borders _From EU to You_ InfoPack general.pdf
ESC Beyond Borders _From EU to You_ InfoPack general.pdfESC Beyond Borders _From EU to You_ InfoPack general.pdf
ESC Beyond Borders _From EU to You_ InfoPack general.pdf
 
How to Break the cycle of negative Thoughts
How to Break the cycle of negative ThoughtsHow to Break the cycle of negative Thoughts
How to Break the cycle of negative Thoughts
 
Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345
 
Thesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.pptThesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.ppt
 
MARUTI SUZUKI- A Successful Joint Venture in India.pptx
MARUTI SUZUKI- A Successful Joint Venture in India.pptxMARUTI SUZUKI- A Successful Joint Venture in India.pptx
MARUTI SUZUKI- A Successful Joint Venture in India.pptx
 
How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...
 
special B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdfspecial B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdf
 
How to Make a Field invisible in Odoo 17
How to Make a Field invisible in Odoo 17How to Make a Field invisible in Odoo 17
How to Make a Field invisible in Odoo 17
 
How to Split Bills in the Odoo 17 POS Module
How to Split Bills in the Odoo 17 POS ModuleHow to Split Bills in the Odoo 17 POS Module
How to Split Bills in the Odoo 17 POS Module
 
The geography of Taylor Swift - some ideas
The geography of Taylor Swift - some ideasThe geography of Taylor Swift - some ideas
The geography of Taylor Swift - some ideas
 
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXXPhrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
 
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
 
Unit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdfUnit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdf
 
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup   New Member Orientation and Q&A (May 2024).pdfWelcome to TechSoup   New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
 
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCECLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
 
Introduction to Quality Improvement Essentials
Introduction to Quality Improvement EssentialsIntroduction to Quality Improvement Essentials
Introduction to Quality Improvement Essentials
 

Cache memory

  • 2. Memory Hierarchy  Computer memory is organized in a hierarchy. This done to cope up with the speed of processor and hence increase performance.  Closest to the processor are the Processing registers. Then comes the Cache memory, followed by Main memory.
  • 3. SRAM and DRAM  Both are random access memories and are volatile, i.e. constant power supply is required to avoid data loss.  DRAM :- made up of a capacitor and a transistor. Transistor acts as a switch and data in the form of charge is present on the capacitor. Requires periodic charge refreshing to maintain data storage. Lesser cost per bit, less expensive. Used for large memory  SRAM :- made up of 4 transistors, which are cross-connected in an arrangement that produces stable logic state. Greater costs per bit, more expensive. Used for small memory.
  • 4. Principles of Locality  Since programs can access a small portion of their address space at any given instant, thus to increase performance, two policies are followed :-  A) Temporal Locality :- locality in time, i.e. if an item is referred, it will tend to referred again soon.  B) Spatial Locality :- locality in space, i.e. if an item is referred, its neighboring
  • 5. Mapping Functions  There are three main types of memory mapping functions :-  1) Direct Mapped  2) Fully Associative  3) Set Associative  For the coming explanations, let us assume 1GB main memory, 128KB Cache memory and Cache line size 32B.
  • 6. Direct Mapping TAG LINE or SLOT (r) OFFSET •Each memory block is mapped to a single cache line. For the purpose of cache access, each main memory address can be viewed as consisting of three fields •No two block in the same line have the same Tag field •Check contents of the cache by finding s w
  • 7.  For the given example, we have –  1GB main memory = 220 bytes  Cache size = 128KB = 217 bytes  Block size = 32B = 25 bytes  No. of cache lines = 217/25 = 212, thus 12 bits are required to locate 212 lines.  Also, offset is 25bytes and thus 5 bits are required to locate individual byte.  Thus Tag bits = 32 – 12 - 5 = 14 bits 14 12 5
  • 8. Summary  Address length = (s + w) bits  Number of addressable units = 2s+w words or bytes  Block size = line size = 2w words or bytes  No. of blocks in main memory = 2s+ w/2w = 2s  Number of lines in cache = m = 2r  Size of tag = (s – r) bits  Mapping Function  Jth Block of the main memory maps to ith cache line  I = J modulo M (M = no. of cache lines)
  • 9. Pro’s and Con’s  Simple  Inexpensive  Fixed location for given block  If a program accesses 2 blocks that map to the same line repeatedly, cache misses (conflict misses) are very high
  • 10. Fully Associative Mapping  A main memory block can load into any line  of cache  Memory address is interpreted as tag and  word  Tag uniquely identifies block of memory  Every line’s tag is examined for a match  Cache searching gets expensive and more power consumption due to parallel comparators TAG OFFSET s w
  • 12.  For the given example, we have –  1GB main memory = 220 bytes  Cache size = 128KB = 217 bytes  Block size = 32B = 25 bytes Here, offset is 25bytes and thus 5 bits are required to locate individual byte.  Thus Tag bits = 32 – 5 = 27 bits 27 5
  • 13. Fully Associative Mapping Summary  Address length = (s + w) bits  Number of addressable units = 2s+w words or bytes  Block size = line size = 2w words or bytes  No. of blocks in main memory = 2s+ w/2w = 2s  Number of lines in cache = Total Number of cache blocks  Size of tag = s bits
  • 14. Pro’s and Con’s  There is flexibility as to which block to replace when a new block is read into the cache  The complex circuitry required for parallel Tag comparison is however a major disadvantage.
  • 15. Set Associative Mapping  Cache is divided into a number of sets  Each set contains a number of lines  A given block maps to any line in a given set. e.g. Block B can be in any line of set i  If 2 lines per set,  2 way associative mapping  A given block can be in one of 2 lines in only one sets w TAG SET (d) OFFSET
  • 17.  For the given example, we have –  1GB main memory = 220 bytes  Cache size = 128KB = 217 bytes  Block size = 32B = 25 bytes  Let it be a 2-way set associative cache,  No. of sets = 217/(2*25 )= 211, thus 11 bits are required to locate 211 sets and each set containing 2 lines each  Also, offset is 25bytes and thus 5 bits are required to locate individual byte.  Thus Tag bits = 32 – 11 - 5 = 16 bits 16 11 5
  • 18. Set Associative Mapping Summary  Address length = (s + w) bits  Number of addressable units = 2s+w words or bytes  Block size = line size = 2w words or bytes  Number of blocks in main memory = 2s  Number of lines in set = k  Number of sets = v = 2d  Number of lines in cache = kv = k * 2d  Size of tag = (s – d) bits  Mapping Function  Jth Block of the main memory maps to ith set  I = J modulo v (v = no. of sets)  Within the set, the block can be mapped to any cache line.
  • 19. Pro’s and Con’s  After simulating the hit ratio for direct mapped and (2,4,8 way) set associative mapped cache, we observe that there is significant difference in performance at least up to cache size of 64KB, set associative being the better one.  However, beyond that, the complexity of cache increases in proportion to the associativity, hence both mapping give approximately similar hit ratio.
  • 20. N-way Set Associative Cache Vs. Direct Mapped Cache:  N comparators Vs 1  Extra mux delay for the data  Data comes after hit/miss  In a direct map cache, cache block is available before hit/miss  Number of misses  DM > SA > FA  Access latency : time to perform read or write operation, i.e. time from instant address is presented to memory to the instant that data have stored or made available  DM < SA < FA
  • 21. Types of Misses Compulsory Misses :-  When a program is started, the cache is completely empty and hence the first access to the block will always be a miss as it has to brought to the cache from memory, at least for the first time.  Also called first reference misses. Can’t be avoided easily.
  • 22. Capacity Misses  Since the cache cannot hold all the blocks needed during the execution of program  Thus this miss occurs due to the blocks being discarded and later retrieved.  They occur because the cache is limited in size.  Fully Associative cache has this as its major miss reason.
  • 23. Conflict Misses  It occurs because multiple distinct memory locations map to the same cache location.  Thus in case of DM or SA, it occurs because a blocks being discarded and later retrieved.  In DM, this is a repeated phenomenon as two blocks which map to the same cache line can be accessed alternately and thereby decreasing the hit ratio.  This phenomenon is called
  • 24. Solutions to reduce misses  Capacity Misses :- ◦ Increase cache size ◦ Re-structure the program  Conflict Misses :- ◦ Increase cache size ◦ Increase associativity
  • 25. Coherence Misses  Occur when other processors update memory which in turn invalidates the data block present in other processor’s cache.
  • 26. Replacement Algorithms  For Direct Mapped Cache, since each block maps to only one line, we have no choice but the replace that line itself  Hence there isn’t any replacement policy for DM.  For SA and FA, few replacement policies :- ◦ Optimal ◦ Random ◦ Arrival ◦ Frequency ◦ Recently Used
  • 27. Optimal This is the ideal benchmarking replacement strategy.  All other policies are compared to it.  This is not implemented, but used just for comparison purposes.
  • 28. Random  Block to be replaced is randomly picked  Minimum hardware complexity – just a pseudo random number generator required.  Access time is not affected by the replacement circuit.  Not suitable for high performance systems
  • 29. Arrival - FIFO  For an N-way set associative cache  Implementation 1  Use N-bit register per cache line to store arrival time information  On cache miss – registers of all cache line in the set are  compared to choose the victim cache line  Implementation 2  Maintain a FIFO queue  Register with (log2 N) bits per cache line  On cache miss – cache line corresponding to register value 00  will be the victim.  Decrement all other registers in the set by 1 and set the victim  register with value N-1
  • 30. FIFO : Advantages & Disadvantages  Advantages  Low hardware Complexity  Better cache hit performance than Random replacement  The cache access time is not affected by the replacement  strategy (not in critical path)  Disadvantages  Cache hit performance is poor compared to LRU and frequency based replacement schemes  Not suitable for high performance systems  Replacement circuit complexity increases with increase
  • 31. Frequency – Least Frequently Used  Requires a register per cache line to save number of references (frequency count)  If cache access is hit, then increase frequency count of the corresponding register by 1  If cache miss, find the victim cache line as the cache line corresponding to minimum frequency count in the set  Reset the register corresponding to victim cache line as 0  LFU can not differentiate between past
  • 32. Least Frequently Used – Dynamic Aging (LFU-DA)  When any frequency count register in the set reaches its maximum value, all the frequency count registers in that set is shifter one position right (divide by 2)  Rest is same as LFU
  • 33. LFU : Advantages & Disadvantages  Advantages  For small and medium caches LFU works better than  FIFO and Random replacements  Suitable for high performance systems whose memory pattern follows frequency order  Disadvantages  The register should be updated in every cache access  Affects the critical path  The replacement circuit becomes more complicated when
  • 34. Least Recently Used Policy  Most widely used replacement strategy  Replaces the least recently used cache line  Implemented by two techniques :- ◦ Square Matrix Implementation ◦ Counter Implementation
  • 35. Square Matrix Implementation  N2 bits per set (DFF’s) to store the LRU information  The cache line corresponding to the row with all zeros is the victim cache line for replacement  If cache hit, all the bits in corresponding row is set to 1 and all the bits in corresponding column is set to 0.  If cache miss, priority encoder selects the cache line corresponding to the row with all zeros for replacement  Used when associativity is less
  • 36. Matrix Implementation – 4 way set Associative Cache
  • 37. Counter Implementation  N registers with log2N bits for N- way set associativity. Thus Nlog2N bits used.  Each register for each line  Cache line corresponding to counter 0 is victim cache line for replacement  If hit, all cache line with counter greater than hit cache line is decremented by 1 & hit cache line is set to N-1  If miss, the cache whose count value
  • 38. Look Policy Look Through : Access Cache, if data not found access the lower level Look Aside : Request to Cache and its lower level at the same
  • 39. Write Policy Need of Write Policy :-  A block in cache might have been be updated, but corresponding updation in main memory might not have been done  Multiple CPU’s have individual cache’s, thereby invalidating the data in other processor’s cache  I/O may be able to read write directly into main memory
  • 40. Write Through  In this technique, all the write operations are made to main memory as well as to cache, ensuring MM is always valid.  Any other processor-cache module, may monitor traffic to MM to maintain consistency. DISADVANTAGE  It generates memory traffic and may create bottleneck.  Bottleneck : delay in transmission of data due to less bandwidth. Hence info is not relayed at speed it is processed.
  • 41. Pseudo Write Through  Also called Write Buffer  Processor writes data into the cache and the write buffer  Memory controller writes contents of the buffer to memory  FIFO (typical number of entries 4)  After write is complete, buffer is flushed
  • 42. Write Back  In this technique, the updates are made only in cache.  When an update is made, a dirty bit or use bit, associated with the line is set  Then when a block is replaced, it is written back into the main memory, iff the dirty bit is set  Thus it minimizes memory writes DISADVANTAGE  Portions of MM are still invalid, hence I/O should be allowed access only through cache  This makes complex circuitry and potential bottleneck
  • 43. Cache Coherency This is required only in case of multiprocessors where each CPU has its own cache Why is it needed ?  Be it any write policy, if the data is modified in one cache, it invalidates the data in other cache, if they seem to hold the same data  Hence we need to maintain a cache coherency to obtain correct results
  • 44. Approaches towards Cache Coherency 1) Bus watching write through :  Cache controller monitors writes into shared memory that also resides in the cache memory  If any writes are made, the controller invalidates the cache entry  This approach depends on use of write through policy
  • 45. 2) Hardware Transparency :-  Additional hardware to ensure that all updates to main memory via cache are reflected in all cache 3) Non Cacheable memory :-  Only a portion of main memory is shared by more than 1 processor, and this is designated as non cacheable.  Here, all access to shared memory are cache misses, as its never copied to cache
  • 46. Cache Optimization  Reducing the miss penalty 1. Multi level caches 2. Critical word first 3. Priority to Read miss over writes 4. Merging write buffers 5. Victim caches
  • 47. Multilevel Cache  The inclusion of an on-chip cache gave left a question whether another external cache is still desirable?  The answer is yes! The reasons are : ◦ If there is no L2 cache and Processor makes a request for a memory location not in the L1 cache, then it accesses the DRAM or ROM. Due to relatively slower bus speed, performance degrades. ◦ Whereas, if an L2 SRAM cache is included, the frequently missing information can be quickly retrieved. Also SRAM is fast enough to match the bus speed, hence giving zero- wait state transaction.
  • 48.  L2 cache do not use the system bus as path for transfer between L2 and processor, but a separate data path to reduce burden  A series have simulations have proved that L2 cache is most efficient when its double the size of L1 cache, as otherwise, its contents will be similar to L1  Due to continued shrinkage of processor components, many processors can accommodate L2 cache on chip giving rise to opportunity to include an L3 cache  The only disadvantage of multilevel cache is that it complicates the design,
  • 49. Cache Performance  Average memory access time = Hit timeL1+Miss Rate L1 X (Hit time L2 + Miss Rate L2 X Miss penalty L2)  Average memory stalls per instruction = Misses per instruction L1 X (Hit time L2 + Misses per instruction L2 X Miss penalty L2)
  • 50. Unified Vs Split Cache  Earlier same cache is used for data as well as instructions i.e. Unified Cache  Now we have separate caches for data and instructions i.e. Split cache  Thus, if the processor attempts to fetch instruction from main memory, it first consults the instruction L1 cache and similarly for data.
  • 51. Advantages of Unified Cache  It balances load between data and instructions automatically.  That is, if execution involves more instruction fetches, the cache will tend to fill up with instructions, and if execution involves more of data fetches, the cache tends to fill up with data.  Only one cache is needed to design
  • 52. Advantages of Split Cache  Useful in parallel instruction execution and pre-fetching of predicted future instructions  Eliminate contention for the instruction fetch/decode unit and the execution unit and thereby supporting pipelining  the processor will fetch the instructions ahead of time and fill the buffer, or pipeline.  E.g. Super scalar machines Pentium and Power PC
  • 53. Critical Word First  This policy involves sending the requested word first and then transfer the rest. Thus getting the data to the processor in 1st cycle.  Assume that 1 block = 16 bytes. 1 cycle transfers 4 bytes. Thus at least 4 cycles required to transfer the block.  If the processor demands for 2nd byte, then why should we wait for entire block to be transferred. We can first send that word and then the complete block with the remaining bytes.
  • 54. Priority to read miss over writes Write Buffer:  Using write buffers: RAW conflicts with reads on cache misses  If simply wait for write buffer to empty - increases read miss penalty by 50%  Check the content of the write buffer on read miss, if no conflicts and memory system is available, allow read miss to continue. If there is a conflict, then flush the buffer before read Write Back?  Read miss replacing dirty block  Normal: Write dirty block to memory, and then do the read  Instead copy the dirty block to a write buffer, then do the
  • 55. Victim Cache  How to combine fast hit time of DM with reduced conflict Misses?  Add a small fully associative buffer (cache) to hold data discarded from cache Victim Cache  A small fully associative cache is used for collecting spill out data  Blocks that are discarded because of a miss (Victim) is stored in victim cache and is checked on a cache miss.  If found swap the data block between victim cache and main cache
  • 56.  Replacement will always happen with the LRU block of victim cache. The block that we want to transfer is made MRU.  Then from cache, the block will come to victim cache and made MRU.  The block which was transferred to cache is now made LRU  If miss in victim cache also, then MM is referred. 01 00 10 11 8 8 0 0 11 11 00
  • 57. Cache Optimization  Reducing hit time 1. Small and simple caches 2. Way prediction cache 3. Trace cache 4. Avoid Address translation during indexing of the cache
  • 58. Cache Optimization  Reducing miss rate 1)Changing cache configurations 2)Compiler optimization
  • 59. Cache Optimization  Reducing miss penalty per miss rate via parallelism 1)Hardware prefetching 2)Compiler prefetching
  • 60. Cache Optimization  Increasing cache bandwidth 1)Pipelined cache, 2)Multi-banked cache 3)Non-blocking cache