SlideShare a Scribd company logo
Cache Optimizations
Vani Kandhasamy
AP, PSG Tech
Cache Performance Measure
AMAT = Hit Time + Miss Rate * Miss Penalty
Cache Performance Measure
• Hit rate: fraction of cache access that result in a hit
• Miss rate: 1- Hit rate
• Hit time: time to access the cache
• Miss penalty: time to place/replace a block from lower
level plus time taken to transfer to Cache
–access time: time to access lower level
–transfer time: time to transfer block
Improving Cache Performance
AMAT = Hit Time + Miss Rate * Miss Penalty
• Reducing miss rate
• Reducing cache miss Penalty
• Reducing hit time
Three Cs Model
❖ Compulsory:
• The very first access to a block cant be in the cache
(unless pre-fetched)
• cold-start or first reference misses
❖ Capacity:
• Cache cannot contain all blocks accessed by the
program
❖ Conflict (collision):
• Multiple memory locations mapped to the same cache
location
6 Basic Cache Optimizations
• Reducing Miss Rate
1. Larger Block size (compulsory misses)
2. Larger Cache size (capacity misses)
3. Higher Associativity (conflict misses)
• Reducing Miss Penalty
4. Multilevel Caches
5. Giving Reads Priority over Writes
• Reducing hit time
6. Avoiding Address Translation during Indexing of the
Cache
#1. Larger Block size
❖Advantage
+ Reduce compulsory misses
(by better utilizing spatial locality)
❖Disadvantage
- Increase miss penalty
‐ Could increase conflict and capacity misses
Larger block size → no. of blocks ↓
#1. Larger Block size
• Optimal block size depends on the latency and
bandwidth of lower‐level memory
- High bandwidth and high latency encourage large
blocks
- Low bandwidth and low latency encourage smaller
block sizes
Problem
Assume the memory subsystem takes 80 Clock cycles of
overhead and then delivers 16 bytes every 2 clock cycles.
Which block size has the smallest AMAT for each cache
size given below? Consider hit time is 1 clock cycle.
Solution
#2. Larger Cache size
Cache size↑ → miss rate↓; hit time↑
❖Advantage
+ Large cache reduce capacity misses
❖Disadvantage
‐ Longer hit time
‐ Higher power and cost
#3. Higher Associativity
Associativity ↑ → miss rate↓; hit time↑
2:1 rule of thumb:
Direct mapped cache of size N has the same miss rate
as two‐way associative cache of size N/2
❖Advantage
+ Reduce conflict misses
❖Disadvantage
‐ Increase hit time
‐ Increased power consumption
AMAT vs Associativity
#4. Multilevel caches
✓CPU gets faster
✓Caches need to be BIGGER and FASTER
Solution:
• Fast 1st level cache (matching the clock cycle
time of fast processor)
• Large 2nd level cache (to catch misses in the
1st level cache)
#4. Multilevel caches
Q1. How do we analyze performance?
AMAT = Hit Time L1 + Miss
Rate L1 * Miss PenaltyL1
Miss Penalty L1 = Hit Time L2 +
Miss Rate L2 *Miss Penalty L2
#4. Multilevel caches
Local miss rate: # misses / # local number of accesses
Global miss rate: # misses / # total number of access
#4. Multilevel caches
Alternative measure:
#4. Multilevel caches
Q2. How to place blocks in multilevel caches?
• Multilevel Inclusion
• Multilevel Exclusion
❖Advantage
+ Reduce miss penalty
❖Disadvantage
‐ High complexity
Problem
Solution
1. AMAT = Hit Time L1 + Miss Rate L1 * (Hit Time
L2 + Miss Rate L2 *Miss Penalty L2)
= 1 + 4% * (10 + 50% * 200) = 5.4 Clock Cycles
2. Average memory stalls per instruction = Misses per
instruction L1 * Hit time L2 + Misses per
instruction L2 * Miss penalty L2
= (60/1000) * 10 + (30/1000) * 200 = 6.6 Clock Cycles
#5. Giving Reads Priority over Writes
SW R3, 512(R0) ;cache index 0
LW R1, 1024(R0) ;cache index 0
LW R2, 512(R0) ;cache index 0
• Assume direct-mapped write through cache where
1024 and 512 are mapped to the same block.
• R2 == R3
#5. Giving Reads Priority over Writes
• Write Through
✓ Wait until the write buffer is
empty
✓ Check the contents of the write
buffer on a read miss, if no
conflicts, continue read miss.
• Write Back
✓ Read miss replacing dirty block
– Normal: Write dirty block to
memory, and then do the read
– Instead, copy the dirty block to a
write buffer, then do the
read, and then write
#6. Avoiding Address Translation during Indexing of the
Cache
• CPU → Virtual address
• Virtual address → TLB
• TLB → Physical address
#6. Avoiding Address Translation during Indexing of the
Cache
• Page Table Entry (PTE)
is checked in TLB
#6. Avoiding Address Translation during Indexing of the
Cache
• PTE is checked in TLB
• Load PTE from PT
#6. Avoiding Address Translation during Indexing of the
Cache
• Cache is indexed and
tagged by Physical
address
• TLB is indexed by
Virtual address
• Very slow
#6. Avoiding Address Translation during Indexing of the
Cache
• Cache is indexed and
tagged by Virtual
address
• TLB is indexed by
Virtual address
• Synonyms problem
#6. Avoiding Address Translation during Indexing of the
Cache
#6. Avoiding Address Translation during Indexing of the
Cache
• Cache is indexed by
Virtual address but
tagged by Physical
address
• TLB is indexed by
Virtual address
Summary
Advanced Optimization Techniques
• Reducing hit time
1. Small and simple
caches
2. Way prediction
• Increasing cache
bandwidth
3. Pipelined caches
4. Multi banked caches
5. Non blocking caches
• Reducing Miss Penalty
6. Critical word first
7. Merging write buffers
• Reducing Miss Rate
8. Compiler optimizations
• Reducing miss penalty
or miss rate via
parallelism
9. Hardware pre fetching
10. Compiler pre fetching
#1 Small and simple first level caches
Cache Size → Hit Time
L1 cache size and associativity
L1 cache size and associativity
#2 Way Prediction
Q1. How to combine fast hit time of Direct Mapped and have
the lower conflict misses of 2‐way SA cache?
• Way prediction: predict the
way to pre-set multiplexer
- keep extra bits in cache to
predict the “way,” or block
within the set, of next cache
access.
Set1
Set2
#3 Pipelined caches
• Pipeline cache access to maintain bandwidth
(but higher latency)
• Simple 3-stage pipeline
✓access the cache
✓tag check & hit/miss logic
✓data transfer back to CPU
#3 Pipelined caches
#4. Multi banked caches
• Organize cache as independent banks to
support simultaneous access
#5. Non blocking caches
• A blocking cache stalls the pipeline on a cache
miss
• A non-blocking cache permits subsequent cache
accesses on a miss
• Allows the CPU to continue executing
instructions while a miss is handled
“hit under miss” → processors allow only 1
outstanding miss
“miss under miss” → processors allow multiple
misses outstanding
#5. Non blocking caches
Challenges
–Maintaining order when multiple misses that
might return out of order
–Load or Store to an already pending miss
address (need merge)
#6. Critical Word First
• Critical word first
– Request missed word from memory first
– Send it to the processor as soon as it arrives
– Rest of cache line comes after “critical word”
#6. Early Restart
• Early restart
– Data returns from memory in order
– Send missed work to the processor as soon as it arrives
– Processor Restarts when needed word is returned
#7. Merging write buffers
• Write buffer lets processor continue while
waiting for write to complete
• If buffer contains modified blocks, addresses
can be checked to see if new data matches that
of some write buffer entry
• If so, new data combined with that entry
#7. Merging write buffers
#8.Compiler Optimizations
• Group data access together to improve temporal locality
• Re- order data access to improve spatial locality
– Loop Interchange: change nesting of loops to access
data in order stored in memory
– Loop fusion: Group the data access
– Blocking: Improve temporal and spatial locality by
accessing “blocks” of data repeatedly vs. going down
whole columns or rows
Loop Interchange Example
/* Before */
for (j = 0; j < 3; j = j+1)
for (i = 0; i < 3; i = i+1)
x[i][j] = 2 * x[i][j];
/* After */
for (i = 0; i < 3; i = i+1)
for (j = 0; j < 3; j = j+1)
x[i][j] = 2 * x[i][j];
Sequential accesses instead of striding through memory every
word
What type of locality does this improves?
Loop Fusion
50
Blocking
for (i = 0; i < N; i = i+1)
for (j = 0; j < N; j = j+1)
{r = 0;
for (k = 0; k < N; k = k+1){
r = r + y[i][k]*z[k][j];};
x[i][j] = r;
};
• Two inner loops:
– Read all NxN elements of z[]
– Read N elements of 1 row of y[] repeatedly
– Write N elements of 1 row of x[]
How Pre fetching Works?
#9. Hardware Pre fetching
• Pre fetch on miss
- Pre fetch b+1 upon miss on b
• One Block Lookahead (OBL) scheme
- Initiate pre fetch for block b+1 when block b is
accessed
#10. Compiler Controlled Pre fetch
• No prefetching
for (i = 0; i < N;
i++) {
sum += a[i]*b[i];
}
• Simple prefetching
for (i = 0; i < N;
i++) {
fetch (&a[i+1]);
fetch (&b[i+1]);
sum += a[i]*b[i];
}
#10. Compiler Controlled Pre fetch
Summary

More Related Content

What's hot

Mapping
MappingMapping
Go back-n protocol
Go back-n protocolGo back-n protocol
Go back-n protocol
STEFFY D
 
Memory Management in OS
Memory Management in OSMemory Management in OS
Memory Management in OS
vampugani
 
Advanced computer architecture
Advanced computer architectureAdvanced computer architecture
Advanced computer architecture
AjithaSomasundaram
 
Interconnection Network
Interconnection NetworkInterconnection Network
Interconnection Network
Ali A Jalil
 
Chapter 13 - I/O Systems
Chapter 13 - I/O SystemsChapter 13 - I/O Systems
Chapter 13 - I/O Systems
Wayne Jones Jnr
 
Ram and-rom-chips
Ram and-rom-chipsRam and-rom-chips
Ram and-rom-chipsAnuj Modi
 
Pipelining and vector processing
Pipelining and vector processingPipelining and vector processing
Pipelining and vector processing
Kamal Acharya
 
Cache memory ppt
Cache memory ppt  Cache memory ppt
Cache memory ppt
Arpita Naik
 
Cache memory principles
Cache memory principlesCache memory principles
Cache memory principles
bit allahabad
 
Design issues of dos
Design issues of dosDesign issues of dos
Design issues of dos
vanamali_vanu
 
Elements of cache design
Elements of cache designElements of cache design
Elements of cache design
Rohail Butt
 
Interconnection Network
Interconnection NetworkInterconnection Network
Interconnection Network
Heman Pathak
 
Design and analysis of algorithms
Design and analysis of algorithmsDesign and analysis of algorithms
Design and analysis of algorithms
Dr Geetha Mohan
 
Centralized shared memory architectures
Centralized shared memory architecturesCentralized shared memory architectures
Centralized shared memory architectures
Gokuldhev mony
 
Dynamic interconnection networks
Dynamic interconnection networksDynamic interconnection networks
Dynamic interconnection networks
Prasenjit Dey
 
Distributed Shared Memory
Distributed Shared MemoryDistributed Shared Memory
Distributed Shared Memory
Prakhar Rastogi
 
Unit 4 memory system
Unit 4   memory systemUnit 4   memory system
Unit 4 memory systemchidabdu
 

What's hot (20)

Mapping
MappingMapping
Mapping
 
Go back-n protocol
Go back-n protocolGo back-n protocol
Go back-n protocol
 
Memory Management in OS
Memory Management in OSMemory Management in OS
Memory Management in OS
 
1.prallelism
1.prallelism1.prallelism
1.prallelism
 
Advanced computer architecture
Advanced computer architectureAdvanced computer architecture
Advanced computer architecture
 
Interconnection Network
Interconnection NetworkInterconnection Network
Interconnection Network
 
Chapter 13 - I/O Systems
Chapter 13 - I/O SystemsChapter 13 - I/O Systems
Chapter 13 - I/O Systems
 
Ram and-rom-chips
Ram and-rom-chipsRam and-rom-chips
Ram and-rom-chips
 
Pipelining and vector processing
Pipelining and vector processingPipelining and vector processing
Pipelining and vector processing
 
Cache memory ppt
Cache memory ppt  Cache memory ppt
Cache memory ppt
 
Cache memory principles
Cache memory principlesCache memory principles
Cache memory principles
 
Design issues of dos
Design issues of dosDesign issues of dos
Design issues of dos
 
Memory hierarchy
Memory hierarchyMemory hierarchy
Memory hierarchy
 
Elements of cache design
Elements of cache designElements of cache design
Elements of cache design
 
Interconnection Network
Interconnection NetworkInterconnection Network
Interconnection Network
 
Design and analysis of algorithms
Design and analysis of algorithmsDesign and analysis of algorithms
Design and analysis of algorithms
 
Centralized shared memory architectures
Centralized shared memory architecturesCentralized shared memory architectures
Centralized shared memory architectures
 
Dynamic interconnection networks
Dynamic interconnection networksDynamic interconnection networks
Dynamic interconnection networks
 
Distributed Shared Memory
Distributed Shared MemoryDistributed Shared Memory
Distributed Shared Memory
 
Unit 4 memory system
Unit 4   memory systemUnit 4   memory system
Unit 4 memory system
 

Similar to Cache optimization

cache memory.ppt
cache memory.pptcache memory.ppt
cache memory.ppt
MUNAZARAZZAQELEA
 
cache memory.ppt
cache memory.pptcache memory.ppt
cache memory.ppt
MUNAZARAZZAQELEA
 
8 memory management strategies
8 memory management strategies8 memory management strategies
8 memory management strategies
Dr. Loganathan R
 
Computer Memory Hierarchy Computer Architecture
Computer Memory Hierarchy Computer ArchitectureComputer Memory Hierarchy Computer Architecture
Computer Memory Hierarchy Computer Architecture
Haris456
 
coa-Unit5-ppt1 (1).pptx
coa-Unit5-ppt1 (1).pptxcoa-Unit5-ppt1 (1).pptx
coa-Unit5-ppt1 (1).pptx
Ruhul Amin
 
Multiprocessor
MultiprocessorMultiprocessor
Multiprocessor
narendra kumar
 
Lec10 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Memory part2
Lec10 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Memory part2Lec10 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Memory part2
Lec10 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Memory part2
Hsien-Hsin Sean Lee, Ph.D.
 
Cache Memory.pptx
Cache Memory.pptxCache Memory.pptx
Cache Memory.pptx
ssusere16bd9
 
Ways to reduce misses
Ways to reduce missesWays to reduce misses
Ways to reduce misses
nellins
 
cache memory
 cache memory cache memory
Cache optimization
Cache optimizationCache optimization
Cache optimizationKavi Kathir
 
Unit I Memory technology and optimization
Unit I Memory technology and optimizationUnit I Memory technology and optimization
Unit I Memory technology and optimization
K Gowsic Gowsic
 
Memory technology and optimization in Advance Computer Architechture
Memory technology and optimization in Advance Computer ArchitechtureMemory technology and optimization in Advance Computer Architechture
Memory technology and optimization in Advance Computer ArchitechtureShweta Ghate
 
memorytechnologyandoptimization-140416131506-phpapp02.pptx
memorytechnologyandoptimization-140416131506-phpapp02.pptxmemorytechnologyandoptimization-140416131506-phpapp02.pptx
memorytechnologyandoptimization-140416131506-phpapp02.pptx
shahdivyanshu1002
 
Memory Hierarchy Design, Basics, Cache Optimization, Address Translation
Memory Hierarchy Design, Basics, Cache Optimization, Address TranslationMemory Hierarchy Design, Basics, Cache Optimization, Address Translation
Memory Hierarchy Design, Basics, Cache Optimization, Address Translation
Farwa Ansari
 
Snooping protocols 3
Snooping protocols 3Snooping protocols 3
Snooping protocols 3
Yasir Khan
 
computer-memory
computer-memorycomputer-memory
computer-memory
Bablu Shofi
 
Memory Hierarchy PPT of Computer Organization
Memory Hierarchy PPT of Computer OrganizationMemory Hierarchy PPT of Computer Organization
Memory Hierarchy PPT of Computer Organization
2022002857mbit
 
Cache.pptx
Cache.pptxCache.pptx
Cache.pptx
VCETCSE
 

Similar to Cache optimization (20)

12-6810-12.ppt
12-6810-12.ppt12-6810-12.ppt
12-6810-12.ppt
 
cache memory.ppt
cache memory.pptcache memory.ppt
cache memory.ppt
 
cache memory.ppt
cache memory.pptcache memory.ppt
cache memory.ppt
 
8 memory management strategies
8 memory management strategies8 memory management strategies
8 memory management strategies
 
Computer Memory Hierarchy Computer Architecture
Computer Memory Hierarchy Computer ArchitectureComputer Memory Hierarchy Computer Architecture
Computer Memory Hierarchy Computer Architecture
 
coa-Unit5-ppt1 (1).pptx
coa-Unit5-ppt1 (1).pptxcoa-Unit5-ppt1 (1).pptx
coa-Unit5-ppt1 (1).pptx
 
Multiprocessor
MultiprocessorMultiprocessor
Multiprocessor
 
Lec10 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Memory part2
Lec10 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Memory part2Lec10 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Memory part2
Lec10 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Memory part2
 
Cache Memory.pptx
Cache Memory.pptxCache Memory.pptx
Cache Memory.pptx
 
Ways to reduce misses
Ways to reduce missesWays to reduce misses
Ways to reduce misses
 
cache memory
 cache memory cache memory
cache memory
 
Cache optimization
Cache optimizationCache optimization
Cache optimization
 
Unit I Memory technology and optimization
Unit I Memory technology and optimizationUnit I Memory technology and optimization
Unit I Memory technology and optimization
 
Memory technology and optimization in Advance Computer Architechture
Memory technology and optimization in Advance Computer ArchitechtureMemory technology and optimization in Advance Computer Architechture
Memory technology and optimization in Advance Computer Architechture
 
memorytechnologyandoptimization-140416131506-phpapp02.pptx
memorytechnologyandoptimization-140416131506-phpapp02.pptxmemorytechnologyandoptimization-140416131506-phpapp02.pptx
memorytechnologyandoptimization-140416131506-phpapp02.pptx
 
Memory Hierarchy Design, Basics, Cache Optimization, Address Translation
Memory Hierarchy Design, Basics, Cache Optimization, Address TranslationMemory Hierarchy Design, Basics, Cache Optimization, Address Translation
Memory Hierarchy Design, Basics, Cache Optimization, Address Translation
 
Snooping protocols 3
Snooping protocols 3Snooping protocols 3
Snooping protocols 3
 
computer-memory
computer-memorycomputer-memory
computer-memory
 
Memory Hierarchy PPT of Computer Organization
Memory Hierarchy PPT of Computer OrganizationMemory Hierarchy PPT of Computer Organization
Memory Hierarchy PPT of Computer Organization
 
Cache.pptx
Cache.pptxCache.pptx
Cache.pptx
 

More from Vani Kandhasamy

Java Basics - Part2
Java Basics - Part2Java Basics - Part2
Java Basics - Part2
Vani Kandhasamy
 
Java Basics - Part1
Java Basics - Part1Java Basics - Part1
Java Basics - Part1
Vani Kandhasamy
 
Introduction to OOP
Introduction to OOPIntroduction to OOP
Introduction to OOP
Vani Kandhasamy
 
Economic network analysis - Part 2
Economic network analysis - Part 2Economic network analysis - Part 2
Economic network analysis - Part 2
Vani Kandhasamy
 
Economic network analysis - Part 1
Economic network analysis - Part 1Economic network analysis - Part 1
Economic network analysis - Part 1
Vani Kandhasamy
 
Cascading behavior in the networks
Cascading behavior in the networksCascading behavior in the networks
Cascading behavior in the networks
Vani Kandhasamy
 
Community detection-Part2
Community detection-Part2Community detection-Part2
Community detection-Part2
Vani Kandhasamy
 
Community detection-Part1
Community detection-Part1Community detection-Part1
Community detection-Part1
Vani Kandhasamy
 
Link Analysis
Link AnalysisLink Analysis
Link Analysis
Vani Kandhasamy
 
Network Models
Network ModelsNetwork Models
Network Models
Vani Kandhasamy
 
Representing & Measuring networks
Representing & Measuring networksRepresenting & Measuring networks
Representing & Measuring networks
Vani Kandhasamy
 

More from Vani Kandhasamy (11)

Java Basics - Part2
Java Basics - Part2Java Basics - Part2
Java Basics - Part2
 
Java Basics - Part1
Java Basics - Part1Java Basics - Part1
Java Basics - Part1
 
Introduction to OOP
Introduction to OOPIntroduction to OOP
Introduction to OOP
 
Economic network analysis - Part 2
Economic network analysis - Part 2Economic network analysis - Part 2
Economic network analysis - Part 2
 
Economic network analysis - Part 1
Economic network analysis - Part 1Economic network analysis - Part 1
Economic network analysis - Part 1
 
Cascading behavior in the networks
Cascading behavior in the networksCascading behavior in the networks
Cascading behavior in the networks
 
Community detection-Part2
Community detection-Part2Community detection-Part2
Community detection-Part2
 
Community detection-Part1
Community detection-Part1Community detection-Part1
Community detection-Part1
 
Link Analysis
Link AnalysisLink Analysis
Link Analysis
 
Network Models
Network ModelsNetwork Models
Network Models
 
Representing & Measuring networks
Representing & Measuring networksRepresenting & Measuring networks
Representing & Measuring networks
 

Recently uploaded

一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理
zwunae
 
Online aptitude test management system project report.pdf
Online aptitude test management system project report.pdfOnline aptitude test management system project report.pdf
Online aptitude test management system project report.pdf
Kamal Acharya
 
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressionsKuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
Victor Morales
 
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdfHybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
fxintegritypublishin
 
NUMERICAL SIMULATIONS OF HEAT AND MASS TRANSFER IN CONDENSING HEAT EXCHANGERS...
NUMERICAL SIMULATIONS OF HEAT AND MASS TRANSFER IN CONDENSING HEAT EXCHANGERS...NUMERICAL SIMULATIONS OF HEAT AND MASS TRANSFER IN CONDENSING HEAT EXCHANGERS...
NUMERICAL SIMULATIONS OF HEAT AND MASS TRANSFER IN CONDENSING HEAT EXCHANGERS...
ssuser7dcef0
 
Swimming pool mechanical components design.pptx
Swimming pool  mechanical components design.pptxSwimming pool  mechanical components design.pptx
Swimming pool mechanical components design.pptx
yokeleetan1
 
Hierarchical Digital Twin of a Naval Power System
Hierarchical Digital Twin of a Naval Power SystemHierarchical Digital Twin of a Naval Power System
Hierarchical Digital Twin of a Naval Power System
Kerry Sado
 
一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理
一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理
一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理
ydteq
 
Tutorial for 16S rRNA Gene Analysis with QIIME2.pdf
Tutorial for 16S rRNA Gene Analysis with QIIME2.pdfTutorial for 16S rRNA Gene Analysis with QIIME2.pdf
Tutorial for 16S rRNA Gene Analysis with QIIME2.pdf
aqil azizi
 
Understanding Inductive Bias in Machine Learning
Understanding Inductive Bias in Machine LearningUnderstanding Inductive Bias in Machine Learning
Understanding Inductive Bias in Machine Learning
SUTEJAS
 
Forklift Classes Overview by Intella Parts
Forklift Classes Overview by Intella PartsForklift Classes Overview by Intella Parts
Forklift Classes Overview by Intella Parts
Intella Parts
 
digital fundamental by Thomas L.floydl.pdf
digital fundamental by Thomas L.floydl.pdfdigital fundamental by Thomas L.floydl.pdf
digital fundamental by Thomas L.floydl.pdf
drwaing
 
Cosmetic shop management system project report.pdf
Cosmetic shop management system project report.pdfCosmetic shop management system project report.pdf
Cosmetic shop management system project report.pdf
Kamal Acharya
 
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdfTop 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Teleport Manpower Consultant
 
Fundamentals of Induction Motor Drives.pptx
Fundamentals of Induction Motor Drives.pptxFundamentals of Induction Motor Drives.pptx
Fundamentals of Induction Motor Drives.pptx
manasideore6
 
Water Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation and Control Monthly - May 2024.pdfWater Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation & Control
 
Literature Review Basics and Understanding Reference Management.pptx
Literature Review Basics and Understanding Reference Management.pptxLiterature Review Basics and Understanding Reference Management.pptx
Literature Review Basics and Understanding Reference Management.pptx
Dr Ramhari Poudyal
 
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
bakpo1
 
Final project report on grocery store management system..pdf
Final project report on grocery store management system..pdfFinal project report on grocery store management system..pdf
Final project report on grocery store management system..pdf
Kamal Acharya
 
DfMAy 2024 - key insights and contributions
DfMAy 2024 - key insights and contributionsDfMAy 2024 - key insights and contributions
DfMAy 2024 - key insights and contributions
gestioneergodomus
 

Recently uploaded (20)

一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理
 
Online aptitude test management system project report.pdf
Online aptitude test management system project report.pdfOnline aptitude test management system project report.pdf
Online aptitude test management system project report.pdf
 
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressionsKuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
 
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdfHybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
 
NUMERICAL SIMULATIONS OF HEAT AND MASS TRANSFER IN CONDENSING HEAT EXCHANGERS...
NUMERICAL SIMULATIONS OF HEAT AND MASS TRANSFER IN CONDENSING HEAT EXCHANGERS...NUMERICAL SIMULATIONS OF HEAT AND MASS TRANSFER IN CONDENSING HEAT EXCHANGERS...
NUMERICAL SIMULATIONS OF HEAT AND MASS TRANSFER IN CONDENSING HEAT EXCHANGERS...
 
Swimming pool mechanical components design.pptx
Swimming pool  mechanical components design.pptxSwimming pool  mechanical components design.pptx
Swimming pool mechanical components design.pptx
 
Hierarchical Digital Twin of a Naval Power System
Hierarchical Digital Twin of a Naval Power SystemHierarchical Digital Twin of a Naval Power System
Hierarchical Digital Twin of a Naval Power System
 
一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理
一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理
一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理
 
Tutorial for 16S rRNA Gene Analysis with QIIME2.pdf
Tutorial for 16S rRNA Gene Analysis with QIIME2.pdfTutorial for 16S rRNA Gene Analysis with QIIME2.pdf
Tutorial for 16S rRNA Gene Analysis with QIIME2.pdf
 
Understanding Inductive Bias in Machine Learning
Understanding Inductive Bias in Machine LearningUnderstanding Inductive Bias in Machine Learning
Understanding Inductive Bias in Machine Learning
 
Forklift Classes Overview by Intella Parts
Forklift Classes Overview by Intella PartsForklift Classes Overview by Intella Parts
Forklift Classes Overview by Intella Parts
 
digital fundamental by Thomas L.floydl.pdf
digital fundamental by Thomas L.floydl.pdfdigital fundamental by Thomas L.floydl.pdf
digital fundamental by Thomas L.floydl.pdf
 
Cosmetic shop management system project report.pdf
Cosmetic shop management system project report.pdfCosmetic shop management system project report.pdf
Cosmetic shop management system project report.pdf
 
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdfTop 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
 
Fundamentals of Induction Motor Drives.pptx
Fundamentals of Induction Motor Drives.pptxFundamentals of Induction Motor Drives.pptx
Fundamentals of Induction Motor Drives.pptx
 
Water Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation and Control Monthly - May 2024.pdfWater Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation and Control Monthly - May 2024.pdf
 
Literature Review Basics and Understanding Reference Management.pptx
Literature Review Basics and Understanding Reference Management.pptxLiterature Review Basics and Understanding Reference Management.pptx
Literature Review Basics and Understanding Reference Management.pptx
 
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
 
Final project report on grocery store management system..pdf
Final project report on grocery store management system..pdfFinal project report on grocery store management system..pdf
Final project report on grocery store management system..pdf
 
DfMAy 2024 - key insights and contributions
DfMAy 2024 - key insights and contributionsDfMAy 2024 - key insights and contributions
DfMAy 2024 - key insights and contributions
 

Cache optimization

  • 2. Cache Performance Measure AMAT = Hit Time + Miss Rate * Miss Penalty
  • 3. Cache Performance Measure • Hit rate: fraction of cache access that result in a hit • Miss rate: 1- Hit rate • Hit time: time to access the cache • Miss penalty: time to place/replace a block from lower level plus time taken to transfer to Cache –access time: time to access lower level –transfer time: time to transfer block
  • 4. Improving Cache Performance AMAT = Hit Time + Miss Rate * Miss Penalty • Reducing miss rate • Reducing cache miss Penalty • Reducing hit time
  • 5. Three Cs Model ❖ Compulsory: • The very first access to a block cant be in the cache (unless pre-fetched) • cold-start or first reference misses ❖ Capacity: • Cache cannot contain all blocks accessed by the program ❖ Conflict (collision): • Multiple memory locations mapped to the same cache location
  • 6. 6 Basic Cache Optimizations • Reducing Miss Rate 1. Larger Block size (compulsory misses) 2. Larger Cache size (capacity misses) 3. Higher Associativity (conflict misses) • Reducing Miss Penalty 4. Multilevel Caches 5. Giving Reads Priority over Writes • Reducing hit time 6. Avoiding Address Translation during Indexing of the Cache
  • 7. #1. Larger Block size ❖Advantage + Reduce compulsory misses (by better utilizing spatial locality) ❖Disadvantage - Increase miss penalty ‐ Could increase conflict and capacity misses Larger block size → no. of blocks ↓
  • 8. #1. Larger Block size • Optimal block size depends on the latency and bandwidth of lower‐level memory - High bandwidth and high latency encourage large blocks - Low bandwidth and low latency encourage smaller block sizes
  • 9. Problem Assume the memory subsystem takes 80 Clock cycles of overhead and then delivers 16 bytes every 2 clock cycles. Which block size has the smallest AMAT for each cache size given below? Consider hit time is 1 clock cycle.
  • 11. #2. Larger Cache size Cache size↑ → miss rate↓; hit time↑ ❖Advantage + Large cache reduce capacity misses ❖Disadvantage ‐ Longer hit time ‐ Higher power and cost
  • 12. #3. Higher Associativity Associativity ↑ → miss rate↓; hit time↑ 2:1 rule of thumb: Direct mapped cache of size N has the same miss rate as two‐way associative cache of size N/2 ❖Advantage + Reduce conflict misses ❖Disadvantage ‐ Increase hit time ‐ Increased power consumption
  • 14. #4. Multilevel caches ✓CPU gets faster ✓Caches need to be BIGGER and FASTER Solution: • Fast 1st level cache (matching the clock cycle time of fast processor) • Large 2nd level cache (to catch misses in the 1st level cache)
  • 15. #4. Multilevel caches Q1. How do we analyze performance? AMAT = Hit Time L1 + Miss Rate L1 * Miss PenaltyL1 Miss Penalty L1 = Hit Time L2 + Miss Rate L2 *Miss Penalty L2
  • 16. #4. Multilevel caches Local miss rate: # misses / # local number of accesses Global miss rate: # misses / # total number of access
  • 18. #4. Multilevel caches Q2. How to place blocks in multilevel caches? • Multilevel Inclusion • Multilevel Exclusion ❖Advantage + Reduce miss penalty ❖Disadvantage ‐ High complexity
  • 20. Solution 1. AMAT = Hit Time L1 + Miss Rate L1 * (Hit Time L2 + Miss Rate L2 *Miss Penalty L2) = 1 + 4% * (10 + 50% * 200) = 5.4 Clock Cycles 2. Average memory stalls per instruction = Misses per instruction L1 * Hit time L2 + Misses per instruction L2 * Miss penalty L2 = (60/1000) * 10 + (30/1000) * 200 = 6.6 Clock Cycles
  • 21. #5. Giving Reads Priority over Writes SW R3, 512(R0) ;cache index 0 LW R1, 1024(R0) ;cache index 0 LW R2, 512(R0) ;cache index 0 • Assume direct-mapped write through cache where 1024 and 512 are mapped to the same block. • R2 == R3
  • 22. #5. Giving Reads Priority over Writes • Write Through ✓ Wait until the write buffer is empty ✓ Check the contents of the write buffer on a read miss, if no conflicts, continue read miss. • Write Back ✓ Read miss replacing dirty block – Normal: Write dirty block to memory, and then do the read – Instead, copy the dirty block to a write buffer, then do the read, and then write
  • 23. #6. Avoiding Address Translation during Indexing of the Cache • CPU → Virtual address • Virtual address → TLB • TLB → Physical address
  • 24. #6. Avoiding Address Translation during Indexing of the Cache • Page Table Entry (PTE) is checked in TLB
  • 25. #6. Avoiding Address Translation during Indexing of the Cache • PTE is checked in TLB • Load PTE from PT
  • 26. #6. Avoiding Address Translation during Indexing of the Cache • Cache is indexed and tagged by Physical address • TLB is indexed by Virtual address • Very slow
  • 27. #6. Avoiding Address Translation during Indexing of the Cache • Cache is indexed and tagged by Virtual address • TLB is indexed by Virtual address • Synonyms problem
  • 28. #6. Avoiding Address Translation during Indexing of the Cache
  • 29. #6. Avoiding Address Translation during Indexing of the Cache • Cache is indexed by Virtual address but tagged by Physical address • TLB is indexed by Virtual address
  • 31. Advanced Optimization Techniques • Reducing hit time 1. Small and simple caches 2. Way prediction • Increasing cache bandwidth 3. Pipelined caches 4. Multi banked caches 5. Non blocking caches • Reducing Miss Penalty 6. Critical word first 7. Merging write buffers • Reducing Miss Rate 8. Compiler optimizations • Reducing miss penalty or miss rate via parallelism 9. Hardware pre fetching 10. Compiler pre fetching
  • 32. #1 Small and simple first level caches Cache Size → Hit Time
  • 33. L1 cache size and associativity
  • 34. L1 cache size and associativity
  • 35. #2 Way Prediction Q1. How to combine fast hit time of Direct Mapped and have the lower conflict misses of 2‐way SA cache? • Way prediction: predict the way to pre-set multiplexer - keep extra bits in cache to predict the “way,” or block within the set, of next cache access. Set1 Set2
  • 36. #3 Pipelined caches • Pipeline cache access to maintain bandwidth (but higher latency) • Simple 3-stage pipeline ✓access the cache ✓tag check & hit/miss logic ✓data transfer back to CPU
  • 38. #4. Multi banked caches • Organize cache as independent banks to support simultaneous access
  • 39. #5. Non blocking caches • A blocking cache stalls the pipeline on a cache miss • A non-blocking cache permits subsequent cache accesses on a miss • Allows the CPU to continue executing instructions while a miss is handled “hit under miss” → processors allow only 1 outstanding miss “miss under miss” → processors allow multiple misses outstanding
  • 40.
  • 41. #5. Non blocking caches Challenges –Maintaining order when multiple misses that might return out of order –Load or Store to an already pending miss address (need merge)
  • 42. #6. Critical Word First • Critical word first – Request missed word from memory first – Send it to the processor as soon as it arrives – Rest of cache line comes after “critical word”
  • 43. #6. Early Restart • Early restart – Data returns from memory in order – Send missed work to the processor as soon as it arrives – Processor Restarts when needed word is returned
  • 44. #7. Merging write buffers • Write buffer lets processor continue while waiting for write to complete • If buffer contains modified blocks, addresses can be checked to see if new data matches that of some write buffer entry • If so, new data combined with that entry
  • 45. #7. Merging write buffers
  • 46. #8.Compiler Optimizations • Group data access together to improve temporal locality • Re- order data access to improve spatial locality – Loop Interchange: change nesting of loops to access data in order stored in memory – Loop fusion: Group the data access – Blocking: Improve temporal and spatial locality by accessing “blocks” of data repeatedly vs. going down whole columns or rows
  • 47. Loop Interchange Example /* Before */ for (j = 0; j < 3; j = j+1) for (i = 0; i < 3; i = i+1) x[i][j] = 2 * x[i][j]; /* After */ for (i = 0; i < 3; i = i+1) for (j = 0; j < 3; j = j+1) x[i][j] = 2 * x[i][j]; Sequential accesses instead of striding through memory every word What type of locality does this improves?
  • 49. 50 Blocking for (i = 0; i < N; i = i+1) for (j = 0; j < N; j = j+1) {r = 0; for (k = 0; k < N; k = k+1){ r = r + y[i][k]*z[k][j];}; x[i][j] = r; }; • Two inner loops: – Read all NxN elements of z[] – Read N elements of 1 row of y[] repeatedly – Write N elements of 1 row of x[]
  • 50.
  • 51.
  • 52.
  • 53.
  • 54.
  • 55.
  • 57.
  • 58. #9. Hardware Pre fetching • Pre fetch on miss - Pre fetch b+1 upon miss on b • One Block Lookahead (OBL) scheme - Initiate pre fetch for block b+1 when block b is accessed
  • 59. #10. Compiler Controlled Pre fetch • No prefetching for (i = 0; i < N; i++) { sum += a[i]*b[i]; } • Simple prefetching for (i = 0; i < N; i++) { fetch (&a[i+1]); fetch (&b[i+1]); sum += a[i]*b[i]; }