SlideShare a Scribd company logo
1 of 22
DAP Spr.‘98 ©UCB 1
Lecture 12:
Memory Hierarchy—Ways to Reduce
Misses
DAP Spr.‘98 ©UCB 2
Review: Four Questions for
Memory Hierarchy Designers
• Q1: Where can a block be placed in the upper level?
(Block placement)
– Fully Associative, Set Associative, Direct Mapped
• Q2: How is a block found if it is in the upper level?
(Block identification)
– Tag/Block
• Q3: Which block should be replaced on a miss?
(Block replacement)
– Random, LRU
• Q4: What happens on a write?
(Write strategy)
– Write Back or Write Through (with Write Buffer)
DAP Spr.‘98 ©UCB 3
Review: Cache Performance
CPUtime = Instruction Count x (CPIexecution +
Mem accesses per instruction x Miss rate x
Miss penalty) x Clock cycle time
Misses per instruction = Memory accesses per
instruction x Miss rate
CPUtime = IC x (CPIexecution + Misses per
instruction x Miss penalty) x Clock cycle time
To Improve Cache Performance:
1. Reduce the miss rate
2. Reduce the miss penalty
3. Reduce the time to hit in the cache.
DAP Spr.‘98 ©UCB 4
Reducing Misses
• Classifying Misses: 3 Cs
– Compulsory—The first access to a block is not in the
cache, so the block must be brought into the cache. Also
called cold start misses or first reference misses.
(Misses in even an Infinite Cache)
– Capacity—If the cache cannot contain all the blocks needed
during execution of a program, capacity misses will occur
due to blocks being discarded and later retrieved.
(Misses in Fully Associative Size X Cache)
– Conflict—If block-placement strategy is set associative or
direct mapped, conflict misses (in addition to compulsory &
capacity misses) will occur because a block can be
discarded and later retrieved if too many blocks map to its
set. Also called collision misses or interference misses.
(Misses in N-way Associative, Size X Cache)
DAP Spr.‘98 ©UCB 5
Cache Size (KB)
MissRateperType
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
1
2
4
8
16
32
64
128
1-way
2-way
4-way
8-way
Capacity
Compulsory
3Cs Absolute Miss Rate
(SPEC92)
Conflict
Note: Compulsory
Miss small
DAP Spr.‘98 ©UCB 6
Cache Size (KB)
MissRateperType
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
1
2
4
8
16
32
64
128
1-way
2-way
4-way
8-way
Capacity
Compulsory
2:1 Cache Rule
Conflict
miss rate 1-way associative cache size X
= miss rate 2-way associative cache size X/2
DAP Spr.‘98 ©UCB 7
How Can Reduce Misses?
• 3 Cs: Compulsory, Capacity, Conflict
• In all cases, assume total cache size not changed:
• What happens if:
1) Change Block Size:
Which of 3Cs is obviously affected?
2) Change Associativity:
Which of 3Cs is obviously affected?
3) Change Compiler:
Which of 3Cs is obviously affected?
DAP Spr.‘98 ©UCB 8
Block Size (bytes)
Miss
Rate
0%
5%
10%
15%
20%
25%
16
32
64
128
256
1K
4K
16K
64K
256K
1. Reduce Misses via Larger
Block Size
DAP Spr.‘98 ©UCB 9
2. Reduce Misses via Higher
Associativity
• 2:1 Cache Rule:
– Miss Rate DM cache size N Miss Rate 2-way cache
size N/2
• Beware: Execution time is only final measure!
– Will Clock Cycle time increase?
– Hill [1988] suggested hit time for 2-way vs. 1-way
external cache +10%,
internal + 2%
DAP Spr.‘98 ©UCB 10
Example: Avg. Memory Access
Time vs. Miss Rate
• Example: assume CCT = 1.10 for 2-way, 1.12 for 4-way,
1.14 for 8-way vs. CCT direct mapped
Cache Size Associativity
(KB) 1-way 2-way 4-way 8-way
1 2.33 2.15 2.07 2.01
2 1.98 1.86 1.76 1.68
4 1.72 1.67 1.61 1.53
8 1.46 1.48 1.47 1.43
16 1.29 1.32 1.32 1.32
32 1.20 1.24 1.25 1.27
64 1.14 1.20 1.21 1.23
128 1.10 1.17 1.18 1.20
(Red means A.M.A.T. not improved by more associativity)
DAP Spr.‘98 ©UCB 11
3. Reducing Misses via a
“Victim Cache”
• How to combine fast hit time of direct mapped
yet still avoid conflict misses?
• Add buffer to place data discarded from cache
• Jouppi [1990]: 4-entry victim cache removed 20% to
95% of conflicts for a 4 KB direct mapped data cache
• Used in Alpha, HP machines
DAP Spr.‘98 ©UCB 12
5. Reducing Misses by
Prefetching of Instructions & Data
• Instruction prefetching – Sequentially prefetch
instructions from IM to the instruction Queue (IQ)
together with branch prediction – All computers
employ this.
• Data prefetching – Difficult to predict data that will
be used in future. Following questions must be
answered.
1. What to prefetch? – How to know which data will
be used? Unnecessary prefetches will waste
memory/bus bandwidth and will replace useful
data in the cache (cache pollution problem) giving
rise to negative impact on the execution time.
2. When to prefetch? – Must be early enough for
the data to be useful, but too early will cause
cache pollution problem.
DAP Spr.‘98 ©UCB 13
Data Prefetching
• Software Prefetching – Explicit instructions to
prefetch data are inserted in the program.
Difficult to decide where to put in the program.
Needs good compiler analysis. Some computers
already have prefetch intructions. Examples are:
-- Load data into register (HP PA-RISC loads)
– Cache Prefetch: load into cache
(MIPS IV, PowerPC, SPARC v. 9)
• Hardware Prefetching – Difficult to predict and
design. Different results for different applications
DAP Spr.‘98 ©UCB 14
5. Reducing Cache Pollution
• E.g., Instruction Prefetching
– Alpha 21064 fetches 2 blocks on a miss
– Extra block placed in “stream buffer”
– On miss check stream buffer
• Works with data blocks too:
– Jouppi [1990] 1 data stream buffer got 25% misses from
4KB cache; 4 streams got 43%
– Palacharla & Kessler [1994] for scientific programs for 8
streams got 50% to 70% of misses from
2 64KB, 4-way set associative caches
• Prefetching relies on having extra memory
bandwidth that can be used without penalty
DAP Spr.‘98 ©UCB 15
Summary
• 3 Cs: Compulsory, Capacity, Conflict Misses
• Reducing Miss Rate
1. Reduce Misses via Larger Block Size
2. Reduce Misses via Higher Associativity
3. Reducing Misses via Victim Cache
4. 5. Reducing Misses by HW Prefetching Instr, Data
6. Reducing Misses by SW Prefetching Data
7. Reducing Misses by Compiler Optimizations
• Remember danger of concentrating on just one
parameter when evaluating performance
CPUtime  IC  CPIExecuti on

Memory accesses
Instruction
 Miss rate Miss penalty




 Clock cycle time
DAP Spr.‘98 ©UCB 16
Review: Improving Cache
Performance
1. Reduce the miss rate,
2. Reduce the miss penalty, or
3. Reduce the time to hit in the cache.
DAP Spr.‘98 ©UCB 17
1. Reducing Miss Penalty:
Read Priority over Write on Miss
• Write through with write buffers offer RAW conflicts
with main memory reads on cache misses
• If simply wait for write buffer to empty, might
increase read miss penalty (old MIPS 1000 by 50% )
• Check write buffer contents before read;
if no conflicts, let the memory access continue
• Write Back?
– Read miss replacing dirty block
– Normal: Write dirty block to memory, and then do the read
– Instead copy the dirty block to a write buffer, then do the
read, and then do the write
– CPU stall less since restarts as soon as do read
DAP Spr.‘98 ©UCB 18
4. Reduce Miss Penalty: Non-blocking
Caches to reduce stalls on misses
• Non-blocking cache or lockup-free cache allow data
cache to continue to supply cache hits during a miss
– requires out-of-order execution CPU
• “hit under multiple miss” or “miss under miss” may
further lower the effective miss penalty by overlapping
multiple misses
– Significantly increases the complexity of the cache controller
as there can be multiple outstanding memory accesses
– Requires multiple memory banks (otherwise cannot support)
– Pentium Pro allows 4 outstanding memory misses
 The technique requires use of a few miss status
holding registers (MSHRs) to hold the outstanding
memory requests.
DAP Spr.‘98 ©UCB 19
5th Miss Penalty Reduction:
Second Level Cache
• L2 Equations
AMAT = Hit TimeL1 + Miss RateL1 x Miss PenaltyL1
Miss PenaltyL1 = Hit TimeL2 + Miss RateL2 x Miss PenaltyL2
AMAT = Hit TimeL1 + Miss RateL1 x (Hit TimeL2 + Miss RateL2 +
Miss PenaltyL2)
• Definitions:
– Local miss rate— misses in this cache divided by the total
number of memory accesses to this cache (Miss rateL2)
– Global miss rate—misses in this cache divided by the total
number of memory accesses generated by the CPU
(Miss RateL1 x Miss RateL2)
– Global Miss Rate is what matters
DAP Spr.‘98 ©UCB 20
An Example (pp. 576)
Q: Suppose we have a processor with a base CPI of 1.0 assuming
all references hit in the primary cache and a clock rate of 500
MHz. The main memory access time is 200 ns. Suppose the
miss rate per instn is 5%. What is the revised CPI? How much
faster will the machine run if we put a secondary cache (with 20-
ns access time) that reduces the miss rate to memory to 2%?
Assume same access time for hit or miss.
A: Miss penalty to main memory = 200 ns = 100 cycles. Total CPI =
Base CPI + Memory-stall cycles per instn. Hence, revised CPI =
1.0 + 5% x 100 = 6.0
When an L2 with 20-ns (10 cycles) access time is put, the miss
rate to memory is reduced to 2%. So, out of 5% L1 miss, L2 hit is
3% and miss is 2%.
The CPI is reduced to 1.0 + 5% ( 10 + 40% x 100) = 3.5. Thus, the
m/c with secondary cache is faster by 6.0/3.5 = 1.7
DAP Spr.‘98 ©UCB 21
Reducing Miss Penalty Summary
• Five techniques
– Read priority over write on miss
– Subblock placement
– Early Restart and Critical Word First on miss
– Non-blocking Caches (Hit under Miss, Miss under Miss)
– Second Level Cache
• Can be applied recursively to Multilevel Caches
– Danger is that time to DRAM will grow with multiple
levels in between
– First attempts at L2 caches can make things worse, since
increased worst case is worse
CPUtime  IC  CPIExecuti on

Memory accesses
Instruction
 Miss rate Miss penalty




 Clock cycle time
DAP Spr.‘98 ©UCB 22
Cache Optimization Summary
Technique MR MP HT Complexity
Larger Block Size + – 0
Higher Associativity + – 1
Victim Caches + 2
Pseudo-Associative Caches + 2
HW Prefetching of Instr/Data + 2
Compiler Controlled Prefetching + 3
Compiler Reduce Misses + 0
Priority to Read Misses + 1
Subblock Placement + + 1
Early Restart & Critical Word 1st + 2
Non-Blocking Caches + 3
Second Level Caches + 2
missratemisspenalty

More Related Content

What's hot

cpu sechduling
cpu sechduling cpu sechduling
cpu sechduling
gopi7
 
Summit2014 riel chegu_w_0340_automatic_numa_balancing_0
Summit2014 riel chegu_w_0340_automatic_numa_balancing_0Summit2014 riel chegu_w_0340_automatic_numa_balancing_0
Summit2014 riel chegu_w_0340_automatic_numa_balancing_0
sprdd
 
Scheduling algorithm (chammu)
Scheduling algorithm (chammu)Scheduling algorithm (chammu)
Scheduling algorithm (chammu)
Nagarajan
 
005 cluster monitoring
005 cluster monitoring005 cluster monitoring
005 cluster monitoring
Scott Miao
 

What's hot (17)

Commit messages - Good practices
Commit messages - Good practicesCommit messages - Good practices
Commit messages - Good practices
 
Os..
Os..Os..
Os..
 
GCMA: Guaranteed Contiguous Memory Allocator
GCMA: Guaranteed Contiguous Memory AllocatorGCMA: Guaranteed Contiguous Memory Allocator
GCMA: Guaranteed Contiguous Memory Allocator
 
Scheduling Algorithms-R.D.Sivakumar
Scheduling Algorithms-R.D.SivakumarScheduling Algorithms-R.D.Sivakumar
Scheduling Algorithms-R.D.Sivakumar
 
Operating Systems Process Scheduling Algorithms
Operating Systems   Process Scheduling AlgorithmsOperating Systems   Process Scheduling Algorithms
Operating Systems Process Scheduling Algorithms
 
cpu scheduling.pdf
cpu scheduling.pdfcpu scheduling.pdf
cpu scheduling.pdf
 
Cpu scheduling
Cpu schedulingCpu scheduling
Cpu scheduling
 
Linux Kernel Memory Model
Linux Kernel Memory ModelLinux Kernel Memory Model
Linux Kernel Memory Model
 
MemGuard: Memory Bandwidth Reservation System for Efficient Performance Isola...
MemGuard: Memory Bandwidth Reservation System for Efficient Performance Isola...MemGuard: Memory Bandwidth Reservation System for Efficient Performance Isola...
MemGuard: Memory Bandwidth Reservation System for Efficient Performance Isola...
 
Ch6 cpu scheduling
Ch6   cpu schedulingCh6   cpu scheduling
Ch6 cpu scheduling
 
Operating Systems 1 (10/12) - Scheduling
Operating Systems 1 (10/12) - SchedulingOperating Systems 1 (10/12) - Scheduling
Operating Systems 1 (10/12) - Scheduling
 
cpu sechduling
cpu sechduling cpu sechduling
cpu sechduling
 
Summit2014 riel chegu_w_0340_automatic_numa_balancing_0
Summit2014 riel chegu_w_0340_automatic_numa_balancing_0Summit2014 riel chegu_w_0340_automatic_numa_balancing_0
Summit2014 riel chegu_w_0340_automatic_numa_balancing_0
 
Scheduling algorithm (chammu)
Scheduling algorithm (chammu)Scheduling algorithm (chammu)
Scheduling algorithm (chammu)
 
005 cluster monitoring
005 cluster monitoring005 cluster monitoring
005 cluster monitoring
 
PerfUG 3 - perfs système
PerfUG 3 - perfs systèmePerfUG 3 - perfs système
PerfUG 3 - perfs système
 
Checkpoint/restart in the userspace
Checkpoint/restart in the userspaceCheckpoint/restart in the userspace
Checkpoint/restart in the userspace
 

Similar to Ways to reduce misses

Memory technology and optimization in Advance Computer Architechture
Memory technology and optimization in Advance Computer ArchitechtureMemory technology and optimization in Advance Computer Architechture
Memory technology and optimization in Advance Computer Architechture
Shweta Ghate
 
memorytechnologyandoptimization-140416131506-phpapp02.pptx
memorytechnologyandoptimization-140416131506-phpapp02.pptxmemorytechnologyandoptimization-140416131506-phpapp02.pptx
memorytechnologyandoptimization-140416131506-phpapp02.pptx
shahdivyanshu1002
 
PPT_on_Cache_Partitioning_Techniques.pdf
PPT_on_Cache_Partitioning_Techniques.pdfPPT_on_Cache_Partitioning_Techniques.pdf
PPT_on_Cache_Partitioning_Techniques.pdf
Gnanavi2
 
Taming Non-blocking Caches to Improve Isolation in Multicore Real-Time Systems
Taming Non-blocking Caches to Improve Isolation in Multicore Real-Time SystemsTaming Non-blocking Caches to Improve Isolation in Multicore Real-Time Systems
Taming Non-blocking Caches to Improve Isolation in Multicore Real-Time Systems
Heechul Yun
 

Similar to Ways to reduce misses (20)

CPU Memory Hierarchy and Caching Techniques
CPU Memory Hierarchy and Caching TechniquesCPU Memory Hierarchy and Caching Techniques
CPU Memory Hierarchy and Caching Techniques
 
Cache optimization
Cache optimizationCache optimization
Cache optimization
 
Unit I Memory technology and optimization
Unit I Memory technology and optimizationUnit I Memory technology and optimization
Unit I Memory technology and optimization
 
Memory technology and optimization in Advance Computer Architechture
Memory technology and optimization in Advance Computer ArchitechtureMemory technology and optimization in Advance Computer Architechture
Memory technology and optimization in Advance Computer Architechture
 
memorytechnologyandoptimization-140416131506-phpapp02.pptx
memorytechnologyandoptimization-140416131506-phpapp02.pptxmemorytechnologyandoptimization-140416131506-phpapp02.pptx
memorytechnologyandoptimization-140416131506-phpapp02.pptx
 
Cache memory
Cache memoryCache memory
Cache memory
 
hierarchical memory technology.pptx
hierarchical memory technology.pptxhierarchical memory technology.pptx
hierarchical memory technology.pptx
 
Lec10 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Memory part2
Lec10 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Memory part2Lec10 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Memory part2
Lec10 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Memory part2
 
cache memory
 cache memory cache memory
cache memory
 
12-6810-12.ppt
12-6810-12.ppt12-6810-12.ppt
12-6810-12.ppt
 
Computer System Architecture Lecture Note 8.1 primary Memory
Computer System Architecture Lecture Note 8.1 primary MemoryComputer System Architecture Lecture Note 8.1 primary Memory
Computer System Architecture Lecture Note 8.1 primary Memory
 
cache memory.ppt
cache memory.pptcache memory.ppt
cache memory.ppt
 
cache memory.ppt
cache memory.pptcache memory.ppt
cache memory.ppt
 
Chap2 slides
Chap2 slidesChap2 slides
Chap2 slides
 
PPT_on_Cache_Partitioning_Techniques.pdf
PPT_on_Cache_Partitioning_Techniques.pdfPPT_on_Cache_Partitioning_Techniques.pdf
PPT_on_Cache_Partitioning_Techniques.pdf
 
Taming Non-blocking Caches to Improve Isolation in Multicore Real-Time Systems
Taming Non-blocking Caches to Improve Isolation in Multicore Real-Time SystemsTaming Non-blocking Caches to Improve Isolation in Multicore Real-Time Systems
Taming Non-blocking Caches to Improve Isolation in Multicore Real-Time Systems
 
Limitations of memory system performance
Limitations of memory system performanceLimitations of memory system performance
Limitations of memory system performance
 
Chapter 5 a
Chapter 5 aChapter 5 a
Chapter 5 a
 
Memory Hierarchy PPT of Computer Organization
Memory Hierarchy PPT of Computer OrganizationMemory Hierarchy PPT of Computer Organization
Memory Hierarchy PPT of Computer Organization
 
Preparing Codes for Intel Knights Landing (KNL)
Preparing Codes for Intel Knights Landing (KNL)Preparing Codes for Intel Knights Landing (KNL)
Preparing Codes for Intel Knights Landing (KNL)
 

Recently uploaded

result management system report for college project
result management system report for college projectresult management system report for college project
result management system report for college project
Tonystark477637
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Christo Ananth
 

Recently uploaded (20)

High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
 
Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptx
 
University management System project report..pdf
University management System project report..pdfUniversity management System project report..pdf
University management System project report..pdf
 
UNIT-III FMM. DIMENSIONAL ANALYSIS
UNIT-III FMM.        DIMENSIONAL ANALYSISUNIT-III FMM.        DIMENSIONAL ANALYSIS
UNIT-III FMM. DIMENSIONAL ANALYSIS
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
MANUFACTURING PROCESS-II UNIT-1 THEORY OF METAL CUTTING
MANUFACTURING PROCESS-II UNIT-1 THEORY OF METAL CUTTINGMANUFACTURING PROCESS-II UNIT-1 THEORY OF METAL CUTTING
MANUFACTURING PROCESS-II UNIT-1 THEORY OF METAL CUTTING
 
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writing
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
 
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
 
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINEDJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptx
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptx
 
result management system report for college project
result management system report for college projectresult management system report for college project
result management system report for college project
 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptx
 
Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)
 
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptxBSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
 
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
 

Ways to reduce misses

  • 1. DAP Spr.‘98 ©UCB 1 Lecture 12: Memory Hierarchy—Ways to Reduce Misses
  • 2. DAP Spr.‘98 ©UCB 2 Review: Four Questions for Memory Hierarchy Designers • Q1: Where can a block be placed in the upper level? (Block placement) – Fully Associative, Set Associative, Direct Mapped • Q2: How is a block found if it is in the upper level? (Block identification) – Tag/Block • Q3: Which block should be replaced on a miss? (Block replacement) – Random, LRU • Q4: What happens on a write? (Write strategy) – Write Back or Write Through (with Write Buffer)
  • 3. DAP Spr.‘98 ©UCB 3 Review: Cache Performance CPUtime = Instruction Count x (CPIexecution + Mem accesses per instruction x Miss rate x Miss penalty) x Clock cycle time Misses per instruction = Memory accesses per instruction x Miss rate CPUtime = IC x (CPIexecution + Misses per instruction x Miss penalty) x Clock cycle time To Improve Cache Performance: 1. Reduce the miss rate 2. Reduce the miss penalty 3. Reduce the time to hit in the cache.
  • 4. DAP Spr.‘98 ©UCB 4 Reducing Misses • Classifying Misses: 3 Cs – Compulsory—The first access to a block is not in the cache, so the block must be brought into the cache. Also called cold start misses or first reference misses. (Misses in even an Infinite Cache) – Capacity—If the cache cannot contain all the blocks needed during execution of a program, capacity misses will occur due to blocks being discarded and later retrieved. (Misses in Fully Associative Size X Cache) – Conflict—If block-placement strategy is set associative or direct mapped, conflict misses (in addition to compulsory & capacity misses) will occur because a block can be discarded and later retrieved if too many blocks map to its set. Also called collision misses or interference misses. (Misses in N-way Associative, Size X Cache)
  • 5. DAP Spr.‘98 ©UCB 5 Cache Size (KB) MissRateperType 0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 1 2 4 8 16 32 64 128 1-way 2-way 4-way 8-way Capacity Compulsory 3Cs Absolute Miss Rate (SPEC92) Conflict Note: Compulsory Miss small
  • 6. DAP Spr.‘98 ©UCB 6 Cache Size (KB) MissRateperType 0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 1 2 4 8 16 32 64 128 1-way 2-way 4-way 8-way Capacity Compulsory 2:1 Cache Rule Conflict miss rate 1-way associative cache size X = miss rate 2-way associative cache size X/2
  • 7. DAP Spr.‘98 ©UCB 7 How Can Reduce Misses? • 3 Cs: Compulsory, Capacity, Conflict • In all cases, assume total cache size not changed: • What happens if: 1) Change Block Size: Which of 3Cs is obviously affected? 2) Change Associativity: Which of 3Cs is obviously affected? 3) Change Compiler: Which of 3Cs is obviously affected?
  • 8. DAP Spr.‘98 ©UCB 8 Block Size (bytes) Miss Rate 0% 5% 10% 15% 20% 25% 16 32 64 128 256 1K 4K 16K 64K 256K 1. Reduce Misses via Larger Block Size
  • 9. DAP Spr.‘98 ©UCB 9 2. Reduce Misses via Higher Associativity • 2:1 Cache Rule: – Miss Rate DM cache size N Miss Rate 2-way cache size N/2 • Beware: Execution time is only final measure! – Will Clock Cycle time increase? – Hill [1988] suggested hit time for 2-way vs. 1-way external cache +10%, internal + 2%
  • 10. DAP Spr.‘98 ©UCB 10 Example: Avg. Memory Access Time vs. Miss Rate • Example: assume CCT = 1.10 for 2-way, 1.12 for 4-way, 1.14 for 8-way vs. CCT direct mapped Cache Size Associativity (KB) 1-way 2-way 4-way 8-way 1 2.33 2.15 2.07 2.01 2 1.98 1.86 1.76 1.68 4 1.72 1.67 1.61 1.53 8 1.46 1.48 1.47 1.43 16 1.29 1.32 1.32 1.32 32 1.20 1.24 1.25 1.27 64 1.14 1.20 1.21 1.23 128 1.10 1.17 1.18 1.20 (Red means A.M.A.T. not improved by more associativity)
  • 11. DAP Spr.‘98 ©UCB 11 3. Reducing Misses via a “Victim Cache” • How to combine fast hit time of direct mapped yet still avoid conflict misses? • Add buffer to place data discarded from cache • Jouppi [1990]: 4-entry victim cache removed 20% to 95% of conflicts for a 4 KB direct mapped data cache • Used in Alpha, HP machines
  • 12. DAP Spr.‘98 ©UCB 12 5. Reducing Misses by Prefetching of Instructions & Data • Instruction prefetching – Sequentially prefetch instructions from IM to the instruction Queue (IQ) together with branch prediction – All computers employ this. • Data prefetching – Difficult to predict data that will be used in future. Following questions must be answered. 1. What to prefetch? – How to know which data will be used? Unnecessary prefetches will waste memory/bus bandwidth and will replace useful data in the cache (cache pollution problem) giving rise to negative impact on the execution time. 2. When to prefetch? – Must be early enough for the data to be useful, but too early will cause cache pollution problem.
  • 13. DAP Spr.‘98 ©UCB 13 Data Prefetching • Software Prefetching – Explicit instructions to prefetch data are inserted in the program. Difficult to decide where to put in the program. Needs good compiler analysis. Some computers already have prefetch intructions. Examples are: -- Load data into register (HP PA-RISC loads) – Cache Prefetch: load into cache (MIPS IV, PowerPC, SPARC v. 9) • Hardware Prefetching – Difficult to predict and design. Different results for different applications
  • 14. DAP Spr.‘98 ©UCB 14 5. Reducing Cache Pollution • E.g., Instruction Prefetching – Alpha 21064 fetches 2 blocks on a miss – Extra block placed in “stream buffer” – On miss check stream buffer • Works with data blocks too: – Jouppi [1990] 1 data stream buffer got 25% misses from 4KB cache; 4 streams got 43% – Palacharla & Kessler [1994] for scientific programs for 8 streams got 50% to 70% of misses from 2 64KB, 4-way set associative caches • Prefetching relies on having extra memory bandwidth that can be used without penalty
  • 15. DAP Spr.‘98 ©UCB 15 Summary • 3 Cs: Compulsory, Capacity, Conflict Misses • Reducing Miss Rate 1. Reduce Misses via Larger Block Size 2. Reduce Misses via Higher Associativity 3. Reducing Misses via Victim Cache 4. 5. Reducing Misses by HW Prefetching Instr, Data 6. Reducing Misses by SW Prefetching Data 7. Reducing Misses by Compiler Optimizations • Remember danger of concentrating on just one parameter when evaluating performance CPUtime  IC  CPIExecuti on  Memory accesses Instruction  Miss rate Miss penalty      Clock cycle time
  • 16. DAP Spr.‘98 ©UCB 16 Review: Improving Cache Performance 1. Reduce the miss rate, 2. Reduce the miss penalty, or 3. Reduce the time to hit in the cache.
  • 17. DAP Spr.‘98 ©UCB 17 1. Reducing Miss Penalty: Read Priority over Write on Miss • Write through with write buffers offer RAW conflicts with main memory reads on cache misses • If simply wait for write buffer to empty, might increase read miss penalty (old MIPS 1000 by 50% ) • Check write buffer contents before read; if no conflicts, let the memory access continue • Write Back? – Read miss replacing dirty block – Normal: Write dirty block to memory, and then do the read – Instead copy the dirty block to a write buffer, then do the read, and then do the write – CPU stall less since restarts as soon as do read
  • 18. DAP Spr.‘98 ©UCB 18 4. Reduce Miss Penalty: Non-blocking Caches to reduce stalls on misses • Non-blocking cache or lockup-free cache allow data cache to continue to supply cache hits during a miss – requires out-of-order execution CPU • “hit under multiple miss” or “miss under miss” may further lower the effective miss penalty by overlapping multiple misses – Significantly increases the complexity of the cache controller as there can be multiple outstanding memory accesses – Requires multiple memory banks (otherwise cannot support) – Pentium Pro allows 4 outstanding memory misses  The technique requires use of a few miss status holding registers (MSHRs) to hold the outstanding memory requests.
  • 19. DAP Spr.‘98 ©UCB 19 5th Miss Penalty Reduction: Second Level Cache • L2 Equations AMAT = Hit TimeL1 + Miss RateL1 x Miss PenaltyL1 Miss PenaltyL1 = Hit TimeL2 + Miss RateL2 x Miss PenaltyL2 AMAT = Hit TimeL1 + Miss RateL1 x (Hit TimeL2 + Miss RateL2 + Miss PenaltyL2) • Definitions: – Local miss rate— misses in this cache divided by the total number of memory accesses to this cache (Miss rateL2) – Global miss rate—misses in this cache divided by the total number of memory accesses generated by the CPU (Miss RateL1 x Miss RateL2) – Global Miss Rate is what matters
  • 20. DAP Spr.‘98 ©UCB 20 An Example (pp. 576) Q: Suppose we have a processor with a base CPI of 1.0 assuming all references hit in the primary cache and a clock rate of 500 MHz. The main memory access time is 200 ns. Suppose the miss rate per instn is 5%. What is the revised CPI? How much faster will the machine run if we put a secondary cache (with 20- ns access time) that reduces the miss rate to memory to 2%? Assume same access time for hit or miss. A: Miss penalty to main memory = 200 ns = 100 cycles. Total CPI = Base CPI + Memory-stall cycles per instn. Hence, revised CPI = 1.0 + 5% x 100 = 6.0 When an L2 with 20-ns (10 cycles) access time is put, the miss rate to memory is reduced to 2%. So, out of 5% L1 miss, L2 hit is 3% and miss is 2%. The CPI is reduced to 1.0 + 5% ( 10 + 40% x 100) = 3.5. Thus, the m/c with secondary cache is faster by 6.0/3.5 = 1.7
  • 21. DAP Spr.‘98 ©UCB 21 Reducing Miss Penalty Summary • Five techniques – Read priority over write on miss – Subblock placement – Early Restart and Critical Word First on miss – Non-blocking Caches (Hit under Miss, Miss under Miss) – Second Level Cache • Can be applied recursively to Multilevel Caches – Danger is that time to DRAM will grow with multiple levels in between – First attempts at L2 caches can make things worse, since increased worst case is worse CPUtime  IC  CPIExecuti on  Memory accesses Instruction  Miss rate Miss penalty      Clock cycle time
  • 22. DAP Spr.‘98 ©UCB 22 Cache Optimization Summary Technique MR MP HT Complexity Larger Block Size + – 0 Higher Associativity + – 1 Victim Caches + 2 Pseudo-Associative Caches + 2 HW Prefetching of Instr/Data + 2 Compiler Controlled Prefetching + 3 Compiler Reduce Misses + 0 Priority to Read Misses + 1 Subblock Placement + + 1 Early Restart & Critical Word 1st + 2 Non-Blocking Caches + 3 Second Level Caches + 2 missratemisspenalty