SlideShare a Scribd company logo
1 of 22
Download to read offline
1
Multicolumn Cache: aiming for both
high hit rate and low latency
Xiaodong Zhang
CSE 3421 additional notes for Memory Systems
CSE3421 Slide 2
 Cache design is memory-address centric
 The default length is 32 bits, each memory address is unique and
points to a byte (byte addressable)
 Each memory address connects to the first byte of the content, and
its default length is a word (4 Bytes or 32 bits)
 Cache is a small storage inclusively built in the CPU chip
 It contains copies of the data from memory (and disks)
 A cache-block storage can be multiple-words long
 Besides data storage, each block needs tags V bit, and others
 For a given memory address, CPU knows where it is
 By direct, set-associative or fully associative mapping
 How do handle write hit, write miss, read hit, and read miss?
 How to address the conflicts between high hit rates and low latency?
Basic concepts of hardware cache
CSE3421 Slide 3
Three steps for a cache miss:
 Sending a memory address (bus)
 Loading data from memory (no bus)
 Transferring data to CPU (bus)
Connections between CPU-Caches and DRAM memory
CSE3421 Slide 6
Trade-offs between High Hit Ratios and Low Latency
 Set- associative cache achieves high hit-ratios: 30% higher than
that of direct-mapped cache.
 But it suffers high access times due to
 Multiplexing logic delay during the selection.
 Tag checking, selection, and data dispatching are sequential.
 Direct-mapped cache loads data and checks tag in parallel:
minimizing the access time.
 High power consumption is another concern for set associative
cache
 Can we get both high hit ratios and low latency in low power?
Best Case of Way Prediction: First Hit
Cost: way prediction only
tag set offset
tag
way0
data
way1 way2 way3
Way-prediction
=?
Mux 4:1 To CPU
Way Prediction: Non-first Hit (in Set)
Cost: way prediction + selection in set
tag set offset
tag
way0
data
way1 way2 way3
Way-prediction
=?
Mux 4:1
≠ To CPU
Worst Case of Way-prediction: Miss
Cost: way prediction + selection in set + miss
tag set offset
tag
way0
data
way1 way2 way3
Way-prediction
=?
Mux 4:1
≠
≠
Lower level Memory
To CPU
Make a low-latency and high hit-rate cache to be a reality
 A write to a set-associative cache
1. Map to the cache set
2. Find an empty way (block) to write
3. Or write to the least recent used (LRU) way (block)
 A read in a set-associative cache
1. Map to the cache set
2. Compare with the tags, then select the way if matches
3. Otherwise, serve the cache miss by going to memory
 Why a way-prediction is hard?
 We do not know which way the cache block is located
 Can we enforce which way to go in both write and read?
Basic Ideas
 Conventional Set Associative Cache
 Mapping to a set of multiple ways
 find a way (LRU) to write
 search all ways for a read
 Latency comes from “finding” or “searching” among ways
 Can we decisively or directly find a way?
 If so, the search would be direct to the way
 How do we make this happen?
 Multi-column cache
Multi-column Caches: Fast and High Hit Ratio
 Zhang, et. al., IEEE Micro, 1997.
 Objectives:
 Each first hit is equivalent to a direct-mapped access.
 Maximize the number of first hits.
 Minimize the latency of non-first-hits.
 Additional hardware should be simple and low cost.
CSE3421 Slide 17
Range of Set Associative Caches
 For a fixed size cache, each time the associativity increases it
doubles the number of blocks per set (or the number of ways)
 halves the number of sets – decreases the set index by 1 bit
 increases the size of the tag by 1 bit
 Note: word offset is for multiple words in a block. word offset + byte offset = long cache block
Byte offset
Set Index
Tag
Decreasing associativity
Fully associative
(only one set)
Tag is all the bits except
block and byte offset
Direct mapped
(only one way)
Smaller tags, only a
single comparator
Increasing associativity
Selects the set
Used for tag to compare Cache Block Size
Multi-Column Caches: Major Location
 The bits given to tag and ``set bits” generate a
direct-mapped location: Major Location.
 A major location mapping = direct-mapping.
Set
Tag Offset
Address
Between direct-map and set-
associative, there is a set of
unused bits for tags.
Let’s use the bits direct-mapping
to the block (way) in the set
CSE3421 Slide 19
Comparisons of Three Caches
Implicit direct mapping in set associative cache
 Standard set associative cache only maps to a set
 Multicolumn cache uses the full direct map bits to
write and read in a set associative cache
 Set index remains the same
 The bits given to the tag from direct map are used for
which way in the set
 These two sets of bits are available and free
 Major location contains a most recent used (MRU)
block either loaded from memory or just accessed.
Multi-column Caches: Selected Locations
 Multiple blocks can be direct-mapped to the same
major location, but only MRU is the major.
 The non-MRU blocks are stored in other empty
locations in the set: Selected Locations.
 If ``other locations” are used for their own major
locations, there will be no space for selected ones.
 Swap:
 A block in selected location is swapped to major location
as it becomes MRU.
 A block in major location is swapped to a selected
location after a new block is loaded in it from memory.
Multi-Column: Indexing Selected Locations
 The selected locations associated with its
major location are indexed for a fast search.
Location 0
Location 1
Location 2
Location 3
0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0
3 2 1 0 3 2 1 0 3 2 1 0 3 2 1 0
Bit Vector 3
(way3)
Bit Vector 2
(way 2)
Bit Vector 1
(way 1)
Bit Vector 0
(way 0)
Major Location 1 has two
selected locations at 0 and 2
Summary of Multi-column Caches
 A major location in a set is the direct-mapped
location of MRU.
 A selected location is the direct-mapped
location but non-MRU.
 A selected location index is maintained for
each major location.
 A ``swap” is used to ensure the block in the
major location is always MRU.
Multi-column Cache Operations
10
0001
Reference 1
Way 0 Way 1 Way 2 Way 3
0000 1101 0111 1011
0000 0000 0000 0010
Selected location
Not at major location !
1101
10
1011
Reference 2
0000
No selected location!
0001
Place 0001 at the
major location !
0000 0001 0111 1011
1011
First Hit!
Performance Summary of Multi-column Caches
 Hit ratio to the major locations is about 90%.
 The hit ratio is higher than that of direct-mapped cache due
to high associativity while keeps low access time of direct-
mapped cache in average.
 First-hit is equivalent to direct-mapped.
 Non-first hits are faster than set-associative caches.
 Outperforming set associative caches
Comparing First-hit Ratios between
Multicolumn and MRU Caches
64KB Data Cache Hit Rate (Program mgrid)
0%
20%
40%
60%
80%
100%
120%
4-way 8-way 16-way
Hit
Rate
overall
MRU first-hit
Multicolumn first-
hit
Note: What is an MRU cache?
An MRU bit map is used for a set associative cache, (1 bit for way, 2 bits for 4-way,
…), which remembers the most recent used way as the prediction of the next access.
This bit map is used for the way prediction.
Multi-column Technique is Critical for
Low Power Caches by targeting specific ways
0.1
1
10
100
1000
10000
8
5
8
7
8
9
9
1
9
3
9
5
9
7
9
9
0
1
0
3
Frequency
(MHz)
Transistor
(M)
Power
(W)
Frequency
Transistor
Power
Source: Intel.com
80386
80486 Pentium
Pentium Pro
Pentium II
Pentium III
Pentium 4
The impact of multicolumn caches
 The concept of multicolumn caches has influenced
the R&D in the set-associative cache products:
 A tag-less cache
 K-way direct mapped cache
 The direct-to-data cache
 Highly associative caches for low power processors
 A way memorization cache
 Way selection based on tag bits
 In ARM Cortex R chip, cache mapping “directly access
one of the ways in a set”

More Related Content

Similar to 9.1-CSE3421-multicolumn-cache.pdf (20)

cache memory
 cache memory cache memory
cache memory
 
04 Cache Memory
04  Cache  Memory04  Cache  Memory
04 Cache Memory
 
Cache memory
Cache memoryCache memory
Cache memory
 
hierarchical memory technology.pptx
hierarchical memory technology.pptxhierarchical memory technology.pptx
hierarchical memory technology.pptx
 
cache memory
cache memorycache memory
cache memory
 
waserdtfgfiuerhiuerwehfiuerghzsdfghyguhijdrtyunit5.pptx
waserdtfgfiuerhiuerwehfiuerghzsdfghyguhijdrtyunit5.pptxwaserdtfgfiuerhiuerwehfiuerghzsdfghyguhijdrtyunit5.pptx
waserdtfgfiuerhiuerwehfiuerghzsdfghyguhijdrtyunit5.pptx
 
memory.ppt
memory.pptmemory.ppt
memory.ppt
 
memory.ppt
memory.pptmemory.ppt
memory.ppt
 
computer-memory
computer-memorycomputer-memory
computer-memory
 
Cache recap
Cache recapCache recap
Cache recap
 
Cache recap
Cache recapCache recap
Cache recap
 
Cache recap
Cache recapCache recap
Cache recap
 
Cache recap
Cache recapCache recap
Cache recap
 
Cache recap
Cache recapCache recap
Cache recap
 
Cache recap
Cache recapCache recap
Cache recap
 
Cache recap
Cache recapCache recap
Cache recap
 
Elements of cache design
Elements of cache designElements of cache design
Elements of cache design
 
Cache memory
Cache memoryCache memory
Cache memory
 
Cache memory
Cache memoryCache memory
Cache memory
 
Architecture Assignment Help
Architecture Assignment HelpArchitecture Assignment Help
Architecture Assignment Help
 

Recently uploaded

(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...ranjana rawat
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations120cr0395
 
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfKamal Acharya
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)simmis5
 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Bookingdharasingh5698
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Dr.Costas Sachpazis
 
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxupamatechverse
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Dr.Costas Sachpazis
 
Russian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur Escorts
Russian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur EscortsRussian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur Escorts
Russian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
result management system report for college project
result management system report for college projectresult management system report for college project
result management system report for college projectTonystark477637
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSSIVASHANKAR N
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escortsranjana rawat
 
MANUFACTURING PROCESS-II UNIT-1 THEORY OF METAL CUTTING
MANUFACTURING PROCESS-II UNIT-1 THEORY OF METAL CUTTINGMANUFACTURING PROCESS-II UNIT-1 THEORY OF METAL CUTTING
MANUFACTURING PROCESS-II UNIT-1 THEORY OF METAL CUTTINGSIVASHANKAR N
 
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingUNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingrknatarajan
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 

Recently uploaded (20)

(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations
 
Roadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and RoutesRoadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and Routes
 
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)
 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
 
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptx
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
 
Russian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur Escorts
Russian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur EscortsRussian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur Escorts
Russian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur Escorts
 
result management system report for college project
result management system report for college projectresult management system report for college project
result management system report for college project
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
 
MANUFACTURING PROCESS-II UNIT-1 THEORY OF METAL CUTTING
MANUFACTURING PROCESS-II UNIT-1 THEORY OF METAL CUTTINGMANUFACTURING PROCESS-II UNIT-1 THEORY OF METAL CUTTING
MANUFACTURING PROCESS-II UNIT-1 THEORY OF METAL CUTTING
 
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingUNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
 

9.1-CSE3421-multicolumn-cache.pdf

  • 1. 1 Multicolumn Cache: aiming for both high hit rate and low latency Xiaodong Zhang CSE 3421 additional notes for Memory Systems
  • 2. CSE3421 Slide 2  Cache design is memory-address centric  The default length is 32 bits, each memory address is unique and points to a byte (byte addressable)  Each memory address connects to the first byte of the content, and its default length is a word (4 Bytes or 32 bits)  Cache is a small storage inclusively built in the CPU chip  It contains copies of the data from memory (and disks)  A cache-block storage can be multiple-words long  Besides data storage, each block needs tags V bit, and others  For a given memory address, CPU knows where it is  By direct, set-associative or fully associative mapping  How do handle write hit, write miss, read hit, and read miss?  How to address the conflicts between high hit rates and low latency? Basic concepts of hardware cache
  • 3. CSE3421 Slide 3 Three steps for a cache miss:  Sending a memory address (bus)  Loading data from memory (no bus)  Transferring data to CPU (bus) Connections between CPU-Caches and DRAM memory
  • 4. CSE3421 Slide 6 Trade-offs between High Hit Ratios and Low Latency  Set- associative cache achieves high hit-ratios: 30% higher than that of direct-mapped cache.  But it suffers high access times due to  Multiplexing logic delay during the selection.  Tag checking, selection, and data dispatching are sequential.  Direct-mapped cache loads data and checks tag in parallel: minimizing the access time.  High power consumption is another concern for set associative cache  Can we get both high hit ratios and low latency in low power?
  • 5. Best Case of Way Prediction: First Hit Cost: way prediction only tag set offset tag way0 data way1 way2 way3 Way-prediction =? Mux 4:1 To CPU
  • 6. Way Prediction: Non-first Hit (in Set) Cost: way prediction + selection in set tag set offset tag way0 data way1 way2 way3 Way-prediction =? Mux 4:1 ≠ To CPU
  • 7. Worst Case of Way-prediction: Miss Cost: way prediction + selection in set + miss tag set offset tag way0 data way1 way2 way3 Way-prediction =? Mux 4:1 ≠ ≠ Lower level Memory To CPU
  • 8. Make a low-latency and high hit-rate cache to be a reality  A write to a set-associative cache 1. Map to the cache set 2. Find an empty way (block) to write 3. Or write to the least recent used (LRU) way (block)  A read in a set-associative cache 1. Map to the cache set 2. Compare with the tags, then select the way if matches 3. Otherwise, serve the cache miss by going to memory  Why a way-prediction is hard?  We do not know which way the cache block is located  Can we enforce which way to go in both write and read?
  • 9. Basic Ideas  Conventional Set Associative Cache  Mapping to a set of multiple ways  find a way (LRU) to write  search all ways for a read  Latency comes from “finding” or “searching” among ways  Can we decisively or directly find a way?  If so, the search would be direct to the way  How do we make this happen?  Multi-column cache
  • 10. Multi-column Caches: Fast and High Hit Ratio  Zhang, et. al., IEEE Micro, 1997.  Objectives:  Each first hit is equivalent to a direct-mapped access.  Maximize the number of first hits.  Minimize the latency of non-first-hits.  Additional hardware should be simple and low cost.
  • 11. CSE3421 Slide 17 Range of Set Associative Caches  For a fixed size cache, each time the associativity increases it doubles the number of blocks per set (or the number of ways)  halves the number of sets – decreases the set index by 1 bit  increases the size of the tag by 1 bit  Note: word offset is for multiple words in a block. word offset + byte offset = long cache block Byte offset Set Index Tag Decreasing associativity Fully associative (only one set) Tag is all the bits except block and byte offset Direct mapped (only one way) Smaller tags, only a single comparator Increasing associativity Selects the set Used for tag to compare Cache Block Size
  • 12. Multi-Column Caches: Major Location  The bits given to tag and ``set bits” generate a direct-mapped location: Major Location.  A major location mapping = direct-mapping. Set Tag Offset Address Between direct-map and set- associative, there is a set of unused bits for tags. Let’s use the bits direct-mapping to the block (way) in the set
  • 13. CSE3421 Slide 19 Comparisons of Three Caches
  • 14. Implicit direct mapping in set associative cache  Standard set associative cache only maps to a set  Multicolumn cache uses the full direct map bits to write and read in a set associative cache  Set index remains the same  The bits given to the tag from direct map are used for which way in the set  These two sets of bits are available and free  Major location contains a most recent used (MRU) block either loaded from memory or just accessed.
  • 15. Multi-column Caches: Selected Locations  Multiple blocks can be direct-mapped to the same major location, but only MRU is the major.  The non-MRU blocks are stored in other empty locations in the set: Selected Locations.  If ``other locations” are used for their own major locations, there will be no space for selected ones.  Swap:  A block in selected location is swapped to major location as it becomes MRU.  A block in major location is swapped to a selected location after a new block is loaded in it from memory.
  • 16. Multi-Column: Indexing Selected Locations  The selected locations associated with its major location are indexed for a fast search. Location 0 Location 1 Location 2 Location 3 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 3 2 1 0 3 2 1 0 3 2 1 0 3 2 1 0 Bit Vector 3 (way3) Bit Vector 2 (way 2) Bit Vector 1 (way 1) Bit Vector 0 (way 0) Major Location 1 has two selected locations at 0 and 2
  • 17. Summary of Multi-column Caches  A major location in a set is the direct-mapped location of MRU.  A selected location is the direct-mapped location but non-MRU.  A selected location index is maintained for each major location.  A ``swap” is used to ensure the block in the major location is always MRU.
  • 18. Multi-column Cache Operations 10 0001 Reference 1 Way 0 Way 1 Way 2 Way 3 0000 1101 0111 1011 0000 0000 0000 0010 Selected location Not at major location ! 1101 10 1011 Reference 2 0000 No selected location! 0001 Place 0001 at the major location ! 0000 0001 0111 1011 1011 First Hit!
  • 19. Performance Summary of Multi-column Caches  Hit ratio to the major locations is about 90%.  The hit ratio is higher than that of direct-mapped cache due to high associativity while keeps low access time of direct- mapped cache in average.  First-hit is equivalent to direct-mapped.  Non-first hits are faster than set-associative caches.  Outperforming set associative caches
  • 20. Comparing First-hit Ratios between Multicolumn and MRU Caches 64KB Data Cache Hit Rate (Program mgrid) 0% 20% 40% 60% 80% 100% 120% 4-way 8-way 16-way Hit Rate overall MRU first-hit Multicolumn first- hit Note: What is an MRU cache? An MRU bit map is used for a set associative cache, (1 bit for way, 2 bits for 4-way, …), which remembers the most recent used way as the prediction of the next access. This bit map is used for the way prediction.
  • 21. Multi-column Technique is Critical for Low Power Caches by targeting specific ways 0.1 1 10 100 1000 10000 8 5 8 7 8 9 9 1 9 3 9 5 9 7 9 9 0 1 0 3 Frequency (MHz) Transistor (M) Power (W) Frequency Transistor Power Source: Intel.com 80386 80486 Pentium Pentium Pro Pentium II Pentium III Pentium 4
  • 22. The impact of multicolumn caches  The concept of multicolumn caches has influenced the R&D in the set-associative cache products:  A tag-less cache  K-way direct mapped cache  The direct-to-data cache  Highly associative caches for low power processors  A way memorization cache  Way selection based on tag bits  In ARM Cortex R chip, cache mapping “directly access one of the ways in a set”