SlideShare a Scribd company logo
1 of 32
Viacheslav Fedorov, Sheng Qiu,
Narasimha Reddy, Paul Gratz
Texas A&M University
ARI:
Adaptive Replacement and Insertion
HiPEAC 2013, Vienna, Austria
Conventional Main Memory
● Usually we only care about
speeding up the cache miss path
Main Memory
Core 0
Core 1
Core 2
Core 3
L3$
L2$
L2$
Main Memory: Trends
● New Memories emerging
● DRAM not dense enough
● Replace or augment DRAM
DRAM
Core 0
Core 1
Core 2
Core 3
L3$
L2$
L2$
DRAM
PCM
DRAM
cache
PCM Technology
● Based on Chalcogenide glass
● Exploits two phases
● Amorphous
● Chrystalline
● Higher density than DRAM
● Non-volatile
Image: Stanford NanoHeat Lab
DRAM vs PCM
● DRAM is writeback-agnostic
● Write Buffers cushion the impact of writebacks
● State-of-the-art policies target cache misses
● PCM
● High write latency – Write Buffers insufficient
● High write energy – Mobile, embedded devices ?
●
Low cell endurance – Limited write cycles ?
Parameter DRAM PCM
Row Read 210 mW 78 mW
Row Write 195 mW 773 mW
Activate 75 mW 25 mW
Standby 90 mW 45 mW
Refresh 4 mW 0 mW
Initial Row Read 15 ns 28 ns
Row Write 22 ns 150 ns
Same Row R/W 15 ns 15 ns
0.3x
4x
0.3x
0.5x
7x
2x
0x
Outline
● Introduction
● Motivation
● ARI: Adaptive Replacement and Insertion
● Evaluation
● Summary
● Conclusion
Motivation
● PCM is attractive as a Main Memory, but...
● PCM does not favor writes
● High energy
● High latency
● Low write cycle tolerance
● Solution: reduce writes into Main Memory
● Modify LLC policies to reduce Writebacks
● Mind the Miss rate!
Application behavior in
High-Associativity Caches
● Bi-Polar block distribution due to LRU policy
● 'Hot' blocks tend to group towards MRU side
● 'Cold' blocks towards LRU side in a set
● Hot blocks have higher Hit-ratio
● Cold blocks tend to have similar Hit-ratios
%hitrate
Position in LRU stackMRU LRU
'Hot' region 'Cold' region
Hit distribution in a high-associativity cache (16-way)
Static LLC policies
● Based on the observed hot-cold distribution
● 16-way cache: 16 static policies, xH16
● Replace any clean block in (16-x) Low-hit blocks
● Drawbacks:
● No single static policy good for all applications
● Less writebacks => more cache misses
– When replacing hot blocks
Enter ARI:
Adaptive Replacement and Insertion
●
Goal: Reduce LLC writebacks !
● Keep miss rate lower than conventional policies
● How?
● Do not replace dirty cache blocks (as long as possible)
● Place fresh incoming blocks into LLC smartly
● Dynamically choose the best policy
ARI: Operation
● Evict clean blocks from Low-Hit region
● Insert new blocks into top of Low-Hit region
%hitrate
Position in LRU stackMRU LRU
High-Hit region
Low-Hit region
ARI: Operation
● Application hit-distributions are not static
● Dynamic policy adaptation based on epochs
● Emulate various static thresholds in LLC tags
● Pick the best one for next epoch (25k LLC accesses)
● Misses + Writebacks metric used
%hitrate
MRU LRU
Core 0
Core 1
Core 2
Core 3
L3$
L2$
L2$
ARI: Implementation
● Emulate static thresholds in shadow tags
● Adapt to the hit-distribution
Tag Array Data ArrayShadow Tag Array
dynamically
4H16 10H16 14H16
Outline
● Introduction
● Motivation
● ARI: Adaptive Replacement and Insertion
● Evaluation
● Summary
● Conclusion
Methodology
● gem5 + DRAMSim2 simulators
● nVidia Tegra -like out-of-order, dual-issue CPU
● SPEC2006 and PARSEC suites
● Compared against state-of-the-art policies
● ARI beats them in writeback reduction
● Nearly identical in total performance
System Single core Multicore
L1 cache 32KB I + 64KB D, 2-way, LRU, 64B block 32KB I + 64KB D, 2-way, LRU, 64B block
L2 cache 256KB, 8-way, LRU, 64B block 256KB, 8-way, LRU, 64B block (private)
L3 cache 2MB, 16-way, LRU, 64B block 16MB, 16-way, LRU, 64B block (shared)
Main memory 4GB, DDR3-1333 DRAM, 32-entry write buffer 4GB, DDR3-1333 DRAM, 32-entry write buffer
ARI: Writeback reduction
● ARI beats the competition: 33% WB reduction
Writeback improvement, normalized to LRU policy
DIP: M. Qureshi et al, ISCA '09
DBLK: S. Khan et al, MICRO '10
RRIP: A. Jaleel et al, ISCA '10
ARI: Miss reduction
● ARI achieves 4.7% Misses reduction
Miss rate improvement, normalized to LRU policy
DIP: M. Qureshi et al, ISCA '09
DBLK: S. Khan et al, MICRO '10
RRIP: A. Jaleel et al, ISCA '10
ARI: Performance improvement
● ARI yields a 5% IPC improvement on average
IPC improvement, normalized to LRU policy
ARI: Dynamic behavior
● ARI adapts to program phases
● Achieves lower WBs than the best static policy
Soplex application, SPEC 2006mcf application, SPEC 2006
Writebacks
ARI: Multicore applications
ARI: PCM lifetime improvement
● ARI facilitates the use of PCM as Main Memory
DIP DBLK RRIP ARI
0%
10%
20%
30%
40%
50%
60%
%PCMlifetimeimprovement
Decrease lifetime
for several apps
ARI: PCM lifetime improvement
ARI: Hardware overhead
● 8 sets shadowed per LLC bank (x8)
● p*2 shadow tags (we use p=9)
● 14kB storage overhead in a 16MB LLC
● Epoch counter – 15 bits
● Performance counters, adders
● Not on critical path
● Can be designed for low power
Outline
● Introduction
● Motivation
● ARI: Adaptive Replacement and Insertion
● Evaluation
● Summary
● Conclusion
ARI: Summary
● 33% writeback reduction
● 4.7% cache miss rate reduction
● 9% less Main Memory traffic
● System IPC boost of 5%
● Enabling PCM as Main Memory
● 50% lifetime improvement
Win – Win
Conclusion
● DRAM is hitting a scalability wall
● New memories/architectures proposed
● We target PCM as main memory
● Propose ARI: Adaptive Replacement and
Insertion
● Simple scheme
● Reduce writebacks to main memory
● Boost the PCM performance and lifetime
Thank you!
Questions?..
Backup Slides
Related Work: PCM
G. Dhiman et al.
PDRAM: A hybrid PRAM and DRAM main memory system. DAC ’09
M. K. Qureshi et al.
Enhancing Lifetime and Security of PCM-based Main Memory with
Start-Gap Wear Leveling. MICRO ’09
B. C. Lee et al.
Architecting Phase Change Memory as a Scalable
DRAM Alternative. ISCA ’09
M. K. Qureshi et al.
Scalable high performance main memory system using
phase-change memory technology. ISCA ’09
A. P. Ferreira et al.
Increasing PCM main memory lifetime. DATE ’10
Related Work: PCM
N. H. Seong et al.
Security refresh: prevent malicious wear-out and increase durability
for phase-change memory with dynamically randomized address mapping.
ISCA ’10
H. Yoon et al.
Row buffer locality aware caching policies for hybrid memories. ICCD ’12
Stuecheli et al.
The Virtual Write Queue: Coordinating DRAM and
Last-Level Cache Policies. ISCA ’10
M. K. Qureshi & G. H. Loh
Fundamental latency trade-off in architecting dram caches:
Outperforming impractical SRAM-tags with a simple and practical design.
MICRO ’12
ARI: Insertion impact
ARI: Total Memory Traffic
gcc
bzip
bwaves
mcf
milc
zeus
gromacs
cactusADMleslie3d
namd
gobmk
soplex
hmmer
sjeng
GemsFDTDh264ref
astar
sphinx3
avg
0.6
0.7
0.8
0.9
1.0
1.1
1.2
1.3
1.4
Total memory traffic, Misses + Writebacks. Normalized to LRU
4H16
ARI
TotaltrafficnormalizedtoLRU

More Related Content

What's hot

DB Latency Using DRAM + PMem in App Direct & Memory Modes
DB Latency Using DRAM + PMem in App Direct & Memory ModesDB Latency Using DRAM + PMem in App Direct & Memory Modes
DB Latency Using DRAM + PMem in App Direct & Memory ModesScyllaDB
 
G1: To Infinity and Beyond
G1: To Infinity and BeyondG1: To Infinity and Beyond
G1: To Infinity and BeyondScyllaDB
 
Apache HBase, Accelerated: In-Memory Flush and Compaction
Apache HBase, Accelerated: In-Memory Flush and Compaction Apache HBase, Accelerated: In-Memory Flush and Compaction
Apache HBase, Accelerated: In-Memory Flush and Compaction HBaseCon
 
Seastore: Next Generation Backing Store for Ceph
Seastore: Next Generation Backing Store for CephSeastore: Next Generation Backing Store for Ceph
Seastore: Next Generation Backing Store for CephScyllaDB
 
hbaseconasia2017: Apache HBase at Netease
hbaseconasia2017: Apache HBase at Neteasehbaseconasia2017: Apache HBase at Netease
hbaseconasia2017: Apache HBase at NeteaseHBaseCon
 
Breaking the Sound Barrier with Persistent Memory
Breaking the Sound Barrier with Persistent Memory Breaking the Sound Barrier with Persistent Memory
Breaking the Sound Barrier with Persistent Memory HBaseCon
 
Keynote: Apache HBase at Yahoo! Scale
Keynote: Apache HBase at Yahoo! ScaleKeynote: Apache HBase at Yahoo! Scale
Keynote: Apache HBase at Yahoo! ScaleHBaseCon
 
HBaseCon2017 gohbase: Pure Go HBase Client
HBaseCon2017 gohbase: Pure Go HBase ClientHBaseCon2017 gohbase: Pure Go HBase Client
HBaseCon2017 gohbase: Pure Go HBase ClientHBaseCon
 
Scaling Apache Pulsar to 10 Petabytes/Day
Scaling Apache Pulsar to 10 Petabytes/DayScaling Apache Pulsar to 10 Petabytes/Day
Scaling Apache Pulsar to 10 Petabytes/DayScyllaDB
 
hbaseconasia2017: Large scale data near-line loading method and architecture
hbaseconasia2017: Large scale data near-line loading method and architecturehbaseconasia2017: Large scale data near-line loading method and architecture
hbaseconasia2017: Large scale data near-line loading method and architectureHBaseCon
 
Symfony e grandi numeri: si può fare!
Symfony e grandi numeri: si può fare!Symfony e grandi numeri: si può fare!
Symfony e grandi numeri: si può fare!Daniel Londero
 
Efficient Memory Mapped File I/O for In-Memory File Systems (HotStorage '17)
Efficient Memory Mapped File I/O for In-Memory File Systems (HotStorage '17)Efficient Memory Mapped File I/O for In-Memory File Systems (HotStorage '17)
Efficient Memory Mapped File I/O for In-Memory File Systems (HotStorage '17)Jungsik Choi
 
Cassandra drivers
Cassandra driversCassandra drivers
Cassandra driversTyler Hobbs
 
HBase Application Performance Improvement
HBase Application Performance ImprovementHBase Application Performance Improvement
HBase Application Performance ImprovementBiju Nair
 
Geographically Distributed PostgreSQL
Geographically Distributed PostgreSQLGeographically Distributed PostgreSQL
Geographically Distributed PostgreSQLmason_s
 

What's hot (17)

DB Latency Using DRAM + PMem in App Direct & Memory Modes
DB Latency Using DRAM + PMem in App Direct & Memory ModesDB Latency Using DRAM + PMem in App Direct & Memory Modes
DB Latency Using DRAM + PMem in App Direct & Memory Modes
 
G1: To Infinity and Beyond
G1: To Infinity and BeyondG1: To Infinity and Beyond
G1: To Infinity and Beyond
 
Apache HBase, Accelerated: In-Memory Flush and Compaction
Apache HBase, Accelerated: In-Memory Flush and Compaction Apache HBase, Accelerated: In-Memory Flush and Compaction
Apache HBase, Accelerated: In-Memory Flush and Compaction
 
EVCache & Moneta (GoSF)
EVCache & Moneta (GoSF)EVCache & Moneta (GoSF)
EVCache & Moneta (GoSF)
 
Seastore: Next Generation Backing Store for Ceph
Seastore: Next Generation Backing Store for CephSeastore: Next Generation Backing Store for Ceph
Seastore: Next Generation Backing Store for Ceph
 
hbaseconasia2017: Apache HBase at Netease
hbaseconasia2017: Apache HBase at Neteasehbaseconasia2017: Apache HBase at Netease
hbaseconasia2017: Apache HBase at Netease
 
Breaking the Sound Barrier with Persistent Memory
Breaking the Sound Barrier with Persistent Memory Breaking the Sound Barrier with Persistent Memory
Breaking the Sound Barrier with Persistent Memory
 
Keynote: Apache HBase at Yahoo! Scale
Keynote: Apache HBase at Yahoo! ScaleKeynote: Apache HBase at Yahoo! Scale
Keynote: Apache HBase at Yahoo! Scale
 
HBaseCon2017 gohbase: Pure Go HBase Client
HBaseCon2017 gohbase: Pure Go HBase ClientHBaseCon2017 gohbase: Pure Go HBase Client
HBaseCon2017 gohbase: Pure Go HBase Client
 
Scaling Apache Pulsar to 10 Petabytes/Day
Scaling Apache Pulsar to 10 Petabytes/DayScaling Apache Pulsar to 10 Petabytes/Day
Scaling Apache Pulsar to 10 Petabytes/Day
 
hbaseconasia2017: Large scale data near-line loading method and architecture
hbaseconasia2017: Large scale data near-line loading method and architecturehbaseconasia2017: Large scale data near-line loading method and architecture
hbaseconasia2017: Large scale data near-line loading method and architecture
 
Symfony e grandi numeri: si può fare!
Symfony e grandi numeri: si può fare!Symfony e grandi numeri: si può fare!
Symfony e grandi numeri: si può fare!
 
Efficient Memory Mapped File I/O for In-Memory File Systems (HotStorage '17)
Efficient Memory Mapped File I/O for In-Memory File Systems (HotStorage '17)Efficient Memory Mapped File I/O for In-Memory File Systems (HotStorage '17)
Efficient Memory Mapped File I/O for In-Memory File Systems (HotStorage '17)
 
Cassandra drivers
Cassandra driversCassandra drivers
Cassandra drivers
 
HBase Low Latency
HBase Low LatencyHBase Low Latency
HBase Low Latency
 
HBase Application Performance Improvement
HBase Application Performance ImprovementHBase Application Performance Improvement
HBase Application Performance Improvement
 
Geographically Distributed PostgreSQL
Geographically Distributed PostgreSQLGeographically Distributed PostgreSQL
Geographically Distributed PostgreSQL
 

Viewers also liked

Viewers also liked (10)

Reviews Checklists
Reviews ChecklistsReviews Checklists
Reviews Checklists
 
Test management
Test managementTest management
Test management
 
Hip Brochure Web
Hip Brochure WebHip Brochure Web
Hip Brochure Web
 
fmcg|ability flyer
fmcg|ability flyerfmcg|ability flyer
fmcg|ability flyer
 
Pitch for Shampoo
Pitch for ShampooPitch for Shampoo
Pitch for Shampoo
 
Blackbox
BlackboxBlackbox
Blackbox
 
Testcase
TestcaseTestcase
Testcase
 
Saxo Bank Institutional Offering
Saxo Bank Institutional OfferingSaxo Bank Institutional Offering
Saxo Bank Institutional Offering
 
Testcase definition
Testcase definitionTestcase definition
Testcase definition
 
Low back pain
Low back painLow back pain
Low back pain
 

Similar to ARI. HiPEAK 2014

MAC: A NOVEL SYSTEMATICALLY MULTILEVEL CACHE REPLACEMENT POLICY FOR PCM MEMORY
MAC: A NOVEL SYSTEMATICALLY MULTILEVEL CACHE REPLACEMENT POLICY FOR PCM MEMORYMAC: A NOVEL SYSTEMATICALLY MULTILEVEL CACHE REPLACEMENT POLICY FOR PCM MEMORY
MAC: A NOVEL SYSTEMATICALLY MULTILEVEL CACHE REPLACEMENT POLICY FOR PCM MEMORYcaijjournal
 
Recent advancements in cache technology
Recent advancements in cache technologyRecent advancements in cache technology
Recent advancements in cache technologyParas Nath Chaudhary
 
Smart SSD Controller with Flexibility
Smart SSD Controller with FlexibilitySmart SSD Controller with Flexibility
Smart SSD Controller with Flexibility76coolio
 
P99 Pursuit: 8 Years of Battling P99 Latency
P99 Pursuit: 8 Years of Battling P99 LatencyP99 Pursuit: 8 Years of Battling P99 Latency
P99 Pursuit: 8 Years of Battling P99 LatencyScyllaDB
 
Building Data Pipelines with SMACK: Designing Storage Strategies for Scale an...
Building Data Pipelines with SMACK: Designing Storage Strategies for Scale an...Building Data Pipelines with SMACK: Designing Storage Strategies for Scale an...
Building Data Pipelines with SMACK: Designing Storage Strategies for Scale an...DataStax
 
Virtualization for Emerging Memory Devices
Virtualization for Emerging Memory DevicesVirtualization for Emerging Memory Devices
Virtualization for Emerging Memory DevicesTakahiro Hirofuchi
 
SanDisk: Persistent Memory and Cassandra
SanDisk: Persistent Memory and CassandraSanDisk: Persistent Memory and Cassandra
SanDisk: Persistent Memory and CassandraDataStax Academy
 
Cache Performance Evaluation under Multi-parameters Using SMPCache simulator
Cache Performance Evaluation under Multi-parameters Using SMPCache simulatorCache Performance Evaluation under Multi-parameters Using SMPCache simulator
Cache Performance Evaluation under Multi-parameters Using SMPCache simulatorالمهندسة عائشة بني صخر
 
GCMA: Guaranteed Contiguous Memory Allocator
GCMA: Guaranteed Contiguous Memory AllocatorGCMA: Guaranteed Contiguous Memory Allocator
GCMA: Guaranteed Contiguous Memory AllocatorSeongJae Park
 
Erasing Belady's Limitations: In Search of Flash Cache Offline Optimality
Erasing Belady's Limitations: In Search of Flash Cache Offline OptimalityErasing Belady's Limitations: In Search of Flash Cache Offline Optimality
Erasing Belady's Limitations: In Search of Flash Cache Offline OptimalityYue Cheng
 
Maha an energy efficient malleable hardware accelerator for data intensive a...
Maha  an energy efficient malleable hardware accelerator for data intensive a...Maha  an energy efficient malleable hardware accelerator for data intensive a...
Maha an energy efficient malleable hardware accelerator for data intensive a...Grace Abraham
 
2012 benjamin klenk-future-memory_technologies-presentation
2012 benjamin klenk-future-memory_technologies-presentation2012 benjamin klenk-future-memory_technologies-presentation
2012 benjamin klenk-future-memory_technologies-presentationSaket Vihari
 
Literature survey presentation
Literature survey presentationLiterature survey presentation
Literature survey presentationKarthik Iyr
 
Basic Computer Architecture
Basic Computer ArchitectureBasic Computer Architecture
Basic Computer ArchitectureYong Heui Cho
 
lecture asdkvakm;bk;dv;advvAVHD;KASV;DVKHSVDK
lecture asdkvakm;bk;dv;advvAVHD;KASV;DVKHSVDKlecture asdkvakm;bk;dv;advvAVHD;KASV;DVKHSVDK
lecture asdkvakm;bk;dv;advvAVHD;KASV;DVKHSVDKofficeaiotfab
 
A survey on exploring memory optimizations in smartphones
A survey on exploring memory optimizations in smartphonesA survey on exploring memory optimizations in smartphones
A survey on exploring memory optimizations in smartphonesKarthik Iyr
 

Similar to ARI. HiPEAK 2014 (20)

MAC: A NOVEL SYSTEMATICALLY MULTILEVEL CACHE REPLACEMENT POLICY FOR PCM MEMORY
MAC: A NOVEL SYSTEMATICALLY MULTILEVEL CACHE REPLACEMENT POLICY FOR PCM MEMORYMAC: A NOVEL SYSTEMATICALLY MULTILEVEL CACHE REPLACEMENT POLICY FOR PCM MEMORY
MAC: A NOVEL SYSTEMATICALLY MULTILEVEL CACHE REPLACEMENT POLICY FOR PCM MEMORY
 
Recent advancements in cache technology
Recent advancements in cache technologyRecent advancements in cache technology
Recent advancements in cache technology
 
Smart SSD Controller with Flexibility
Smart SSD Controller with FlexibilitySmart SSD Controller with Flexibility
Smart SSD Controller with Flexibility
 
P99 Pursuit: 8 Years of Battling P99 Latency
P99 Pursuit: 8 Years of Battling P99 LatencyP99 Pursuit: 8 Years of Battling P99 Latency
P99 Pursuit: 8 Years of Battling P99 Latency
 
Building Data Pipelines with SMACK: Designing Storage Strategies for Scale an...
Building Data Pipelines with SMACK: Designing Storage Strategies for Scale an...Building Data Pipelines with SMACK: Designing Storage Strategies for Scale an...
Building Data Pipelines with SMACK: Designing Storage Strategies for Scale an...
 
MEMORY & I/O SYSTEMS
MEMORY & I/O SYSTEMS                          MEMORY & I/O SYSTEMS
MEMORY & I/O SYSTEMS
 
USRG2014 Poster
USRG2014 PosterUSRG2014 Poster
USRG2014 Poster
 
Virtualization for Emerging Memory Devices
Virtualization for Emerging Memory DevicesVirtualization for Emerging Memory Devices
Virtualization for Emerging Memory Devices
 
SanDisk: Persistent Memory and Cassandra
SanDisk: Persistent Memory and CassandraSanDisk: Persistent Memory and Cassandra
SanDisk: Persistent Memory and Cassandra
 
Cache Performance Evaluation under Multi-parameters Using SMPCache simulator
Cache Performance Evaluation under Multi-parameters Using SMPCache simulatorCache Performance Evaluation under Multi-parameters Using SMPCache simulator
Cache Performance Evaluation under Multi-parameters Using SMPCache simulator
 
GCMA: Guaranteed Contiguous Memory Allocator
GCMA: Guaranteed Contiguous Memory AllocatorGCMA: Guaranteed Contiguous Memory Allocator
GCMA: Guaranteed Contiguous Memory Allocator
 
Erasing Belady's Limitations: In Search of Flash Cache Offline Optimality
Erasing Belady's Limitations: In Search of Flash Cache Offline OptimalityErasing Belady's Limitations: In Search of Flash Cache Offline Optimality
Erasing Belady's Limitations: In Search of Flash Cache Offline Optimality
 
Maha an energy efficient malleable hardware accelerator for data intensive a...
Maha  an energy efficient malleable hardware accelerator for data intensive a...Maha  an energy efficient malleable hardware accelerator for data intensive a...
Maha an energy efficient malleable hardware accelerator for data intensive a...
 
2012 benjamin klenk-future-memory_technologies-presentation
2012 benjamin klenk-future-memory_technologies-presentation2012 benjamin klenk-future-memory_technologies-presentation
2012 benjamin klenk-future-memory_technologies-presentation
 
Literature survey presentation
Literature survey presentationLiterature survey presentation
Literature survey presentation
 
Basic Computer Architecture
Basic Computer ArchitectureBasic Computer Architecture
Basic Computer Architecture
 
Slides of talk
Slides of talkSlides of talk
Slides of talk
 
Multicore architectures
Multicore architecturesMulticore architectures
Multicore architectures
 
lecture asdkvakm;bk;dv;advvAVHD;KASV;DVKHSVDK
lecture asdkvakm;bk;dv;advvAVHD;KASV;DVKHSVDKlecture asdkvakm;bk;dv;advvAVHD;KASV;DVKHSVDK
lecture asdkvakm;bk;dv;advvAVHD;KASV;DVKHSVDK
 
A survey on exploring memory optimizations in smartphones
A survey on exploring memory optimizations in smartphonesA survey on exploring memory optimizations in smartphones
A survey on exploring memory optimizations in smartphones
 

Recently uploaded

Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxnull - The Open Security Community
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Bluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfBluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfngoud9212
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Science&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfScience&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfjimielynbastida
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsHyundai Motor Group
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentationphoebematthew05
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 

Recently uploaded (20)

Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Bluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfBluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdf
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Science&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfScience&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdf
 
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort ServiceHot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentation
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 

ARI. HiPEAK 2014

  • 1. Viacheslav Fedorov, Sheng Qiu, Narasimha Reddy, Paul Gratz Texas A&M University ARI: Adaptive Replacement and Insertion HiPEAC 2013, Vienna, Austria
  • 2. Conventional Main Memory ● Usually we only care about speeding up the cache miss path Main Memory Core 0 Core 1 Core 2 Core 3 L3$ L2$ L2$
  • 3. Main Memory: Trends ● New Memories emerging ● DRAM not dense enough ● Replace or augment DRAM DRAM Core 0 Core 1 Core 2 Core 3 L3$ L2$ L2$ DRAM PCM DRAM cache
  • 4. PCM Technology ● Based on Chalcogenide glass ● Exploits two phases ● Amorphous ● Chrystalline ● Higher density than DRAM ● Non-volatile Image: Stanford NanoHeat Lab
  • 5. DRAM vs PCM ● DRAM is writeback-agnostic ● Write Buffers cushion the impact of writebacks ● State-of-the-art policies target cache misses ● PCM ● High write latency – Write Buffers insufficient ● High write energy – Mobile, embedded devices ? ● Low cell endurance – Limited write cycles ? Parameter DRAM PCM Row Read 210 mW 78 mW Row Write 195 mW 773 mW Activate 75 mW 25 mW Standby 90 mW 45 mW Refresh 4 mW 0 mW Initial Row Read 15 ns 28 ns Row Write 22 ns 150 ns Same Row R/W 15 ns 15 ns 0.3x 4x 0.3x 0.5x 7x 2x 0x
  • 6. Outline ● Introduction ● Motivation ● ARI: Adaptive Replacement and Insertion ● Evaluation ● Summary ● Conclusion
  • 7. Motivation ● PCM is attractive as a Main Memory, but... ● PCM does not favor writes ● High energy ● High latency ● Low write cycle tolerance ● Solution: reduce writes into Main Memory ● Modify LLC policies to reduce Writebacks ● Mind the Miss rate!
  • 8. Application behavior in High-Associativity Caches ● Bi-Polar block distribution due to LRU policy ● 'Hot' blocks tend to group towards MRU side ● 'Cold' blocks towards LRU side in a set ● Hot blocks have higher Hit-ratio ● Cold blocks tend to have similar Hit-ratios %hitrate Position in LRU stackMRU LRU 'Hot' region 'Cold' region Hit distribution in a high-associativity cache (16-way)
  • 9. Static LLC policies ● Based on the observed hot-cold distribution ● 16-way cache: 16 static policies, xH16 ● Replace any clean block in (16-x) Low-hit blocks ● Drawbacks: ● No single static policy good for all applications ● Less writebacks => more cache misses – When replacing hot blocks
  • 10. Enter ARI: Adaptive Replacement and Insertion ● Goal: Reduce LLC writebacks ! ● Keep miss rate lower than conventional policies ● How? ● Do not replace dirty cache blocks (as long as possible) ● Place fresh incoming blocks into LLC smartly ● Dynamically choose the best policy
  • 11. ARI: Operation ● Evict clean blocks from Low-Hit region ● Insert new blocks into top of Low-Hit region %hitrate Position in LRU stackMRU LRU High-Hit region Low-Hit region
  • 12. ARI: Operation ● Application hit-distributions are not static ● Dynamic policy adaptation based on epochs ● Emulate various static thresholds in LLC tags ● Pick the best one for next epoch (25k LLC accesses) ● Misses + Writebacks metric used %hitrate MRU LRU
  • 13. Core 0 Core 1 Core 2 Core 3 L3$ L2$ L2$ ARI: Implementation ● Emulate static thresholds in shadow tags ● Adapt to the hit-distribution Tag Array Data ArrayShadow Tag Array dynamically 4H16 10H16 14H16
  • 14. Outline ● Introduction ● Motivation ● ARI: Adaptive Replacement and Insertion ● Evaluation ● Summary ● Conclusion
  • 15. Methodology ● gem5 + DRAMSim2 simulators ● nVidia Tegra -like out-of-order, dual-issue CPU ● SPEC2006 and PARSEC suites ● Compared against state-of-the-art policies ● ARI beats them in writeback reduction ● Nearly identical in total performance System Single core Multicore L1 cache 32KB I + 64KB D, 2-way, LRU, 64B block 32KB I + 64KB D, 2-way, LRU, 64B block L2 cache 256KB, 8-way, LRU, 64B block 256KB, 8-way, LRU, 64B block (private) L3 cache 2MB, 16-way, LRU, 64B block 16MB, 16-way, LRU, 64B block (shared) Main memory 4GB, DDR3-1333 DRAM, 32-entry write buffer 4GB, DDR3-1333 DRAM, 32-entry write buffer
  • 16. ARI: Writeback reduction ● ARI beats the competition: 33% WB reduction Writeback improvement, normalized to LRU policy DIP: M. Qureshi et al, ISCA '09 DBLK: S. Khan et al, MICRO '10 RRIP: A. Jaleel et al, ISCA '10
  • 17. ARI: Miss reduction ● ARI achieves 4.7% Misses reduction Miss rate improvement, normalized to LRU policy DIP: M. Qureshi et al, ISCA '09 DBLK: S. Khan et al, MICRO '10 RRIP: A. Jaleel et al, ISCA '10
  • 18. ARI: Performance improvement ● ARI yields a 5% IPC improvement on average IPC improvement, normalized to LRU policy
  • 19. ARI: Dynamic behavior ● ARI adapts to program phases ● Achieves lower WBs than the best static policy Soplex application, SPEC 2006mcf application, SPEC 2006 Writebacks
  • 21. ARI: PCM lifetime improvement ● ARI facilitates the use of PCM as Main Memory DIP DBLK RRIP ARI 0% 10% 20% 30% 40% 50% 60% %PCMlifetimeimprovement Decrease lifetime for several apps
  • 22. ARI: PCM lifetime improvement
  • 23. ARI: Hardware overhead ● 8 sets shadowed per LLC bank (x8) ● p*2 shadow tags (we use p=9) ● 14kB storage overhead in a 16MB LLC ● Epoch counter – 15 bits ● Performance counters, adders ● Not on critical path ● Can be designed for low power
  • 24. Outline ● Introduction ● Motivation ● ARI: Adaptive Replacement and Insertion ● Evaluation ● Summary ● Conclusion
  • 25. ARI: Summary ● 33% writeback reduction ● 4.7% cache miss rate reduction ● 9% less Main Memory traffic ● System IPC boost of 5% ● Enabling PCM as Main Memory ● 50% lifetime improvement Win – Win
  • 26. Conclusion ● DRAM is hitting a scalability wall ● New memories/architectures proposed ● We target PCM as main memory ● Propose ARI: Adaptive Replacement and Insertion ● Simple scheme ● Reduce writebacks to main memory ● Boost the PCM performance and lifetime
  • 29. Related Work: PCM G. Dhiman et al. PDRAM: A hybrid PRAM and DRAM main memory system. DAC ’09 M. K. Qureshi et al. Enhancing Lifetime and Security of PCM-based Main Memory with Start-Gap Wear Leveling. MICRO ’09 B. C. Lee et al. Architecting Phase Change Memory as a Scalable DRAM Alternative. ISCA ’09 M. K. Qureshi et al. Scalable high performance main memory system using phase-change memory technology. ISCA ’09 A. P. Ferreira et al. Increasing PCM main memory lifetime. DATE ’10
  • 30. Related Work: PCM N. H. Seong et al. Security refresh: prevent malicious wear-out and increase durability for phase-change memory with dynamically randomized address mapping. ISCA ’10 H. Yoon et al. Row buffer locality aware caching policies for hybrid memories. ICCD ’12 Stuecheli et al. The Virtual Write Queue: Coordinating DRAM and Last-Level Cache Policies. ISCA ’10 M. K. Qureshi & G. H. Loh Fundamental latency trade-off in architecting dram caches: Outperforming impractical SRAM-tags with a simple and practical design. MICRO ’12
  • 32. ARI: Total Memory Traffic gcc bzip bwaves mcf milc zeus gromacs cactusADMleslie3d namd gobmk soplex hmmer sjeng GemsFDTDh264ref astar sphinx3 avg 0.6 0.7 0.8 0.9 1.0 1.1 1.2 1.3 1.4 Total memory traffic, Misses + Writebacks. Normalized to LRU 4H16 ARI TotaltrafficnormalizedtoLRU