ChargeCache
Reducing DRAM Latency by
Exploiting Row Access Locality
Hasan Hassan,
Gennady Pekhimenko,
Nandita Vijaykumar,
Vivek Seshadri, Donghyuk Lee,
Oguz Ergin, Onur Mutlu
2
Executive Summary
• Goal: Reduce average DRAM access latency with no
modification to the existing DRAM chips
• Observations:
1) A highly-charged DRAM row can be accessed with low latency
2) A row’s charge is restored when the row is accessed
3) A recently-accessed row is likely to be accessed again:
Row Level Temporal Locality (RLTL)
• Key Idea: Track recently-accessed DRAM rows and use lower
timing parameters if such rows are accessed again
• ChargeCache:
– Low cost & no modifications to the DRAM
– Higher performance (8.6-10.6% on average for 8-core)
– Lower DRAM energy (7.9% on average)
3
1. DRAM Operation Basics
2. Accessing Highly-charged Rows
4. ChargeCache
5. Evaluation
6. Conclusion
3. Row Level Temporal Locality (RLTL)
Outline
4
DRAM Stores Data as Charge
DRAM
Cell
Sense-Amplifier
MemCtrl
CPU
5
DRAM Stores Data as Charge
1. Sensing
2. Restore
3. Precharge
DRAM
Cell
Sense-Amplifier
Three steps of charge
movement
6
Data 0
Data 1
Cell
time
charge
Sense-Amplifier
DRAM Charge over Time
Sensing Restore
Cell
Sense
Amplifier
Precharge
R/W
ACT PRE
Ready to Access
Charge Level
tRCD
tRAS
Ready to Access
Ready to
Precharge
7
1. DRAM Operation Basics
2. Accessing Highly-charged Rows
4. ChargeCache
5. Evaluation
6. Conclusion
3. Row Level Temporal Locality (RLTL)
Outline
8
Accessing Highly-charged Rows
Data 0
Data 1
Cell
time
charge
Sense-Amplifier
Sensing Restore Precharge
R/W
ACT PRE
tRCD
tRAS
R/W PRE
Ready to Access Ready to Precharge
9
Observation 1
A highly-charged DRAM row can be
accessed with low latency
• tRCD: 44%
• tRAS: 37%
How does a row become
highly-charged?
10
How Does a Row Become Highly-Charged?
DRAM cells lose charge over time
Two ways of restoring a row’s charge:
• Refresh Operation
• Access
time
Refresh
charge
Refresh
Access
11
Observation 2
A row’s charge is restored when the row
is accessed
How likely is a recently-accessed
row to be accessed again?
12
1. DRAM Operation Basics
2. Accessing Highly-charged Rows
4. ChargeCache
5. Evaluation
6. Conclusion
3. Row Level Temporal Locality (RLTL)
Outline
13
0%
20%
40%
60%
80%
100%
Fraction
of
Accesses
Row Level Temporal Locality (RLTL)
86%
0%
20%
40%
60%
80%
100%
Fraction
of
Accesses
97%
A recently-accessed DRAM row is likely to be
accessed again.
• t-RLTL: Fraction of rows that are accessed
within time t after their previous access
8ms – RLTL for single-core workloads
8ms – RLTL for eight-core workloads
14
1. DRAM Operation Basics
2. Accessing Highly-charged Rows
4. ChargeCache
5. Evaluation
6. Conclusion
3. Row Level Temporal Locality (RLTL)
Outline
15
Summary of the Observations
1. A highly-charged DRAM row can be
accessed with low latency
2. A row’s charge is restored when the
row is accessed
3. A recently-accessed DRAM row is
likely to be accessed again:
Row Level Temporal Locality (RLTL)
16
Key Idea
Track recently-accessed DRAM rows
and use lower timing parameters if
such rows are accessed again
17
ChargeCache Overview
Memory Controller
ChargeCache
A
:B
:D
:C
:E
:F
Requests:
:A
D A
DRAM
A
D
ChargeCache Miss: Use Default Timings
ChargeCache Hit: Use Lower Timings
18
Area and Power Overhead
• Modeled with CACTI
• Area
– ~5KB for 128-entry ChargeCache
– 0.24% of a 4MB Last Level Cache (LLC)
area
• Power Consumption
– 0.15 mW on average (static + dynamic)
– 0.23% of the 4MB LLC power consumption
19
1. DRAM Operation Basics
2. Accessing Highly-charged Rows
4. ChargeCache
5. Evaluation
6. Conclusion
3. Row Level Temporal Locality (RLTL)
Outline
20
Methodology
• Simulator
– Ramulator [Kim+, CAL’15]
https://github.com/CMU-SAFARI/ramulator
• Workloads
– 22 single-core workloads
• SPEC CPU2006, TPC, STREAM
– 20 multi-programmed 8-core workloads
• By randomly choosing from single-core workloads
– Execute at least 1 billion representative instructions per
core (Pinpoints)
• System Parameters
– 1/8 core system with 4MB LLC
– Default tRCD/tRAS of 11/28 cycles
21
Mechanisms Evaluated
Non-Uniform Access Time Memory Controller (NUAT)
[Shin+, HPCA’14]
– Key idea: Access only recently-refreshed rows with
lower timing parameters
 Recently-refreshed rows can be accessed faster
 Only a small fraction (10-12%) of accesses go to
recently-refreshed rows
ChargeCache
 Recently-accessed rows can be accessed faster
 A large fraction (86-97%) of accesses go to recently-
accessed rows (RLTL)
– 128 entries per core, On hit: tRCD-7, tRAS-20 cycles
Upper Bound: Low Latency DRAM
– Works as ChargeCache with 100% Hit Ratio
– On all DRAM accesses: tRCD-7, tRAS-20 cycles
22
0%
4%
8%
12%
16%
Speedup
Single-core Performance
NUAT ChargeCache
ChargeCache + NUAT LL-DRAM (Upper bound)
ChargeCache improves
single-core performance
23
Eight-core Performance
NUAT ChargeCache
ChargeCache + NUAT LL-DRAM (Upperbound)
2.5% 9%
0%
4%
8%
12%
16%
Speedup
13%
ChargeCache significantly improves
multi-core performance
24
DRAM Energy Savings
0%
5%
10%
15%
Single-core Eight-core
DRAM
Energy
Reduction Average Maximum
ChargeCache reduces DRAM energy
25
Other Results In The Paper
• Detailed analysis of the Row Level
Temporal Locality phenomenon
• ChargeCache hit-rate analysis
• Sensitivity studies
oSensitivity to t in t-RLTL
oChargeCache capacity
26
1. DRAM Operation Basics
2. Accessing Highly-charged Rows
4. ChargeCache
5. Evaluation
6. Conclusion
3. Row Level Temporal Locality (RLTL)
Outline
27
Conclusion
• ChargeCache reduces average DRAM access latency at low cost
• Observations:
1) A highly-charged DRAM row can be accessed with low latency
2) A row’s charge is restored when the row is accessed
3) A recently-accessed row is likely to be accessed again: Row
Level Temporal Locality (RLTL)
• Key Idea: Track recently-accessed DRAM rows and use lower
timing parameters if such rows are accessed again
• ChargeCache:
– Low cost & no modifications to the DRAM
– Higher performance (8.6-10.6% on average for 8-core)
– Lower DRAM energy (7.9% on average)

Charge_cache_Slides_HPCA_2016.pdf

  • 1.
    ChargeCache Reducing DRAM Latencyby Exploiting Row Access Locality Hasan Hassan, Gennady Pekhimenko, Nandita Vijaykumar, Vivek Seshadri, Donghyuk Lee, Oguz Ergin, Onur Mutlu
  • 2.
    2 Executive Summary • Goal:Reduce average DRAM access latency with no modification to the existing DRAM chips • Observations: 1) A highly-charged DRAM row can be accessed with low latency 2) A row’s charge is restored when the row is accessed 3) A recently-accessed row is likely to be accessed again: Row Level Temporal Locality (RLTL) • Key Idea: Track recently-accessed DRAM rows and use lower timing parameters if such rows are accessed again • ChargeCache: – Low cost & no modifications to the DRAM – Higher performance (8.6-10.6% on average for 8-core) – Lower DRAM energy (7.9% on average)
  • 3.
    3 1. DRAM OperationBasics 2. Accessing Highly-charged Rows 4. ChargeCache 5. Evaluation 6. Conclusion 3. Row Level Temporal Locality (RLTL) Outline
  • 4.
    4 DRAM Stores Dataas Charge DRAM Cell Sense-Amplifier MemCtrl CPU
  • 5.
    5 DRAM Stores Dataas Charge 1. Sensing 2. Restore 3. Precharge DRAM Cell Sense-Amplifier Three steps of charge movement
  • 6.
    6 Data 0 Data 1 Cell time charge Sense-Amplifier DRAMCharge over Time Sensing Restore Cell Sense Amplifier Precharge R/W ACT PRE Ready to Access Charge Level tRCD tRAS Ready to Access Ready to Precharge
  • 7.
    7 1. DRAM OperationBasics 2. Accessing Highly-charged Rows 4. ChargeCache 5. Evaluation 6. Conclusion 3. Row Level Temporal Locality (RLTL) Outline
  • 8.
    8 Accessing Highly-charged Rows Data0 Data 1 Cell time charge Sense-Amplifier Sensing Restore Precharge R/W ACT PRE tRCD tRAS R/W PRE Ready to Access Ready to Precharge
  • 9.
    9 Observation 1 A highly-chargedDRAM row can be accessed with low latency • tRCD: 44% • tRAS: 37% How does a row become highly-charged?
  • 10.
    10 How Does aRow Become Highly-Charged? DRAM cells lose charge over time Two ways of restoring a row’s charge: • Refresh Operation • Access time Refresh charge Refresh Access
  • 11.
    11 Observation 2 A row’scharge is restored when the row is accessed How likely is a recently-accessed row to be accessed again?
  • 12.
    12 1. DRAM OperationBasics 2. Accessing Highly-charged Rows 4. ChargeCache 5. Evaluation 6. Conclusion 3. Row Level Temporal Locality (RLTL) Outline
  • 13.
    13 0% 20% 40% 60% 80% 100% Fraction of Accesses Row Level TemporalLocality (RLTL) 86% 0% 20% 40% 60% 80% 100% Fraction of Accesses 97% A recently-accessed DRAM row is likely to be accessed again. • t-RLTL: Fraction of rows that are accessed within time t after their previous access 8ms – RLTL for single-core workloads 8ms – RLTL for eight-core workloads
  • 14.
    14 1. DRAM OperationBasics 2. Accessing Highly-charged Rows 4. ChargeCache 5. Evaluation 6. Conclusion 3. Row Level Temporal Locality (RLTL) Outline
  • 15.
    15 Summary of theObservations 1. A highly-charged DRAM row can be accessed with low latency 2. A row’s charge is restored when the row is accessed 3. A recently-accessed DRAM row is likely to be accessed again: Row Level Temporal Locality (RLTL)
  • 16.
    16 Key Idea Track recently-accessedDRAM rows and use lower timing parameters if such rows are accessed again
  • 17.
    17 ChargeCache Overview Memory Controller ChargeCache A :B :D :C :E :F Requests: :A DA DRAM A D ChargeCache Miss: Use Default Timings ChargeCache Hit: Use Lower Timings
  • 18.
    18 Area and PowerOverhead • Modeled with CACTI • Area – ~5KB for 128-entry ChargeCache – 0.24% of a 4MB Last Level Cache (LLC) area • Power Consumption – 0.15 mW on average (static + dynamic) – 0.23% of the 4MB LLC power consumption
  • 19.
    19 1. DRAM OperationBasics 2. Accessing Highly-charged Rows 4. ChargeCache 5. Evaluation 6. Conclusion 3. Row Level Temporal Locality (RLTL) Outline
  • 20.
    20 Methodology • Simulator – Ramulator[Kim+, CAL’15] https://github.com/CMU-SAFARI/ramulator • Workloads – 22 single-core workloads • SPEC CPU2006, TPC, STREAM – 20 multi-programmed 8-core workloads • By randomly choosing from single-core workloads – Execute at least 1 billion representative instructions per core (Pinpoints) • System Parameters – 1/8 core system with 4MB LLC – Default tRCD/tRAS of 11/28 cycles
  • 21.
    21 Mechanisms Evaluated Non-Uniform AccessTime Memory Controller (NUAT) [Shin+, HPCA’14] – Key idea: Access only recently-refreshed rows with lower timing parameters  Recently-refreshed rows can be accessed faster  Only a small fraction (10-12%) of accesses go to recently-refreshed rows ChargeCache  Recently-accessed rows can be accessed faster  A large fraction (86-97%) of accesses go to recently- accessed rows (RLTL) – 128 entries per core, On hit: tRCD-7, tRAS-20 cycles Upper Bound: Low Latency DRAM – Works as ChargeCache with 100% Hit Ratio – On all DRAM accesses: tRCD-7, tRAS-20 cycles
  • 22.
    22 0% 4% 8% 12% 16% Speedup Single-core Performance NUAT ChargeCache ChargeCache+ NUAT LL-DRAM (Upper bound) ChargeCache improves single-core performance
  • 23.
    23 Eight-core Performance NUAT ChargeCache ChargeCache+ NUAT LL-DRAM (Upperbound) 2.5% 9% 0% 4% 8% 12% 16% Speedup 13% ChargeCache significantly improves multi-core performance
  • 24.
    24 DRAM Energy Savings 0% 5% 10% 15% Single-coreEight-core DRAM Energy Reduction Average Maximum ChargeCache reduces DRAM energy
  • 25.
    25 Other Results InThe Paper • Detailed analysis of the Row Level Temporal Locality phenomenon • ChargeCache hit-rate analysis • Sensitivity studies oSensitivity to t in t-RLTL oChargeCache capacity
  • 26.
    26 1. DRAM OperationBasics 2. Accessing Highly-charged Rows 4. ChargeCache 5. Evaluation 6. Conclusion 3. Row Level Temporal Locality (RLTL) Outline
  • 27.
    27 Conclusion • ChargeCache reducesaverage DRAM access latency at low cost • Observations: 1) A highly-charged DRAM row can be accessed with low latency 2) A row’s charge is restored when the row is accessed 3) A recently-accessed row is likely to be accessed again: Row Level Temporal Locality (RLTL) • Key Idea: Track recently-accessed DRAM rows and use lower timing parameters if such rows are accessed again • ChargeCache: – Low cost & no modifications to the DRAM – Higher performance (8.6-10.6% on average for 8-core) – Lower DRAM energy (7.9% on average)