Prediction System for L1 Data Cache Ankur A Kath Neha Paranjape Sachin M Kulkarni Varun Pandit
INTRODUCTION <ul><li>Advances in Semi-Conductor industry  </li></ul><ul><li>Faster Processors </li></ul><ul><li>Increase i...
PROBLEM <ul><li>Increase in performance is not directly proportional to the increase in operational speeds. </li></ul><ul>...
POSSIBLE SOLUTIONS <ul><li>Obvious Solution –  Reduce Cache Misses </li></ul><ul><li>Many different approaches: </li></ul>...
OVERVIEW OF OUR SOLUTION <ul><li>Our Solution aims at  predicting the address  of the next memory access. </li></ul><ul><l...
ENTIRE PREDICTION SYSTEM <ul><li>Monitor loads and stores  from the fetch and commit stage respectively. </li></ul><ul><li...
IDEA BEHIND OUR SOLUTION <ul><li>Memory Accesses can be  separated in time . </li></ul><ul><li>Processor busy with its int...
BUFFER <ul><li>To avoid hindrance with the existing L1 Data Cache operation a new  Buffer Cache is required . </li></ul><u...
PREDICTION SYSTEM <ul><li>How do we predict addresses? </li></ul><ul><li>Simplest method is calculating  </li></ul><ul><li...
HARDWARE DETAILS <ul><li>16 entry Buffer Cache with the same block size as the L1 Data Cache.  </li></ul><ul><li>L1 Access...
RESULTS - IPC
RESULTS – MISSES & PREDICTIONS
IMPROVEMENTS <ul><li>Possible improvements with the prediction system: </li></ul><ul><ul><li>Correct/Incorrect : Do a Buff...
RESULTS (IPC)
RESULTS  (CORRECT PREDICTIONS)
SHORTCOMINGS <ul><li>Our system will not work in a multiprocessor environment. </li></ul><ul><ul><li>Data in Shared State,...
CONCLUSION <ul><li>Prediction System showing a slight increase in IPC in certain cases, specially in improvements. </li></...
FUTURE WORK <ul><li>Squeeze more juice out? </li></ul><ul><ul><li>Prediction System can be developed depending on the diff...
<ul><li>Thank You </li></ul><ul><li>Any Questions? </li></ul>
Upcoming SlideShare
Loading in …5
×

Prediction System For L1 Cache

415 views

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
415
On SlideShare
0
From Embeds
0
Number of Embeds
9
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Prediction System For L1 Cache

  1. 1. Prediction System for L1 Data Cache Ankur A Kath Neha Paranjape Sachin M Kulkarni Varun Pandit
  2. 2. INTRODUCTION <ul><li>Advances in Semi-Conductor industry </li></ul><ul><li>Faster Processors </li></ul><ul><li>Increase in size of memories, program complexities. </li></ul><ul><li>Increase in operational speeds of individual units. </li></ul>
  3. 3. PROBLEM <ul><li>Increase in performance is not directly proportional to the increase in operational speeds. </li></ul><ul><li>Speeds of operation of each unit do not match. </li></ul><ul><li>Interfacing latencies proving to be a bottleneck. </li></ul><ul><li>Processor-Memory interface proving to be a major contributor to these latencies. </li></ul><ul><li>Cache has made improvements to reduce latency. </li></ul><ul><li>But cache misses eat into these improvements. </li></ul>
  4. 4. POSSIBLE SOLUTIONS <ul><li>Obvious Solution – Reduce Cache Misses </li></ul><ul><li>Many different approaches: </li></ul><ul><ul><li>Increase Cache Size </li></ul></ul><ul><ul><li>Blocking </li></ul></ul><ul><ul><li>Adding Levels </li></ul></ul><ul><li>Our Solution, a bit different, aims at predicting the address of the next memory access. </li></ul>
  5. 5. OVERVIEW OF OUR SOLUTION <ul><li>Our Solution aims at predicting the address of the next memory access. </li></ul><ul><li>Memory accesses can be sequential and hence prediction becomes easy. </li></ul><ul><li>Increasing the size of cache blocks or cache does not really help. </li></ul><ul><li>Preload the predicted address so that it is available when used in the next memory access . </li></ul>
  6. 6. ENTIRE PREDICTION SYSTEM <ul><li>Monitor loads and stores from the fetch and commit stage respectively. </li></ul><ul><li>Loads/Stores going to the L1 Data Cache are only monitored. (Prevents data forwarding from interfering) </li></ul><ul><li>Since execution is Out-of-Order separate prediction for loads and stores . </li></ul><ul><li>Two new units added to the system: </li></ul><ul><ul><li>Buffer Cache </li></ul></ul><ul><ul><li>Prediction System </li></ul></ul>
  7. 7. IDEA BEHIND OUR SOLUTION <ul><li>Memory Accesses can be separated in time . </li></ul><ul><li>Processor busy with its internal operations leaving the Cache and Memory idle. </li></ul><ul><li>This idle time can be used to load data into the cache from a predicted address. </li></ul><ul><li>When this address is accessed in the next cycle it will be readily available. </li></ul><ul><li>Example: </li></ul>
  8. 8. BUFFER <ul><li>To avoid hindrance with the existing L1 Data Cache operation a new Buffer Cache is required . </li></ul><ul><li>On the same level as the L1 Data Cache, same hit latency as the L1. </li></ul><ul><li>No writes to Buffer. </li></ul><ul><li>On a hit on the buffer data is moved to L1. </li></ul><ul><li>Two operations on the Buffer: </li></ul><ul><ul><li>Prediction Load </li></ul></ul><ul><ul><li>Normal Access (Only on L1 Misses) </li></ul></ul>
  9. 9. PREDICTION SYSTEM <ul><li>How do we predict addresses? </li></ul><ul><li>Simplest method is calculating </li></ul><ul><li>difference between two consecutive memory operations. </li></ul><ul><li>Predicted address will be current address plus/minus the difference. </li></ul><ul><li>Again, Load/Stores have to be predicted separately. </li></ul><ul><li>The Prediction Logic issues a memory access for the predicted address right after an actual load/store. </li></ul>
  10. 10. HARDWARE DETAILS <ul><li>16 entry Buffer Cache with the same block size as the L1 Data Cache. </li></ul><ul><li>L1 Accesses and Buffer (Prediction) Loads occur on the same bus. </li></ul><ul><li>Prediction Unit will consist of: </li></ul><ul><ul><li>Adder Unit (Calculation of Offset) </li></ul></ul><ul><ul><li>Control Circuitry </li></ul></ul><ul><li>Possible Improvement - Have a “Two Port” L2 making L1 & Buffer work in parallel. </li></ul>
  11. 11. RESULTS - IPC
  12. 12. RESULTS – MISSES & PREDICTIONS
  13. 13. IMPROVEMENTS <ul><li>Possible improvements with the prediction system: </li></ul><ul><ul><li>Correct/Incorrect : Do a Buffer (Prediction) Load only when the earlier prediction is correct. </li></ul></ul><ul><ul><li>Correct/Incorrect with Range : Do a Buffer (Prediction) Load when either the earlier prediction is correct or the predicted address lies within a particular range of the current address. </li></ul></ul><ul><ul><li>Saturation Counter : Use a 2 bit Saturation Counter for Correct/Incorrect predictions. (Similar to Branch Prediction) </li></ul></ul>
  14. 14. RESULTS (IPC)
  15. 15. RESULTS (CORRECT PREDICTIONS)
  16. 16. SHORTCOMINGS <ul><li>Our system will not work in a multiprocessor environment. </li></ul><ul><ul><li>Data in Shared State, </li></ul></ul><ul><ul><li>Invalidation of modified data in all Buffer Caches </li></ul></ul><ul><ul><li>Multiple Sources </li></ul></ul><ul><li>Design of a perfect prediction system is non trivial due to very random nature of memory accesses. </li></ul><ul><li>Good only for Application Specific Systems, General Purpose Systems will not benefit. </li></ul>
  17. 17. CONCLUSION <ul><li>Prediction System showing a slight increase in IPC in certain cases, specially in improvements. </li></ul><ul><li>Design of Prediction Logic is very critical and is highly dependant on the memory access pattern. </li></ul><ul><li>Hence, Application Specific Systems can benefit. </li></ul>
  18. 18. FUTURE WORK <ul><li>Squeeze more juice out? </li></ul><ul><ul><li>Prediction System can be developed depending on the different types of Algorithms (Divide & Conquer, Branch & Bound, …) </li></ul></ul><ul><li>Can this be extended to a multiprocessor environment? </li></ul><ul><ul><li>Requires a very efficient cache coherency protocol, reducing inter cache messages. </li></ul></ul>
  19. 19. <ul><li>Thank You </li></ul><ul><li>Any Questions? </li></ul>

×