Your SlideShare is downloading. ×
0
Prediction System For L1 Cache
Prediction System For L1 Cache
Prediction System For L1 Cache
Prediction System For L1 Cache
Prediction System For L1 Cache
Prediction System For L1 Cache
Prediction System For L1 Cache
Prediction System For L1 Cache
Prediction System For L1 Cache
Prediction System For L1 Cache
Prediction System For L1 Cache
Prediction System For L1 Cache
Prediction System For L1 Cache
Prediction System For L1 Cache
Prediction System For L1 Cache
Prediction System For L1 Cache
Prediction System For L1 Cache
Prediction System For L1 Cache
Prediction System For L1 Cache
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Prediction System For L1 Cache

276

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
276
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Prediction System for L1 Data Cache Ankur A Kath Neha Paranjape Sachin M Kulkarni Varun Pandit
  • 2. INTRODUCTION <ul><li>Advances in Semi-Conductor industry </li></ul><ul><li>Faster Processors </li></ul><ul><li>Increase in size of memories, program complexities. </li></ul><ul><li>Increase in operational speeds of individual units. </li></ul>
  • 3. PROBLEM <ul><li>Increase in performance is not directly proportional to the increase in operational speeds. </li></ul><ul><li>Speeds of operation of each unit do not match. </li></ul><ul><li>Interfacing latencies proving to be a bottleneck. </li></ul><ul><li>Processor-Memory interface proving to be a major contributor to these latencies. </li></ul><ul><li>Cache has made improvements to reduce latency. </li></ul><ul><li>But cache misses eat into these improvements. </li></ul>
  • 4. POSSIBLE SOLUTIONS <ul><li>Obvious Solution – Reduce Cache Misses </li></ul><ul><li>Many different approaches: </li></ul><ul><ul><li>Increase Cache Size </li></ul></ul><ul><ul><li>Blocking </li></ul></ul><ul><ul><li>Adding Levels </li></ul></ul><ul><li>Our Solution, a bit different, aims at predicting the address of the next memory access. </li></ul>
  • 5. OVERVIEW OF OUR SOLUTION <ul><li>Our Solution aims at predicting the address of the next memory access. </li></ul><ul><li>Memory accesses can be sequential and hence prediction becomes easy. </li></ul><ul><li>Increasing the size of cache blocks or cache does not really help. </li></ul><ul><li>Preload the predicted address so that it is available when used in the next memory access . </li></ul>
  • 6. ENTIRE PREDICTION SYSTEM <ul><li>Monitor loads and stores from the fetch and commit stage respectively. </li></ul><ul><li>Loads/Stores going to the L1 Data Cache are only monitored. (Prevents data forwarding from interfering) </li></ul><ul><li>Since execution is Out-of-Order separate prediction for loads and stores . </li></ul><ul><li>Two new units added to the system: </li></ul><ul><ul><li>Buffer Cache </li></ul></ul><ul><ul><li>Prediction System </li></ul></ul>
  • 7. IDEA BEHIND OUR SOLUTION <ul><li>Memory Accesses can be separated in time . </li></ul><ul><li>Processor busy with its internal operations leaving the Cache and Memory idle. </li></ul><ul><li>This idle time can be used to load data into the cache from a predicted address. </li></ul><ul><li>When this address is accessed in the next cycle it will be readily available. </li></ul><ul><li>Example: </li></ul>
  • 8. BUFFER <ul><li>To avoid hindrance with the existing L1 Data Cache operation a new Buffer Cache is required . </li></ul><ul><li>On the same level as the L1 Data Cache, same hit latency as the L1. </li></ul><ul><li>No writes to Buffer. </li></ul><ul><li>On a hit on the buffer data is moved to L1. </li></ul><ul><li>Two operations on the Buffer: </li></ul><ul><ul><li>Prediction Load </li></ul></ul><ul><ul><li>Normal Access (Only on L1 Misses) </li></ul></ul>
  • 9. PREDICTION SYSTEM <ul><li>How do we predict addresses? </li></ul><ul><li>Simplest method is calculating </li></ul><ul><li>difference between two consecutive memory operations. </li></ul><ul><li>Predicted address will be current address plus/minus the difference. </li></ul><ul><li>Again, Load/Stores have to be predicted separately. </li></ul><ul><li>The Prediction Logic issues a memory access for the predicted address right after an actual load/store. </li></ul>
  • 10. HARDWARE DETAILS <ul><li>16 entry Buffer Cache with the same block size as the L1 Data Cache. </li></ul><ul><li>L1 Accesses and Buffer (Prediction) Loads occur on the same bus. </li></ul><ul><li>Prediction Unit will consist of: </li></ul><ul><ul><li>Adder Unit (Calculation of Offset) </li></ul></ul><ul><ul><li>Control Circuitry </li></ul></ul><ul><li>Possible Improvement - Have a “Two Port” L2 making L1 & Buffer work in parallel. </li></ul>
  • 11. RESULTS - IPC
  • 12. RESULTS – MISSES & PREDICTIONS
  • 13. IMPROVEMENTS <ul><li>Possible improvements with the prediction system: </li></ul><ul><ul><li>Correct/Incorrect : Do a Buffer (Prediction) Load only when the earlier prediction is correct. </li></ul></ul><ul><ul><li>Correct/Incorrect with Range : Do a Buffer (Prediction) Load when either the earlier prediction is correct or the predicted address lies within a particular range of the current address. </li></ul></ul><ul><ul><li>Saturation Counter : Use a 2 bit Saturation Counter for Correct/Incorrect predictions. (Similar to Branch Prediction) </li></ul></ul>
  • 14. RESULTS (IPC)
  • 15. RESULTS (CORRECT PREDICTIONS)
  • 16. SHORTCOMINGS <ul><li>Our system will not work in a multiprocessor environment. </li></ul><ul><ul><li>Data in Shared State, </li></ul></ul><ul><ul><li>Invalidation of modified data in all Buffer Caches </li></ul></ul><ul><ul><li>Multiple Sources </li></ul></ul><ul><li>Design of a perfect prediction system is non trivial due to very random nature of memory accesses. </li></ul><ul><li>Good only for Application Specific Systems, General Purpose Systems will not benefit. </li></ul>
  • 17. CONCLUSION <ul><li>Prediction System showing a slight increase in IPC in certain cases, specially in improvements. </li></ul><ul><li>Design of Prediction Logic is very critical and is highly dependant on the memory access pattern. </li></ul><ul><li>Hence, Application Specific Systems can benefit. </li></ul>
  • 18. FUTURE WORK <ul><li>Squeeze more juice out? </li></ul><ul><ul><li>Prediction System can be developed depending on the different types of Algorithms (Divide & Conquer, Branch & Bound, …) </li></ul></ul><ul><li>Can this be extended to a multiprocessor environment? </li></ul><ul><ul><li>Requires a very efficient cache coherency protocol, reducing inter cache messages. </li></ul></ul>
  • 19. <ul><li>Thank You </li></ul><ul><li>Any Questions? </li></ul>

×