Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Caching and TLBs Andy Wang Operating Systems COP 4610 / CGS 5765
Caching <ul><li>Store copies of data at places that can be accessed more quickly than accessing the original </li></ul><ul...
Caching in Memory Hierarchy <ul><li>Provides the illusion of GB storage </li></ul><ul><ul><li>With register access time </...
Caching in Memory Hierarchy <ul><li>Exploits two hardware characteristics </li></ul><ul><ul><li>Smaller memory provides fa...
Locality in Access Patterns <ul><li>Temporal locality:   recently referenced locations are more likely to be referenced in...
Caching <ul><li>Storing a small set of data in cache </li></ul><ul><ul><li>Provides the following illusions </li></ul></ul...
Generic Issues in Caching <ul><li>Effective metrics </li></ul><ul><ul><li>Cache hit:   a lookup is resolved by the content...
Effective Access Time <ul><li>Cache hit rate:  99% </li></ul><ul><ul><li>Cost:  2 clock cycles </li></ul></ul><ul><li>Cach...
Implications <ul><li>10 MB of cache </li></ul><ul><ul><li>Illusion of 4 GB of memory </li></ul></ul><ul><ul><li>Running at...
Reasons for Cache Misses <ul><li>Compulsory misses:   data brought into the cache for the first time </li></ul><ul><ul><li...
Reasons for Cache Misses <ul><li>Misses due to competing cache entries:   a cache entry assigned to two pieces of data </l...
Design Issues of Caching <ul><li>How is a cache entry lookup performed? </li></ul><ul><li>Which cache entry should be repl...
Caching Applied to Address Translation <ul><li>Process references the same page repeatedly </li></ul><ul><ul><li>Translati...
Caching Applied to Address Translation Virtual  addresses Physical addresses Data reads or writes (untranslated) TLB Trans...
Example of the TLB Content Valid, rw 4 0 Invalid - - Valid, rw 1 2 Control bits Physical page number (PPN) Virtual page nu...
TLB Lookups <ul><li>Sequential search of the TLB table </li></ul><ul><li>Direct mapping:   assigns each virtual page to a ...
Direct Mapping <ul><li>if (TLB[UpperBits(vpn)].vpn == vpn) { </li></ul><ul><li>return TLB[UpperBits(vpn)].ppn; </li></ul><...
Direct Mapping <ul><li>When use only high order bits </li></ul><ul><ul><li>Two pages may compete for the same TLB entry </...
TLB Lookups <ul><li>Sequential search of the TLB table </li></ul><ul><li>Direct mapping:   assigns each virtual page to a ...
Two-Way Associative Cache VPN VPN PPN VPN PPN VPN PPN VPN PPN VPN PPN VPN PPN hash = =
Two-Way Associative Cache VPN VPN PPN VPN PPN VPN PPN VPN PPN VPN PPN VPN PPN hash = =
Two-Way Associative Cache VPN If miss, translate and replace one of the entries VPN PPN VPN PPN VPN PPN VPN PPN VPN PPN VP...
TLB Lookups <ul><li>Direct mapping:   assigns each virtual page to a specific slot in the TLB </li></ul><ul><ul><li>e.g., ...
Fully Associative Cache VPN VPN PPN VPN PPN VPN PPN hash = = =
Fully Associative Cache VPN VPN PPN VPN PPN VPN PPN hash = = =
Fully Associative Cache VPN VPN PPN VPN PPN VPN PPN If miss, translate and replace one of the entries hash = = =
TLB Lookups <ul><li>Typically </li></ul><ul><ul><li>TLBs are small and fully associative </li></ul></ul><ul><ul><li>Hardwa...
Replacement of TLB Entries <ul><li>Direct mapping </li></ul><ul><ul><li>Entry replaced whenever a VPN mismatches </li></ul...
Replacement of TLB Entries <ul><li>Hardware-level </li></ul><ul><ul><li>TLB replacement is mostly random </li></ul></ul><u...
Consistency Between TLB and Page Tables <ul><li>Different processes have different page tables </li></ul><ul><ul><li>TLB e...
Relationship Between TLB and HW Memory Caches <ul><li>We can extend the principle of TLB </li></ul><ul><li>Virtually addre...
Relationship Between TLB and HW Memory Caches Data reads or writes (untranslated) VA data VA data Virtually  addressed cac...
Two Ways to Commit Data Changes <ul><li>Write-through:   immediately propagates update through various levels of caching <...
Upcoming SlideShare
Loading in …5
×

Lecture 14 Caching And Tlbs

2,371 views

Published on

Published in: Technology
  • Be the first to comment

Lecture 14 Caching And Tlbs

  1. 1. Caching and TLBs Andy Wang Operating Systems COP 4610 / CGS 5765
  2. 2. Caching <ul><li>Store copies of data at places that can be accessed more quickly than accessing the original </li></ul><ul><ul><li>Speed up access to frequently used data </li></ul></ul><ul><ul><li>At a cost: Slows down the infrequently used data </li></ul></ul>
  3. 3. Caching in Memory Hierarchy <ul><li>Provides the illusion of GB storage </li></ul><ul><ul><li>With register access time </li></ul></ul>$0.001/MB < 200 GB 5-50 msec Disk Secondary memory $0.1/MB < 4GB 1-4 clock cycles Main memory <10 MB 1-2 clock cycles Cache On chip ~500 bytes 1 clock cycle Registers Primary memory Cost Size Access Time
  4. 4. Caching in Memory Hierarchy <ul><li>Exploits two hardware characteristics </li></ul><ul><ul><li>Smaller memory provides faster access times </li></ul></ul><ul><ul><li>Large memory provides cheaper storage per byte </li></ul></ul><ul><li>Puts frequently accessed data in small, fast, and expensive memory </li></ul><ul><li>Assumption: non-random program access behaviors </li></ul>
  5. 5. Locality in Access Patterns <ul><li>Temporal locality: recently referenced locations are more likely to be referenced in the near future </li></ul><ul><ul><li>e.g., files </li></ul></ul><ul><li>Spatial locality: referenced locations tend to be clustered </li></ul><ul><ul><li>e.g., listing all files under a directory </li></ul></ul>
  6. 6. Caching <ul><li>Storing a small set of data in cache </li></ul><ul><ul><li>Provides the following illusions </li></ul></ul><ul><ul><ul><li>Large storage </li></ul></ul></ul><ul><ul><ul><li>Speed of small cache </li></ul></ul></ul><ul><li>Does not work well for programs with little localities </li></ul><ul><ul><li>e.g., scanning the entire disk </li></ul></ul><ul><ul><ul><li>Leaves behind cache content with no localities ( cache pollution ) </li></ul></ul></ul>
  7. 7. Generic Issues in Caching <ul><li>Effective metrics </li></ul><ul><ul><li>Cache hit: a lookup is resolved by the content stored in cache </li></ul></ul><ul><ul><li>Cache miss: a lookup cannot be resolved by the content stored in cache </li></ul></ul><ul><li>Effective access time: </li></ul><ul><ul><li>P(hit)*(hit_cost) + P(miss)*(miss_cost) </li></ul></ul>
  8. 8. Effective Access Time <ul><li>Cache hit rate: 99% </li></ul><ul><ul><li>Cost: 2 clock cycles </li></ul></ul><ul><li>Cache miss rate: 1% </li></ul><ul><ul><li>Cost: 4 clock cycles </li></ul></ul><ul><li>Effective access time: </li></ul><ul><ul><li>99%*2 + 1%*(2 + 4) </li></ul></ul><ul><ul><ul><li>= 1.98 + 0.06 = 2.04 (clock cycles) </li></ul></ul></ul>
  9. 9. Implications <ul><li>10 MB of cache </li></ul><ul><ul><li>Illusion of 4 GB of memory </li></ul></ul><ul><ul><li>Running at the speed of hardware cache </li></ul></ul>
  10. 10. Reasons for Cache Misses <ul><li>Compulsory misses: data brought into the cache for the first time </li></ul><ul><ul><li>e.g., booting </li></ul></ul><ul><li>Capacity misses: caused by the limited size of a cache </li></ul><ul><ul><li>A program may require a hash table that exceeds the cache capacity </li></ul></ul><ul><ul><ul><li>Random access pattern </li></ul></ul></ul><ul><ul><ul><li>No caching policy can be effective </li></ul></ul></ul>
  11. 11. Reasons for Cache Misses <ul><li>Misses due to competing cache entries: a cache entry assigned to two pieces of data </li></ul><ul><ul><li>When both active </li></ul></ul><ul><ul><li>Each will preempt the other </li></ul></ul><ul><li>Policy misses: caused by cache replacement policy, which chooses which cache entry to replace when the cache is full </li></ul>
  12. 12. Design Issues of Caching <ul><li>How is a cache entry lookup performed? </li></ul><ul><li>Which cache entry should be replaced when the cache is full? </li></ul><ul><li>How to maintain consistency between the cache copy and the real data? </li></ul>
  13. 13. Caching Applied to Address Translation <ul><li>Process references the same page repeatedly </li></ul><ul><ul><li>Translating each virtual address to physical address is wasteful </li></ul></ul><ul><li>Translation lookaside buffer (TLB) </li></ul><ul><ul><li>Track frequently used translations </li></ul></ul><ul><ul><li>Avoid translations in the common case </li></ul></ul>
  14. 14. Caching Applied to Address Translation Virtual addresses Physical addresses Data reads or writes (untranslated) TLB Translation table In TLB
  15. 15. Example of the TLB Content Valid, rw 4 0 Invalid - - Valid, rw 1 2 Control bits Physical page number (PPN) Virtual page number (VPN)
  16. 16. TLB Lookups <ul><li>Sequential search of the TLB table </li></ul><ul><li>Direct mapping: assigns each virtual page to a specific slot in the TLB </li></ul><ul><ul><li>e.g., use upper bits of VPN to index TLB </li></ul></ul>
  17. 17. Direct Mapping <ul><li>if (TLB[UpperBits(vpn)].vpn == vpn) { </li></ul><ul><li>return TLB[UpperBits(vpn)].ppn; </li></ul><ul><li>} else { </li></ul><ul><li>ppn = PageTable[vpn]; </li></ul><ul><li>TLB[UpperBits(vpn)].control = INVALID; </li></ul><ul><li>TLB[UpperBits(vpn)].vpn = vpn; </li></ul><ul><li>TLB[UpperBits(vpn)].ppn = ppn; </li></ul><ul><li>TLB[UpperBits(vpn)].control = VALID | RW </li></ul><ul><li>return ppn; </li></ul><ul><li>} </li></ul>
  18. 18. Direct Mapping <ul><li>When use only high order bits </li></ul><ul><ul><li>Two pages may compete for the same TLB entry </li></ul></ul><ul><ul><ul><li>May toss out needed TLB entries </li></ul></ul></ul><ul><li>When use only low order bits </li></ul><ul><ul><li>TLB reference will be clustered </li></ul></ul><ul><ul><ul><li>Failing to use full range of TLB entries </li></ul></ul></ul><ul><li>Common approach: combine both </li></ul>
  19. 19. TLB Lookups <ul><li>Sequential search of the TLB table </li></ul><ul><li>Direct mapping: assigns each virtual page to a specific slot in the TLB </li></ul><ul><ul><li>e.g., use upper bits of VPN to index TLB </li></ul></ul><ul><li>Set associativity: use N TLB banks to perform lookups in parallel </li></ul>
  20. 20. Two-Way Associative Cache VPN VPN PPN VPN PPN VPN PPN VPN PPN VPN PPN VPN PPN hash = =
  21. 21. Two-Way Associative Cache VPN VPN PPN VPN PPN VPN PPN VPN PPN VPN PPN VPN PPN hash = =
  22. 22. Two-Way Associative Cache VPN If miss, translate and replace one of the entries VPN PPN VPN PPN VPN PPN VPN PPN VPN PPN VPN PPN hash = =
  23. 23. TLB Lookups <ul><li>Direct mapping: assigns each virtual page to a specific slot in the TLB </li></ul><ul><ul><li>e.g., use upper bits of VPN to index TLB </li></ul></ul><ul><li>Set associativity: use N TLB banks to perform lookups in parallel </li></ul><ul><li>Fully associative cache: allows looking up all TLB entries in parallel </li></ul>
  24. 24. Fully Associative Cache VPN VPN PPN VPN PPN VPN PPN hash = = =
  25. 25. Fully Associative Cache VPN VPN PPN VPN PPN VPN PPN hash = = =
  26. 26. Fully Associative Cache VPN VPN PPN VPN PPN VPN PPN If miss, translate and replace one of the entries hash = = =
  27. 27. TLB Lookups <ul><li>Typically </li></ul><ul><ul><li>TLBs are small and fully associative </li></ul></ul><ul><ul><li>Hardware caches use direct mapped or set-associative cache </li></ul></ul>
  28. 28. Replacement of TLB Entries <ul><li>Direct mapping </li></ul><ul><ul><li>Entry replaced whenever a VPN mismatches </li></ul></ul><ul><li>Associative caches </li></ul><ul><ul><li>Random replacement </li></ul></ul><ul><ul><li>LRU (least recently used) </li></ul></ul><ul><ul><li>MRU (most recently used) </li></ul></ul><ul><ul><li>Depending on reference patterns </li></ul></ul>
  29. 29. Replacement of TLB Entries <ul><li>Hardware-level </li></ul><ul><ul><li>TLB replacement is mostly random </li></ul></ul><ul><ul><ul><li>Simple and fast </li></ul></ul></ul><ul><li>Software-level </li></ul><ul><ul><li>Memory page replacements are more sophisticated </li></ul></ul><ul><ul><li>CPU cycles vs. cache hit rate </li></ul></ul>
  30. 30. Consistency Between TLB and Page Tables <ul><li>Different processes have different page tables </li></ul><ul><ul><li>TLB entries need to be invalidated on context switches </li></ul></ul><ul><ul><li>Alternatives: </li></ul></ul><ul><ul><ul><li>Tag TLB entries with process IDs </li></ul></ul></ul><ul><ul><ul><li>Additional cost of hardware and comparisons per lookup </li></ul></ul></ul>
  31. 31. Relationship Between TLB and HW Memory Caches <ul><li>We can extend the principle of TLB </li></ul><ul><li>Virtually addressed cache: between the CPU and the translation tables </li></ul><ul><li>Physically addressed cache: between the translation tables and the main memory </li></ul>
  32. 32. Relationship Between TLB and HW Memory Caches Data reads or writes (untranslated) VA data VA data Virtually addressed cache TLB PA data PA data Physically addressed cache PA data PA data PA data PA data Translation tables
  33. 33. Two Ways to Commit Data Changes <ul><li>Write-through: immediately propagates update through various levels of caching </li></ul><ul><ul><li>For critical data </li></ul></ul><ul><li>Write-back: delays the propagation until the cached item is replaced </li></ul><ul><ul><li>Goal: spread the cost of update propagation over multiple updates </li></ul></ul><ul><ul><li>Less costly </li></ul></ul>

×