Cache-partitioning

1,780
-1

Published on

Presentation slides from WIOSCA 2007.

Published in: Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,780
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
30
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Cache-partitioning

  1. 1. Managing Shared L2 Caches on Multicore Systems in Software David Tam, Reza Azimi, Livio Soares, Michael Stumm University of Toronto {tamda, azimi, livio, stumm}@eecg.toronto.edu WIOSCA 2007 Software Partitioning 1
  2. 2. Multicores Today Core Hardware Thread Shared L2 Cache Shared L2 Issues: Uncontrolled sharing threads compete for space ● LRU eviction policy interference Software Partitioning 2
  3. 3. Uncontrolled Sharing is Bad Dual Core Power5 100 100 100 89 87 90 80 72 70 57 60 IPC (%) mcf 50 art 40 30 20 10 Our System, 0 In Software Single-Prg Multi-Prg, Multi-Prg, Uncontrolled Controlled Hardware partitioning mechanisms: ● [Qureshi06],[Rafique06],[Chandra05],[Iyer04],[Suh04] ● Not available today ● Software Partitioning 3
  4. 4. Software Partitioning Implemented on real system ● Apply classic page-coloring technique ● [Cho06],[Bershad94],[Lynch92],[Kessler92] ● Guided phys pages allocation ● controlled L2 cache line usage Virtual Pages Process A Physical Pages Color A L2 Cache } Color A Color B } Color B Virtual Pages Physically Process B Color A Indexed Color B { Fixed Mapping { (Hardware) OS Managed Software Partitioning 4
  5. 5. Software-Based Properties Deployable right now ● OS, VMM ● Flexible ● Can allocate for specific: users, applications, processes, ● threads Unaffected by: # threads, location ● MySQL M A A MM MA MM AAAA A MMM Apache A 25% 75% Apache MySql (4 colors) (12 colors) MMA A MMMA MMMM A A MM Software Partitioning 5
  6. 6. Linux Implementation Color-aware physical page allocation ● Minimal changes to Linux ● List of local free physical pages: ● Single list Old: (per logical CPU) To find target color quickly: ● Multiple lists 0 1 New: (per logical CPU) 15 Looks like single list to rest of Linux ● Software Partitioning 6
  7. 7. Experimental Setup Power5 Victim 1.9MB L2 14 cycles 36MB 90 cycles Local Remote 4 GB 4 GB 280 cycles 310 cycles 16 partitions (colors): ● 120 KB each (L2), 2.25 MB each (L3) ● Linux 2.6.15 ● SPECcpu2000, SPECjbb2000 ● Software Partitioning 7
  8. 8. Overview of Results Show ability to control cache usage ● Single-programmed ● Show benefits of partitioning ● Multi-programmed ● Metric selection for performance analysis ● Software Partitioning 8
  9. 9. Overview of Results Show ability to control cache usage ● Single-programmed ● Show benefits of partitioning ● Multi-programmed ● Metric selection for performance analysis ● Software Partitioning 9
  10. 10. Controlling Cache Usage Mechanism can control cache usage ● Single-programmed ● Software Partitioning 10
  11. 11. Overview of Results Show ability to control cache usage ● Single-programmed ● Show benefits of partitioning ● Multi-programmed ● Metric selection for performance analysis ● Software Partitioning 11
  12. 12. Benefits of Partitioning Multiprogrammed (1 application per core) ● ● Base: multi-prg, no partitioning Software Partitioning 12
  13. 13. Benefits of Partitioning Software Partitioning 13
  14. 14. Mechanism Issues Inherent: L3 victim cache partitioned correspondingly ● Super-pages reduce # of partitions ● Time granularity ● Allocating more/less partitions ● Current Implementation: Static partitioning ● ● Future work: dynamic partitioning Software Partitioning 14
  15. 15. Overview of Results Show ability to control cache usage ● Single-programmed ● Show benefits of partitioning ● Multi-programmed ● Metric selection for performance analysis ● Software Partitioning 15
  16. 16. Experiences with Metrics Explaining & predicting performance ● Obvious Metric: L2 Miss Rate Curve (MRC) Rate = per billion cycles ● Miss rate ignores cost variations [Moreto07],[Qureshi06] ● ● Miss latencies ● Memory level parallelism [Qureshi06] Alternative Metric: Stall Rate Curve (SRC) Stall rate more inclusive [McGregor05] ● ● Instruction retirement stall ● Caused by memory latencies ● Directly measures pipeline degradation Measurable by hardware performance counters ● Software Partitioning 16
  17. 17. L2 MRC Failure in Art Single-programmed art ● ● L2 MRC does not show trend ● SRC shows trend ● (per billion cycles) D-Cache SRC Exec Time L2 MRC (millions) L1 data cache miss SRC is adequate ● Software Partitioning 17
  18. 18. L2 MRC Failure in Art Single-programmed art ● L3 and main memory hit rates explain trend ● (per billion cycles) ● L3 Hit Rate Main Memory Hit Rate D-Cache SRC (millions) Software Partitioning 18
  19. 19. Conclusions Software partitioning is effective ● Up to 17% improvement ● Deployable today ● Stall metric is better than miss metric ● More accurate for performance analysis ● Measurable on real system ● Work In Progress More workloads ● Dynamically allocate more/less partitions ● Dynamically determine optimal # of partitions ● ● Using hardware performance counters Software Partitioning 19
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×