thread-clustering

Loading...

Flash Player 9 (or above) is needed to view presentations.
We have detected that you do not have it on your computer. To install it, go here.

0 comments

Post a comment

    Post a comment
    Embed Video
    Edit your comment Cancel

    2 Favorites

    thread-clustering - Presentation Transcript

    1. Thread Clustering: Sharing-Aware Thread Scheduling on SMP-CMP-SMT Multiprocessors David Tam, Reza Azimi, Michael Stumm University of Toronto {tamda, azimi, stumm}@eecg.toronto.edu Thread Clustering
    2. Multiprocessors Today Example: IBM Power 5 system 1 Thread Clustering
    3. Multiprocessors Today Example: IBM Power 5 system SMP CMP SMT SHARED CACHE 1 Thread Clustering
    4. Multiprocessors Today Example: IBM Power 5 system Disparity in L2 latencies 120 14 1 Thread Clustering
    5. Operating Systems Today CPU Schedulers: Ignore disparity in L2 latencies ● Ignore data sharing among threads ● Distribute threads poorly ● Cross-chip traffic ● Remote L2 cache accesses ● Causes performance problem ● 2 Thread Clustering
    6. Our Goal: Sharing-Aware Scheduling Detect sharing patterns ● Cluster threads ● Benefits: Decrease cross-chip traffic ● Increase on-chip cache locality ● Exploit shared L2 caches ● 3 Thread Clustering
    7. Our Online Technique STEPS: 1) Monitor remote cache access rate 2) Detect thread sharing patterns REPEAT 3) Determine thread clusters 4) Migrate thread clusters 4 Thread Clustering
    8. Sharing Detection To observe remote cache accesses: ● Exploit HPCs (hardware performance counters) ● Sample remote cache miss addresses ● Local cache misses satisfied by remote cache ● IBM Power 5 continuous data sampling ● 1 X 5 Thread Clustering
    9. Sharing Detection To observe remote cache accesses: ● Exploit HPCs (hardware performance counters) ● Sample remote cache miss addresses ● Local cache misses satisfied by remote cache ● IBM Power 5 continuous data sampling ● X 2 5 Thread Clustering
    10. Sharing Detection To observe remote cache accesses: ● Exploit HPCs (hardware performance counters) ● Sample remote cache miss addresses ● Local cache misses satisfied by remote cache ● IBM Power 5 continuous data sampling ● 3 5 Thread Clustering
    11. Sharing Signatures Construct for each thread ● Counts remote cache accesses ● Conceptually virtual address virtual address 264 0 block 8-bit counter 6 Thread Clustering
    12. Sharing Signatures Construct for each thread ● Counts remote cache accesses ● Conceptually virtual address virtual address 264 0 ctri++ block 8-bit counter 6 Thread Clustering
    13. Optimizations CPU: Temporal Sampling ● Sample every Nth remote cache access ● Memory: Spatial Sampling ● 256-entry vector ● Hash function ● Block ID filter ● Vectors still effective at indicating sharing ● 7 Thread Clustering
    14. Spatial Sampling Hash collision & alias removal ● Filter Legend Empty Reserved Block ID 255 0 255 0 8 Thread Clustering
    15. Spatial Sampling Hash collision & alias removal ● Filter Legend Empty Reserved hash EMPTY Block ID 255 0 255 0 8 Thread Clustering
    16. Spatial Sampling Hash collision & alias removal ● Filter Legend Empty Reserved hash (First-Come-First-Reserved) Block ID 255 0 255 0 8 Thread Clustering
    17. Spatial Sampling Hash collision & alias removal ● Filter Legend Empty Reserved hash MATCH Block ID Block ID 255 0 255 0 8 Thread Clustering
    18. Spatial Sampling Hash collision & alias removal ● Filter Legend Empty Reserved hash MISMATCH Block ID Block ID 255 0 ALIASING PREVENTED 255 0 8 Thread Clustering
    19. Automated Clustering Clustering Heuristic: Simple, one-pass algorithm ● Compare vector against existing clusters ● If not similar, create a new cluster ● Similarity Metric: N ∑ V1[i] * V2[i] i=0 Shared blocks amplified ● Non-shared blocks nullified ● 9 Thread Clustering
    20. Experimental Platform 8-way Power 5, 1.5GHz ● Linux 2.6 ● IBM J2SE 5.0 JVM ● 1.9MB L2 1.9MB L2 36MB 36MB 4 GB 4 GB 10 Thread Clustering
    21. Workloads Microbenchmark expect 4 clusters ● 4 threads per cluster ● SPECjbb2000 (modified) expect 2 clusters ● 2 warehouses, 8 threads per warehouse ● RUBiS + MySQL expect 2 clusters ● 2 databases, 16 threads per database ● VolanoMark chat server expect 2 clusters ● 2 rooms, 8 threads per room ● 11 Thread Clustering
    22. Visualizing Clusters Counter Values An example 255 ● 128 64 0 { Cluster A, 4 vectors { Cluster B, 4 vectors 12 Thread Clustering
    23. Visualizing Clusters Counter Values An example 255 ● 128 64 0 { Cluster A, 4 vectors { Cluster B, 4 vectors 12 Thread Clustering
    24. Visualizing Clusters Counter Values An example 255 ● 128 64 0 { Cluster A, 4 vectors { Cluster B, 4 vectors 12 Thread Clustering
    25. Visualizing Clusters Microbenchmark ● { 4 vectors 13 Thread Clustering
    26. Visualizing Clusters Modified SPECjbb2000 (4 warehouses) ● { 16 vectors 14 Thread Clustering
    27. Visualizing Clusters RUBiS + MySQL (2 databases) ● { 24 vectors 15 Thread Clustering
    28. Visualizing Clusters VolanoMark (4 rooms) ● 16 Thread Clustering
    29. Remote Cache Impact Normalized to default Linux ● 90 70 72 43 32 22 9 2 -17 17 Thread Clustering
    30. Performance Impact IPC: instructions per cycle ● Normalized to default Linux ● 7.4 7.4 7.1 6.1 6.1 5.1 5.0 3.7 -0.8 18 Thread Clustering
    31. Summary AFTER: BEFORE: Operating System With Current Operating Systems Thread Clustering 19 Thread Clustering
    32. Conclusions HPCs can detect sharing ● Sharing signatures are effective ● Automated thread clustering: ● Reduces remote cache access up to 70% ● Improves performance up to 7% ● All with low overhead ● Future Work: More workloads ● Improve clustering algorithm ● Integration with load-balancing aspects ● 20 Thread Clustering
    33. Thread Clustering
    34. Sampling Overhead Modified SPECjbb2000 ● Thread Clustering
    SlideShare Zeitgeist 2009

    + davidkftamdavidkftam Nominate

    custom

    187 views, 2 favs, 0 embeds more stats

    Presentation slides from EuroSys 2007.

    More info about this document

    © All Rights Reserved

    Go to text version

    • Total Views 187
      • 187 on SlideShare
      • 0 from embeds
    • Comments 0
    • Favorites 2
    • Downloads 3
    Most viewed embeds

    more

    All embeds

    less

    Flagged as inappropriate Flag as inappropriate
    Flag as inappropriate

    Select your reason for flagging this presentation as inappropriate. If needed, use the feedback form to let us know more details.

    Cancel
    File a copyright complaint
    Having problems? Go to our helpdesk?

    Categories