CSTalks - The Multicore Midlife Crisis - 30 Mar

  • 547 views
Uploaded on

 

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
547
On Slideshare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
6
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. The Multicore Midlife Crisis Bogdan Marius Tudor CSTalks 30 March 2011
  • 2. Outline•  The Memory Problem•  Do We Need All These Cores?•  Tomorrow’s Multicore•  Research Perspective5/4/11 2
  • 3. Remember Single Core? Wikipedia5/4/11 3
  • 4. My Next Processors 4000 3000Cache Size [kB] 2000 1000 0 66 200 1000 2250 1600 2400 2400 MHz MHz MHz MHz MHz MHz MHz Apr-94 Apr-98 Nov-01 May-04 Jul-06 Jul-08 Mar-11 5/4/11 4
  • 5. My Next Processors 4000 3000Cache Size [kB] 2000 1000 0 66 200 1000 2250 1600 2400 2400 MHz MHz MHz MHz MHz MHz MHz Apr-94 Apr-98 Nov-01 May-04 Jul-06 Jul-08 Mar-11 5/4/11 5
  • 6. So What?Yeap, they improved the cache size. Do I care?The interesting part is why they did it.5/4/11 6
  • 7. The Memory Problem•  Moore’s Law: the number Processor of transistors double Core Core Core Core every 18 months –  Singlecore: new transistors = faster speed –  Multicore: new transistors Cache = more cores•  Memory speed increase Memory does not obey Moore’s Law!5/4/11 7
  • 8. The Memory Problem•  Problem: More cores compete for same slow memory!•  Implications: IF IF ID Queue ID ID X Stalled! M access to cache or RAM W J 5 cycles L > 100 cycles5/4/11 8
  • 9. The Memory Problem•  Problem: More cores compete for same slow memory!•  Solution: Increase cache size J –  Maintain cache hit rate •  2x cache hit rate requires 4x cache size •  Exponential increase in #transistors need –  Cache coherence overhead5/4/11 9
  • 10. Increasing Cache Size Not practical! B. M. Rogers et al. Scaling the bandwidth wall: challenges in and avenues for CMP scaling. ISCA 20095/4/11 10
  • 11. Other Approaches•  Improve memory speed –  Slow, power-hungry and error-prone•  Better caching•  Improve memory bandwidth –  Latency tradeoff•  Prefetch –  Mixed blessings•  Allow more in-flight requests5/4/11 11
  • 12. Do We Need All These Cores?•  Average utilization: < 20%•  We don’t have too many parallel apps•  We just have enough compute power•  Until you try to encode an HD video –  Star Trek holodecks: not there yet•  CPU vendors still have to make a living5/4/11 12
  • 13. Tomorrow’s Multicore Intel5/4/11 13
  • 14. Tomorrow’s Multicore•  Intel Core i3, i5, i7 –  Video is integrated into CPU –  Must balance sequential and parallel performance –  Lower energy requirements than prev. generations•  Heterogeneous cores –  Many, slow, good at floating points –  Some general purpose cores –  “Combine” cores into super-cores•  Must live with the memory problems5/4/11 14
  • 15. Tomorrow’s Multicore•  The number of cores is becoming less important –  They can’t keep increasing them –  i3, i5, i7: how many cores each?5/4/11 15
  • 16. Tomorrow’s Multicore Wikipedia5/4/11 16
  • 17. Tomorrow’s Multicore•  The number of cores is becoming less important –  They can’t keep increasing them –  i3, i5, i7: how many cores each?•  Important is what the system provides –  FLOP intensive: GPU-style cores –  I/O intensive: FAWN (CMU) –  Memory intensive: Opteron/Xeon NUMA servers5/4/11 17
  • 18. A Research Perspective•  Coping with heterogeneity is hard –  Different degrees of parallelism have different sequential executions speeds –  Many tradeoffs: Speed vs. Energy vs. Memory intensity vs. I/O intensity•  Need models for heterogeneity –  Understand the cost of the applications in terms of FLOPS, INTOPS, memory, I/O etc.•  Silver lining: stick to sequential apps (?)5/4/11 18
  • 19. A Research Perspective•  Coping with slow memory•  Need to improve data locality by orders of magnitude •  Compiler support, auto-tunners etc.•  Space-efficient data types: •  HOT area in algo & systems •  Bloom filters: NSDI’10: 3 papers! •  Succinct data structures: STOC’08-STOC’10 •  Cache oblivious algorithms5/4/11 19
  • 20. A Research Perspective•  Software-helped cache coherence –  Or go without it J•  Renounce some programming patterns •  Java initializes all objects to some value… •  Rethink those hash tables•  Go for approximate solutions –  It’s better if you can provide error bounds5/4/11 20
  • 21. Discussion Thank you for your attention5/4/11 21