Multicore Programming
Agenda Part 1 - Current state of affairs Part 2 - Multithreaded algorithms Part 3 – Task Parallel Library
Multicore ProgrammingPart 1: Current state of affairs
Why Moores law is not workinganymore Power consumption Wire delays DRAM access latency Diminishing returns of more ins...
Power consumption                                                                                             Sun‟s Surfac...
Wire delays
DRAM access latency
Diminishing returns 80‟s   10 CPI  1 CPI 90   1 CPI  0.5CPI 00‟s: multicore
The Free Lunch Is Over. AFundamental Turn TowardConcurrency in Software                        Herb Sutter
Survival To scale performance, put many processing cores on the  microprocessor chip New Moore’s law edition is about do...
Quotations No matter how fast processors get, software  consistently finds new ways to eat up the extra speed If you hav...
What keeps us away from multicore Sequential way of thinking Believe that parallel programming is difficult and  error-p...
What have been done Many frameworks have been created, that brings  parallelism at application level. Vendors hardly tri...
Multicore ProgrammingPart 2: Multithreaded algorithms
Chapter 27 MultithreadedAlgorithms
Multithreaded algorithms No single architecture of parallel  computer  no single and wide  accepted model of parallel  c...
Dynamic multithreaded model(DMM) Allows programmer to operate with “logical  parallelism” without worrying about any issu...
DMM - advantages Simple extension of “serial model”. Only 3 new  keywords: parallel, spawn and sync. Provides theoretica...
Multithreaded execution model
Work
Span
Speedup
Parallelism
Performance summary
Example: fib(4)
Scheduler role
Analyzing MT algorithms: MatrixmultiplicationP-Square-Matrix-Multiply:1. n = a.rows2. let C be new NxN matrix3. parallel f...
Analyzing MT algorithms: Matrixmultiplication
Chess Lesson
Multicore ProgrammingPart 2: Task Parallel Library
TPL building blocks Consist of:  - Tasks  - Tread Safe Scalable Collections  - Phases and Work Exchange  - Partitioning  ...
Data parallelismParallel.ForEach(letters, ch => Capitalize(ch));
Task parallelismParallel.Invoke(() => Average(), () => Minimum()…);
Thread Pool in .net 3.5
Thread Pool in .NET 4.0
Task Scheduler & Thread pool 3.5 ThreadPool.QueueUserWorkItem disadvantages:  Zero information about each work item  Fa...
New Primitives Thread-safe, scalable collections        AggregateException   IProducerConsumerCollection<T>      Initi...
References The Free Lunch Is Over: A Fundamental Turn    Toward Concurrency in Software   MIT Introduction to algorithms...
Upcoming SlideShare
Loading in …5
×

Multicore programmingandtpl

425 views

Published on

0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
425
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
15
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Multicore programmingandtpl

  1. 1. Multicore Programming
  2. 2. Agenda Part 1 - Current state of affairs Part 2 - Multithreaded algorithms Part 3 – Task Parallel Library
  3. 3. Multicore ProgrammingPart 1: Current state of affairs
  4. 4. Why Moores law is not workinganymore Power consumption Wire delays DRAM access latency Diminishing returns of more instruction-level parallelism
  5. 5. Power consumption Sun‟s Surface 10,000 1,000 Rocket NozzlePower Density (W/cm2) 100 Nuclear Reactor 10 Pentium® processors Hot Plate 1 8080 486 386 „70 „80 ‟90 ‟00 „10 Intel Developer Forum, Spring 2004 - Pat Gelsinger
  6. 6. Wire delays
  7. 7. DRAM access latency
  8. 8. Diminishing returns 80‟s  10 CPI  1 CPI 90  1 CPI  0.5CPI 00‟s: multicore
  9. 9. The Free Lunch Is Over. AFundamental Turn TowardConcurrency in Software Herb Sutter
  10. 10. Survival To scale performance, put many processing cores on the microprocessor chip New Moore’s law edition is about doubling of cores.
  11. 11. Quotations No matter how fast processors get, software consistently finds new ways to eat up the extra speed If you haven‟t done so already, now is the time to take a hard look at the design of your application, determine what operations are CPU- sensitive now or are likely to become so soon, and identify how those places could benefit from concurrency.” -- Herb Sutter, C++ Architect at Microsoft (March2005) After decades of single core processors, the high volume processor industry has gone from single to dual to quad-core in just the last two years. Moore‟s Law scaling should easily let us hit the 80-core mark in mainstream processors within the next ten years and quite possibly even less. -- Justin Rattner, CTO, Intel (February
  12. 12. What keeps us away from multicore Sequential way of thinking Believe that parallel programming is difficult and error-prone Unwilling to accept the fact that sequential era is over Neglecting performance
  13. 13. What have been done Many frameworks have been created, that brings parallelism at application level. Vendors hardly tries to teach programming community how to write parallel programs MIT and other education centers did a lot of researches in this area
  14. 14. Multicore ProgrammingPart 2: Multithreaded algorithms
  15. 15. Chapter 27 MultithreadedAlgorithms
  16. 16. Multithreaded algorithms No single architecture of parallel computer  no single and wide accepted model of parallel computing We rely on parallel shared memory computer
  17. 17. Dynamic multithreaded model(DMM) Allows programmer to operate with “logical parallelism” without worrying about any issues of static programming Two main features are:  Nested parallelism (parent can proceed while spawned child is computing its result)  Parallel loop (iteration of the loop can execute concurrently)
  18. 18. DMM - advantages Simple extension of “serial model”. Only 3 new keywords: parallel, spawn and sync. Provides theoretically clean way of quantify parallelism based on notions of “work” and “span” Many MT algorithms based on nested parallelism a naturally follows from divide and conquer approach
  19. 19. Multithreaded execution model
  20. 20. Work
  21. 21. Span
  22. 22. Speedup
  23. 23. Parallelism
  24. 24. Performance summary
  25. 25. Example: fib(4)
  26. 26. Scheduler role
  27. 27. Analyzing MT algorithms: MatrixmultiplicationP-Square-Matrix-Multiply:1. n = a.rows2. let C be new NxN matrix3. parallel for i = 1 to n4. parallel for j = 1 to n5. Cij = 06. for k 1 to n7. Cij= Cij + Aik * B kj
  28. 28. Analyzing MT algorithms: Matrixmultiplication
  29. 29. Chess Lesson
  30. 30. Multicore ProgrammingPart 2: Task Parallel Library
  31. 31. TPL building blocks Consist of: - Tasks - Tread Safe Scalable Collections - Phases and Work Exchange - Partitioning - Looping - Control - Breaking - Exceptions - Results
  32. 32. Data parallelismParallel.ForEach(letters, ch => Capitalize(ch));
  33. 33. Task parallelismParallel.Invoke(() => Average(), () => Minimum()…);
  34. 34. Thread Pool in .net 3.5
  35. 35. Thread Pool in .NET 4.0
  36. 36. Task Scheduler & Thread pool 3.5 ThreadPool.QueueUserWorkItem disadvantages:  Zero information about each work item  Fairness FIFO queue maintain Improvements:  More efficient FIFO queue (ConcurrentQueue)  Enhance the API to get more information from user  Task  Work stealing  Threads injections  Wait completion, handling exceptions, getting computation result
  37. 37. New Primitives Thread-safe, scalable collections  AggregateException  IProducerConsumerCollection<T>  Initialization  ConcurrentQueue<T>  Lazy<T>  ConcurrentStack<T>  LazyInitializer.EnsureInitialized<T>  ConcurrentBag<T>  ThreadLocal<T>  ConcurrentDictionary<TKey,TValu e>  Locks  ManualResetEventSlim Phases and work exchange  SemaphoreSlim  Barrier  SpinLock  BlockingCollection<T>  SpinWait  CountdownEvent  Cancellation Partitioning  CancellationToken{Source}  {Orderable}Partitioner<T>  Partitioner.Create Exception handling
  38. 38. References The Free Lunch Is Over: A Fundamental Turn Toward Concurrency in Software MIT Introduction to algorithms video lectures Chapter 27 Multithreaded Algorithms from Introduction to algorithms 3rd edition CLR 4.0 ThreadPool Improvements: Part 1 Multicore Programming Primer ThreadPool on Channel 9

×