Successfully reported this slideshow.

Multicore programmingandtpl(.net day)

0

Share

Upcoming SlideShare
Multicore programmingandtpl
Multicore programmingandtpl
Loading in …3
×
1 of 38
1 of 38

More Related Content

Related Audiobooks

Free with a 14 day trial from Scribd

See all

Multicore programmingandtpl(.net day)

  1. 1. Multicore Programming Prepared by: Yan Druglaya e-mail: ydrugalya@gmail.com Twitter: @ydrugalya
  2. 2. Agenda  Part 1 - Current state of affairs  Part 2 - Multithreaded algorithms  Part 3 – Task Parallel Library
  3. 3. Multicore Programming Part 1: Current state of affairs
  4. 4. Why Moore's law is not working anymore  Power consumption  Wire delays  DRAM access latency  Diminishing returns of more instruction-level parallelism
  5. 5. Power consumption Sun‟s Surface 10,000 1,000 Rocket Nozzle Power Density (W/cm2) 100 Nuclear Reactor 10 Pentium® processors Hot Plate 1 8080 486 386 „70 „80 ‟90 ‟00 „10 Intel Developer Forum, Spring 2004 - Pat Gelsinger
  6. 6. Wire delays
  7. 7. DRAM access latency
  8. 8. Diminishing returns  80‟s  10 CPI  1 CPI  90  1 CPI  0.5CPI  00‟s: multicore
  9. 9. The Free Lunch Is Over. A Fundamental Turn Toward Concurrency in Software Herb Sutter
  10. 10. Survival  To scale performance, put many processing cores on the microprocessor chip  New Moore’s law edition is about doubling of cores.
  11. 11. Quotations  No matter how fast processors get, software consistently finds new ways to eat up the extra speed  If you haven‟t done so already, now is the time to take a hard look at the design of your application, determine what operations are CPU-sensitive now or are likely to become so soon, and identify how those places could benefit from concurrency.” -- Herb Sutter, C++ Architect at Microsoft (March 2005)  After decades of single core processors, the high volume processor industry has gone from single to dual to quad-core in just the last two years. Moore‟s Law scaling should easily let us hit the 80-core mark in mainstream processors within the next ten years and quite possibly even less. -- Justin Rattner, CTO, Intel (February 2007)
  12. 12. What keeps us away from multicore  Sequential way of thinking  Believe that parallel programming is difficult and error-prone  Unwilling to accept the fact that sequential era is over  Neglecting performance
  13. 13. What have been done  Many frameworks have been created, that brings parallelism at application level.  Vendors hardly tries to teach programming community how to write parallel programs  MIT and other education centers did a lot of researches in this area
  14. 14. Multicore Programming Part 2: Multithreaded algorithms
  15. 15. Chapter 27 Multithreaded Algorithms
  16. 16. Multithreaded algorithms  No single architecture of parallel computer  no single and wide accepted model of parallel computing  We rely on parallel shared memory computer
  17. 17. Dynamic multithreaded model(DMM)  Allows programmer to operate with “logical parallelism” without worrying about any issues of static programming  Two main features are:  Nested parallelism (parent can proceed while spawned child is computing its result)  Parallel loop (iteration of the loop can execute concurrently)
  18. 18. DMM - advantages  Simple extension of “serial model”. Only 3 new keywords: parallel, spawn and sync.  Provides theoretically clean way of quantify parallelism based on notions of “work” and “span”  Many MT algorithms based on nested parallelism a naturally follows from divide and conquer approach
  19. 19. Multithreaded execution model
  20. 20. Work 
  21. 21. Span 
  22. 22. Speedup 
  23. 23. Parallelism 
  24. 24. Performance summary 
  25. 25. Example: fib(4)
  26. 26. Scheduler role 
  27. 27. Analyzing MT algorithms: Matrix multiplication P-Square-Matrix-Multiply: 1. n = a.rows 2. let C be new NxN matrix 3. parallel for i = 1 to n 4. parallel for j = 1 to n 5. Cij = 0 6. for k 1 to n 7. Cij= Cij + Aik * B kj
  28. 28. Analyzing MT algorithms: Matrix multiplication 
  29. 29. Chess Lesson 
  30. 30. Multicore Programming Part 2: Task Parallel Library
  31. 31. TPL building blocks  Consist of: - Tasks - Tread Safe Scalable Collections - Phases and Work Exchange - Partitioning - Looping - Control - Breaking - Exceptions - Results
  32. 32. Data parallelism Parallel.ForEach(letters, ch => Capitalize(ch));
  33. 33. Task parallelism Parallel.Invoke(() => Average(), () => Minimum() …);
  34. 34. Thread Pool in .net 3.5
  35. 35. Thread Pool in .NET 4.0
  36. 36. Task Scheduler & Thread pool  3.5 ThreadPool.QueueUserWorkItem disadvantages:  Zero information about each work item  Fairness FIFO queue maintain  Improvements:  More efficient FIFO queue (ConcurrentQueue)  Enhance the API to get more information from user  Task  Work stealing  Threads injections  Wait completion, handling exceptions, getting computation result
  37. 37. New Primitives  Thread-safe, scalable collections  AggregateException  IProducerConsumerCollection<T>  Initialization  ConcurrentQueue<T>  Lazy<T>  ConcurrentStack<T>  LazyInitializer.EnsureInitialized<T>  ConcurrentBag<T>  ThreadLocal<T>  ConcurrentDictionary<TKey,TValu e>  Locks  ManualResetEventSlim  Phases and work exchange  SemaphoreSlim  Barrier  SpinLock  BlockingCollection<T>  SpinWait  CountdownEvent  Cancellation  Partitioning  CancellationToken{Source}  {Orderable}Partitioner<T>  Partitioner.Create  Exception handling
  38. 38. References  The Free Lunch Is Over: A Fundamental Turn Toward Concurrency in Software  MIT Introduction to algorithms video lectures  Chapter 27 Multithreaded Algorithms from Introduction to algorithms 3rd edition  CLR 4.0 ThreadPool Improvements: Part 1  Multicore Programming Primer  ThreadPool on Channel 9

×