Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Multicore programmingandtpl(.net day)

470 views

Published on

  • Be the first to comment

  • Be the first to like this

Multicore programmingandtpl(.net day)

  1. 1. Multicore Programming Prepared by: Yan Druglaya e-mail: ydrugalya@gmail.com Twitter: @ydrugalya
  2. 2. Agenda Part 1 - Current state of affairs Part 2 - Multithreaded algorithms Part 3 – Task Parallel Library
  3. 3. Multicore ProgrammingPart 1: Current state of affairs
  4. 4. Why Moores law is not workinganymore Power consumption Wire delays DRAM access latency Diminishing returns of more instruction-level parallelism
  5. 5. Power consumption Sun‟s Surface 10,000 1,000 Rocket NozzlePower Density (W/cm2) 100 Nuclear Reactor 10 Pentium® processors Hot Plate 1 8080 486 386 „70 „80 ‟90 ‟00 „10 Intel Developer Forum, Spring 2004 - Pat Gelsinger
  6. 6. Wire delays
  7. 7. DRAM access latency
  8. 8. Diminishing returns 80‟s  10 CPI  1 CPI 90  1 CPI  0.5CPI 00‟s: multicore
  9. 9. The Free Lunch Is Over. AFundamental Turn TowardConcurrency in Software Herb Sutter
  10. 10. Survival To scale performance, put many processing cores on the microprocessor chip New Moore’s law edition is about doubling of cores.
  11. 11. Quotations No matter how fast processors get, software consistently finds new ways to eat up the extra speed If you haven‟t done so already, now is the time to take a hard look at the design of your application, determine what operations are CPU-sensitive now or are likely to become so soon, and identify how those places could benefit from concurrency.” -- Herb Sutter, C++ Architect at Microsoft (March2005) After decades of single core processors, the high volume processor industry has gone from single to dual to quad-core in just the last two years. Moore‟s Law scaling should easily let us hit the 80-core mark in mainstream processors within the next ten years and quite possibly even less. -- Justin Rattner, CTO, Intel (February2007)
  12. 12. What keeps us away from multicore Sequential way of thinking Believe that parallel programming is difficult and error-prone Unwilling to accept the fact that sequential era is over Neglecting performance
  13. 13. What have been done Many frameworks have been created, that brings parallelism at application level. Vendors hardly tries to teach programming community how to write parallel programs MIT and other education centers did a lot of researches in this area
  14. 14. Multicore ProgrammingPart 2: Multithreaded algorithms
  15. 15. Chapter 27 MultithreadedAlgorithms
  16. 16. Multithreaded algorithms No single architecture of parallel computer  no single and wide accepted model of parallel computing We rely on parallel shared memory computer
  17. 17. Dynamic multithreaded model(DMM) Allows programmer to operate with “logical parallelism” without worrying about any issues of static programming Two main features are:  Nested parallelism (parent can proceed while spawned child is computing its result)  Parallel loop (iteration of the loop can execute concurrently)
  18. 18. DMM - advantages Simple extension of “serial model”. Only 3 new keywords: parallel, spawn and sync. Provides theoretically clean way of quantify parallelism based on notions of “work” and “span” Many MT algorithms based on nested parallelism a naturally follows from divide and conquer approach
  19. 19. Multithreaded execution model
  20. 20. Work
  21. 21. Span
  22. 22. Speedup
  23. 23. Parallelism
  24. 24. Performance summary
  25. 25. Example: fib(4)
  26. 26. Scheduler role
  27. 27. Analyzing MT algorithms: MatrixmultiplicationP-Square-Matrix-Multiply:1. n = a.rows2. let C be new NxN matrix3. parallel for i = 1 to n4. parallel for j = 1 to n5. Cij = 06. for k 1 to n7. Cij= Cij + Aik * B kj
  28. 28. Analyzing MT algorithms: Matrixmultiplication
  29. 29. Chess Lesson
  30. 30. Multicore ProgrammingPart 2: Task Parallel Library
  31. 31. TPL building blocks Consist of: - Tasks - Tread Safe Scalable Collections - Phases and Work Exchange - Partitioning - Looping - Control - Breaking - Exceptions - Results
  32. 32. Data parallelismParallel.ForEach(letters, ch => Capitalize(ch));
  33. 33. Task parallelismParallel.Invoke(() => Average(), () => Minimum()…);
  34. 34. Thread Pool in .net 3.5
  35. 35. Thread Pool in .NET 4.0
  36. 36. Task Scheduler & Thread pool 3.5 ThreadPool.QueueUserWorkItem disadvantages:  Zero information about each work item  Fairness FIFO queue maintain Improvements:  More efficient FIFO queue (ConcurrentQueue)  Enhance the API to get more information from user  Task  Work stealing  Threads injections  Wait completion, handling exceptions, getting computation result
  37. 37. New Primitives Thread-safe, scalable collections  AggregateException  IProducerConsumerCollection<T>  Initialization  ConcurrentQueue<T>  Lazy<T>  ConcurrentStack<T>  LazyInitializer.EnsureInitialized<T>  ConcurrentBag<T>  ThreadLocal<T>  ConcurrentDictionary<TKey,TValu e>  Locks  ManualResetEventSlim Phases and work exchange  SemaphoreSlim  Barrier  SpinLock  BlockingCollection<T>  SpinWait  CountdownEvent  Cancellation Partitioning  CancellationToken{Source}  {Orderable}Partitioner<T>  Partitioner.Create Exception handling
  38. 38. References The Free Lunch Is Over: A Fundamental Turn Toward Concurrency in Software MIT Introduction to algorithms video lectures Chapter 27 Multithreaded Algorithms from Introduction to algorithms 3rd edition CLR 4.0 ThreadPool Improvements: Part 1 Multicore Programming Primer ThreadPool on Channel 9

×