Concurrency scalability

196
-1

Published on

Herb Sutter (GotW.ca) says that the concept of Concurrency is easier understood if split into three sub concepts; scalability, responsiveness and consistency. This presentation is the first of three covering these concepts, starting off with everyone’s favorite: Scalability – i.e. splitting a CPU-bound problem onto several cores in order to solve the problem faster. I will show what tools what .NET offer but also performance pitfalls that arise from an escalating problem that plagued computer architecture for the last 20 years.

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
196
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
3
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Concurrency scalability

  1. 1. Mårten Rånge WCOM AB @marten_range
  2. 2. Concurrency Examples for .NET
  3. 3. Responsive
  4. 4. Performance Scalable algorithms
  5. 5. Three pillars of Concurrency  Scalability (CPU)  Parallel.For  Responsiveness  Task/Future  async/await  Consistency     lock/synchronized Interlocked.* Mutex/Event/Semaphore Monitor
  6. 6. Scalability
  7. 7. Which is fastest? var ints = new int[InnerLoop]; var random = new Random(); for (var inner = 0; inner < InnerLoop; ++inner) { ints[inner] = random.Next(); } // -----------------------------------------------var ints = new int[InnerLoop]; var random = new Random(); Parallel.For( 0, InnerLoop, i => ints[i] = random.Next() );
  8. 8. SHARED STATE  Race condition var ints = new int[InnerLoop]; var random = new Random(); for (var inner = 0; inner < InnerLoop; ++inner) { ints[inner] = random.Next(); } // -----------------------------------------------var ints = new int[InnerLoop]; var random = new Random(); Parallel.For( 0, InnerLoop, i => ints[i] = random.Next() );
  9. 9. SHARED STATE  Poor performance var ints = new int[InnerLoop]; var random = new Random(); for (var inner = 0; inner < InnerLoop; ++inner) { ints[inner] = random.Next(); } // -----------------------------------------------var ints = new int[InnerLoop]; var random = new Random(); Parallel.For( 0, InnerLoop, i => ints[i] = random.Next() );
  10. 10. Then and now Metric VAX-11/750 (’80) Today Improvement MHz 6 3300 550x Memory MB 2 16384 8192x Memory MB/s 13 R ~10000 W ~2500 770x 190x
  11. 11. Then and now Metric VAX-11/750 (’80) Today Improvement MHz 6 3300 550x Memory MB 2 16384 8192x Memory MB/s 13 Memory nsec 225 R ~10000 W ~2500 70 770x 190x 3x
  12. 12. Then and now Metric VAX-11/750 (’80) Today Improvement MHz 6 3300 550x Memory MB 2 16384 8192x Memory MB/s 13 Memory nsec 225 R ~10000 W ~2500 70 770x 190x 3x Memory cycles 1.4 210 -150x
  13. 13. 299,792,458 m/s
  14. 14. Speed of light is too slow
  15. 15. 0.09 m/c
  16. 16. 99% - latency mitigation 1% - computation
  17. 17. 2 Core CPU CPU1 CPU2 L1 L1 L2 L2 L3 RAM
  18. 18. 2 Core CPU – L1 Cache CPU1 L1 CPU2 new Random () new int[InnerLoop] L1
  19. 19. 2 Core CPU – L1 Cache CPU1 CPU2 Random object Random object L1 L1
  20. 20. 2 Core CPU – L1 Cache CPU1 CPU2 Random object Random object L1 L1
  21. 21. 2 Core CPU – L1 Cache CPU1 CPU2 Random object Random object L1 L1
  22. 22. 2 Core CPU – L1 Cache CPU1 CPU2 Random object Random object L1 L1
  23. 23. 2 Core CPU – L1 Cache CPU1 CPU2 Random object Random object L1 L1
  24. 24. 2 Core CPU – L1 Cache CPU1 CPU2 Random object Random object L1 L1
  25. 25. 4 Core CPU – L1 Cache CPU1 L1 CPU2 L1 CPU3 new Random () new int[InnerLoop] L1 CPU4 L1
  26. 26. 2x4 Core CPU CPU1 CPU2 CPU3 CPU4 CPU5 CPU6 CPU7 CPU8 L1 L1 L1 L1 L1 L1 L1 L1 L2 L2 L2 L2 L2 L2 L2 L2 L3 L3 RAM
  27. 27. Solution 1 – Locks var ints = new int[InnerLoop]; var random = new Random(); Parallel.For( 0, InnerLoop, i => {lock (ints) {ints[i] = random.Next();}} );
  28. 28. Solution 2 – No sharing var ints = new int[InnerLoop]; Parallel.For( 0, InnerLoop, () => new Random(), (i, pls, random) => {ints[i] = random.Next(); return random;}, random => {} );
  29. 29. Parallel.For adds overhead Level2 Level1 Level2 Level0 Level2 Level1 Level2 ints[0] ints[1] ints[2] ints[3] ints[4] ints[5] ints[6] ints[7]
  30. 30. Solution 3 – Less overhead var ints = new int[InnerLoop]; Parallel.For( 0, InnerLoop / Modulus, () => new Random(), (i, pls, random) => { var begin = i * Modulus ; var end = begin + Modulus ; for (var iter = begin; iter < end; ++iter) { ints[iter] = random.Next(); } return random; }, random => {} );
  31. 31. var ints = new int[InnerLoop]; var random = new Random(); for (var inner = 0; inner < InnerLoop; ++inner) { ints[inner] = random.Next(); }
  32. 32. Solution 4 – Independent runs var tasks = Enumerable.Range (0, 8).Select ( i => Task.Factory.StartNew ( () => { var ints = new int[InnerLoop]; var random = new Random (); while (counter.CountDown ()) { for (var inner = 0; inner < InnerLoop; ++inner) { ints[inner] = random.Next(); } } }, TaskCreationOptions.LongRunning)) .ToArray (); Task.WaitAll (tasks);
  33. 33. Parallel.For Only for CPU bound problems
  34. 34. Sharing is bad Kills performance Race conditions Dead-locks
  35. 35. Cache locality RAM is a misnomer Class design Avoid GC
  36. 36. Natural concurrency Avoid Parallel.For
  37. 37. Act like an engineer Measure before and after
  38. 38. One more thing…
  39. 39. http://tinyurl.com/wcom-cpuscalability
  40. 40. Mårten Rånge WCOM AB @marten_range

×