Overview of Parallel
    Development
    Visual Studio 2010 + a little on Axum and Concurrent Basic




    Eric Nelson
  ...
Microsoft UK MSDN Flash Newsletter
Every two weeks, pure joy enters your Inbox 


                                       ...
Agenda
 Overview of what we are up to
 Drill down into parallel programming for
 managed developers
 If we have time, “hea...
Things I learnt...
  We have a very large investment in parallel computing
     We have “something for everyone”
     It i...
Buying a new Processor

  £100 - £300
                   Core
    64-bit


    2-3GHz


                   Core
  2 cores ...
Buying a new Processor

   £200 - £500

         Core       Core   Core   Core
     64-bit


     2-3GHz


 4 cores with H...
Where will it all end?
Was it a wise purchase?
      App 1

     My Code

  .NET Framework

     .NET CLR
       App 1          App 2   ...

    ...
Was it a wise purchase?
 Some environments scale to take advantage of
 additional CPU cores (mostly server-side)
         ...
What happened to “The Free Lunch”?
  Bad sequential code will run faster on a faster processor

   Bad parallel code WILL ...
Applications Can Scale Well
                    64                                   Production Fluid
                    ...
What's The Problem?
   Multithreaded programming is “hard” today
     Doable by only a subgroup of senior specialists
    ...
void MatrixMult(
    int size, double** m1, double** m2, double** result)
{
    for (int i = 0; i < size; i++) {
        f...
Static partitioning
void MatrixMult(
    int size, double** m1, double** m2, double** result) {
  int N = size;
          ...
Microsoft Parallel Computing Technologies

                                         Task Concurrency


                   ...
Visual Studio 2010
 Tools / Programming Models / Runtimes
 Integrated                Programming Models                   ...
Explicit Tasking Support
      .NET 4.0                      Visual Studio 2010 C++
      Task Parallel Library         Pa...
Task Parallel Library ( TPL )
Task


     No Threading
     to Threading
     to Tasks



19
User Mode Scheduler
  CLR Thread Pool

      Global
      Queue




                 Worker         Worker
               ...
User Mode Scheduler For Tasks
     CLR Thread Pool: Work-Stealing

                           Local             Local
    ...
Tasks revisited


     More on Tasks




22
Debugger Support
Support both managed and native
1. Parallel Tasks
2. Parallel Stacks
Higher Level Constructs
 Even with Task there are common patterns that
 build into higher level abstractions




 The Para...
Parallel


     Parallel.ForEach
     Parallel.Invoke




25
Declarative Data Parallelism
Parallel LINQ-to-Objects (PLINQ)
     Enables LINQ devs to leverage multiple cores
     Fully...
Parallel LINQ




27
IEnumerable<BabyInfo> babies = ...;
var results = new List<BabyInfo>();
foreach(var baby in babies)
{
    if (baby.Name ==...
IEnumerable<BabyInfo> babies = …;
var results = new List<BabyInfo>();
int partitionsCount = Environment.ProcessorCount;
in...
var results = from baby in babies.AsParallel()
              where baby.Name == queryName &&
                    baby.Stat...
Coordination Data Structures
 Thread-safe collections
   ConcurrentStack<T>...
 Locks
   SpinLock, SpinWait, SemaphoreSlim...
Coordination Data Structures




32
What Next?
http://geekswithblogs.net/iupdateable
  Slides and links
  http://blogs.msdn.com/pfxteam/
  http://msdn.com/con...
Appendix




34
Heads up: Axum
 Previously called Maestro
 Incubation project!
 New programming language
 Lets you take advantage of paral...
Axum “Hello World”
using System;

agent Program :
   Microsoft.Axum.ConsoleApplication
{
  override int Run(String[] args)...
Channels and Agents


using System;
using System.Concurrency;
                                                   agent Mai...
Heads up: Concurrent Basic
    Research Project
       http://channel9.msdn.com/shows/Going+Deep/Claudio-Russo-and-Lucian-...
Upcoming SlideShare
Loading in …5
×

Overview Of Parallel Development - Ericnel

2,292 views

Published on

Overview of the new parallel capabilities in .NET 4.0 plus a swift look at Axum (a new language) and Concurrent Basic (a research project)

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
2,292
On SlideShare
0
From Embeds
0
Number of Embeds
178
Actions
Shares
0
Downloads
57
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Overview Of Parallel Development - Ericnel

  1. 1. Overview of Parallel Development Visual Studio 2010 + a little on Axum and Concurrent Basic Eric Nelson Eric.nelson@microsoft.com http://geekswithblogs.net/iupdateable http://blogs.msdn.com/goto100 http://twitter.com/ericnel 1
  2. 2. Microsoft UK MSDN Flash Newsletter Every two weeks, pure joy enters your Inbox  MSDN Flash Podcast Pilot For feedback http://bit.ly/flashpod1 http://msdn.microsoft.com/uk/flash MSDN Flash eBook 13 of the “Best Technical Technical Authors wanted Articles of 2008” for the Flash – 400 to 500 http://bit.ly/flashebook1 words. Fancy it?
  3. 3. Agenda Overview of what we are up to Drill down into parallel programming for managed developers If we have time, “heads up” on Axum and CB
  4. 4. Things I learnt... We have a very large investment in parallel computing We have “something for everyone” It is not all synced, it is sometimes overlapping It is a big topic Managed vs native vs client vs server vs task vs data... Even with the investment, design/code/test for parallel is far harder Locking, Deadlocks, Livelocks It is about getting ready for the future Code today – run better tomorrow? VS2010 CTP – not a great place for parallel Single core in guest Unsupported route to use Hyper-V Easiest route to dabble – Microsoft Parallel Extensions June CTP for VS2008
  5. 5. Buying a new Processor £100 - £300 Core 64-bit 2-3GHz Core 2 cores or 4
  6. 6. Buying a new Processor £200 - £500 Core Core Core Core 64-bit 2-3GHz 4 cores with HT Memory Controller QuickPath Interconnect
  7. 7. Where will it all end?
  8. 8. Was it a wise purchase? App 1 My Code .NET Framework .NET CLR App 1 App 2 ... Windows OS
  9. 9. Was it a wise purchase? Some environments scale to take advantage of additional CPU cores (mostly server-side) ... ASP.NET Web Forms/Services WCF Services WF Engine .NET ThreadPool or Custom Threading Strategy A lot of code does not (mostly client-side) This code will see little benefit from future hardware advances 
  10. 10. What happened to “The Free Lunch”? Bad sequential code will run faster on a faster processor Bad parallel code WILL NOT run faster on more cores Just using parallel code is not enough Speedup 3 2.5 2 1.5 Speedup 1 0.5 0 1 2 4 8 16 32
  11. 11. Applications Can Scale Well 64 Production Fluid Production Face Production Cloth Parallel Speedup 48 Game Fluid Game Rigid Body Game Cloth 32 Marching Cubes Sports Video Analysis Video Cast Indexing Home Video Editing 16 Text Indexing Ray Tracing Foreground Estimation 0 Human Body Tracker 0 16 32 48 64 Portifolio Management Geometric Mean Cores Graphics Rendering – Physical Simulation -- Vision – Data Mining -- Analytics
  12. 12. What's The Problem? Multithreaded programming is “hard” today Doable by only a subgroup of senior specialists Parallel patterns are not prevalent, well known, nor easy to implement So many potential problems Races, deadlocks, livelocks, lock convoys, cache coherency overheads, lost event notifications, broken serializability, priority inversion, and so on… Businesses have little desire to “go deep” Best developers should focus on business value, not concurrency Need simple ways to allow all developers to write concurrent code
  13. 13. void MatrixMult( int size, double** m1, double** m2, double** result) { for (int i = 0; i < size; i++) { for (int j = 0; j < size; j++) { result[i][j] = 0; for (int k = 0; k < size; k++) { result[i][j] += m1[i][k] * m2[k][j]; } } } }
  14. 14. Static partitioning void MatrixMult( int size, double** m1, double** m2, double** result) { int N = size; Synchronization Knowledge int P = 2 * NUMPROCS; int Chunk = N / P; HANDLE hEvent = CreateEvent(NULL, TRUE, FALSE, NULL); Error prone long counter = P; for (int c = 0; c < P; c++) { std::thread t ([&,c] { Lots of boilerplate for (int i = c * Chunk; i < (c + 1 == P ? N : (c + 1) * Chunk); i++) { for (int j = 0; j < size; j++) { result[i][j] = 0; for (int k = 0; k < size; k++) { result[i][j] += m1[i][k] * m2[k][j]; Tricks } } } Lack of thread reuse if (InterlockedDecrement(counter) == 0) SetEvent(hEvent); }); } Heavy synchronization WaitForSingleObject(hEvent,INFINITE); CloseHandle(hEvent); }
  15. 15. Microsoft Parallel Computing Technologies Task Concurrency WCF CCR •Robotics-based •Automotive control system •Internet –based photo WF WF manufacturing assembly line •Silverlight Olympics viewer services TPL / PPL Maestro aka Axum Local Distributed/ Computing Cloud Computing •Ultrasound imaging •Enterprise search, OLTP, collab Cluster SOA •Animation / CGI rendering equipment •Media encode/decode •Weather forecasting PLINQ Cluster PLINQ •Image processing/ •Seismic monitoring •Oil exploration enhancement OpenMP TPL / PPL Cluster TPL MPI / MPI.Net •Data visualization Compute Shader CDS Data Parallelism
  16. 16. Visual Studio 2010 Tools / Programming Models / Runtimes Integrated Programming Models Programming Models Tooling PLINQ Parallel Pattern Parallel Task Parallel Agents Debugger Library Library Library Tool Data Structures Data Structures Concurrency Runtime Concurrency Runtime ThreadPool Task Scheduler Profiler Task Scheduler Concurrency Analysis Resource Manager Resource Manager Operating System Threads Key: Managed Library Native Library Tools
  17. 17. Explicit Tasking Support .NET 4.0 Visual Studio 2010 C++ Task Parallel Library Parallel Pattern Library Task, TaskFactory task, task_group Parallel.For parallel_for Parallel.Foreach parallel_for_each Parallel.Invoke parallel_invoke Concurrent data structures Concurrent data structures Primitives for message passing User-mode locks 17
  18. 18. Task Parallel Library ( TPL )
  19. 19. Task No Threading to Threading to Tasks 19
  20. 20. User Mode Scheduler CLR Thread Pool Global Queue Worker Worker … Thread 1 Thread p Program Thread
  21. 21. User Mode Scheduler For Tasks CLR Thread Pool: Work-Stealing Local Local … Queue Queue Global Queue Worker Worker … Thread 1 Thread p Task 6 Task Task 3 4 Task 1 Task 5 Task 2Program Thread
  22. 22. Tasks revisited More on Tasks 22
  23. 23. Debugger Support Support both managed and native 1. Parallel Tasks 2. Parallel Stacks
  24. 24. Higher Level Constructs Even with Task there are common patterns that build into higher level abstractions The Parallel class Invoke, For, For<T>, Foreach Care needs to be taken with state, ordering “This is not your Father’s for loop”
  25. 25. Parallel Parallel.ForEach Parallel.Invoke 25
  26. 26. Declarative Data Parallelism Parallel LINQ-to-Objects (PLINQ) Enables LINQ devs to leverage multiple cores Fully supports all .NET standard query operators Minimal impact to existing LINQ model var q = from p in people.AsParallel() where p.Name == queryInfo.Name && p.State == queryInfo.State && p.Year >= yearStart && p.Year <= yearEnd orderby p.Year ascending select p;
  27. 27. Parallel LINQ 27
  28. 28. IEnumerable<BabyInfo> babies = ...; var results = new List<BabyInfo>(); foreach(var baby in babies) { if (baby.Name == queryName && baby.State == queryState && baby.Year >= yearStart && baby.Year <= yearEnd) { results.Add(baby); } } results.Sort((b1, b2) => b1.Year.CompareTo(b2.Year));
  29. 29. IEnumerable<BabyInfo> babies = …; var results = new List<BabyInfo>(); int partitionsCount = Environment.ProcessorCount; int remainingCount = partitionsCount; var enumerator = babies.GetEnumerator(); try { using (var done = new ManualResetEvent(false)) { for(int i = 0; i < partitionsCount; i++) { ThreadPool.QueueUserWorkItem(delegate { while(true) { BabyInfo baby; lock (enumerator) { if (!enumerator.MoveNext()) break; baby = enumerator.Current; } if (baby.Name == queryName && baby.State == queryState && baby.Year >= yearStart && baby.Year <= yearEnd) { lock (results) results.Add(baby); } } if (Interlocked.Decrement(ref remainingCount) == 0) done.Set(); }); } done.WaitOne(); results.Sort((b1, b2) => b1.Year.CompareTo(b2.Year)); } } finally { if (enumerator is IDisposable) ((IDisposable)enumerator).Dispose(); }
  30. 30. var results = from baby in babies.AsParallel() where baby.Name == queryName && baby.State == queryState && baby.Year >= yearStart && baby.Year <= yearEnd orderby baby.Year ascending select baby;
  31. 31. Coordination Data Structures Thread-safe collections ConcurrentStack<T>... Locks SpinLock, SpinWait, SemaphoreSlim ... Work Exchange BlockingCollection<T> ... Phased Operation CountdownEvent ...
  32. 32. Coordination Data Structures 32
  33. 33. What Next? http://geekswithblogs.net/iupdateable Slides and links http://blogs.msdn.com/pfxteam/ http://msdn.com/concurrency Wait for the Beta of Visual Studio 2008 and OR for the most impatient Download VS 2010 CTP Remember to set the clock back Or Download Parallel Extensions June 2008 CTP for VS2008
  34. 34. Appendix 34
  35. 35. Heads up: Axum Previously called Maestro Incubation project! New programming language Lets you take advantage of parallelism without “thinking about it” Agent based programming vs Object based programming Model agents and their interactions via messages No public methods, fields
  36. 36. Axum “Hello World” using System; agent Program : Microsoft.Axum.ConsoleApplication { override int Run(String[] args) { Console.WriteLine(quot;Hello, World!quot;); } }
  37. 37. Channels and Agents using System; using System.Concurrency; agent MainAgent : channel Microsoft.Axum.Application using Microsoft.Axum; { public MainAgent() channel Adder { { var adder = AdderAgent.CreateInNewDomain(); input int Num1; adder::Num1 <-- 10; input int Num2; adder::Num2 <-- 20; output int Sum; // do something useful ... } var sum = receive(adder::Sum); agent AdderAgent : channel Adder Console.WriteLine(sum); { public AdderAgent() PrimaryChannel::ExitCode <-- 0; { } int result = receive(PrimaryChannel::Num1) + } receive(PrimaryChannel::Num2); PrimaryChannel::Sum <-- result; } }
  38. 38. Heads up: Concurrent Basic Research Project http://channel9.msdn.com/shows/Going+Deep/Claudio-Russo-and-Lucian-Wischik-Inside-Concurrent- Basic/ Added message passing primitives – channels Module Buffer Public Asynchronous Put(ByVal s As String) Public Synchronous Take() As String Private Function CaseTakeAndPut(ByVal s As String) As String When Take, Put Return s End Function End Module Thread1: Thread2: Put(“Hello”) result = Take()

×