.Net Multithreading and Parallelization

Uploaded on


More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads


Total Views
On Slideshare
From Embeds
Number of Embeds



Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

    No notes for slide


  • 1. Multithreading and Parallelization Dmitri Nesteruk dmitrinesteruk@gmail.com | http://nesteruk.org/seminars
  • 2. Agenda Overview Multithreading PowerThreading (AsyncEnumerator) Multi-core parallelization Parallel Extensions to .NET Framework Multi-computer parallelization PureMPI.NET
  • 3. Why now? Manycore paradigm shift CPU speeds reach production challenges (not at the limit yet) growth Processor features Hyper-threading SIMD
  • 4. CPU Scope Past: more Yesterday transistors per chip 1x-core Present: more cores per chip Today 2x-core norm Future: even more 4x- cores per chip; Tomorrow NUMA & other 32x-core? specialties
  • 5. Machine Scope Most clients are concerned with Machine one-machine use Clustering helps Cluster leverage performance Clouds Cloud
  • 6. Multithreading vs. Parallelization Multithreading Using threads/thread pool to perform async operations Explicit (# of threads known) Parallelization Implicit parallelization No explicit thread operation
  • 7. Ways to Parallelize/Multithread System.Threading Managed Parr. Extensions Libraries OpenMP Unmanaged Libraries GPGPU Specialized FPGA
  • 8. Managed System.Threading Libraries Parallel Extensions (TPL + PLINQ) PowerThreading Languages/frameworks Sing#, CCR Remoting, WCF, MPI.NET, PureMPI.NET, etc. Use over many machines
  • 9. Unmanaged OpenMP – #pragma directives in C++ code Intel multi-core libraries Threading Building Blocks (low-level) Integrated Performance Primitives Math Kernel Library (also has MPI support) MPI, PVM, etc. Use over many machines
  • 10. Specialized Ex. (Intrinsic Parallelization) GPU Computation (GPGPU) Calculations on graphic card Uses programmable pixel shaders See, e.g., NVidia CUDA, GPGPU.org FPGA Hardware-specific solutions E.g., in-socket accelerators Requires HDL programming & custom hardware
  • 11. Part I Multithreading: a look at AsyncEnumerator
  • 12. Multithreading Goals Do stuff concurrently Preserve safety/consistency Tools Threads ThreadPool Synchronization objects Framework async APIs
  • 13. A Look at Delegates Making delegate for function is easy Given void a() { … } – ThreadStart del = a; Given void a(int n) { … } – Action<int> del = a; Given float a(int n, double m) {…} – Func<int, double, float> del = a; Otherwise, make your own!
  • 14. Delegate Methods Invoke() Synchronous, blocks your thread  BeginInvoke Executes in ThreadPool Returns IAsyncResult EndInvoke Waits for completion Takes the IAsyncResult from BeginInvoke
  • 15. Usage Fire and forget – del.BeginInvoke(null, null); Fire, and wait until done – IAsyncResult ar = del.BeginInvoke(null,null); … del.EndInvoke(ar); Fire, and call a function when done – del.BeginInvoke(firedWhenDone, null); Callback parameter
  • 16. WaitOne and WaitAll To wait until either delegate completes – WaitHandle.WaitOne( new ThreadStart[] { ar1.AsyncWaitHandle, ar2.AsyncWaitHandle }); // wait until either completes To wait until all delegates complete Use WaitAll instead of WaitOne – [MTAThread]-specific, use Pulse & Wait instead
  • 17. Example Execute a() and b() in parallel; wait on both ThreadStart delA = a; ThreadStart delB = b; IAsyncResult arA = delA.BeginInvoke(null, null); IAsyncResult arB = delB.BeginInvoke(null, null); WaitHandle.WaitAll(new [] { arA.AsyncWaitHandle, arB.AsyncWaitHandle });
  • 18. LINQ Example Execute a() and b() in parallel; wait on both WaitHandle.WaitAll( new [] { a, b } Implicitly make an array of delegates .Select (f =>f.BeginInvoke(null,null) Call each delegate .AsyncWaitHandle) .ToArray()); Get a wait handle of each Convert from IEnumerable to array
  • 19. Asynchronous Programming Model (APM) Basic goal – IAsyncResult ar = del.BeginXXX(null,null); … del.EndXXX(ar); Supported by Framework classes, e.g., – FileStream – WebRequest
  • 20. Difficulties Async calls do not always succeed Timeout Exceptions Cancelation Results in too many functions/anonymous delegates Async workflow code becomes difficult to read
  • 21. PowerThreading A free library from Resource locks Wintellect (Jeffrey ReaderWriterGate Richter) Async. prog. model Get it at AsyncEnumerator wintellect.com SyncGate Other features Also check out IO PowerCollections State manager NumaInformation :)
  • 22. AsyncEnumerator Simplifies APM programming No need to manually manage IAsyncResult cookies Fewer functions, cleaner code
  • 23. Usage patterns 1 async op → process X async ops → process all X async ops → process each one as it completes X async ops → process some, discard the rest X async ops → process some until cancellation/timeout occurs, discard the rest
  • 24. AsyncEnumerator Basics Has three methods Execute(IEnumerator<Int32>) BeginExecute EndExecute Also exists as AsyncEnumerator<T> when a return value is required
  • 25. Inside the Function internal IEnumerator<Int32> GetFile( AsyncEnumerator ae, string uri) { WebRequest wr = WebRequest.Create(uri); wr.BeginGetResponse(ae.End(), null); yield return 1; WebResponse resp = wr.EndGetResponse( ae.DequeueAsyncResult()); // use response }
  • 26. Signature internal IEnumerator<Int32> GetFile( AsyncEnumerator ae, string uri) { Function must return IEnumerator<Int32> WebRequestwr = WebRequest.Create(uri); Function must accept AsyncEnumerator as wr.BeginGetResponse(ae.End(), null); one of the parameters (order unimportant) yield return 1; WebResponseresp = wr.EndGetResponse( ae.DequeueAsyncResult()); // use response }
  • 27. Callback internal IEnumerator<Int32> GetFile( AsyncEnumerator ae, string uri) { WebRequest wr = WebRequest.Create(uri); wr.BeginGetResponse(ae.End(), null); yieldthe asyncBeginXXX() methods Call return 1; WebResponseresp = wr.EndGetResponse( Pass ae.End() as callback parameter ae.DequeueAsyncResult()); // use response }
  • 28. Yield internal IEnumerator<Int32> GetFile( AsyncEnumerator ae, string uri) { WebRequest wr = WebRequest.Create(uri); wr.BeginGetResponse(ae.End(), null); yield return 1; WebResponseresp = wr.EndGetResponse( Now yield return the number of pending asynchronous operations ae.DequeueAsyncResult()); // use response }
  • 29. Wait & Process internal IEnumerator<Int32> GetFile( AsyncEnumerator ae, string uri) { WebRequest wr = WebRequest.Create(uri); wr.BeginGetResponse(ae.End(), null); yield return 1; Call the asyncEndXXX() methods WebResponse resp = wr.EndGetResponse( ae.DequeueAsyncResult()); // use response Pass ae.DequeueAsyncResult() as parameter }
  • 30. Usage Init the enumerator – var ae = new AsyncEnumerator(); Use it, passing itself as a parameter – ae.Execute(GetFile( ae, “http://nesteruk.org”));
  • 31. Exception Handling Break out of function – try { resp = wr.EndGetResponse( ae.DequeueAsyncResult()); } catch (WebException e) { // process e yield break; } Propagate a parameter
  • 32. Discard Groups Sometimes, you want to ignore the result of some calls E.g., you already got the data elsewhere To discard a group of calls Use overloaded End(…) methods to specify Group number Cleanup delegate Call DiscardGroup(…) with group number
  • 33. Cancellation External code can cancel the iterator – ae.Cancel(…) Or specify a timeout – ae.SetCancelTimeout(…) Check whether iterator is cancelled with – ae.IsCanceled(…) just call yield break if it is
  • 34. Part II Parallel Extensions to .NET Framework TPL and PLINQ
  • 35. Parallelization Algorithms vary (e.g., matrix multiplication) Some not so (e.g., matrix inversion) Some not at all parallelize them
  • 36. Parallel Extensions to .NET Framework (PFX) A library for parallelization Consists of Task Parallel Library Parallel LINQ (PLINQ) Currently in CTP stage Maybe in .NET 4.0?
  • 37. Task Parallel Library Features System.Linq Parallel LINQ System.Theading Implicit parallelism (Parallel.Xxx) System.Threading.Collections Thread-safe stack and queue System.Threading.Tasks Task manager, tasks, futures
  • 38. System.Threading Implicit Parallel.For | ForEach parallelization (Parallel.For and LazyInit<T> ForEach) WriteOnce<T> Aggregate AggregateException exceptions Other useful classes Other goodies 
  • 39. Parallel.For Parallelizes a for loop Instead of for (int i = 0; i < 10; ++i) { … } We write Parallel.For(0, 10, i => { … });
  • 40. Parallel.For Overloads Step size ParallelState for cancelation Thread-local initialization Thread-local finalization References to a TaskManager Task creation options
  • 41. Parallel.ForEach Same features as Parallel.For except No counters or steps Takes an IEnumerable<T> 
  • 42. Cancelation Parallel.For takes an Action<Int32> delegate Can also take an Action<Int32, ParallelState> ParallelState keeps track of the state of parallel execution ParallelState.Stop() stops execution in all threads
  • 43. Parallel.For Exceptions The AggregateException class holds all exceptions thrown Created even if only one thread throws Used by both Parallel.Xxx and PLINQ Original exceptions stored in InnerExceptions property.
  • 44. LazyInit<T> Lazy initialization of a single variable Options – AllowMultipleExecution Init function can be called by many threads, only one value published – EnsureSingleExecution Init function executed only once – ThreadLocal One init call & value per thread
  • 45. WriteOnce<T> Single-assignment structure Just like Nullable: HasValue Value Also try methods TryGetValue TrySetValue
  • 46. Futures A future is the name of a value that will eventually be produced by a computation Thus, we can decide what to do with the value before we know it
  • 47. Futures of T • Future is a factory • Future<T> is the actual future (and also has factory methods) To make a future – var f = Future.Create(() => g()); To use a future Get f.Value The accessor does an async computation
  • 48. Tasks & TaskManager A better Thread+ThreadPool combination TaskManager A very clever thread pool :) Adjusts worker threads to # of CPUs/cores Keeps all cores busy Task A unit of work May (or may not) run concurrently http://channel9.msdn.com/posts/DanielMoth/Parall elFX-Task-and-friends/
  • 49. Task Just like a future, a task takes an Action<T> – Task t = Task.Create(DoSomeWork); Overloads exist :) Fires off immediately. To wait on completion – t.Wait(); Unlike the thread pool, task manager will use as many threads as there are cores
  • 50. Parallel LINQ (PLINQ) Parallel evaluation in LINQ to Objects LINQ to XML Features IParallelEnumerable<T> ParallelEnumerable.AsParallel static method
  • 51. Example IEnumerable<T> data = ...; var q = data.AsParallel() .Where(x => p(x)) .Orderby(x => k(x)) .Select(x => f(x)); foreach (var e in q) a(e);
  • 52. Part III Interprocess communication with PureMPI.NET
  • 53. Message Passing Interface An API for general-purpose IPC Works across cores & machines C++ and Fortran Some Intel libraries support explicitly http://www.mcs.anl.gov/research/projects/m pich2/
  • 54. PureMPI.NET A free library available at http://purempi.net Uses WCF endpoints for communication Uses MPI syntax Features A library DLL for WCF functionality An EXE for easy deployment over network
  • 55. How it works Your computers run a service that connects them together Your program exposes WCF endpoints You use the MPI interfaces to communicate
  • 56. Communicator & Rank A communicator is a group of computers In most scenarios, you would have one group MPI_COMM_WORLD comm Useful for determine whether we are the
  • 57. Main static void Main(string[] args) { MPIEnvironment app.config using (ProcessorGroup processors = new ProcessorGroup("MPIEnvironment", MpiProcess)) { Run MpiProcess on all machines processors.Start(); Start each one processors.WaitForCompletion(); Wait on all } }
  • 58. Sending & Receiving Blocking or non-blocking methods Send/Receive (blocking) Begin|End Send/Receive (async) Invoked on the comm
  • 59. Send/Receive static void MpiProcess(IDictionary<string, Comm> comms) { Get a default comm from dictionary Comm comm = comms["MPI_COMM_WORLD"]; if (comm.Rank == 0) { Get a message from 1 (blocking) string msg = comm.Receive<string>(1, string.Empty); Console.WriteLine("Got " + msg); } else if (comm.Rank == 1) { comm.Send(0, string.Empty, "Hello"); } Send a message to 0 (also blocking) }
  • 60. Extras Can use async ops Can send to all (Broadcast) Can distribute work and then collect it (Gather/Scatter)
  • 61. Thank You!