Parallel and AsynchronousProgramming        Or how we built a Dropbox clone without a        PhD in Astrophysics          ...
Why• Processors are getting smaller• Networks are getting worse• Operating Systems demand it• Only a subset of the code ca...
Processors are getting smaller• Once, a single-thread process could use 100%  of the CPU• 16% ΜΑΧ ona Quad core LAPTOP wit...
What we used to have• Hand-coded threads and synchronization• BackgroundWorker    Heavy, cumbersome, single threaded, ina...
Why I stopped blogging• Asynchronous Pipes with APM
The problem with threads• Collisions     Reduced throughput     Deadlocks• Solution: Limit the number of threads     Th...
Why should I care aboutthreads?• How can I speed-up my algorithm?• Which parts can run in parallel?• How can I partition m...
Example          Revani
Synchronous Revani•   Beat the yolks with 2/3 of sugar until fluffy•   Beat the whites with 1/3 of sugar to stiff meringue...
Parallel Revani• Beat yolks                   • Beat Whites           •   Add half mixture           •   Mix semolina     ...
What we have now• Support for multiple concurrency scenarios• Overall improvements in threading• Highly Concurrent collect...
Scenaria• Faster processing of large data  • Number crunching• Execute long operations• Serve high volume of requests  • S...
Scenario Classification• Data Parallelism• Task Parallelism• Asynchronous programming• Agents/Actors• Dataflows
Data Parallelism – Recipe• Partition the data• Implement the algorithm in a function• TPL creates the necessary tasks• The...
Data Parallelism - Tools• Parallel.For / Parallel.ForEach• PLINQ• Partitioners
Parallel class Methods• Parallel execution of lambdas• Blocking calls!• We specify     Cancellation Token     Maximum nu...
PLINQ• LINQ Queries• Potentially multiple threads• Parallel operators• Unordered results• Beware of races  List<int> list ...
What it can’t do• Doesn’t use SSE instructions• Doesn’t use the GPU• Isn’t using the CPU at 100%
Scenaria• Data Parallelism• Task Parallelism• Asynchronous programming• Agents/Actors• Dataflows
Task Parellelism – Recipe• Break the problem into steps• Convert each step to a function• Combine steps with Continuations...
The Improvements• Tasks wherever code blocks• Cancellation• Lazy Initialization• Progress Reporting• Synchronization Conte...
Cancellation• Problem: How do you cancel multiple tasks  without leaving trash behind?• Solution: Everyone monitors a  Can...
Progress Reporting• Problem: How do you update the UI from inside  a task?• Solution: Using an IProgress<T> object     Ou...
Lazy Initialization• Calculate a value only when needed• Lazy<T>(Func<T> …)• Synchronous or Asynchronous calculation     ...
Synchronization Context• Since .NET 2.0!• Hides Winforms, WPF, ASP.NET     SynchronizationContext.Post/Send instead of Di...
Scenaria• Data Parallelism• Task Parallelism• Asynchronous programming• Agents/Actors• Dataflows
Async/Await• Support at the language leve• Debugging support• Exception Handling• After await return to original “thread” ...
Asynchronous Retryprivate static async Task<T>Retry<T>(Func<T> func, int retryCount) {  while (true) {    try {      var r...
More Goodies - Collections•   Highly concurrent•   Thread-safe•   Not only for TPL/PLINQ•   Producer/Consumer scenaria
Concurrent Collections - 2• ConcurrentQueue• ConcurrentStack• ConcurrentDictionary
The Odd one - ConcurrentBag• Duplicates allowed• List per Thread• Reduced collisions for each tread’s Add/Take• BAD for Pr...
Concurrent Collections -Gotchas• NOT faster than plain collections in low  concurrency scenarios• DO NOT consume less memo...
Also in .NET 4• Visual Studio 2012• Async Targeting package• System.Net.HttpClient package
Other Technologies• F# async• C++ Parallel Patterns Library• C++ Concurrency Runtime• C++ Agents• C++ AMP
• Object storage similar to Amazon S3/Azure Blob  storage• A Service of Synnefo – IaaS by GRNet• Written in Python• Client...
Synnefo
Pithos API• REST API base on CloudFiles by Rackspace     Compatible with CyberDuck etc• Block storage• Uploads only using...
Pithos Client for Windows• Multiple accounts per machine• Synchronize local folder to a Pithos account• Detect local chang...
The Architecture   UI         Core         Networking    Storage             File Agent  WPF                            Cl...
Technologies• .ΝΕΤ 4, due to Windows XP compatibility• Visual Studio 2012 + Async Targeting Pack• UI - Caliburn.Micro• Con...
The challenges• Handle potentially hundrends of file events• Hashing of many/large files• Multiple slow calls to the serve...
Events Handling• Use producer/consumer pattern• Store events in ConcurrentQueue• Process ONLY after idle timeout
Merkle Hashing• Why I hate Game of Thrones• Asynchronous reading of blocks• Parallel Hashing of each block• Use of OpenSSL...
Multiple slow calls• Each call a task• Concurrent REST calls per account and share• Task.WhenAll to process results
Unreliable network• Use System.Net.Http.HttpClient• Store blocks in a cache folder• Check and reuse orphans• Asynchronous ...
Resilience to crashes• Use Transactional NTFS if available     Thanks MS for killing it!• Update a copy and File.Replace ...
Should not hang• Use of independent agents• Asynchronous operations wherever possible
Provide Sufficient user feedback• Use WPF, MVVM• Use Progress to update the UI
Next Steps• Create Windows 8 Dekstop and WinRT client• Use Reactive Framework    ΖΗΤΟΥΝΤΑΘ ΕΘΕΛΟΝΤΕΣ
Clever Tricks• Avoid Side Effects• Use Functional Style• Clean Coding• THE BIG SECRET:     Use existing, tested algorithm...
YES TPL• Simplify asynchronous or parallel code• Use out-of-the-box libraries• Scenarios that SUIT Task or Data Parallelism
NO TPL• To accelerate “bad” algorithms• To “accelerate” database access     Use proper SQL and Indexes!     Avoid Cursor...
When TPL is not enough• Functional languages like F#, Scala• Distributed Frameworks like Hadoop, {m}brace
Books• C# 5 in a Nutshell, O’Riley• Parallel Programming with .NET, Microsoft• Pro Parallel Programming with C#, Wiley• Co...
Useful Links• Parallel FX Team:  http://blogs.msdn.com/b/pfxteam/• ΙΕΕΕ Computer Society  http://www.computer.org• ACM htt...
Parallel and Asynchronous Programming -  ITProDevConnections 2012 (Greek)
Parallel and Asynchronous Programming -  ITProDevConnections 2012 (Greek)
Parallel and Asynchronous Programming -  ITProDevConnections 2012 (Greek)
Parallel and Asynchronous Programming -  ITProDevConnections 2012 (Greek)
Parallel and Asynchronous Programming -  ITProDevConnections 2012 (Greek)
Parallel and Asynchronous Programming -  ITProDevConnections 2012 (Greek)
Upcoming SlideShare
Loading in …5
×

Parallel and Asynchronous Programming - ITProDevConnections 2012 (Greek)

1,033 views

Published on

An overview of the Task Parallel Library and its use in the Pithos for Windows client

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,033
On SlideShare
0
From Embeds
0
Number of Embeds
168
Actions
Shares
0
Downloads
6
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Parallel and Asynchronous Programming - ITProDevConnections 2012 (Greek)

  1. 1. Parallel and AsynchronousProgramming Or how we built a Dropbox clone without a PhD in Astrophysics Panagiotis Kanavos DotNetZone Moderator pkanavos@gmail.com
  2. 2. Why• Processors are getting smaller• Networks are getting worse• Operating Systems demand it• Only a subset of the code can run in parallel
  3. 3. Processors are getting smaller• Once, a single-thread process could use 100% of the CPU• 16% ΜΑΧ ona Quad core LAPTOP with HyperThreading• 8% ΜΑΧ on an 8 core server
  4. 4. What we used to have• Hand-coded threads and synchronization• BackgroundWorker  Heavy, cumbersome, single threaded, inadequate progress reporting• EAP: From event to event  Complicated, loss of continuity• APM: BeginXXX/EndXXX  Cumbersome, imagine socket programming with Begin/End! or rather ...
  5. 5. Why I stopped blogging• Asynchronous Pipes with APM
  6. 6. The problem with threads• Collisions  Reduced throughput  Deadlocks• Solution: Limit the number of threads  ThreadPools  Extreme: Stackless Python  Copy data instead of shared access  Extreme: Immutable programming
  7. 7. Why should I care aboutthreads?• How can I speed-up my algorithm?• Which parts can run in parallel?• How can I partition my data?
  8. 8. Example Revani
  9. 9. Synchronous Revani• Beat the yolks with 2/3 of sugar until fluffy• Beat the whites with 1/3 of sugar to stiff meringue• and add half the mixture to the yolk mixture.• Mix semolina with flour and ground coconut ,• add rest of meringue and mix• Mix and pour in cake pan• Bake in pre-heated oven at 170οC for 20-25 mins.• Allow to cool• Prepare syrup, boil water, sugar, lemon for 3 mins.• Pour warm syrup over revani• Sprinkle with ground coconut.
  10. 10. Parallel Revani• Beat yolks • Beat Whites • Add half mixture • Mix semolina • Add rest of meringue • Mix • Pour in cake pan• Bake • Prepare syrup • Pour syrup • Sprinkle
  11. 11. What we have now• Support for multiple concurrency scenarios• Overall improvements in threading• Highly Concurrent collections
  12. 12. Scenaria• Faster processing of large data • Number crunching• Execute long operations• Serve high volume of requests • Social Sites, Web sites, Billing, Log aggregators• Tasks with frequent blocking • REST clients, IT management apps
  13. 13. Scenario Classification• Data Parallelism• Task Parallelism• Asynchronous programming• Agents/Actors• Dataflows
  14. 14. Data Parallelism – Recipe• Partition the data• Implement the algorithm in a function• TPL creates the necessary tasks• The tasks are assigned to threads• I DON’T’T have to define the number of Tasks/Threads!
  15. 15. Data Parallelism - Tools• Parallel.For / Parallel.ForEach• PLINQ• Partitioners
  16. 16. Parallel class Methods• Parallel execution of lambdas• Blocking calls!• We specify  Cancellation Token  Maximum number of Threads  Task Scheduler
  17. 17. PLINQ• LINQ Queries• Potentially multiple threads• Parallel operators• Unordered results• Beware of races List<int> list = new List<int>(); var q = src.AsParallel() .Select(x => { list.Add(x); return x; }) .Where(x => true) .Take(100);
  18. 18. What it can’t do• Doesn’t use SSE instructions• Doesn’t use the GPU• Isn’t using the CPU at 100%
  19. 19. Scenaria• Data Parallelism• Task Parallelism• Asynchronous programming• Agents/Actors• Dataflows
  20. 20. Task Parellelism – Recipe• Break the problem into steps• Convert each step to a function• Combine steps with Continuations• TPL assigns tasks to threads as needed• I DON’T have to define number of Tasks/Threads!• Cancellation of the entire task chain
  21. 21. The Improvements• Tasks wherever code blocks• Cancellation• Lazy Initialization• Progress Reporting• Synchronization Contexts
  22. 22. Cancellation• Problem: How do you cancel multiple tasks without leaving trash behind?• Solution: Everyone monitors a CancellationToken  TPL cancels subsequent Tasks or Parallel operations  Created by a CancellationTokenSource  Can execute code when Cancel is called
  23. 23. Progress Reporting• Problem: How do you update the UI from inside a task?• Solution: Using an IProgress<T> object  Out-of-the-Box Progress<T> updates the current Synch Context  Any type can be a message  Replace with our own implementation
  24. 24. Lazy Initialization• Calculate a value only when needed• Lazy<T>(Func<T> …)• Synchronous or Asynchronous calculation  Lazy.Value  Lazy.GetValueAsync<T>()
  25. 25. Synchronization Context• Since .NET 2.0!• Hides Winforms, WPF, ASP.NET  SynchronizationContext.Post/Send instead of Dispatcher.Invoke etc  Synchronous and Asynchronous version• Automatically created by the environment  SynchronizationContext.Current• Can create our own  E.g. For a Command Line aplication
  26. 26. Scenaria• Data Parallelism• Task Parallelism• Asynchronous programming• Agents/Actors• Dataflows
  27. 27. Async/Await• Support at the language leve• Debugging support• Exception Handling• After await return to original “thread”  Beware of servers and libraries• Dos NOT always execute asynchronously  Only when a task is encountered or the thread yields  Task.Yield
  28. 28. Asynchronous Retryprivate static async Task<T>Retry<T>(Func<T> func, int retryCount) { while (true) { try { var result = await Task.Run(func); return result; } catch { If (retryCount == 0) throw; retryCount--;} } }
  29. 29. More Goodies - Collections• Highly concurrent• Thread-safe• Not only for TPL/PLINQ• Producer/Consumer scenaria
  30. 30. Concurrent Collections - 2• ConcurrentQueue• ConcurrentStack• ConcurrentDictionary
  31. 31. The Odd one - ConcurrentBag• Duplicates allowed• List per Thread• Reduced collisions for each tread’s Add/Take• BAD for Producer/Consumer
  32. 32. Concurrent Collections -Gotchas• NOT faster than plain collections in low concurrency scenarios• DO NOT consume less memory• DO NOT provide thread safe enumeration• DO NOT ensure atomic operations on content• DO NOT fix unsafe code
  33. 33. Also in .NET 4• Visual Studio 2012• Async Targeting package• System.Net.HttpClient package
  34. 34. Other Technologies• F# async• C++ Parallel Patterns Library• C++ Concurrency Runtime• C++ Agents• C++ AMP
  35. 35. • Object storage similar to Amazon S3/Azure Blob storage• A Service of Synnefo – IaaS by GRNet• Written in Python• Clients for Web, Windows, iOS, Android, Linux• Versioning, Permissions, Sharing
  36. 36. Synnefo
  37. 37. Pithos API• REST API base on CloudFiles by Rackspace  Compatible with CyberDuck etc• Block storage• Uploads only using blocks• Uses Merkle Hashing
  38. 38. Pithos Client for Windows• Multiple accounts per machine• Synchronize local folder to a Pithos account• Detect local changes and upload• Detect server changes and download• Calculate Merkle Hash for each file
  39. 39. The Architecture UI Core Networking Storage File Agent WPF CloudFiles SQLite Poll Agent MVVM Network Agent SQL Server Caliburn HttpClient Compact Micro Status Agent
  40. 40. Technologies• .ΝΕΤ 4, due to Windows XP compatibility• Visual Studio 2012 + Async Targeting Pack• UI - Caliburn.Micro• Concurrency - TPL, Parallel, Dataflow• Network – HttpClient• Hashing - OpenSSL – Faster than native provider for hashing• Storage - NHibernate, SQLite/SQL Server Compact• Logging - log4net
  41. 41. The challenges• Handle potentially hundrends of file events• Hashing of many/large files• Multiple slow calls to the server• Unreliable network• And yet it shouldn’t hang• Update the UI with enough information
  42. 42. Events Handling• Use producer/consumer pattern• Store events in ConcurrentQueue• Process ONLY after idle timeout
  43. 43. Merkle Hashing• Why I hate Game of Thrones• Asynchronous reading of blocks• Parallel Hashing of each block• Use of OpenSSL for its SSE support• Concurrency Throttling• Beware of memory consumption!
  44. 44. Multiple slow calls• Each call a task• Concurrent REST calls per account and share• Task.WhenAll to process results
  45. 45. Unreliable network• Use System.Net.Http.HttpClient• Store blocks in a cache folder• Check and reuse orphans• Asynchronous Retry of calls
  46. 46. Resilience to crashes• Use Transactional NTFS if available  Thanks MS for killing it!• Update a copy and File.Replace otherwise
  47. 47. Should not hang• Use of independent agents• Asynchronous operations wherever possible
  48. 48. Provide Sufficient user feedback• Use WPF, MVVM• Use Progress to update the UI
  49. 49. Next Steps• Create Windows 8 Dekstop and WinRT client• Use Reactive Framework ΖΗΤΟΥΝΤΑΘ ΕΘΕΛΟΝΤΕΣ
  50. 50. Clever Tricks• Avoid Side Effects• Use Functional Style• Clean Coding• THE BIG SECRET:  Use existing, tested algorithms• IEEE, ACM Journals and libraries
  51. 51. YES TPL• Simplify asynchronous or parallel code• Use out-of-the-box libraries• Scenarios that SUIT Task or Data Parallelism
  52. 52. NO TPL• To accelerate “bad” algorithms• To “accelerate” database access  Use proper SQL and Indexes!  Avoid Cursors• Reporting DBs, Data Warehouse, OLAP Cubes
  53. 53. When TPL is not enough• Functional languages like F#, Scala• Distributed Frameworks like Hadoop, {m}brace
  54. 54. Books• C# 5 in a Nutshell, O’Riley• Parallel Programming with .NET, Microsoft• Pro Parallel Programming with C#, Wiley• Concurrent Programming on Windows, Pearson• The Art of Concurrency, O’Reilly
  55. 55. Useful Links• Parallel FX Team: http://blogs.msdn.com/b/pfxteam/• ΙΕΕΕ Computer Society http://www.computer.org• ACM http://www.acm.org

×