Over the years software development has relied on increasing processor clock speeds to achieve better performance. For better or worse though the trend has changed to adding more processing cores. Generally speaking, software development hasn’t adjusted to account for this transition. As a result many applications aren’t taking full advantage of the underlying platform and therefore they’re not performing as well as they could. In order to take advantage of multi-core and multi-processor systems though we need to change the way we write code to include parallelization.
On .NET we have not ignored this challenge. In fact we have not just one but several patterns for how to do asynchronous programming; that is, dealing with I/O and similar high latency operations without blocking threads. Most often there is both a synchronous (i.e. blocking transparently) and an asynchronous (i.e. latency-explicit) way of doing things. The problem is that these current patterns are very disruptive to program structure, leading to exceedingly complex and error prone code or (more commonly) developers giving up and using the blocking approach, taking a responsiveness and performance hit instead.The goal should be to bring the asynchronous development experience as close to the synchronous paradigm as possible, without letting go of the ability to handle the asynchrony-specific situations. Asynchrony should be explicit and non-transparent, but in a very lightweight and non-disruptive manner. Composability, abstraction and control structures should all work as simply and intuitively as with synchronous code.
In data parallel operations, the source collection is partitioned so that multiple threads can operate on different segments concurrently.TPL supports data parallelism through the System.Threading.Tasks.Parallel class. This class provides method-based parallel implementations of for and foreach loops (For and For Each in Visual Basic). You write the loop logic for a Parallel.For or Parallel.ForEach loop much as you would write a sequential loop. You do not have to create threads or queue work items. In basic loops, you do not have to take locks. The TPL handles all the low-level work for you. The following code example shows a simple foreach loop and its parallel equivalent.When a parallel loop runs, the TPL partitions the data source so that the loop can operate on multiple parts concurrently. Behind the scenes, the Task Scheduler partitions the task based on system resources and workload. When possible, the scheduler redistributes work among multiple threads and processors if the workload becomes unbalanced.TPL does not protect you from synchronizing data.Best to use as little shared data as possiblePerformance issues
More efficient and more scalable use of system resources.Behind the scenes, tasks are queued to the ThreadPool, which has been enhanced with algorithms (like hill-climbing) that determine and adjust to the number of threads that maximizes throughput. This makes tasks relatively lightweight, and you can create many of them to enable fine-grained parallelism. To complement this, widely-known work-stealing algorithms are employed to provide load-balancing.More programmatic control than is possible with a thread or work item.Tasks and the framework built around them provide a rich set of APIs that support waiting, cancellation, continuations, robust exception handling, detailed status, custom scheduling, and more.
Tasks are like an Action delegate running in the back. One can spawn new tasks, wait for a task to complete, cancel one, schedule continuation tasks, etc.Task<T> is like a Func<T> delegate running in the back, eventually producing (in the future) a value of type T. Again continuation functions can be scheduled, etc.
The number of Task instances that are created behind the scenes by Invoke is not necessarily equal to the number of delegates that are provided. The TPL may employ various optimizations, especially with large numbers of delegates.
A task that returns a value is represented by the System.Threading.Tasks.Task<TResult> class, which inherits from Task.When you create a task, you give it a user delegate that encapsulates the code that the task will execute. The delegate can be expressed as a named delegate, an anonymous method, or a lambda expression. Lambda expressions can contain a call to a named method, as shown in the following example.You can also use the StartNew method to create and start a task in one operation. This is the preferred way to create and start tasks if creation and scheduling do not have to be separated, as shown in the following exampleAll tasks returned from TAP methods must be “hot.” If a TAP method internally uses a Task’s constructor to instantiate the task to be returned, the TAP method must call Start on the Task object prior to returning it. Consumers of a TAP method may safely assume that the returned task is “hot,” and should not attempt to call Start on any Task returned from a TAP method. Calling Start on a “hot” task will result in an InvalidOperationException (this check is handled automatically by the Task class).
Also, in this example, because the tasks are of type System.Threading.Tasks.Task<TResult>, they each have a public Result property that contains the result of the computation. The tasks run asynchronously and may complete in any order. If Result is accessed before the computation completes, the property will block the thread until the value is available.Every task receives an integer ID that uniquely identifies it in an application domain and that is accessible by using the Id property. The ID is useful for viewing task information in the Visual Studio debugger Parallel Stacks and Parallel Tasks windows.
In asynchronous programming, it is very common for one asynchronous operation, on completion, to invoke a second operation and pass data to it. Traditionally, this has been done by using callback methods. In the Task Parallel Library, the same functionality is provided by continuation tasks. A continuation task (also known just as a continuation) is an asynchronous task that is invoked by another task, which is known as theantecedent, when the antecedent completes.In the Task Parallel Library, a task whose ContinueWith method is invoked is called the antecedent task and the task that is defined in the ContinueWith method is called the continuation.Most APIs that create tasks provide overloads that accept a TaskCreationOptions parameter. By specifying one of these options, you instruct the task scheduler as to how to schedule the task on the thread pool. The following table lists the various task creation options.ElementDescriptionNoneThe default option when no option is specified. The scheduler uses its default heuristics to schedule the task.PreferFairnessSpecifies that the task should be scheduled so that tasks created sooner will be more likely to be executed sooner, and tasks created later will be more likely to execute later.LongRunningSpecifies that the task represents a long-running operation..AttachedToParentSpecifies that a task should be created as an attached child of the current Task, if one exists.
When user code that is running in a task creates a new task and does not specify theAttachedToParent option, the new task not synchronized with the outer task in any special way. Such tasks are called a detached nested task. The following example shows a task that creates one detached nested task.Outer task does not wait for the nested task to complete.When user code that is running in a task creates a task with the AttachedToParent option, the new task is known as a child task of the originating task, which is known as the parent task. You can use the AttachedToParent option to express structured task parallelism, because the parent task implicitly waits for all child tasks to complete. The following example shows a task that creates one child task:CategoryNested TasksAttached Child TasksOuter task (parent) waits for inner tasks to complete.NoYesParent propagates exceptions thrown by children (inner tasks).NoYesStatus of parent (outer task) dependent on status of child (inner task).NoYes
In the Task classes, cancellation involves cooperation between the user delegate, which represents a cancelable operation and the code that requested the cancellation.You can terminate the operation by using one of these options:By simply returning from the delegate. In many scenarios this is sufficient; however, a task instance that is "canceled" in this way transitions to the RanToCompletion state, not to the Canceled state.By throwing a OperationCanceledException and passing it the token on which cancellation was requested. The preferred way to do this is to use the ThrowIfCancellationRequested method. A task that is canceled in this way transitions to the Canceled state, which the calling code can use to verify that the task responded to its cancellation request.
Providing the progress interface at the time of the asynchronous method’s invocation helps to eliminate race conditions that result from incorrect usage where event handlers incorrectly registered after the invocation of the operation may miss updates. More importantly, it enables varying implementations of progress to be utilized, as determined by the consumer. The consumer may, for example, only care about the latest progress update, or may want to buffer them all, or may simply want to invoke an action for each update, or may want to control whether the invocation is marshaled to a particular thread; all of this may be achieved by utilizing a different implementation of the interface, each of which may be customized to the particular consumer’s need. As with cancellation, TAP implementations should only provide an IProgress<T> parameter if the API supports progress notifications.An instance of Progress<T> exposes a ProgressChanged event, which is raised every time the asynchronous operation reports a progress update. The ProgressChanged event is raised on whatever SynchronizationContext was captured when the Progress<T> instance was instantiated (if no context was available, a default context is used, targeting the ThreadPool). Handlers may be registered with this event; a single handler may also be provided to the Progress instance’s constructor (this is purely for convenience, and behaves just as would an event handler for the ProgressChanged event). Progress updates are raised asynchronously so as to avoid delaying the asynchronous operation while event handlers are executing. Another IProgress<T> implementation could choose to apply different semantics.
Previously, thread pool had only one queue on which all the work items were queued and enqueued in FIFO order (ofcourse, thats why queue). The worker threads are allocated for every work item access the work item from this queue. In .NET 4.0, it has been improved by introducing local queue for every worker thread, in addition to qlobal queue.Tasks those are created by program thread queued on global queue. The task scheduler enqueues the tasks from global queue in FIFO order and distributes to respective worker thread’s local queue The worker thread enqueues the tasks from its local queue in LIFO order. The introduction of local queue makes these threads can be executed on different processors without contention issue which normally occur in single queue thread pool. The reason for worker thread picking up the tasks in LIFO order is the assumption that “last-in” is hot to act which results no qurantee in task ordering, but better performance.
a consumer may then choose whether to wrap an invocation of that synchronous method into a Task for their own purposes of offloading the work to another thread and/or to achieve parallelism.
By not overstating how to execute queries thanks to LINQ, the runtime can reason about a query’s execution and optimize it. For example by partitioning the input set and running Where clauses in parallel over the chunks of data, merging the results back afterwards. All you have to do is to pepper on “AsParallel” on the source collection of the query. The reason that doesn’t happen automagically is because – though not recommended at all – queries can be side-effecting and auto-parallelizing its execution would change the semantics of existing code. Side-effects will kill you, right…In a parallel setting (which the user has stated explicitly using AsParallel) you loose preservation of input ordering by default. E.g. from x in Enumerable.Range(1,10) where x % 2 == 0 select x, normally results in 2, 4, 6, 8, 10. In PLINQ any order can occur as data can get partitioned to be filtered (amongst other operations) on multiple cores. Unless you explicitly indicate to preserve the original order by calling AsOrdered (xs.AsParallel().AsOrdered()) which obviously is less efficient than a plain non-ordered AsParallel query (but potentially and hopefully still faster that the non-parallel one).
APM – Asynchronous Programming ModelOne of our design goals for the Task Parallel Library is to integrate well into existing asynchronous mechanisms in the .NET Framework. The .NET Framework 1.0 saw the introduction of the IAsyncResult pattern, otherwise known as the Asynchronous Programming Model (APM) pattern, or the Begin/End pattern. The .NET Framework 2.0 then brought with it the event-based asynchronous pattern (EAP). The new TAP deprecates both of its predecessors, while at the same time providing the ability to easily build migration routines from the APM and EAP to TAP.1. The event-based asynchronous pattern relies on an instance MethodNameAsync method which returns void, accepts the same parameters as the synchronous MethodName method, and initiates the asynchronous operation. 2. Prior to initiating the asynchronous operation, event handlers are registered with events on the same instance, and these events are then raised to provide progress and completion notifications. The event handlers are typically custom delegate types that utilize event argument types that are or that are derived from ProgressChangedEventArgs and AsyncCompletedEventArgs.And one of the most common concurrency-related patterns in the .NET Framework is the Asynchronous Programming Model (APM), which typically manifests as a BeginXx method that kicks off an asynchronous operation and returns an IAsyncResult, along with an EndXx method that accepts an IAsyncResult and returns the computed value.Implementing this for computationally intensive asynchronous operations can be done with System.Threading.Tasks.Future<T>, as Future<T> derives from System.Threading.Tasks.Task, and Task implements IAsyncResult. APM pattern requires two methods : BeginXXX and EndXXX. In case of Task based Async Pattern (TAP) there is only one method that returns a Task<Result>
Collection classes that are thread safe and scalable.Multiple threads can safely add and remove items from these collections without requiring additional synchronization in code.Do not use collections from .NET framework 1.0. We recommend the concurrent collections classes in the .NET Framework 4 because they provide not only the type safety of the .NET Framework 2.0 collection classes, but also more efficient and more complete thread safety than the .NET Framework 1.0 collections provide.Some of the concurrent collection types use lightweight synchronization mechanisms such as SpinLock, SpinWait, SemaphoreSlim, and CountdownEvent, which are new in the .NET Framework 4. These synchronization types typically use busy spinning for brief periods before they put the thread into a true Wait state. When wait times are expected to be very short, spinning is far less computationally expensive than waiting, which involves an expensive kernel transition. For collection classes that use spinning, this efficiency means that multiple threads can add and remove items at a very high rate. For more information about spinning vs. blocking, see SpinLock and SpinWait.The ConcurrentQueue(Of T) and ConcurrentStack(Of T) classes do not use locks at all. Instead, they rely on Interlocked operations to achieve thread-safety.ConcurrentBag – Thread safe unordered collection of objects.ManualResetEventSlim a light-weight version of ManualResetEvent that can only be used for intra-process communication.SemaphoreSlim a light-weight version of Semaphore that restricts concurrent access to resources.ReaderWriterLockSlim– In general the Slim verisons : "In the .NET Framework version 4, you can use the System.Threading.ManualResetEventSlim class for better performance when wait times are expected to be very short, and when the event does not cross a process boundary"
Lock free updates using SpinWait and InterlockedExchange.
Notice first how similar it is to the synchronous code. The highlighted parts are added, the rest stays the same. The control flow is completely unaltered, and there are no callbacks in sight. That doesn’t mean that there are no callbacks, but the compiler takes care of creating and signing them up, as we shall see.By adding the async contextual keyword to the method definition, we are able to use the await keyword on our WebClient.DownloadStringTaskAsync method call.When the user clicks this button, the new method (Task<string> WebClient.DownloadStringTaskAsync(string)) is called, which returns a Task<string>. By adding the await keyword, the runtime will call this method that returns Task<string>, and execution will return to the caller at this point. This means that our UI is not blocked while the webpage is downloaded. Instead, the UI thread will “await” at this point, and let the WebClient do it’s thing asynchronously.When the WebClient finishes downloading the string, the user interface’s synchronization context will automatically be used to “pick up” where it left off, and the Task<string> returned from DownloadStringTaskAsync is automatically unwrapped and set into the content variable. At this point, we can use that and set our text box content.
At the API level, the way to achieve waiting without blocking is to provide callbacks. For Tasks, this is achieved through methods like ContinueWith. Language-based asynchrony support hides callbacks by allowing asynchronous operations to be awaited within normal control flow, with compiler-generated code targeting this same API-level support.Consider another example of downloading multiple files from the web asynchronously. In this case, all of the asynchronous operations have homogeneous result types, and access to the results is simple:string  pages = await Task.WhenAll( from url in urls select DownloadStringAsync(url));asyncConventionally asynchronous methods use the Async suffix to indicate that execution can be carried out after the method has ended.An async method provides a convenient way to do things that take a long time without causing blockages in the calling thread. The thread that makes an async method call can continue without having to wait until the method finishes.awaitAn expression await is only allowed when it is within an asynchronous method. Commonly, a modified method for the modifier async contains at least one instruction await. Such a method runs synchronously until it meets the first expression await, at which execution is suspended until the task ends.
Partition your problem into enough tasks to keep each processor busy throughout the operation, but not more than necessary to keep each processor busy.Partition your problem in a way to place the most work possible into each task.Parallelization is something that should be handled with care and forethought, added by design, and not just introduced casually.Parallel isn’t Always Faster – sometimes the overhead involved with determining whether something should be parallelized actually takes longer than just running the process sequentially. As a corollary, it is possible to add too much parallelization. When there aren’t enough resources available to gain benefits from parallelization then the added overhead will decrease performance.Writing to Shared Memory from multiple threads amplifies the potential for race conditions. Additionally, the overhead associated with locks and synchronization can hamper performance.Thread-Affinity still matters so when using technologies that impose restrictions requiring some code to run on a specific thread you may not be able to perform certain actions in a task without changing scheduler settings.