http://go.microsoft.com/?linkid=9692084
Parallel Programming with Visual Studio 2010 and the .NET Framework 4Stephen ToubMicrosoft CorporationOctober 2009
AgendaWhy Parallelism, Why Now?Difficulties w/ Visual Studio 2008 & .NET 3.5Solutions w/ Visual Studio 2010 & .NET 4Parallel LINQTask Parallel LibraryNew Coordination & Synchronization PrimitivesNew Parallel Debugger WindowsNew Profiler Concurrency Visualizations
Moore’s Law“The number of transistors incorporated in a chip will approximately double every 24 months.” Gordon MooreIntel Co-Founderhttp://www.intel.com/pressroom/kits/events/moores_law_40th/
Moore’s Law: Alive and Well?The number of transistors doubles every two years…More than 1 billiontransistorsin 2006!http://upload.wikimedia.org/wikipedia/commons/2/25/Transistor_Count_and_Moore%27s_Law_-_2008_1024.png
Moore’s Law: Feel the Heat! Sun’s Surface10,0001,000100101Rocket NozzleNuclear ReactorPower Density (W/cm2)Pentium® processorsHot Plate8080‘70	                  ‘80	                ’90             ’00                    ‘10486386Intel Developer Forum, Spring 2004 - Pat Gelsinger
Moore’s Law: But DifferentFrequencies will NOT get much faster!Maybe 5 to 10% every year or so, a few more times…And these modest gains would make the chips A LOThotter!http://www.tomshw.it/cpu.php?guide=20051121
The Manycore Shift“[A]fter decades of single core processors, the high volume processor industry has gone from single to dual to quad-core in just the last two years. Moore’s Law scaling should easily let us hit the 80-core mark in mainstream processors within the next ten years and quite possibly even less.”-- Justin Rattner, CTO, Intel (February 2007)“If you haven’t done so already, now is the time to take a hard look at the design of your application, determine what operations are CPU-sensitive now or are likely to become so soon, and identify how those places could benefit from concurrency.”-- Herb Sutter, C++ Architect at Microsoft (March 2005)
I'm convinced… now what?Multithreaded programming is “hard” todayDoable by only a subgroup of senior specialistsParallel patterns are not prevalent, well known, nor easy to implementSo many potential problemsBusinesses have little desire to “go deep”Best devs should focus on business value, not concurrencyNeed simple ways to allow all devs to write concurrent code
Example: “Race Car Drivers”IEnumerable<RaceCarDriver> drivers = ...;varresults = new List<RaceCarDriver>();foreach(var driver in drivers){if (driver.Name == queryName &&driver.Wins.Count >= queryWinCount)    {results.Add(driver);    }}results.Sort((b1, b2) =>     b1.Age.CompareTo(b2.Age));
Manual Parallel SolutionIEnumerable<RaceCarDriver> drivers = …;varresults = new List<RaceCarDriver>();intpartitionsCount = Environment.ProcessorCount;intremainingCount = partitionsCount;varenumerator = drivers.GetEnumerator();try {using (vardone = new ManualResetEvent(false)) {for(inti = 0; i < partitionsCount; i++) {ThreadPool.QueueUserWorkItem(delegate {while(true) {RaceCarDriver driver;lock (enumerator) {if (!enumerator.MoveNext()) break;                        driver = enumerator.Current;                    }if (driver.Name == queryName &&driver.Wins.Count >= queryWinCount) {lock(results) results.Add(driver);                    }                }if (Interlocked.Decrement(refremainingCount) == 0) done.Set();            });        }done.WaitOne();results.Sort((b1, b2) => b1.Age.CompareTo(b2.Age));    }}finally { if (enumerator is IDisposable) ((IDisposable)enumerator).Dispose(); }
PLINQ Solution.AsParallel()varresults = from driver in driverswhere driver.Name == queryName &&driver.Wins.Count >= queryWinCountorderbydriver.Ageascendingselect driver;
Visual Studio 2010Tools, Programming Models, RuntimesProgramming ModelsTools.NET Framework 4Visual C++ 10Visual StudioIDEParallel LINQParallel Pattern LibraryParallelDebugger Tool WindowsAgentsLibraryTask Parallel LibraryData StructuresData StructuresConcurrency RuntimeProfiler ConcurrencyAnalysisTask SchedulerThreadPoolTask SchedulerResource ManagerResource ManagerOperating SystemWindowsThreadsUMS ThreadsKey:ManagedNativeTooling
Parallel ExtensionsWhat is it?Pure .NET librariesNo compiler changes necessarymscorlib.dll, System.dll, System.Core.dllLightweight, user-mode runtimeKey ThreadPool enhancementsSupports imperative and declarative, data and task parallelismDeclarative data parallelism (PLINQ)Imperative data and task parallelism (Task Parallel Library)New coordination/synchronization constructsWhy do we need it?Supports parallelism in any .NET languageDelivers reduced concept count and complexity, better time to solutionBegins to move parallelism capabilities from concurrency experts to domain expertsHow do we get it?Built into the core of .NET 4Debugging and profiling support in Visual Studio 2010
ArchitecturePLINQ Execution Engine.NET ProgramParallel AlgorithmsDeclarativeQueriesQuery AnalysisData PartitioningChunkRangeHashStripedRepartitioningCustomOperator TypesMergingSync and AsyncOrder PreservingBufferedInvertedMapFilterSortSearchReduceGroupJoin…C# CompilerVB CompilerTask Parallel LibraryCoordination Data StructuresC++ CompilerLoop replacementsImperative Task ParallelismSchedulingThread-safe CollectionsSynchronization TypesCoordination TypesF# CompilerOther .NET CompilerThreadsILProc 1Proc p…
Language Integrated Query (LINQ)LINQ enabled data sourcesOthers…C#Visual Basic.NET Standard Query OperatorsLINQ-enabled ADO.NETLINQTo SQLLINQTo XMLLINQTo ObjectsLINQTo DatasetsLINQTo Entities<book>    <title/>    <author/>    <price/></book>RelationalObjectsXML
Writing a LINQ-to-Objects QueryTwo ways to write queriesComprehensionsSyntax extensions to C# and Visual BasicAPIsUsed as extension methods on IEnumerable<T>System.Linq.Enumerable classCompiler converts the former into the latterAPI implementation does the actual workvar q = from x in Y where p(x) orderby x.f1select x.f2;var q = Y.Where(x => p(x)).OrderBy(x => x.f1).Select(x => x.f2);var q = Enumerable.Select(Enumerable.OrderBy(Enumerable.Where(Y, x => p(x)),x => x.f1),x => x.f2);
LINQ Query OperatorsIn .NET 4, ~50 operators w/ ~175 overloadsAggregate(3)All(1)Any(2)AsEnumerable(1)Average(20)Cast(1)Concat(1)Contains(2)Count(2)DefaultIfEmpty(2)Distinct(2)ElementAt(1)ElementAtOrDefault(1)Empty(1)Except(2)First(2)FirstOrDefault(2)GroupBy(8)GroupJoin(2)Intersect(2)Join(2)Last(2)LastOrDefault(2)LongCount(2)Max(22)Min(22)OfType(1)OrderBy(2)OrderByDescending(2)Range(1)Repeat(1)Reverse(1)Select(2)SelectMany(4)SequenceEqual(2)Single(2)SingleOrDefault(2)Skip(1)SkipWhile(2)Sum(20)Take(1)TakeWhile(2)ThenBy(2)ThenByDescending(2)ToArray(1)ToDictionary(4)ToList(1)ToLookup(4)Union(2)Where(2)Zip(1)var operators = from method in typeof(Enumerable).GetMethods(BindingFlags.Public | BindingFlags.Static | BindingFlags.DeclaredOnly)                group method by method.Name into methodsorderbymethods.Key                select new { Name = methods.Key, Count=methods.Count() };
Query Operators, cont.Tree of operatorsProducersNo inputExamples: Range, RepeatConsumer/producersTransform input stream(s) into output streamExamples: Select, Where, Join, Skip, TakeConsumersReduce to a single valueExamples: Aggregate, Min, Max, FirstMany are unary while others are binaryData-intensive bulk transformations…SelectJoinWhereWhere
Implementation of a Query OperatorWhat might an implementation look like?Does it have to be this way?What if we could do this in… parallel?!public static IEnumerable<TSource> Where<TSource>(    this IEnumerable<TSource> source, Func<TSource, bool> predicate){    if (source == null || predicate == null)         throw new ArgumentNullException();foreach (var item in source)    {        if (predicate(item)) yield return item;    }}public static IEnumerable<TSource> Where<TSource>(    this IEnumerable<TSource> source, Func<TSource, bool> predicate){    ...}
Parallel LINQ (PLINQ)Utilizes parallel hardware for LINQ queriesAbstracts away most parallelism detailsPartitions and merges data intelligentlySupports all .NET Standard Query OperatorsPlus a few knobsWorks for any IEnumerable<T>Optimizations for other types (T[], IList<T>)Supports custom partitioning (Partitioner<T>)Built on top of the rest of Parallel Extensions
Programming ModelMinimal impact to existing LINQ programming modelAsParallel extension methodParallelEnumerable classImplements the Standard Query Operators, but for ParallelQuery<T>public static ParallelQuery<T> AsParallel<T>(this IEnumerable<T> source);public static ParallelQuery<TSource> Where<TSource>(    this ParallelQuery<TSource> source, Func<TSource, bool> predicate)
Writing a PLINQ QueryTwo ways to write queriesComprehensionsSyntax extensions to C# and Visual BasicAPIsUsed as extension methods on ParallelQuery<T>System.Linq.ParallelEnumerable classCompiler converts the former into the latter As with serial LINQ, API implementation does the actual workvar q = from x in Y.AsParallel()where p(x) orderby x.f1select x.f2;var q = Y.AsParallel().Where(x => p(x)).OrderBy(x => x.f1).Select(x => x.f2);var q = ParallelEnumerable.Select(ParallelEnumerable.OrderBy(ParallelEnumerable.Where(Y.AsParallel(), x => p(x)),x => x.f1),x => x.f2);
PLINQ KnobsAdditional Extension MethodsWithDegreeOfParallelismAsOrderedWithCancellationWithMergeOptionsWithExecutionModevar results = from driver in drivers.AsParallel().WithDegreeOfParallelism(4)              where driver.Name == queryName &&driver.Wins.Count >= queryWinCountorderbydriver.Age ascending              select driver;var results = from driver in drivers.AsParallel().AsOrdered()              where driver.Name == queryName &&driver.Wins.Count >= queryWinCountorderbydriver.Age ascending              select driver;
PartitioningInput to a single operator is partitioned into p disjoint subsets
Operators are replicated across the partitions
Example	from x in A where p(x) …Partitions execute in (almost) complete isolation… Task 1 …where p(x)A… Tasks 2..n-1 …… Task n…where p(x)
Partitioning: Load BalancingDynamicSchedulingStatic Scheduling (Range)CPU0CPU1…CPUNCPU0CPU1…CPUN5573173166882244
Several partitioning schemes built-inChunkWorks with any IEnumerable<T>Single enumerator shared; chunks handed out on-demandRangeWorks only with IList<T>Input divided into contiguous regions, one per partitionStripeWorks only with IList<T>Elements handed out round-robin to each partitionHashWorks with any IEnumerable<T>Elements assigned to partition based on hash codeCustom partitioning available through Partitioner<T>Partitioner.Createavailable for tighter control over built-in partitioning schemesPartitioning: Algorithms
Operator FusionNaïve approach: partition and merge for each operator
Example: (from x in D.AsParallel() where p(x) select x*x*x).Sum();
Partition and merge mean synchronization => scalabilitybottleneck
Instead, we can fuse operators together:
Minimizes number of partitioning/merging steps necessary… Task 1 …… Task 1 …… Task 1 …where p(x)select x3Sum()D#… Task n…… Task n…… Task n…where p(x)select x3Sum()… Task 1 …where p(x)select x3Sum()D#… Task n…where p(x)select x3Sum()
MergingPipelined: separate consumer threadDefault for GetEnumerator()And hence foreach loopsAutoBuffered, NoBufferingAccess to data as its availableBut more synchronization overheadStop-and-go: consumer helpsSorts, ToArray, ToList, etc.FullyBufferedMinimizes context switchesBut higher latency and more memoryInverted: no merging neededForAll extension methodMost efficient by farBut not always applicableRequires side-effectsThread 2Thread 1Thread 1Thread 3Thread 4Thread 1Thread 1Thread 1Thread 2Thread 3Thread 1Thread 1Thread 1Thread 2Thread 3
Parallelism BlockersOrdering not guaranteedExceptionsThread affinityOperations with < 1.0 speedupSide effects and mutability are serious issuesMost queries do not use side effects, but it’s possible…int[] values = new int[] { 0, 1, 2 };var q = from x in values.AsParallel() select x * 2;int[] scaled = q.ToArray(); // == { 0, 2, 4 }?System.AggregateExceptionobject[] data = new object[] { "foo", null, null };var q = from x in data.AsParallel() select o.ToString();controls.AsParallel().ForAll(c => c.Size = ...);IEnumerable<int> input = …;var doubled = from x in input.AsParallel() select x*2;Random rand = new Random();var q = from i in Enumerable.Range(0, 10000).AsParallel()        select rand.Next();
Task Parallel LibraryLoopsLoops are a common source of workCan be parallelized when iterations are independentBody doesn’t depend on mutable state / synchronization usedSynchronousAll iterations finish, regularly or exceptionallyLots of knobsBreaking, task-local state, custom partitioning, cancellation, scheduling, degree of parallelismVisual Studio 2010 profiler support (as with PLINQ)for (inti = 0; i < n; i++) work(i);…foreach (T e in data) work(e);Parallel.For(0, n, i => work(i));…Parallel.ForEach(data, e => work(e));
Task Parallel LibraryStatementsSequence of statementsWhen independent, can be parallelizedSynchronous (same as loops)Under the coversMay use Parallel.For, may use TasksStatementA();StatementB;StatementC();Parallel.Invoke(  () => StatementA() ,  () => StatementB   ,  () => StatementC() );
Task Parallel LibraryTasksSystem.Threading.TasksTaskRepresents an asynchronous operationSupports waiting, cancellation, continuations, …Parent/child relationships1st-class debugging support in Visual Studio 2010Task<TResult> : TaskTasks that return resultsTaskCompletionSource<TResult>Create Task<TResult>s to represent other operationsTaskSchedulerRepresents a scheduler that executes tasksExtensibleTaskScheduler.Default => ThreadPool
Global QueueWorker Thread 1Worker Thread 1ThreadPool in .NET 3.5…Item 4Item 5Program ThreadItem 1Item 2Item 3Item 6Thread Management:Starvation Detection
Idle Thread RetirementThreadPool in .NET 4LocalWork-Stealing QueueLocal Work-Stealing QueueLock-FreeGlobal Queue…Worker Thread 1Worker Thread p…Task 6Task 3Program ThreadTask 4Task 1Task 5Task 2Thread Management:Starvation Detection

Toub parallelism tour_oct2009

  • 1.
  • 2.
    Parallel Programming withVisual Studio 2010 and the .NET Framework 4Stephen ToubMicrosoft CorporationOctober 2009
  • 3.
    AgendaWhy Parallelism, WhyNow?Difficulties w/ Visual Studio 2008 & .NET 3.5Solutions w/ Visual Studio 2010 & .NET 4Parallel LINQTask Parallel LibraryNew Coordination & Synchronization PrimitivesNew Parallel Debugger WindowsNew Profiler Concurrency Visualizations
  • 4.
    Moore’s Law“The numberof transistors incorporated in a chip will approximately double every 24 months.” Gordon MooreIntel Co-Founderhttp://www.intel.com/pressroom/kits/events/moores_law_40th/
  • 5.
    Moore’s Law: Aliveand Well?The number of transistors doubles every two years…More than 1 billiontransistorsin 2006!http://upload.wikimedia.org/wikipedia/commons/2/25/Transistor_Count_and_Moore%27s_Law_-_2008_1024.png
  • 6.
    Moore’s Law: Feelthe Heat! Sun’s Surface10,0001,000100101Rocket NozzleNuclear ReactorPower Density (W/cm2)Pentium® processorsHot Plate8080‘70 ‘80 ’90 ’00 ‘10486386Intel Developer Forum, Spring 2004 - Pat Gelsinger
  • 7.
    Moore’s Law: ButDifferentFrequencies will NOT get much faster!Maybe 5 to 10% every year or so, a few more times…And these modest gains would make the chips A LOThotter!http://www.tomshw.it/cpu.php?guide=20051121
  • 8.
    The Manycore Shift“[A]fterdecades of single core processors, the high volume processor industry has gone from single to dual to quad-core in just the last two years. Moore’s Law scaling should easily let us hit the 80-core mark in mainstream processors within the next ten years and quite possibly even less.”-- Justin Rattner, CTO, Intel (February 2007)“If you haven’t done so already, now is the time to take a hard look at the design of your application, determine what operations are CPU-sensitive now or are likely to become so soon, and identify how those places could benefit from concurrency.”-- Herb Sutter, C++ Architect at Microsoft (March 2005)
  • 9.
    I'm convinced… nowwhat?Multithreaded programming is “hard” todayDoable by only a subgroup of senior specialistsParallel patterns are not prevalent, well known, nor easy to implementSo many potential problemsBusinesses have little desire to “go deep”Best devs should focus on business value, not concurrencyNeed simple ways to allow all devs to write concurrent code
  • 10.
    Example: “Race CarDrivers”IEnumerable<RaceCarDriver> drivers = ...;varresults = new List<RaceCarDriver>();foreach(var driver in drivers){if (driver.Name == queryName &&driver.Wins.Count >= queryWinCount) {results.Add(driver); }}results.Sort((b1, b2) => b1.Age.CompareTo(b2.Age));
  • 11.
    Manual Parallel SolutionIEnumerable<RaceCarDriver>drivers = …;varresults = new List<RaceCarDriver>();intpartitionsCount = Environment.ProcessorCount;intremainingCount = partitionsCount;varenumerator = drivers.GetEnumerator();try {using (vardone = new ManualResetEvent(false)) {for(inti = 0; i < partitionsCount; i++) {ThreadPool.QueueUserWorkItem(delegate {while(true) {RaceCarDriver driver;lock (enumerator) {if (!enumerator.MoveNext()) break; driver = enumerator.Current; }if (driver.Name == queryName &&driver.Wins.Count >= queryWinCount) {lock(results) results.Add(driver); } }if (Interlocked.Decrement(refremainingCount) == 0) done.Set(); }); }done.WaitOne();results.Sort((b1, b2) => b1.Age.CompareTo(b2.Age)); }}finally { if (enumerator is IDisposable) ((IDisposable)enumerator).Dispose(); }
  • 12.
    PLINQ Solution.AsParallel()varresults =from driver in driverswhere driver.Name == queryName &&driver.Wins.Count >= queryWinCountorderbydriver.Ageascendingselect driver;
  • 13.
    Visual Studio 2010Tools,Programming Models, RuntimesProgramming ModelsTools.NET Framework 4Visual C++ 10Visual StudioIDEParallel LINQParallel Pattern LibraryParallelDebugger Tool WindowsAgentsLibraryTask Parallel LibraryData StructuresData StructuresConcurrency RuntimeProfiler ConcurrencyAnalysisTask SchedulerThreadPoolTask SchedulerResource ManagerResource ManagerOperating SystemWindowsThreadsUMS ThreadsKey:ManagedNativeTooling
  • 14.
    Parallel ExtensionsWhat isit?Pure .NET librariesNo compiler changes necessarymscorlib.dll, System.dll, System.Core.dllLightweight, user-mode runtimeKey ThreadPool enhancementsSupports imperative and declarative, data and task parallelismDeclarative data parallelism (PLINQ)Imperative data and task parallelism (Task Parallel Library)New coordination/synchronization constructsWhy do we need it?Supports parallelism in any .NET languageDelivers reduced concept count and complexity, better time to solutionBegins to move parallelism capabilities from concurrency experts to domain expertsHow do we get it?Built into the core of .NET 4Debugging and profiling support in Visual Studio 2010
  • 15.
    ArchitecturePLINQ Execution Engine.NETProgramParallel AlgorithmsDeclarativeQueriesQuery AnalysisData PartitioningChunkRangeHashStripedRepartitioningCustomOperator TypesMergingSync and AsyncOrder PreservingBufferedInvertedMapFilterSortSearchReduceGroupJoin…C# CompilerVB CompilerTask Parallel LibraryCoordination Data StructuresC++ CompilerLoop replacementsImperative Task ParallelismSchedulingThread-safe CollectionsSynchronization TypesCoordination TypesF# CompilerOther .NET CompilerThreadsILProc 1Proc p…
  • 16.
    Language Integrated Query(LINQ)LINQ enabled data sourcesOthers…C#Visual Basic.NET Standard Query OperatorsLINQ-enabled ADO.NETLINQTo SQLLINQTo XMLLINQTo ObjectsLINQTo DatasetsLINQTo Entities<book> <title/> <author/> <price/></book>RelationalObjectsXML
  • 17.
    Writing a LINQ-to-ObjectsQueryTwo ways to write queriesComprehensionsSyntax extensions to C# and Visual BasicAPIsUsed as extension methods on IEnumerable<T>System.Linq.Enumerable classCompiler converts the former into the latterAPI implementation does the actual workvar q = from x in Y where p(x) orderby x.f1select x.f2;var q = Y.Where(x => p(x)).OrderBy(x => x.f1).Select(x => x.f2);var q = Enumerable.Select(Enumerable.OrderBy(Enumerable.Where(Y, x => p(x)),x => x.f1),x => x.f2);
  • 18.
    LINQ Query OperatorsIn.NET 4, ~50 operators w/ ~175 overloadsAggregate(3)All(1)Any(2)AsEnumerable(1)Average(20)Cast(1)Concat(1)Contains(2)Count(2)DefaultIfEmpty(2)Distinct(2)ElementAt(1)ElementAtOrDefault(1)Empty(1)Except(2)First(2)FirstOrDefault(2)GroupBy(8)GroupJoin(2)Intersect(2)Join(2)Last(2)LastOrDefault(2)LongCount(2)Max(22)Min(22)OfType(1)OrderBy(2)OrderByDescending(2)Range(1)Repeat(1)Reverse(1)Select(2)SelectMany(4)SequenceEqual(2)Single(2)SingleOrDefault(2)Skip(1)SkipWhile(2)Sum(20)Take(1)TakeWhile(2)ThenBy(2)ThenByDescending(2)ToArray(1)ToDictionary(4)ToList(1)ToLookup(4)Union(2)Where(2)Zip(1)var operators = from method in typeof(Enumerable).GetMethods(BindingFlags.Public | BindingFlags.Static | BindingFlags.DeclaredOnly) group method by method.Name into methodsorderbymethods.Key select new { Name = methods.Key, Count=methods.Count() };
  • 19.
    Query Operators, cont.Treeof operatorsProducersNo inputExamples: Range, RepeatConsumer/producersTransform input stream(s) into output streamExamples: Select, Where, Join, Skip, TakeConsumersReduce to a single valueExamples: Aggregate, Min, Max, FirstMany are unary while others are binaryData-intensive bulk transformations…SelectJoinWhereWhere
  • 20.
    Implementation of aQuery OperatorWhat might an implementation look like?Does it have to be this way?What if we could do this in… parallel?!public static IEnumerable<TSource> Where<TSource>( this IEnumerable<TSource> source, Func<TSource, bool> predicate){ if (source == null || predicate == null) throw new ArgumentNullException();foreach (var item in source) { if (predicate(item)) yield return item; }}public static IEnumerable<TSource> Where<TSource>( this IEnumerable<TSource> source, Func<TSource, bool> predicate){ ...}
  • 21.
    Parallel LINQ (PLINQ)Utilizesparallel hardware for LINQ queriesAbstracts away most parallelism detailsPartitions and merges data intelligentlySupports all .NET Standard Query OperatorsPlus a few knobsWorks for any IEnumerable<T>Optimizations for other types (T[], IList<T>)Supports custom partitioning (Partitioner<T>)Built on top of the rest of Parallel Extensions
  • 22.
    Programming ModelMinimal impactto existing LINQ programming modelAsParallel extension methodParallelEnumerable classImplements the Standard Query Operators, but for ParallelQuery<T>public static ParallelQuery<T> AsParallel<T>(this IEnumerable<T> source);public static ParallelQuery<TSource> Where<TSource>( this ParallelQuery<TSource> source, Func<TSource, bool> predicate)
  • 23.
    Writing a PLINQQueryTwo ways to write queriesComprehensionsSyntax extensions to C# and Visual BasicAPIsUsed as extension methods on ParallelQuery<T>System.Linq.ParallelEnumerable classCompiler converts the former into the latter As with serial LINQ, API implementation does the actual workvar q = from x in Y.AsParallel()where p(x) orderby x.f1select x.f2;var q = Y.AsParallel().Where(x => p(x)).OrderBy(x => x.f1).Select(x => x.f2);var q = ParallelEnumerable.Select(ParallelEnumerable.OrderBy(ParallelEnumerable.Where(Y.AsParallel(), x => p(x)),x => x.f1),x => x.f2);
  • 24.
    PLINQ KnobsAdditional ExtensionMethodsWithDegreeOfParallelismAsOrderedWithCancellationWithMergeOptionsWithExecutionModevar results = from driver in drivers.AsParallel().WithDegreeOfParallelism(4) where driver.Name == queryName &&driver.Wins.Count >= queryWinCountorderbydriver.Age ascending select driver;var results = from driver in drivers.AsParallel().AsOrdered() where driver.Name == queryName &&driver.Wins.Count >= queryWinCountorderbydriver.Age ascending select driver;
  • 25.
    PartitioningInput to asingle operator is partitioned into p disjoint subsets
  • 26.
    Operators are replicatedacross the partitions
  • 27.
    Example from x inA where p(x) …Partitions execute in (almost) complete isolation… Task 1 …where p(x)A… Tasks 2..n-1 …… Task n…where p(x)
  • 28.
    Partitioning: Load BalancingDynamicSchedulingStaticScheduling (Range)CPU0CPU1…CPUNCPU0CPU1…CPUN5573173166882244
  • 29.
    Several partitioning schemesbuilt-inChunkWorks with any IEnumerable<T>Single enumerator shared; chunks handed out on-demandRangeWorks only with IList<T>Input divided into contiguous regions, one per partitionStripeWorks only with IList<T>Elements handed out round-robin to each partitionHashWorks with any IEnumerable<T>Elements assigned to partition based on hash codeCustom partitioning available through Partitioner<T>Partitioner.Createavailable for tighter control over built-in partitioning schemesPartitioning: Algorithms
  • 30.
    Operator FusionNaïve approach:partition and merge for each operator
  • 31.
    Example: (from xin D.AsParallel() where p(x) select x*x*x).Sum();
  • 32.
    Partition and mergemean synchronization => scalabilitybottleneck
  • 33.
    Instead, we canfuse operators together:
  • 34.
    Minimizes number ofpartitioning/merging steps necessary… Task 1 …… Task 1 …… Task 1 …where p(x)select x3Sum()D#… Task n…… Task n…… Task n…where p(x)select x3Sum()… Task 1 …where p(x)select x3Sum()D#… Task n…where p(x)select x3Sum()
  • 35.
    MergingPipelined: separate consumerthreadDefault for GetEnumerator()And hence foreach loopsAutoBuffered, NoBufferingAccess to data as its availableBut more synchronization overheadStop-and-go: consumer helpsSorts, ToArray, ToList, etc.FullyBufferedMinimizes context switchesBut higher latency and more memoryInverted: no merging neededForAll extension methodMost efficient by farBut not always applicableRequires side-effectsThread 2Thread 1Thread 1Thread 3Thread 4Thread 1Thread 1Thread 1Thread 2Thread 3Thread 1Thread 1Thread 1Thread 2Thread 3
  • 36.
    Parallelism BlockersOrdering notguaranteedExceptionsThread affinityOperations with < 1.0 speedupSide effects and mutability are serious issuesMost queries do not use side effects, but it’s possible…int[] values = new int[] { 0, 1, 2 };var q = from x in values.AsParallel() select x * 2;int[] scaled = q.ToArray(); // == { 0, 2, 4 }?System.AggregateExceptionobject[] data = new object[] { "foo", null, null };var q = from x in data.AsParallel() select o.ToString();controls.AsParallel().ForAll(c => c.Size = ...);IEnumerable<int> input = …;var doubled = from x in input.AsParallel() select x*2;Random rand = new Random();var q = from i in Enumerable.Range(0, 10000).AsParallel() select rand.Next();
  • 37.
    Task Parallel LibraryLoopsLoopsare a common source of workCan be parallelized when iterations are independentBody doesn’t depend on mutable state / synchronization usedSynchronousAll iterations finish, regularly or exceptionallyLots of knobsBreaking, task-local state, custom partitioning, cancellation, scheduling, degree of parallelismVisual Studio 2010 profiler support (as with PLINQ)for (inti = 0; i < n; i++) work(i);…foreach (T e in data) work(e);Parallel.For(0, n, i => work(i));…Parallel.ForEach(data, e => work(e));
  • 38.
    Task Parallel LibraryStatementsSequenceof statementsWhen independent, can be parallelizedSynchronous (same as loops)Under the coversMay use Parallel.For, may use TasksStatementA();StatementB;StatementC();Parallel.Invoke( () => StatementA() , () => StatementB , () => StatementC() );
  • 39.
    Task Parallel LibraryTasksSystem.Threading.TasksTaskRepresentsan asynchronous operationSupports waiting, cancellation, continuations, …Parent/child relationships1st-class debugging support in Visual Studio 2010Task<TResult> : TaskTasks that return resultsTaskCompletionSource<TResult>Create Task<TResult>s to represent other operationsTaskSchedulerRepresents a scheduler that executes tasksExtensibleTaskScheduler.Default => ThreadPool
  • 40.
    Global QueueWorker Thread1Worker Thread 1ThreadPool in .NET 3.5…Item 4Item 5Program ThreadItem 1Item 2Item 3Item 6Thread Management:Starvation Detection
  • 41.
    Idle Thread RetirementThreadPoolin .NET 4LocalWork-Stealing QueueLocal Work-Stealing QueueLock-FreeGlobal Queue…Worker Thread 1Worker Thread p…Task 6Task 3Program ThreadTask 4Task 1Task 5Task 2Thread Management:Starvation Detection