Toub parallelism tour_oct2009
Upcoming SlideShare
Loading in...5
×
 

Toub parallelism tour_oct2009

on

  • 1,059 views

 

Statistics

Views

Total Views
1,059
Views on SlideShare
1,059
Embed Views
0

Actions

Likes
1
Downloads
13
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Toub parallelism tour_oct2009 Toub parallelism tour_oct2009 Presentation Transcript

  • http://go.microsoft.com/?linkid=9692084
  • Parallel Programming with Visual Studio 2010 and the .NET Framework 4
    Stephen Toub
    Microsoft Corporation
    October 2009
  • Agenda
    Why Parallelism, Why Now?
    Difficulties w/ Visual Studio 2008 & .NET 3.5
    Solutions w/ Visual Studio 2010 & .NET 4
    Parallel LINQ
    Task Parallel Library
    New Coordination & Synchronization Primitives
    New Parallel Debugger Windows
    New Profiler Concurrency Visualizations
  • Moore’s Law
    “The number of transistors incorporated in a chip will approximately double every 24 months.”
    Gordon Moore
    Intel Co-Founder
    http://www.intel.com/pressroom/kits/events/moores_law_40th/
  • Moore’s Law: Alive and Well?
    The number of transistors doubles every two years…
    More than 1 billiontransistors
    in 2006!
    http://upload.wikimedia.org/wikipedia/commons/2/25/Transistor_Count_and_Moore%27s_Law_-_2008_1024.png
  • Moore’s Law: Feel the Heat!
    Sun’s Surface
    10,000
    1,000
    100
    10
    1
    Rocket Nozzle
    Nuclear Reactor
    Power Density (W/cm2)
    Pentium® processors
    Hot Plate
    8080
    ‘70 ‘80 ’90 ’00 ‘10
    486
    386
    Intel Developer Forum, Spring 2004 - Pat Gelsinger
  • Moore’s Law: But Different
    Frequencies will NOT get much faster!
    Maybe 5 to 10% every year or so, a few more times…
    And these modest gains would make the chips A LOThotter!
    http://www.tomshw.it/cpu.php?guide=20051121
  • The Manycore Shift
    “[A]fter decades of single core processors, the high volume processor industry has gone from single to dual to quad-core in just the last two years. Moore’s Law scaling should easily let us hit the 80-core mark in mainstream processors within the next ten years and quite possibly even less.”-- Justin Rattner, CTO, Intel (February 2007)
    “If you haven’t done so already, now is the time to take a hard look at the design of your application, determine what operations are CPU-sensitive now or are likely to become so soon, and identify how those places could benefit from concurrency.”-- Herb Sutter, C++ Architect at Microsoft (March 2005)
  • I'm convinced… now what?
    Multithreaded programming is “hard” today
    Doable by only a subgroup of senior specialists
    Parallel patterns are not prevalent, well known, nor easy to implement
    So many potential problems
    Businesses have little desire to “go deep”
    Best devs should focus on business value, not concurrency
    Need simple ways to allow all devs to write concurrent code
  • Example: “Race Car Drivers”
    IEnumerable<RaceCarDriver> drivers = ...;
    varresults = new List<RaceCarDriver>();
    foreach(var driver in drivers)
    {
    if (driver.Name == queryName &&
    driver.Wins.Count >= queryWinCount)
    {
    results.Add(driver);
    }
    }
    results.Sort((b1, b2) =>
    b1.Age.CompareTo(b2.Age));
  • Manual Parallel Solution
    IEnumerable<RaceCarDriver> drivers = …;
    varresults = new List<RaceCarDriver>();
    intpartitionsCount = Environment.ProcessorCount;
    intremainingCount = partitionsCount;
    varenumerator = drivers.GetEnumerator();
    try {
    using (vardone = new ManualResetEvent(false)) {
    for(inti = 0; i < partitionsCount; i++) {
    ThreadPool.QueueUserWorkItem(delegate {
    while(true) {
    RaceCarDriver driver;
    lock (enumerator) {
    if (!enumerator.MoveNext()) break;
    driver = enumerator.Current;
    }
    if (driver.Name == queryName &&
    driver.Wins.Count >= queryWinCount) {
    lock(results) results.Add(driver);
    }
    }
    if (Interlocked.Decrement(refremainingCount) == 0) done.Set();
    });
    }
    done.WaitOne();
    results.Sort((b1, b2) => b1.Age.CompareTo(b2.Age));
    }
    }
    finally { if (enumerator is IDisposable) ((IDisposable)enumerator).Dispose(); }
  • P
    LINQ Solution
    .AsParallel()
    varresults = from driver in drivers
    where driver.Name == queryName &&
    driver.Wins.Count >= queryWinCount
    orderbydriver.Ageascending
    select driver;
  • Visual Studio 2010Tools, Programming Models, Runtimes
    Programming Models
    Tools
    .NET Framework 4
    Visual C++ 10
    Visual
    Studio
    IDE
    Parallel LINQ
    Parallel Pattern Library
    Parallel
    Debugger Tool Windows
    AgentsLibrary
    Task Parallel
    Library
    Data Structures
    Data Structures
    Concurrency Runtime
    Profiler Concurrency
    Analysis
    Task Scheduler
    ThreadPool
    Task Scheduler
    Resource Manager
    Resource Manager
    Operating System
    Windows
    Threads
    UMS Threads
    Key:
    Managed
    Native
    Tooling
  • Parallel Extensions
    What is it?
    Pure .NET libraries
    No compiler changes necessary
    mscorlib.dll, System.dll, System.Core.dll
    Lightweight, user-mode runtime
    Key ThreadPool enhancements
    Supports imperative and declarative, data and task parallelism
    Declarative data parallelism (PLINQ)
    Imperative data and task parallelism (Task Parallel Library)
    New coordination/synchronization constructs
    Why do we need it?
    Supports parallelism in any .NET language
    Delivers reduced concept count and complexity, better time to solution
    Begins to move parallelism capabilities from concurrency experts to domain experts
    How do we get it?
    Built into the core of .NET 4
    Debugging and profiling support in Visual Studio 2010
  • Architecture
    PLINQ Execution Engine
    .NET Program
    Parallel Algorithms
    Declarative
    Queries
    Query Analysis
    Data Partitioning
    Chunk
    Range
    Hash
    Striped
    Repartitioning
    Custom
    Operator Types
    Merging
    Sync and Async
    Order Preserving
    Buffered
    Inverted
    Map
    Filter
    Sort
    Search
    Reduce
    Group
    Join

    C# Compiler
    VB Compiler
    Task Parallel Library
    Coordination Data Structures
    C++ Compiler
    Loop replacementsImperative Task Parallelism
    Scheduling
    Thread-safe Collections
    Synchronization Types
    Coordination Types
    F# Compiler
    Other .NET Compiler
    Threads
    IL
    Proc 1
    Proc p

  • Language Integrated Query (LINQ)
    LINQ enabled data sources
    Others…
    C#
    Visual Basic
    .NET Standard Query Operators
    LINQ-enabled ADO.NET
    LINQ
    To SQL
    LINQ
    To XML
    LINQ
    To Objects
    LINQ
    To Datasets
    LINQ
    To Entities
    <book>
    <title/>
    <author/>
    <price/>
    </book>
    Relational
    Objects
    XML
  • Writing a LINQ-to-Objects Query
    Two ways to write queries
    Comprehensions
    Syntax extensions to C# and Visual Basic
    APIs
    Used as extension methods on IEnumerable<T>
    System.Linq.Enumerable class
    Compiler converts the former into the latter
    API implementation does the actual work
    var q = from x in Y where p(x) orderby x.f1select x.f2;
    var q = Y.Where(x => p(x)).OrderBy(x => x.f1).Select(x => x.f2);
    var q = Enumerable.Select(
    Enumerable.OrderBy(
    Enumerable.Where(Y, x => p(x)),
    x => x.f1),
    x => x.f2);
  • LINQ Query Operators
    • In .NET 4, ~50 operators w/ ~175 overloads
    Aggregate(3)
    All(1)
    Any(2)
    AsEnumerable(1)
    Average(20)
    Cast(1)
    Concat(1)
    Contains(2)
    Count(2)
    DefaultIfEmpty(2)
    Distinct(2)
    ElementAt(1)
    ElementAtOrDefault(1)
    Empty(1)
    Except(2)
    First(2)
    FirstOrDefault(2)
    GroupBy(8)
    GroupJoin(2)
    Intersect(2)
    Join(2)
    Last(2)
    LastOrDefault(2)
    LongCount(2)
    Max(22)
    Min(22)
    OfType(1)
    OrderBy(2)
    OrderByDescending(2)
    Range(1)
    Repeat(1)
    Reverse(1)
    Select(2)
    SelectMany(4)
    SequenceEqual(2)
    Single(2)
    SingleOrDefault(2)
    Skip(1)
    SkipWhile(2)
    Sum(20)
    Take(1)
    TakeWhile(2)
    ThenBy(2)
    ThenByDescending(2)
    ToArray(1)
    ToDictionary(4)
    ToList(1)
    ToLookup(4)
    Union(2)
    Where(2)
    Zip(1)
    var operators = from method in typeof(Enumerable).GetMethods(
    BindingFlags.Public | BindingFlags.Static | BindingFlags.DeclaredOnly)
    group method by method.Name into methods
    orderbymethods.Key
    select new { Name = methods.Key, Count=methods.Count() };
  • Query Operators, cont.
    Tree of operators
    Producers
    No input
    Examples: Range, Repeat
    Consumer/producers
    Transform input stream(s) into output stream
    Examples: Select, Where, Join, Skip, Take
    Consumers
    Reduce to a single value
    Examples: Aggregate, Min, Max, First
    Many are unary while others are binary
    • Data-intensive bulk transformations

    Select
    Join
    Where
    Where
  • Implementation of a Query Operator
    What might an implementation look like?
    Does it have to be this way?
    What if we could do this in… parallel?!
    public static IEnumerable<TSource> Where<TSource>(
    this IEnumerable<TSource> source,
    Func<TSource, bool> predicate)
    {
    if (source == null || predicate == null)
    throw new ArgumentNullException();
    foreach (var item in source)
    {
    if (predicate(item)) yield return item;
    }
    }
    public static IEnumerable<TSource> Where<TSource>(
    this IEnumerable<TSource> source,
    Func<TSource, bool> predicate)
    {
    ...
    }
  • Parallel LINQ (PLINQ)
    Utilizes parallel hardware for LINQ queries
    Abstracts away most parallelism details
    Partitions and merges data intelligently
    Supports all .NET Standard Query Operators
    Plus a few knobs
    Works for any IEnumerable<T>
    Optimizations for other types (T[], IList<T>)
    Supports custom partitioning (Partitioner<T>)
    Built on top of the rest of Parallel Extensions
  • Programming Model
    Minimal impact to existing LINQ programming model
    AsParallel extension method
    ParallelEnumerable class
    Implements the Standard Query Operators, but for ParallelQuery<T>
    public static ParallelQuery<T>
    AsParallel<T>(this IEnumerable<T> source);
    public static ParallelQuery<TSource>
    Where<TSource>( this ParallelQuery<TSource> source,
    Func<TSource, bool> predicate)
  • Writing a PLINQ Query
    Two ways to write queries
    Comprehensions
    Syntax extensions to C# and Visual Basic
    APIs
    Used as extension methods on ParallelQuery<T>
    System.Linq.ParallelEnumerable class
    Compiler converts the former into the latter
    As with serial LINQ, API implementation does the actual work
    var q = from x in Y.AsParallel()where p(x) orderby x.f1select x.f2;
    var q = Y.AsParallel().Where(x => p(x)).OrderBy(x => x.f1).Select(x => x.f2);
    var q = ParallelEnumerable.Select(
    ParallelEnumerable.OrderBy(
    ParallelEnumerable.Where(Y.AsParallel(), x => p(x)),
    x => x.f1),
    x => x.f2);
  • PLINQ Knobs
    Additional Extension Methods
    WithDegreeOfParallelism
    AsOrdered
    WithCancellation
    WithMergeOptions
    WithExecutionMode
    var results = from driver in drivers.AsParallel().WithDegreeOfParallelism(4)
    where driver.Name == queryName &&
    driver.Wins.Count >= queryWinCount
    orderbydriver.Age ascending
    select driver;
    var results = from driver in drivers.AsParallel().AsOrdered()
    where driver.Name == queryName &&
    driver.Wins.Count >= queryWinCount
    orderbydriver.Age ascending
    select driver;
  • Partitioning
    • Input to a single operator is partitioned into p disjoint subsets
    • Operators are replicated across the partitions
    • Example
    from x in A where p(x) …
    • Partitions execute in (almost) complete isolation
    … Task 1 …
    where p(x)
    A
    … Tasks 2..n-1 …
    … Task n…
    where p(x)
  • Partitioning: Load Balancing
    DynamicScheduling
    Static Scheduling (Range)
    CPU0
    CPU1

    CPUN
    CPU0
    CPU1

    CPUN
    5
    5
    7
    3
    1
    7
    3
    1
    6
    6
    8
    8
    2
    2
    4
    4
  • Several partitioning schemes built-in
    Chunk
    Works with any IEnumerable<T>
    Single enumerator shared; chunks handed out on-demand
    Range
    Works only with IList<T>
    Input divided into contiguous regions, one per partition
    Stripe
    Works only with IList<T>
    Elements handed out round-robin to each partition
    Hash
    Works with any IEnumerable<T>
    Elements assigned to partition based on hash code
    Custom partitioning available through Partitioner<T>
    Partitioner.Createavailable for tighter control over built-in partitioning schemes
    Partitioning: Algorithms
  • Operator Fusion
    • Naïve approach: partition and merge for each operator
    • Example: (from x in D.AsParallel() where p(x) select x*x*x).Sum();
    • Partition and merge mean synchronization => scalabilitybottleneck
    • Instead, we can fuse operators together:
    • Minimizes number of partitioning/merging steps necessary
    … Task 1 …
    … Task 1 …
    … Task 1 …
    where p(x)
    select x3
    Sum()
    D
    #
    … Task n…
    … Task n…
    … Task n…
    where p(x)
    select x3
    Sum()
    … Task 1 …
    where p(x)
    select x3
    Sum()
    D
    #
    … Task n…
    where p(x)
    select x3
    Sum()
  • Merging
    Pipelined: separate consumer thread
    Default for GetEnumerator()
    And hence foreach loops
    AutoBuffered, NoBuffering
    Access to data as its available
    But more synchronization overhead
    Stop-and-go: consumer helps
    Sorts, ToArray, ToList, etc.
    FullyBuffered
    Minimizes context switches
    But higher latency and more memory
    Inverted: no merging needed
    ForAll extension method
    Most efficient by far
    But not always applicable
    Requires side-effects
    Thread 2
    Thread 1
    Thread 1
    Thread 3
    Thread 4
    Thread 1
    Thread 1
    Thread 1
    Thread 2
    Thread 3
    Thread 1
    Thread 1
    Thread 1
    Thread 2
    Thread 3
  • Parallelism Blockers
    Ordering not guaranteed
    Exceptions
    Thread affinity
    Operations with < 1.0 speedup
    Side effects and mutability are serious issues
    Most queries do not use side effects, but it’s possible…
    int[] values = new int[] { 0, 1, 2 };var q = from x in values.AsParallel() select x * 2;int[] scaled = q.ToArray(); // == { 0, 2, 4 }?
    System.AggregateException
    object[] data = new object[] { "foo", null, null };var q = from x in data.AsParallel() select o.ToString();
    controls.AsParallel().ForAll(c => c.Size = ...);
    IEnumerable<int> input = …;
    var doubled = from x in input.AsParallel() select x*2;
    Random rand = new Random();
    var q = from i in Enumerable.Range(0, 10000).AsParallel()
    select rand.Next();
  • Task Parallel LibraryLoops
    Loops are a common source of work
    Can be parallelized when iterations are independent
    Body doesn’t depend on mutable state / synchronization used
    Synchronous
    All iterations finish, regularly or exceptionally
    Lots of knobs
    Breaking, task-local state, custom partitioning, cancellation, scheduling, degree of parallelism
    Visual Studio 2010 profiler support (as with PLINQ)
    for (inti = 0; i < n; i++) work(i);

    foreach (T e in data) work(e);
    Parallel.For(0, n, i => work(i));

    Parallel.ForEach(data, e => work(e));
  • Task Parallel LibraryStatements
    Sequence of statements
    When independent, can be parallelized
    Synchronous (same as loops)
    Under the covers
    May use Parallel.For, may use Tasks
    StatementA();
    StatementB;
    StatementC();
    Parallel.Invoke(
    () => StatementA() ,
    () => StatementB ,
    () => StatementC() );
  • Task Parallel LibraryTasks
    System.Threading.Tasks
    Task
    Represents an asynchronous operation
    Supports waiting, cancellation, continuations, …
    Parent/child relationships
    1st-class debugging support in Visual Studio 2010
    Task<TResult> : Task
    Tasks that return results
    TaskCompletionSource<TResult>
    Create Task<TResult>s to represent other operations
    TaskScheduler
    Represents a scheduler that executes tasks
    Extensible
    TaskScheduler.Default => ThreadPool
  • Global Queue
    Worker Thread 1
    Worker Thread 1
    ThreadPool in .NET 3.5

    Item 4
    Item 5
    Program Thread
    Item 1
    Item 2
    Item 3
    Item 6
    Thread Management:
    • Starvation Detection
    • Idle Thread Retirement
  • ThreadPool in .NET 4
    Local
    Work-Stealing Queue
    Local Work-Stealing Queue
    Lock-Free
    Global Queue

    Worker Thread 1
    Worker Thread p

    Task 6
    Task 3
    Program Thread
    Task 4
    Task 1
    Task 5
    Task 2
    Thread Management:
    • Starvation Detection
    • Idle Thread Retirement
    • Hill-climbing
  • New Primitives
    Public, and used throughout PLINQ and TPL
    Address many of today’s core concurrency issues
    Thread-safe, scalable collections
    IProducerConsumerCollection<T>
    ConcurrentQueue<T>
    ConcurrentStack<T>
    ConcurrentBag<T>
    ConcurrentDictionary<TKey,TValue>
    Phases and work exchange
    Barrier
    BlockingCollection<T>
    CountdownEvent
    Partitioning
    {Orderable}Partitioner<T>
    Partitioner.Create
    Exception handling
    AggregateException
    Initialization
    Lazy<T>
    LazyInitializer.EnsureInitialized<T>
    ThreadLocal<T>
    Locks
    ManualResetEventSlim
    SemaphoreSlim
    SpinLock
    SpinWait
    Cancellation
    CancellationToken{Source}
  • What Can I Do with These Cores?
    Offload
    Free up your UI
    Go faster whenever you can
    Parallelize the parallelizable
    Do more
    Use more data to get better results
    Add more features
    Speculate
    Pre-fetch, Pre-process
    Evaluate multiple solutions
  • Performance Tips
    Compute intensive and/or large data sets
    Work done should be at least 1,000s of cycles
    Measure, and combine/optimize as necessary
    Use the Visual Studio concurrency profiler
    Look for common anti-patterns: load imbalance, lock convoys, etc.
    Parallelize fine-grained but not too fine-grained
    e.g. Parallelize outer loop, unless N is insufficiently large to offer enough parallelism
    Consider parallelizing only inner, or both, at that point
    Consider unrolling
    Do not be gratuitous in task creation
    Lightweight, but still requires object allocation, etc.
    Prefer isolation & immutability over synchronization
    Synchronization => !Scalable
    Try to avoid shared state
    Have realistic expectations
  • Amdahl’s Law
    Theoretical maximum speedup determined by amount of sequential code
  • To Infinity And Beyond…
    The “Manycore Shift” is happening
    Parallelism in your code is inevitable
    Visual Studio 2010 and .NET 4 will help
    Parallel Computing Dev Center
    http://msdn.com/concurrency
    Download Beta 2 (“go-live” license)
    http://go.microsoft.com/?linkid=9692084
    Team Blogs
    Managed: http://blogs.msdn.com/pfxteam
    Native: http://blogs.msdn.com/nativeconcurrency
    Tools: http://blogs.msdn.com/visualizeconcurrency
    Forums
    http://social.msdn.microsoft.com/Forums/en-US/category/parallelcomputing
    We love feedback!
  • © 2009 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.
    The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.