Patterns For Parallel Computing

Uploaded on

Presentation delivered at Microsoft Architect Council on 2009.06.11 by David Chou

Presentation delivered at Microsoft Architect Council on 2009.06.11 by David Chou

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads


Total Views
On Slideshare
From Embeds
Number of Embeds



Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

    No notes for slide
  • SETI@Home states:StatToday; Change (Last 24 hours) Teams55,848; 12    Active   15,817;   4 Users977,698; 291    Active   148,334;   -65 Hosts2.34e+6; 930    Active   238,234 ;  -256 Total Credit4.89e+10; 4.97e+7 Recent Average6.31e+7; -1,352,173 Total FLOPs 4.221e+22; 4.298e+19
  • Source: Cal Henderson, Chief Architect, Flickr
  • Source: Cal Henderson, Chief Architect, Flickr
  • Source: Aber Whitcomb, Co-Founder and CTO, MySpace; Jim Benedetto, SVP Technical Operations, MySpace
  • Source: John Rothschild, VP of Technology, Facebook
  • Source: Jeffrey Dean and Sanjay Ghemawat, Google
  • Source: WernerVogels, CTO, Amazon
  • Deployed at MySpace for messaging infrastructure
  • Deployed in AdCenter for massivelog processing


  • 1. Patterns for Parallel Computing
    David Chou
  • 2. > Outline
    An architectural conversation
    Design Principles
    Microsoft Platform
  • 3. > Concepts
    Why is this interesting?
    Amdahl’s law (1967)
    Multi-core processors
    High-performance computing
    Distributed architecture
    Web–scale applications
    Cloud computing
     Paradigm shift!
  • 4. > Concepts
    Parallel Computing == ??
    Simultaneous multi-threading (Intel HyperThreading, IBM Cell microprocessor for PS3, etc.)
    Operating system multitasking (cooperative, preemptive; symmetric multi-processing, etc.)
    Server load-balancing & clustering(Oracle RAC, Windows HPC Server, etc.)
    Grid computing (SETI@home, Sun Grid, DataSynapse, DigiPede, etc.)
    Asynchronous programming (AJAX, JMS, MQ, event-driven, etc.)
    Multi-threaded & concurrent programming (java.lang.Thread, System.Thread, Click, LabVIEW, etc.)
    Massively parallel processing (MapReduce, Hadoop, Dryad, etc.)
     Elements and best practices in all of these
  • 5. > Patterns
    Types of Parallelism
    Bit-level parallelism (microprocessors)
    Instruction-level parallelism (compilers)
    Multiprocessing, multi-tasking (operating systems)
    HPC, clustering (servers)
    Multi-threading (application code)
    Data parallelism (massive distributed databases)
    Task parallelism(concurrent distributed processing)
     Focus is moving “up” the technology stack…
  • 6. >Patterns > HPC, Clustering
    Clustering Infrastructure for High Availability
  • 7. >Patterns > HPC, Clustering
    High-Performance Computing
    Web/App Server
    Web/App Server
  • 8. >Patterns > HPC, Clustering > Example
    Infrastructure and Application Footprint
    7 Internet data centers & 3 CDN partnerships
    120+ Websites, 1000’s apps and 2500 databases
    20-30+ Gbits/sec Web traffic; 500+ Gbits/sec download traffic
    2007 stats (
    #9 ranked domain in U.S; 54.0M UU for 36.0% reach
    #5 site worldwide; reaching 287.3M UU
    15K req/sec, 35K concurrent connections on 80 servers
    600 vroots, 350 IIS Web apps & 12 app pools
    Windows Server 2008, SQL Server 2008, IIS7, ASP.NET 3.5
    2007 stats (Windows Update):
    350M UScans/day, 60K ASP.NET req/sec, 1.5M concurrent connections
    50B downloads for CY 2006
    Update Egress – MS, Akamai, Level3 & Limelight (50-500+ Gbits/sec)
  • 9. >Patterns > Multi-threading
    Multi-threaded programming
    Execution Time
    Execution Time
  • 10. >Patterns > Multi-threading
    Typically, functional decomposition into individual threads
    But, explicit concurrent programming brings complexities
    Managing threads, semaphores, monitors, dead-locks, race conditions, mutual exclusion, synchronization, etc.
    Moving towards implicit parallelism
    Integrating concurrency & coordination into mainstream programming languages
    Developing tools to ease development
    Encapsulating parallelism in reusable components
    Raising the semantic level: new approaches
  • 11. >Patterns > Multi-threading > Example
    Web Browser
    2007 stats:
    +30M searches processed / day
    25M UU/month in US, +46M worldwide
    +7B images uploaded
    +300K unique websites link to content
    #31 top 50 sites in US
    #41 top 100 sites worldwide
    18th largest ad supported site in US
    Content Pods
    Content Pods
    Content Pods
    Content Pods
    Content Pods
    Content Pods
    Content Pods
    Content Pods
    Content Pods
    Content Pods
    Content Pods
    Content Pods
    Content Pods
    Content Pods
    Content Pods
    Content Pods
    Scaling the performance:
    Browser handles concurrency
    Centralized lookup
    Horizontal partitioning of distributed content
  • 12. >Patterns > Data Parallelism
    Data Parallelism
    Loop-level parallelism
    Focuses on distributing the data across different parallel computing nodes
    Denormalization, sharding, horizontal partitioning, etc.
    Each processor performs the same task on different pieces of distributed data
    Emphasizes the distributed (parallelized) nature of the data
    Ideal for data that is read more than written (scale vs. consistency)
  • 13. >Patterns > Data Parallelism
    Parallelizing Data in Distributed Architecture
    Web/App Server
    Web/App Server
    Web/App Server
    Web/App Server
    Web/App Server
  • 14. >Patterns > Data Parallelism > Example
    2007 stats:
    Serve 40,000 photos / second
    Handle 100,000 cache operations / second
    Process 130,000 database queries / second
    Scaling the “read” data:
    Data denormalization
    Database replication and federation
    Vertical partitioning
    Central cluster for index lookups
    Large data sets horizontally partitioned as shards
    Grow by binary hashing of user buckets
  • 15. >Patterns > Data Parallelism > Example
    2007 stats:
    115B pageviews/month
    5M concurrent users @ peak
    +3B images, mp3, videos
    +10M new images/day
    160 Gbit/sec peak bandwidth
    Scaling the “write” data:
    MyCache: distributed dynamic memory cache
    MyRelay: inter-node messaging transport handling +100K req/sec, directs reads/writes to any node
    MySpace Distributed File System: geographically redundant distributed storage providing massive concurrent access to images, mp3, videos, etc.
    MySpace Distributed Transaction Manager: broker for all non-transient writes to databases/SAN, multi-phase commit across data centers
  • 16. >Patterns > Data Parallelism > Example
    2009 stats:
    +200B pageviews/month
    >3.9T feed actions/day
    +300M active users
    >1B chat mesgs/day
    100M search queries/day
    >6B minutes spent/day (ranked #2 on Internet)
    +20B photos, +2B/month growth
    600,000 photos served / sec
    25TB log data / day processed thru Scribe
    120M queries /sec on memcache
    Scaling the “relational” data:
    Keeps data normalized, randomly distributed, accessed at high volumes
    Uses “shared nothing” architecture
  • 17. >Patterns > Task Parallelism
    Task Parallelism
    Functional parallelism
    Focuses on distributing execution processes (threads) across different parallel computing nodes
    Each processor executes a different thread (or process) on the same or different data
    Communication takes place usually to pass data from one thread to the next as part of a workflow
    Emphasizes the distributed (parallelized) nature of the processing (i.e. threads)
    Need to design how to compose partial output from concurrent processes
  • 18. >Patterns > Task Parallelism > Example
    2007 stats:
    +20 petabytes of data processed / day by +100K MapReduce jobs
    1 petabyte sort took ~6 hours on ~4K servers replicated onto ~48K disks
    +200 GFS clusters, each at 1-5K nodes, handling +5 petabytes of storage
    ~40 GB/sec aggregate read/write throughput across the cluster
    +500 servers for each search query < 500ms
    Scaling the process:
    MapReduce: parallel processing framework
    BigTable: structured hash database
    Google File System: massively scalable distributed storage
  • 19. > Design Principles
    Parallelism for Speedup
    Amdahl’s law (1967): 11 −P+ PN
    Amdahl’s speedup: Max.Speedup≤ p1+f∗(p−1)
    Gustafson’s law (1988): SP=P − 𝛼 ∙P−1
    Gustafson’s speedup: S=an+p∙(1−an)
    Karp-Flatt metric (1990): e=1𝜑−1p1−1p
    Speedup: Sp=T1Tp
    Efficiency: Ep=Spp=T1pTp
  • 20. > Design Principles
    Parallelism for Scale-out
    Sequential  Parallel
    Convert sequential and/or single-machine program into a form in which it can be executed in a concurrent, potentially distributed environment
    Over-decompose for scaling
    Structured multi-threading with a data focus
    Relax sequential order to gain more parallelism
    Ensure atomicity of unordered interactions
    Consider data as well as control flow
    Careful data structure & locking choices to manage contention
    User parallel data structures
    Minimize shared data and synchronization
    Continuous optimization
  • 21. >Design Principles > Example
    Principles for Scalable Service Design (Werner Vogels, CTO, Amazon)
    Controlled concurrency
    Controlled parallelism
    Decompose into small well-understood building blocks
    Failure tolerant
    Local responsibility
    Recovery built-in
  • 22. > Microsoft Platform
    Parallel computing on the Microsoft platform
    Concurrent Programming (.NET 4.0 Parallel APIs)
    Distributed Computing (CCR & DSS Runtime, Dryad)
    Cloud Computing (Azure Services Platform)
    Grid Computing (Windows HPC Server 2008)
    Massive Data Processing (SQL Server “Madison”)
     Components spanning a spectrum of computing models
  • 23. > Microsoft Platform > Concurrent Programming
    .NET 4.0 Parallel APIs
    Task Parallel Library (TPL)
    Parallel LINQ (PLINQ)
    Data Structures
    Diagnostic Tools
  • 24. > Microsoft Platform > Distributed Computing
    CCR & DSS Toolkit
    Concurrency & Coordination Runtime
    Decentralized Software Services
    Supporting multi-core and concurrent applications by facilitating asynchronous operations
    Dealing with concurrency, exploiting parallel hardware and handling partial failure
    Supporting robust, distributed applications based on a light-weight state-driven service model
    Providing service composition, event notification, and data isolation
  • 25. > Microsoft Platform > Distributed Computing
    General-purpose execution environment for distributed, data-parallel applications
    Automated management of resources, scheduling, distribution, monitoring, fault tolerance, accounting, etc.
    Concurrency and mutual exclusion semantics transparency
    Higher-level and domain-specific language support
  • 26. > Microsoft Platform > Cloud Computing
    Azure Services Platform
    Internet-scale, highly available cloud fabric
    Auto-provisioning 64-bit compute nodes on Windows Server VMs
    Massively scalable distributed storage (table, blob, queue)
    Massively scalable and highly consistent relational database
  • 27. > Microsoft Platform > Grid Computing
    Windows HPC Server
    #10 fastest supercomputer in the world (
    30,720 cores
    180.6 teraflops
    77.5% efficiency
    Image multicasting-based parallel deployment of cluster nodes
    Fault tolerance with failover clustering of head node
    Policy-driven, NUMA-aware, multicore-aware, job scheduler
    Inter-process distributed communication via MS-MPI
  • 28. > Microsoft Platform > Massive Data Processing
    SQL Server “Madison”
    Massively parallel processing (MPP) architecture
    +500TB to PB’s databases
    “Ultra Shared Nothing” design
    IO and CPU affinity within symmetric multi-processing (SMP) nodes
    Multiple physical instances of tables w/ dynamic re-distribution
    Distribute / partition large tables across multiple nodes
    Replicate small tables
    Replicate + distribute medium tables
  • 29. > Resources
    For More Information
    Architect Council Website (
    This series (
    .NET 4.0 Parallel APIs (
    CCR & DSS Toolkit (
    Dryad (
    Azure Services Platform (
    SQL Server “Madison” (
    Windows HPC Server 2008 (
  • 30. Thank you!
    © 2009 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.
    The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.