Your SlideShare is downloading. ×
Patterns For Parallel Computing
Upcoming SlideShare
Loading in...5

Thanks for flagging this SlideShare!

Oops! An error has occurred.


Introducing the official SlideShare app

Stunning, full-screen experience for iPhone and Android

Text the download link to your phone

Standard text messaging rates apply

Patterns For Parallel Computing


Published on

Presentation delivered at Microsoft Architect Council on 2009.06.11 by David Chou

Presentation delivered at Microsoft Architect Council on 2009.06.11 by David Chou

Published in: Technology

  • Be the first to comment

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

No notes for slide
  • SETI@Home states:StatToday; Change (Last 24 hours) Teams55,848; 12    Active   15,817;   4 Users977,698; 291    Active   148,334;   -65 Hosts2.34e+6; 930    Active   238,234 ;  -256 Total Credit4.89e+10; 4.97e+7 Recent Average6.31e+7; -1,352,173 Total FLOPs 4.221e+22; 4.298e+19
  • Source: Cal Henderson, Chief Architect, Flickr
  • Source: Cal Henderson, Chief Architect, Flickr
  • Source: Aber Whitcomb, Co-Founder and CTO, MySpace; Jim Benedetto, SVP Technical Operations, MySpace
  • Source: John Rothschild, VP of Technology, Facebook
  • Source: Jeffrey Dean and Sanjay Ghemawat, Google
  • Source: WernerVogels, CTO, Amazon
  • Deployed at MySpace for messaging infrastructure
  • Deployed in AdCenter for massivelog processing
  • Transcript

    • 1. Patterns for Parallel Computing
      David Chou
    • 2. > Outline
      An architectural conversation
      Design Principles
      Microsoft Platform
    • 3. > Concepts
      Why is this interesting?
      Amdahl’s law (1967)
      Multi-core processors
      High-performance computing
      Distributed architecture
      Web–scale applications
      Cloud computing
       Paradigm shift!
    • 4. > Concepts
      Parallel Computing == ??
      Simultaneous multi-threading (Intel HyperThreading, IBM Cell microprocessor for PS3, etc.)
      Operating system multitasking (cooperative, preemptive; symmetric multi-processing, etc.)
      Server load-balancing & clustering(Oracle RAC, Windows HPC Server, etc.)
      Grid computing (SETI@home, Sun Grid, DataSynapse, DigiPede, etc.)
      Asynchronous programming (AJAX, JMS, MQ, event-driven, etc.)
      Multi-threaded & concurrent programming (java.lang.Thread, System.Thread, Click, LabVIEW, etc.)
      Massively parallel processing (MapReduce, Hadoop, Dryad, etc.)
       Elements and best practices in all of these
    • 5. > Patterns
      Types of Parallelism
      Bit-level parallelism (microprocessors)
      Instruction-level parallelism (compilers)
      Multiprocessing, multi-tasking (operating systems)
      HPC, clustering (servers)
      Multi-threading (application code)
      Data parallelism (massive distributed databases)
      Task parallelism(concurrent distributed processing)
       Focus is moving “up” the technology stack…
    • 6. >Patterns > HPC, Clustering
      Clustering Infrastructure for High Availability
    • 7. >Patterns > HPC, Clustering
      High-Performance Computing
      Web/App Server
      Web/App Server
    • 8. >Patterns > HPC, Clustering > Example
      Infrastructure and Application Footprint
      7 Internet data centers & 3 CDN partnerships
      120+ Websites, 1000’s apps and 2500 databases
      20-30+ Gbits/sec Web traffic; 500+ Gbits/sec download traffic
      2007 stats (
      #9 ranked domain in U.S; 54.0M UU for 36.0% reach
      #5 site worldwide; reaching 287.3M UU
      15K req/sec, 35K concurrent connections on 80 servers
      600 vroots, 350 IIS Web apps & 12 app pools
      Windows Server 2008, SQL Server 2008, IIS7, ASP.NET 3.5
      2007 stats (Windows Update):
      350M UScans/day, 60K ASP.NET req/sec, 1.5M concurrent connections
      50B downloads for CY 2006
      Update Egress – MS, Akamai, Level3 & Limelight (50-500+ Gbits/sec)
    • 9. >Patterns > Multi-threading
      Multi-threaded programming
      Execution Time
      Execution Time
    • 10. >Patterns > Multi-threading
      Typically, functional decomposition into individual threads
      But, explicit concurrent programming brings complexities
      Managing threads, semaphores, monitors, dead-locks, race conditions, mutual exclusion, synchronization, etc.
      Moving towards implicit parallelism
      Integrating concurrency & coordination into mainstream programming languages
      Developing tools to ease development
      Encapsulating parallelism in reusable components
      Raising the semantic level: new approaches
    • 11. >Patterns > Multi-threading > Example
      Web Browser
      2007 stats:
      +30M searches processed / day
      25M UU/month in US, +46M worldwide
      +7B images uploaded
      +300K unique websites link to content
      #31 top 50 sites in US
      #41 top 100 sites worldwide
      18th largest ad supported site in US
      Content Pods
      Content Pods
      Content Pods
      Content Pods
      Content Pods
      Content Pods
      Content Pods
      Content Pods
      Content Pods
      Content Pods
      Content Pods
      Content Pods
      Content Pods
      Content Pods
      Content Pods
      Content Pods
      Scaling the performance:
      Browser handles concurrency
      Centralized lookup
      Horizontal partitioning of distributed content
    • 12. >Patterns > Data Parallelism
      Data Parallelism
      Loop-level parallelism
      Focuses on distributing the data across different parallel computing nodes
      Denormalization, sharding, horizontal partitioning, etc.
      Each processor performs the same task on different pieces of distributed data
      Emphasizes the distributed (parallelized) nature of the data
      Ideal for data that is read more than written (scale vs. consistency)
    • 13. >Patterns > Data Parallelism
      Parallelizing Data in Distributed Architecture
      Web/App Server
      Web/App Server
      Web/App Server
      Web/App Server
      Web/App Server
    • 14. >Patterns > Data Parallelism > Example
      2007 stats:
      Serve 40,000 photos / second
      Handle 100,000 cache operations / second
      Process 130,000 database queries / second
      Scaling the “read” data:
      Data denormalization
      Database replication and federation
      Vertical partitioning
      Central cluster for index lookups
      Large data sets horizontally partitioned as shards
      Grow by binary hashing of user buckets
    • 15. >Patterns > Data Parallelism > Example
      2007 stats:
      115B pageviews/month
      5M concurrent users @ peak
      +3B images, mp3, videos
      +10M new images/day
      160 Gbit/sec peak bandwidth
      Scaling the “write” data:
      MyCache: distributed dynamic memory cache
      MyRelay: inter-node messaging transport handling +100K req/sec, directs reads/writes to any node
      MySpace Distributed File System: geographically redundant distributed storage providing massive concurrent access to images, mp3, videos, etc.
      MySpace Distributed Transaction Manager: broker for all non-transient writes to databases/SAN, multi-phase commit across data centers
    • 16. >Patterns > Data Parallelism > Example
      2009 stats:
      +200B pageviews/month
      >3.9T feed actions/day
      +300M active users
      >1B chat mesgs/day
      100M search queries/day
      >6B minutes spent/day (ranked #2 on Internet)
      +20B photos, +2B/month growth
      600,000 photos served / sec
      25TB log data / day processed thru Scribe
      120M queries /sec on memcache
      Scaling the “relational” data:
      Keeps data normalized, randomly distributed, accessed at high volumes
      Uses “shared nothing” architecture
    • 17. >Patterns > Task Parallelism
      Task Parallelism
      Functional parallelism
      Focuses on distributing execution processes (threads) across different parallel computing nodes
      Each processor executes a different thread (or process) on the same or different data
      Communication takes place usually to pass data from one thread to the next as part of a workflow
      Emphasizes the distributed (parallelized) nature of the processing (i.e. threads)
      Need to design how to compose partial output from concurrent processes
    • 18. >Patterns > Task Parallelism > Example
      2007 stats:
      +20 petabytes of data processed / day by +100K MapReduce jobs
      1 petabyte sort took ~6 hours on ~4K servers replicated onto ~48K disks
      +200 GFS clusters, each at 1-5K nodes, handling +5 petabytes of storage
      ~40 GB/sec aggregate read/write throughput across the cluster
      +500 servers for each search query < 500ms
      Scaling the process:
      MapReduce: parallel processing framework
      BigTable: structured hash database
      Google File System: massively scalable distributed storage
    • 19. > Design Principles
      Parallelism for Speedup
      Amdahl’s law (1967): 11 −P+ PN
      Amdahl’s speedup: Max.Speedup≤ p1+f∗(p−1)
      Gustafson’s law (1988): SP=P − 𝛼 ∙P−1
      Gustafson’s speedup: S=an+p∙(1−an)
      Karp-Flatt metric (1990): e=1𝜑−1p1−1p
      Speedup: Sp=T1Tp
      Efficiency: Ep=Spp=T1pTp
    • 20. > Design Principles
      Parallelism for Scale-out
      Sequential  Parallel
      Convert sequential and/or single-machine program into a form in which it can be executed in a concurrent, potentially distributed environment
      Over-decompose for scaling
      Structured multi-threading with a data focus
      Relax sequential order to gain more parallelism
      Ensure atomicity of unordered interactions
      Consider data as well as control flow
      Careful data structure & locking choices to manage contention
      User parallel data structures
      Minimize shared data and synchronization
      Continuous optimization
    • 21. >Design Principles > Example
      Principles for Scalable Service Design (Werner Vogels, CTO, Amazon)
      Controlled concurrency
      Controlled parallelism
      Decompose into small well-understood building blocks
      Failure tolerant
      Local responsibility
      Recovery built-in
    • 22. > Microsoft Platform
      Parallel computing on the Microsoft platform
      Concurrent Programming (.NET 4.0 Parallel APIs)
      Distributed Computing (CCR & DSS Runtime, Dryad)
      Cloud Computing (Azure Services Platform)
      Grid Computing (Windows HPC Server 2008)
      Massive Data Processing (SQL Server “Madison”)
       Components spanning a spectrum of computing models
    • 23. > Microsoft Platform > Concurrent Programming
      .NET 4.0 Parallel APIs
      Task Parallel Library (TPL)
      Parallel LINQ (PLINQ)
      Data Structures
      Diagnostic Tools
    • 24. > Microsoft Platform > Distributed Computing
      CCR & DSS Toolkit
      Concurrency & Coordination Runtime
      Decentralized Software Services
      Supporting multi-core and concurrent applications by facilitating asynchronous operations
      Dealing with concurrency, exploiting parallel hardware and handling partial failure
      Supporting robust, distributed applications based on a light-weight state-driven service model
      Providing service composition, event notification, and data isolation
    • 25. > Microsoft Platform > Distributed Computing
      General-purpose execution environment for distributed, data-parallel applications
      Automated management of resources, scheduling, distribution, monitoring, fault tolerance, accounting, etc.
      Concurrency and mutual exclusion semantics transparency
      Higher-level and domain-specific language support
    • 26. > Microsoft Platform > Cloud Computing
      Azure Services Platform
      Internet-scale, highly available cloud fabric
      Auto-provisioning 64-bit compute nodes on Windows Server VMs
      Massively scalable distributed storage (table, blob, queue)
      Massively scalable and highly consistent relational database
    • 27. > Microsoft Platform > Grid Computing
      Windows HPC Server
      #10 fastest supercomputer in the world (
      30,720 cores
      180.6 teraflops
      77.5% efficiency
      Image multicasting-based parallel deployment of cluster nodes
      Fault tolerance with failover clustering of head node
      Policy-driven, NUMA-aware, multicore-aware, job scheduler
      Inter-process distributed communication via MS-MPI
    • 28. > Microsoft Platform > Massive Data Processing
      SQL Server “Madison”
      Massively parallel processing (MPP) architecture
      +500TB to PB’s databases
      “Ultra Shared Nothing” design
      IO and CPU affinity within symmetric multi-processing (SMP) nodes
      Multiple physical instances of tables w/ dynamic re-distribution
      Distribute / partition large tables across multiple nodes
      Replicate small tables
      Replicate + distribute medium tables
    • 29. > Resources
      For More Information
      Architect Council Website (
      This series (
      .NET 4.0 Parallel APIs (
      CCR & DSS Toolkit (
      Dryad (
      Azure Services Platform (
      SQL Server “Madison” (
      Windows HPC Server 2008 (
    • 30. Thank you!
      © 2009 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.
      The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.