Architecting Cloudy ApplicationsDavid Choudavid.chou@microsoft.comblogs.msdn.com/dachou
> IntroductionSize mattersFacebook (2009)+200B pageviews /month>3.9T feed actions /day+300M active users>1B chat mesgs /day100M search queries /day>6B minutes spent /day (ranked #2 on Internet)+20B photos, +2B/month growth600,000 photos served /sec25TB log data /day processed thru Scribe120M queries /sec on memcacheTwitter (2009)600 requests /secavg 200-300 connections /sec; peak at 800MySQL handles 2,400 requests /sec30+ processes for handling odd jobsprocess a request in 200 milliseconds in Railsaverage time spent in the database is 50-100 milliseconds+16 GB of memcachedGoogle (2007)+20 petabytes of data processed /day by +100K MapReduce jobs 1 petabyte sort took ~6 hours on ~4K servers replicated onto ~48K disks+200 GFS clusters, each at 1-5K nodes, handling +5 petabytes of storage~40 GB /sec aggregate read/write throughput across the cluster+500 servers for each search query < 500ms>1B views / day on Youtube (2009)Myspace(2007)115B pageviews /month5M concurrent users @ peak+3B images, mp3, videos+10M new images/day160 Gbit/sec peak bandwidthFlickr (2007)+4B queries /day+2B photos served~35M photos in squid cache~2M photos in squid’s RAM 38k req/sec to memcached (12M objects) 2 PB raw storage+400K photos added /daySource: multiple articles, High Scalabilityhttp://highscalability.com/
> IntroductionCloud levels the playing field2007founded by 6 people2008$29M funding from VC2009revenue - $270M$180M funding from Digital Sky Technologies20101,000+ employees$300M funding from Google and SoftbankActive unique players75M monthly60M daily1M daily 4 days after launch10M after 60 daysHosted in Amazon Web Services12,000 EC2 nodes3 Gigabits/sec of traffic between FarmVille and Facebook (at peak)caching cluster serves another 1.5 Gigabits/sec to the applicationSource: “How FarmVille Scales to Harvest 75 Million Players a Month”, 2010.02.08, Tedd Hoffhttp://highscalability.com/blog/2010/2/8/how-farmville-scales-to-harvest-75-million-players-a-month.html
> IntroductionCloud computingCharacteristicsOn-demand self-serviceBroad network accessResource poolingRapid elasticityMeasured serviceService modelsSoftware as a servicePlatform as a serviceInfrastructure as a serviceDeployment modelsPrivate cloudCommunity cloudPublic cloudHybrid cloud“Cloud computing is a model for enabling convenient, on-demand network access to a shared pool of configurable computing resources (e.g., networks, servers, storage, applications, and services) that can be rapidly provisioned and released with minimal management effort or service provider interaction. This cloud model promotes availability and is composed of five essential characteristics, three service models, and four deployment models.”Source: The NIST Definition of Cloud Computing, Version 15, 2009.10.07, Peter Mell and Tim Grancehttp://csrc.nist.gov/groups/SNS/cloud-computing/cloud-def-v15.doc
> IntroductionService delivery models(On-Premise)Infrastructure(as a Service)Platform(as a Service)Software(as a Service)You manageApplicationsApplicationsApplicationsApplicationsYou manageDataDataDataDataRuntimeRuntimeRuntimeRuntimeManaged by vendorMiddlewareMiddlewareMiddlewareMiddlewareYou manageManaged by vendorO/SO/SO/SO/SManaged by vendorVirtualizationVirtualizationVirtualizationVirtualizationServersServersServersServersStorageStorageStorageStorageNetworkingNetworkingNetworkingNetworking
> Architecting for Scale > Vertical ScalingTraditional scale-up architectureCommon characteristicssynchronous processessequential units of worktight couplingstatefulpessimistic concurrencyclustering for HAvertical scalingunits of workapp serverwebdata storeapp serverwebdata store
> Architecting for Scale >Vertical ScalingTraditional scale-up architectureTo scale, get bigger serversexpensivehas scaling limitsinefficient use of resourcesapp serverwebdata storeapp serverweb
> Architecting for Scale >Vertical ScalingTraditional scale-up architectureWhen problems occurbigger failure impactdata storeapp serverwebapp serverweb
> Architecting for Scale >Vertical ScalingTraditional scale-up architectureWhen problems occurbigger failure impactmore complex recoveryapp serverwebdata storeweb
> Architecting for Scale >Fundamental ConceptsCAP (Consistency, Availability, Partition) TheoremAt most two of these properties for any shared-data systemConsistency + Availability High data integrity
Single site, cluster database, LDAP, xFS file system, etc.
2-phase commit, data replication, etc.ACAACCConsistency + Partition Distributed database, distributed locking, etc.
Pessimistic locking, minority partition unavailable, etc.PPPAvailability + Partition High scalability
Distributed cache, DNS, etc.
Optimistic locking, expiration/leases, etc.Source: “Towards Robust Distributed Systems”, Dr. Eric A. Brewer, UC Berkeley
> Architecting for Scale > Horizontal scalingUse more pieces, not bigger piecesLEGO 7778 Midi-scale Millennium Falcon9.3 x 6.7 x 3.2 inches (L/W/H)
356 piecesLEGO 10179 Ultimate Collector's Millennium Falcon33 x 22 x 8.3 inches (L/W/H)
5,195 pieces> Architecting for Scale > Horizontal scalingScale-out architectureCommon characteristicssmall logical units of workloosely-coupled processesstatelessevent-driven designoptimistic concurrencypartitioned dataredundancy fault-tolerancere-try-based recoverabilityapp serverwebdata storeapp serverwebdata store
> Architecting for Scale > Horizontal scalingScale-out architectureTo scale, add more serversnot bigger serversapp serverwebdata storeapp serverwebdata storeapp serverwebdata storeapp serverwebdata storeapp serverwebdata storeapp serverwebdata store
> Architecting for Scale > Horizontal scalingScale-out architectureWhen problems occursmaller failure impacthigher perceived availabilityapp serverwebdata storeapp serverwebdata storeapp serverwebdata storeapp serverwebdata storeapp serverwebdata storeapp serverwebdata store
> Architecting for Scale > Horizontal scalingScale-out architectureWhen problems occursmaller failure impacthigher perceived availabilitysimpler recoveryapp serverwebdata storeapp serverwebdata storewebapp serverdata storewebdata storeapp serverwebdata storeapp serverwebdata store
> Architecting for Scale > Horizontal scalingScale-out architecture + distributed computingparallel tasksScalable performance at extreme scaleasynchronous processesparallelizationsmaller footprintoptimized resource usagereduced response timeimproved throughputapp serverwebdata storeapp serverwebdata storewebapp serverdata storeapp serverwebdata storeperceived response timeapp serverwebdata storeapp serverwebdata storeasync tasks
> Architecting for Scale > Horizontal scalingScale-out architecture + distributed computingWhen problems occursmaller units of workdecoupling shields impactapp serverwebdata storeapp serverwebdata storewebapp serverdata storeapp serverwebdata storeapp serverwebdata storeapp serverwebdata store
> Architecting for Scale > Horizontal scalingScale-out architecture + distributed computingWhen problems occursmaller units of workdecoupling shields impacteven simpler recoveryapp serverwebdata storeapp serverwebdata storewebapp serverdata storeapp serverwebdata storeapp serverwebdata storewebdata store
> Architecting for Scale >Cloud Architecture PatternsLive Journal (from Brad Fitzpatrick, then Founder at Live Journal, 2007)Web FrontendApps & ServicesPartitioned DataDistributedCacheDistributed Storage
> Architecting for Scale >Cloud Architecture PatternsFlickr (from Cal Henderson, then Director of Engineering at Yahoo, 2007)Web FrontendApps & ServicesDistributed StorageDistributedCachePartitioned Data
> Architecting for Scale >Cloud Architecture PatternsSlideShare(from John Boutelle, CTO at Slideshare, 2008)WebFrontendApps &ServicesDistributed CachePartitioned DataDistributed Storage
> Architecting for Scale >Cloud Architecture PatternsTwitter (from John Adams, Ops Engineer at Twitter, 2010)WebFrontendApps &ServicesPartitionedDataQueuesAsyncProcessesDistributedCacheDistributedStorage
> Architecting for Scale >Cloud Architecture PatternsDistributedStorageFacebook(from Jeff Rothschild, VP Technology at Facebook, 2009)2010 stats (Source: http://www.facebook.com/press/info.php?statistics)People+500M active users50% of active users log on in any given daypeople spend +700B minutes /monthActivity on Facebook+900M objects that people interact with+30B pieces of content shared /monthGlobal Reach+70 translations available on the site~70% of users outside the US+300K users helped translate the site through the translations applicationPlatform+1M developers from +180 countries+70% of users engage with applications /month+550K active applications+1M websites have integrated with Facebook Platform +150M people engage with Facebook on external websites /monthWebFrontendApps &ServicesDistributedCacheParallelProcessesPartitionedDataAsyncProcesses
>Architecting for ScaleFundamental conceptsVertical scaling still works

Architecting Cloudy Applications

  • 1.
    Architecting Cloudy ApplicationsDavidChoudavid.chou@microsoft.comblogs.msdn.com/dachou
  • 2.
    > IntroductionSize mattersFacebook(2009)+200B pageviews /month>3.9T feed actions /day+300M active users>1B chat mesgs /day100M search queries /day>6B minutes spent /day (ranked #2 on Internet)+20B photos, +2B/month growth600,000 photos served /sec25TB log data /day processed thru Scribe120M queries /sec on memcacheTwitter (2009)600 requests /secavg 200-300 connections /sec; peak at 800MySQL handles 2,400 requests /sec30+ processes for handling odd jobsprocess a request in 200 milliseconds in Railsaverage time spent in the database is 50-100 milliseconds+16 GB of memcachedGoogle (2007)+20 petabytes of data processed /day by +100K MapReduce jobs 1 petabyte sort took ~6 hours on ~4K servers replicated onto ~48K disks+200 GFS clusters, each at 1-5K nodes, handling +5 petabytes of storage~40 GB /sec aggregate read/write throughput across the cluster+500 servers for each search query < 500ms>1B views / day on Youtube (2009)Myspace(2007)115B pageviews /month5M concurrent users @ peak+3B images, mp3, videos+10M new images/day160 Gbit/sec peak bandwidthFlickr (2007)+4B queries /day+2B photos served~35M photos in squid cache~2M photos in squid’s RAM 38k req/sec to memcached (12M objects) 2 PB raw storage+400K photos added /daySource: multiple articles, High Scalabilityhttp://highscalability.com/
  • 3.
    > IntroductionCloud levelsthe playing field2007founded by 6 people2008$29M funding from VC2009revenue - $270M$180M funding from Digital Sky Technologies20101,000+ employees$300M funding from Google and SoftbankActive unique players75M monthly60M daily1M daily 4 days after launch10M after 60 daysHosted in Amazon Web Services12,000 EC2 nodes3 Gigabits/sec of traffic between FarmVille and Facebook (at peak)caching cluster serves another 1.5 Gigabits/sec to the applicationSource: “How FarmVille Scales to Harvest 75 Million Players a Month”, 2010.02.08, Tedd Hoffhttp://highscalability.com/blog/2010/2/8/how-farmville-scales-to-harvest-75-million-players-a-month.html
  • 4.
    > IntroductionCloud computingCharacteristicsOn-demandself-serviceBroad network accessResource poolingRapid elasticityMeasured serviceService modelsSoftware as a servicePlatform as a serviceInfrastructure as a serviceDeployment modelsPrivate cloudCommunity cloudPublic cloudHybrid cloud“Cloud computing is a model for enabling convenient, on-demand network access to a shared pool of configurable computing resources (e.g., networks, servers, storage, applications, and services) that can be rapidly provisioned and released with minimal management effort or service provider interaction. This cloud model promotes availability and is composed of five essential characteristics, three service models, and four deployment models.”Source: The NIST Definition of Cloud Computing, Version 15, 2009.10.07, Peter Mell and Tim Grancehttp://csrc.nist.gov/groups/SNS/cloud-computing/cloud-def-v15.doc
  • 5.
    > IntroductionService deliverymodels(On-Premise)Infrastructure(as a Service)Platform(as a Service)Software(as a Service)You manageApplicationsApplicationsApplicationsApplicationsYou manageDataDataDataDataRuntimeRuntimeRuntimeRuntimeManaged by vendorMiddlewareMiddlewareMiddlewareMiddlewareYou manageManaged by vendorO/SO/SO/SO/SManaged by vendorVirtualizationVirtualizationVirtualizationVirtualizationServersServersServersServersStorageStorageStorageStorageNetworkingNetworkingNetworkingNetworking
  • 6.
    > Architecting forScale > Vertical ScalingTraditional scale-up architectureCommon characteristicssynchronous processessequential units of worktight couplingstatefulpessimistic concurrencyclustering for HAvertical scalingunits of workapp serverwebdata storeapp serverwebdata store
  • 7.
    > Architecting forScale >Vertical ScalingTraditional scale-up architectureTo scale, get bigger serversexpensivehas scaling limitsinefficient use of resourcesapp serverwebdata storeapp serverweb
  • 8.
    > Architecting forScale >Vertical ScalingTraditional scale-up architectureWhen problems occurbigger failure impactdata storeapp serverwebapp serverweb
  • 9.
    > Architecting forScale >Vertical ScalingTraditional scale-up architectureWhen problems occurbigger failure impactmore complex recoveryapp serverwebdata storeweb
  • 10.
    > Architecting forScale >Fundamental ConceptsCAP (Consistency, Availability, Partition) TheoremAt most two of these properties for any shared-data systemConsistency + Availability High data integrity
  • 11.
    Single site, clusterdatabase, LDAP, xFS file system, etc.
  • 12.
    2-phase commit, datareplication, etc.ACAACCConsistency + Partition Distributed database, distributed locking, etc.
  • 13.
    Pessimistic locking, minoritypartition unavailable, etc.PPPAvailability + Partition High scalability
  • 14.
  • 15.
    Optimistic locking, expiration/leases,etc.Source: “Towards Robust Distributed Systems”, Dr. Eric A. Brewer, UC Berkeley
  • 16.
    > Architecting forScale > Horizontal scalingUse more pieces, not bigger piecesLEGO 7778 Midi-scale Millennium Falcon9.3 x 6.7 x 3.2 inches (L/W/H)
  • 17.
    356 piecesLEGO 10179Ultimate Collector's Millennium Falcon33 x 22 x 8.3 inches (L/W/H)
  • 18.
    5,195 pieces> Architectingfor Scale > Horizontal scalingScale-out architectureCommon characteristicssmall logical units of workloosely-coupled processesstatelessevent-driven designoptimistic concurrencypartitioned dataredundancy fault-tolerancere-try-based recoverabilityapp serverwebdata storeapp serverwebdata store
  • 19.
    > Architecting forScale > Horizontal scalingScale-out architectureTo scale, add more serversnot bigger serversapp serverwebdata storeapp serverwebdata storeapp serverwebdata storeapp serverwebdata storeapp serverwebdata storeapp serverwebdata store
  • 20.
    > Architecting forScale > Horizontal scalingScale-out architectureWhen problems occursmaller failure impacthigher perceived availabilityapp serverwebdata storeapp serverwebdata storeapp serverwebdata storeapp serverwebdata storeapp serverwebdata storeapp serverwebdata store
  • 21.
    > Architecting forScale > Horizontal scalingScale-out architectureWhen problems occursmaller failure impacthigher perceived availabilitysimpler recoveryapp serverwebdata storeapp serverwebdata storewebapp serverdata storewebdata storeapp serverwebdata storeapp serverwebdata store
  • 22.
    > Architecting forScale > Horizontal scalingScale-out architecture + distributed computingparallel tasksScalable performance at extreme scaleasynchronous processesparallelizationsmaller footprintoptimized resource usagereduced response timeimproved throughputapp serverwebdata storeapp serverwebdata storewebapp serverdata storeapp serverwebdata storeperceived response timeapp serverwebdata storeapp serverwebdata storeasync tasks
  • 23.
    > Architecting forScale > Horizontal scalingScale-out architecture + distributed computingWhen problems occursmaller units of workdecoupling shields impactapp serverwebdata storeapp serverwebdata storewebapp serverdata storeapp serverwebdata storeapp serverwebdata storeapp serverwebdata store
  • 24.
    > Architecting forScale > Horizontal scalingScale-out architecture + distributed computingWhen problems occursmaller units of workdecoupling shields impacteven simpler recoveryapp serverwebdata storeapp serverwebdata storewebapp serverdata storeapp serverwebdata storeapp serverwebdata storewebdata store
  • 25.
    > Architecting forScale >Cloud Architecture PatternsLive Journal (from Brad Fitzpatrick, then Founder at Live Journal, 2007)Web FrontendApps & ServicesPartitioned DataDistributedCacheDistributed Storage
  • 26.
    > Architecting forScale >Cloud Architecture PatternsFlickr (from Cal Henderson, then Director of Engineering at Yahoo, 2007)Web FrontendApps & ServicesDistributed StorageDistributedCachePartitioned Data
  • 27.
    > Architecting forScale >Cloud Architecture PatternsSlideShare(from John Boutelle, CTO at Slideshare, 2008)WebFrontendApps &ServicesDistributed CachePartitioned DataDistributed Storage
  • 28.
    > Architecting forScale >Cloud Architecture PatternsTwitter (from John Adams, Ops Engineer at Twitter, 2010)WebFrontendApps &ServicesPartitionedDataQueuesAsyncProcessesDistributedCacheDistributedStorage
  • 29.
    > Architecting forScale >Cloud Architecture PatternsDistributedStorageFacebook(from Jeff Rothschild, VP Technology at Facebook, 2009)2010 stats (Source: http://www.facebook.com/press/info.php?statistics)People+500M active users50% of active users log on in any given daypeople spend +700B minutes /monthActivity on Facebook+900M objects that people interact with+30B pieces of content shared /monthGlobal Reach+70 translations available on the site~70% of users outside the US+300K users helped translate the site through the translations applicationPlatform+1M developers from +180 countries+70% of users engage with applications /month+550K active applications+1M websites have integrated with Facebook Platform +150M people engage with Facebook on external websites /monthWebFrontendApps &ServicesDistributedCacheParallelProcessesPartitionedDataAsyncProcesses
  • 30.
    >Architecting for ScaleFundamentalconceptsVertical scaling still works
  • 31.
    >Architecting for ScaleFundamentalconceptsHorizontal scaling for cloud computingSmall pieces, loosely coupledDistributed computing best practicesasynchronous processes (event-driven design)parallelizationidempotent operations (handle duplicity)de-normalized, partitioned data (sharding)shared nothing architectureoptimistic concurrencyfault-tolerance by redundancy and replicationetc.
  • 32.
    > Architecting forScale >Fundamental ConceptsAsynchronous processes & parallelizationDefer work as late as possiblereturn to user as quickly as possibleevent-driven design (instead of request-driven)Cloud computing friendlydistributes work to more servers (divide & conquer)smaller resource usage/footprintsmaller failure surfacedecouples process dependenciesWindows Azure platform servicesQueue ServiceAppFabric Service Businter-node communicationWorker RoleWeb RoleQueuesService BusWeb RoleWeb RoleWeb RoleWorker RoleWorker RoleWorker Role
  • 33.
    > Architecting forScale >Fundamental ConceptsPartitioned dataShared nothing architecturetransaction locality (partition based on an entity that is the “atomic” target of majority of transactional processing)loosened referential integrity (avoid distributed transactions across shard and entity boundaries)design for dynamic redistribution and growth of data (elasticity)Cloud computing friendlydivide & conquersize growth with virtually no limitssmaller failure surfaceWindows Azure platform servicesTable Storage ServiceSQL AzurereadWeb RoleQueuesWeb RoleWeb RoleWorker RoleRelational DatabaseRelational DatabaseRelational DatabaseWeb Rolewrite
  • 34.
    > Architecting forScale >Fundamental ConceptsIdempotent operationsRepeatable processesallow duplicates (additive)allow re-tries (overwrite)reject duplicates (optimistic locking)stateless designCloud computing friendlyresiliencyWindows Azure platform servicesQueue ServiceAppFabric Service BusWorker RoleService BusWorker RoleWorker Role
  • 35.
    > Architecting forScale >Fundamental ConceptsHybrid architecturesScale-out (horizontal)BASE: Basically Available, Soft state, Eventually consistentfocus on “commit”conservative (pessimistic)shared nothingfavor extreme sizee.g., user requests, data collection & processing, etc.Scale-up (vertical)ACID: Atomicity, Consistency, Isolation, Durabilityavailability first; best effortaggressive (optimistic)transactionalfavor accuracy/consistencye.g., BI & analytics, financial processing, etc. Most distributed systems employ both approaches
  • 36.
    Thank you!David Choudavid.chou@microsoft.comblogs.msdn.com/dachou©2010 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Editor's Notes

  • #2 Microsoft&apos;s Windows Azure platform is a virtualized and abstracted application platform that can be used to build highly scalable and reliable applications, with Java. The environment consists of a set of services such as NoSQL table storage, blob storage, queues, relational database service, internet service bus, access control, and more. Java applications can be built using these services via Web services APIs, and your own Java Virtual Machine, without worrying about the underlying server OS and infrastructure. Highlights of this session will include: • An overview of the Windows Azure environment • How to develop and deploy Java applications in Windows Azure • How to architect horizontally scalable applications in Windows Azure
  • #8 Picture source: http://en.wikipedia.org/wiki/Amdahl%27s_law
  • #12 To build for big scale – use more of the same pieces, not bigger pieces; though a different approach may be needed
  • #20 Source: http://danga.com/words/2007_06_usenix/usenix.pdf
  • #21 Source: http://highscalability.com/blog/2007/11/13/flickr-architecture.html
  • #22 Source: http://www.slideshare.net/jboutelle/scalable-web-architectures-w-ruby-and-amazon-s3
  • #23 Source: http://www.slideshare.net/netik/billions-of-hits-scaling-twitterSource: http://highscalability.com/blog/2009/6/27/scaling-twitter-making-twitter-10000-percent-faster.html
  • #24 Source: http://highscalability.com/blog/2009/10/12/high-performance-at-massive-scale-lessons-learned-at-faceboo-1.html
  • #25 Picture source: http://pdp.protopak.net/Belltheous90/DeathStarII.gif