Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Scale as a Competitive Advantage


Published on

Deck presented at the 2010 SOA & Cloud Symposium

Published in: Technology
  • Be the first to comment

Scale as a Competitive Advantage

  1. 1. Scale as a Competitive Advantage<br />David Chou<br /><br /><br />
  2. 2. The age of “big data”<br />2009: 600K photos served /sec<br />2010: ~1PB / 60 minutes<br />(projected)<br />2008: ~1B views / day<br />Source: Wired Magazine: Issue 16.07, 2008.06.23; illustration by Marian Bantjes<br /><br />
  3. 3. “More is different”<br />Infinite storage. Clouds of processors. Our ability to capture, warehouse, and understand massive amounts of data is changing science, medicine, business, and technology. As our collection of facts and figures grows, so will the opportunity to find answers to fundamental questions. Because in the era of big data, more isn't just more. More is different.<br />Source: Wired Magazine: Issue 16.07, 2008.06.23<br /><br />
  4. 4. “The future belongs to the companies and people that turn data into products”<br />Source: “What is data science?”, An O’Reilly Radar Report, 2010.06.02, Mike Loukides<br /><br />
  5. 5. Working with data at scale<br />45M tweets pattern visualization in minutes<br />#justinbieber cluster<br />#teaparty cluster<br />…. “political world has more connective tissue than of-the-moment entertainment”<br />Source: “Data science democratize”, 2010.07.01, Mac Slocum<br /><br />
  6. 6. Big data needs big processing<br />Facebook (2009)<br />+200B pageviews /month<br />>3.9T feed actions /day<br />+300M active users<br />>1B chat mesgs /day<br />100M search queries /day<br />>6B minutes spent /day (ranked #2 on Internet)<br />+20B photos, +2B/month growth<br />600,000 photos served /sec<br />25TB log data /day processed thru Scribe<br />120M queries /sec on memcache<br />Twitter (2009)<br />600 requests /sec<br />avg 200-300 connections /sec; peak at 800<br />MySQL handles 2,400 requests /sec<br />30+ processes for handling odd jobs<br />process a request in 200 milliseconds in Rails<br />average time spent in the database is 50-100 milliseconds<br />+16 GB of memcached<br />Google (2007)<br />+20 petabytes of data processed /day by +100K MapReduce jobs <br />1 petabyte sort took ~6 hours on ~4K servers replicated onto ~48K disks<br />+200 GFS clusters, each at 1-5K nodes, handling +5 petabytes of storage<br />~40 GB /sec aggregate read/write throughput across the cluster<br />+500 servers for each search query < 500ms<br />>1B views / day on Youtube (2009)<br />Myspace(2007)<br />115B pageviews /month<br />5M concurrent users @ peak<br />+3B images, mp3, videos<br />+10M new images/day<br />160 Gbit/sec peak bandwidth<br />Flickr (2007)<br />+4B queries /day<br />+2B photos served<br />~35M photos in squid cache<br />~2M photos in squid’s RAM <br />38k req/sec to memcached (12M objects) <br />2 PB raw storage<br />+400K photos added /day<br />Source: multiple articles, High Scalability<br /><br />
  7. 7. Bing Maps<br />Big data collection and processing<br />flying planes over nearly every inch of the United States<br />on road photos<br />45-degree low-altitude aerial photos<br />high altitude plane photos<br />satellite photos<br />10% done (August 2010)<br />previous “all USA” flight image gathering exercise took 10 years<br />5PB storage and thousands of servers in one container<br />Source: “Map Wars (visiting Bing’s imaging center)”, 2010.08.10, Robert Scoble<br /><br />
  8. 8. Cloud computing<br />Characteristics<br />On-demand self-service<br />Broad network access<br />Resource pooling<br />Rapid elasticity<br />Measured service<br />Service models<br />Software as a service<br />Platform as a service<br />Infrastructure as a service<br />Deployment models<br />Private cloud<br />Community cloud<br />Public cloud<br />Hybrid cloud<br />“Cloud computing is a model for enabling convenient, on-demand network access to a shared pool of configurable computing resources (e.g., networks, servers, storage, applications, and services) that can be rapidly provisioned and released with minimal management effort or service provider interaction. This cloud model promotes availability and is composed of five essential characteristics, three service models, and four deployment models.”<br />Source: The NIST Definition of Cloud Computing, Version 15, 2009.10.07, Peter Mell and Tim Grance<br /><br />
  9. 9. Cloud levels the playing field<br />2007<br />founded by 6 people<br />2008<br />$29M funding from VC<br />2009<br />revenue - $270M<br />$180M funding from Digital Sky Technologies<br />2010<br />1,200+ employees<br />$300M funding from Google and Softbank<br />Active unique players<br />215M monthly; 10% of world internet population (updated 2010.10); 60M daily<br />1M daily 4 days after launch; 10M after 60 days<br />3B neighborhood connections<br />Cloud infrastructure<br />12,000 Amazon EC2 nodes<br />Adding 1,000 servers per week (updated 2010.10)<br />Moving 1PB data per day (updated 2010.10)<br />3 Gigabits/sec of traffic between FarmVille and Facebook (at peak)<br />caching cluster serves another 1.5 Gigabits/sec to the application<br />Source(s): “How FarmVille Scales to Harvest 75 Million Players a Month”,, 2010.02.08, Tedd Hoff<br />“Zynga Moves 1 Petabyte Of Data Daily; Adds 1,000 Servers A Week”,, 2010.09.22, LeenaRao<br />
  10. 10. Cloud as a platform<br />Utility computing<br />on-demand infrastructure<br />self-provisioning and servicing<br />rapid elasticity<br />economy of scale<br />operational expenditures<br />Infrastructure-as-a-Service<br />Service delivery model<br />… but cloud computing != cloud hosting<br />
  11. 11. Cloud as a platform<br />Native cloud applications<br />horizontal scaling (scale-out)<br />parallelization<br />shared-nothing architecture<br />partitioned data (sharding)<br />multi-tenancy<br />failure resilient (or fail-in-place)<br />service-oriented<br />federated composition<br />Platform-as-a-Service<br />Application development model<br />
  12. 12. Service delivery models<br />(On-Premise)<br />Infrastructure<br />(as a Service)<br />Platform<br />(as a Service)<br />Software<br />(as a Service)<br />You manage<br />Applications<br />Applications<br />Applications<br />Applications<br />You manage<br />Data<br />Data<br />Data<br />Data<br />Runtime<br />Runtime<br />Runtime<br />Runtime<br />Managed by vendor<br />Middleware<br />Middleware<br />Middleware<br />Middleware<br />You manage<br />Managed by vendor<br />O/S<br />O/S<br />O/S<br />O/S<br />Managed by vendor<br />Virtualization<br />Virtualization<br />Virtualization<br />Virtualization<br />Servers<br />Servers<br />Servers<br />Servers<br />Storage<br />Storage<br />Storage<br />Storage<br />Networking<br />Networking<br />Networking<br />Networking<br />
  13. 13. Use more pieces, not bigger pieces<br />LEGO 7778 Midi-scale Millennium Falcon<br /><ul><li>9.3 x 6.7 x 3.2 inches (L/W/H)
  14. 14. 356 pieces</li></ul>LEGO 10179 Ultimate Collector's Millennium Falcon<br /><ul><li>33 x 22 x 8.3 inches (L/W/H)
  15. 15. 5,195 pieces</li></li></ul><li>Live Journal (from Brad Fitzpatrick, then Founder at Live Journal, 2007)<br />Web Frontend<br />Apps & Services<br />Partitioned Data<br />Distributed<br />Cache<br />Distributed Storage<br />
  16. 16. Flickr (from Cal Henderson, then Director of Engineering at Yahoo, 2007)<br />Web Frontend<br />Apps & Services<br />Distributed Storage<br />Distributed<br />Cache<br />Partitioned Data<br />
  17. 17. SlideShare(from John Boutelle, CTO at Slideshare, 2008)<br />Web<br />Frontend<br />Apps &<br />Services<br />Distributed Cache<br />Partitioned Data<br />Distributed Storage<br />
  18. 18. Twitter (from John Adams, Ops Engineer at Twitter, 2010)<br />Web<br />Frontend<br />Apps &<br />Services<br />Partitioned<br />Data<br />Queues<br />Async<br />Processes<br />Distributed<br />Cache<br />Distributed<br />Storage<br />
  19. 19. Distributed<br />Storage<br />Facebook<br />(from Jeff Rothschild, VP Technology at Facebook, 2009)<br />2010 stats (Source:<br />People<br />+500M active users<br />50% of active users log on in any given day<br />people spend +700B minutes /month<br />Activity on Facebook<br />+900M objects that people interact with<br />+30B pieces of content shared /month<br />Global Reach<br />+70 translations available on the site<br />~70% of users outside the US<br />+300K users helped translate the site through the translations application<br />Platform<br />+1M developers from +180 countries<br />+70% of users engage with applications /month<br />+550K active applications<br />+1M websites have integrated with Facebook Platform <br />+150M people engage with Facebook on external websites /month<br />Web<br />Frontend<br />Apps &<br />Services<br />Distributed<br />Cache<br />Parallel<br />Processes<br />Partitioned<br />Data<br />Async<br />Processes<br />
  20. 20. Cloud computing as a new paradigm<br />Scale-out architecture + distributed computing<br />small logical units of work<br />loosely-coupled processes<br />stateless<br />event-driven design<br />optimistic concurrency<br />partitioned data<br />redundancy fault-tolerance<br />re-try-based recoverability<br />parallel tasks<br />app server<br />web<br />data store<br />app server<br />web<br />data store<br />web<br />app server<br />data store<br />app server<br />web<br />data store<br />app server<br />web<br />data store<br />app server<br />web<br />data store<br />async tasks<br />
  21. 21. Strategic advantages of cloud computing<br />cost reduction<br />cost reduction<br />time to market<br />pay by use<br />ability to scale<br />
  22. 22. What’s next?<br />Data<br />data federation<br />data purification<br />data democratization<br />derived intelligence<br />Process<br />Web as a platform<br />federated applications<br />adaptive agents<br />
  23. 23. Thank you!<br />David Chou<br /><br /><br />© 2010 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.<br />The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.<br />