Your SlideShare is downloading. ×
0
Scale as a Competitive Advantage
Scale as a Competitive Advantage
Scale as a Competitive Advantage
Scale as a Competitive Advantage
Scale as a Competitive Advantage
Scale as a Competitive Advantage
Scale as a Competitive Advantage
Scale as a Competitive Advantage
Scale as a Competitive Advantage
Scale as a Competitive Advantage
Scale as a Competitive Advantage
Scale as a Competitive Advantage
Scale as a Competitive Advantage
Scale as a Competitive Advantage
Scale as a Competitive Advantage
Scale as a Competitive Advantage
Scale as a Competitive Advantage
Scale as a Competitive Advantage
Scale as a Competitive Advantage
Scale as a Competitive Advantage
Scale as a Competitive Advantage
Scale as a Competitive Advantage
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Scale as a Competitive Advantage

2,106

Published on

Deck presented at the 2010 SOA & Cloud Symposium

Deck presented at the 2010 SOA & Cloud Symposium

Published in: Technology
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
2,106
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
48
Comments
0
Likes
3
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide
  • Microsoft's Windows Azure platform is a virtualized and abstracted application platform that can be used to build highly scalable and reliable applications, with Java. The environment consists of a set of services such as NoSQL table storage, blob storage, queues, relational database service, internet service bus, access control, and more. Java applications can be built using these services via Web services APIs, and your own Java Virtual Machine, without worrying about the underlying server OS and infrastructure. Highlights of this session will include: • An overview of the Windows Azure environment • How to develop and deploy Java applications in Windows Azure • How to architect horizontally scalable applications in Windows Azure
  • http://highscalability.com/blog/2010/2/8/how-farmville-scales-to-harvest-75-million-players-a-month.htmlhttp://techcrunch.com/2010/09/22/zynga-moves-1-petabyte-of-data-daily-adds-1000-servers-a-week/
  • To build for big scale – use more of the same pieces, not bigger pieces; though a different approach may be neededPictures source:http://lego.wikia.com/wiki/10179_Ultimate_Collector%27s_Millennium_Falconhttp://lego.wikia.com/wiki/7778_Midi-scale_Millennium_Falcon
  • Source: http://danga.com/words/2007_06_usenix/usenix.pdf
  • Source: http://highscalability.com/blog/2007/11/13/flickr-architecture.html
  • Source: http://www.slideshare.net/jboutelle/scalable-web-architectures-w-ruby-and-amazon-s3
  • Source: http://www.slideshare.net/netik/billions-of-hits-scaling-twitterSource: http://highscalability.com/blog/2009/6/27/scaling-twitter-making-twitter-10000-percent-faster.html
  • Source: http://highscalability.com/blog/2009/10/12/high-performance-at-massive-scale-lessons-learned-at-faceboo-1.html
  • Transcript

    • 1. Scale as a Competitive Advantage
      David Chou
      david.chou@microsoft.com
      blogs.msdn.com/dachou
    • 2. The age of “big data”
      2009: 600K photos served /sec
      2010: ~1PB / 60 minutes
      (projected)
      2008: ~1B views / day
      Source: Wired Magazine: Issue 16.07, 2008.06.23; illustration by Marian Bantjes
      http://www.wired.com/science/discoveries/magazine/16-07/pb_intro
    • 3. “More is different”
      Infinite storage. Clouds of processors. Our ability to capture, warehouse, and understand massive amounts of data is changing science, medicine, business, and technology. As our collection of facts and figures grows, so will the opportunity to find answers to fundamental questions. Because in the era of big data, more isn't just more. More is different.
      Source: Wired Magazine: Issue 16.07, 2008.06.23
      http://www.wired.com/science/discoveries/magazine/16-07/pb_intro
    • 4. “The future belongs to the companies and people that turn data into products”
      Source: “What is data science?”, An O’Reilly Radar Report, 2010.06.02, Mike Loukides
      http://radar.oreilly.com/2010/06/what-is-data-science.html
    • 5. Working with data at scale
      45M tweets pattern visualization in minutes
      #justinbieber cluster
      #teaparty cluster
      …. “political world has more connective tissue than of-the-moment entertainment”
      Source: “Data science democratize”, 2010.07.01, Mac Slocum
      http://radar.oreilly.com/2010/07/data-science-democratized.html
    • 6. Big data needs big processing
      Facebook (2009)
      +200B pageviews /month
      >3.9T feed actions /day
      +300M active users
      >1B chat mesgs /day
      100M search queries /day
      >6B minutes spent /day (ranked #2 on Internet)
      +20B photos, +2B/month growth
      600,000 photos served /sec
      25TB log data /day processed thru Scribe
      120M queries /sec on memcache
      Twitter (2009)
      600 requests /sec
      avg 200-300 connections /sec; peak at 800
      MySQL handles 2,400 requests /sec
      30+ processes for handling odd jobs
      process a request in 200 milliseconds in Rails
      average time spent in the database is 50-100 milliseconds
      +16 GB of memcached
      Google (2007)
      +20 petabytes of data processed /day by +100K MapReduce jobs
      1 petabyte sort took ~6 hours on ~4K servers replicated onto ~48K disks
      +200 GFS clusters, each at 1-5K nodes, handling +5 petabytes of storage
      ~40 GB /sec aggregate read/write throughput across the cluster
      +500 servers for each search query < 500ms
      >1B views / day on Youtube (2009)
      Myspace(2007)
      115B pageviews /month
      5M concurrent users @ peak
      +3B images, mp3, videos
      +10M new images/day
      160 Gbit/sec peak bandwidth
      Flickr (2007)
      +4B queries /day
      +2B photos served
      ~35M photos in squid cache
      ~2M photos in squid’s RAM
      38k req/sec to memcached (12M objects)
      2 PB raw storage
      +400K photos added /day
      Source: multiple articles, High Scalability
      http://highscalability.com/
    • 7. Bing Maps
      Big data collection and processing
      flying planes over nearly every inch of the United States
      on road photos
      45-degree low-altitude aerial photos
      high altitude plane photos
      satellite photos
      10% done (August 2010)
      previous “all USA” flight image gathering exercise took 10 years
      5PB storage and thousands of servers in one container
      Source: “Map Wars (visiting Bing’s imaging center)”, 2010.08.10, Robert Scoble
      http://scobleizer.com/2010/08/10/map-wars-visiting-bings-imaging-center/
    • 8. Cloud computing
      Characteristics
      On-demand self-service
      Broad network access
      Resource pooling
      Rapid elasticity
      Measured service
      Service models
      Software as a service
      Platform as a service
      Infrastructure as a service
      Deployment models
      Private cloud
      Community cloud
      Public cloud
      Hybrid cloud
      “Cloud computing is a model for enabling convenient, on-demand network access to a shared pool of configurable computing resources (e.g., networks, servers, storage, applications, and services) that can be rapidly provisioned and released with minimal management effort or service provider interaction. This cloud model promotes availability and is composed of five essential characteristics, three service models, and four deployment models.”
      Source: The NIST Definition of Cloud Computing, Version 15, 2009.10.07, Peter Mell and Tim Grance
      http://csrc.nist.gov/groups/SNS/cloud-computing/cloud-def-v15.doc
    • 9. Cloud levels the playing field
      2007
      founded by 6 people
      2008
      $29M funding from VC
      2009
      revenue - $270M
      $180M funding from Digital Sky Technologies
      2010
      1,200+ employees
      $300M funding from Google and Softbank
      Active unique players
      215M monthly; 10% of world internet population (updated 2010.10); 60M daily
      1M daily 4 days after launch; 10M after 60 days
      3B neighborhood connections
      Cloud infrastructure
      12,000 Amazon EC2 nodes
      Adding 1,000 servers per week (updated 2010.10)
      Moving 1PB data per day (updated 2010.10)
      3 Gigabits/sec of traffic between FarmVille and Facebook (at peak)
      caching cluster serves another 1.5 Gigabits/sec to the application
      Source(s): “How FarmVille Scales to Harvest 75 Million Players a Month”, HighScalability.com, 2010.02.08, Tedd Hoff
      “Zynga Moves 1 Petabyte Of Data Daily; Adds 1,000 Servers A Week”, TechCrunch.com, 2010.09.22, LeenaRao
    • 10. Cloud as a platform
      Utility computing
      on-demand infrastructure
      self-provisioning and servicing
      rapid elasticity
      economy of scale
      operational expenditures
      Infrastructure-as-a-Service
      Service delivery model
      … but cloud computing != cloud hosting
    • 11. Cloud as a platform
      Native cloud applications
      horizontal scaling (scale-out)
      parallelization
      shared-nothing architecture
      partitioned data (sharding)
      multi-tenancy
      failure resilient (or fail-in-place)
      service-oriented
      federated composition
      Platform-as-a-Service
      Application development model
    • 12. Service delivery models
      (On-Premise)
      Infrastructure
      (as a Service)
      Platform
      (as a Service)
      Software
      (as a Service)
      You manage
      Applications
      Applications
      Applications
      Applications
      You manage
      Data
      Data
      Data
      Data
      Runtime
      Runtime
      Runtime
      Runtime
      Managed by vendor
      Middleware
      Middleware
      Middleware
      Middleware
      You manage
      Managed by vendor
      O/S
      O/S
      O/S
      O/S
      Managed by vendor
      Virtualization
      Virtualization
      Virtualization
      Virtualization
      Servers
      Servers
      Servers
      Servers
      Storage
      Storage
      Storage
      Storage
      Networking
      Networking
      Networking
      Networking
    • 13. Use more pieces, not bigger pieces
      LEGO 7778 Midi-scale Millennium Falcon
      • 9.3 x 6.7 x 3.2 inches (L/W/H)
      • 14. 356 pieces
      LEGO 10179 Ultimate Collector's Millennium Falcon
      • 33 x 22 x 8.3 inches (L/W/H)
      • 15. 5,195 pieces
    • Live Journal (from Brad Fitzpatrick, then Founder at Live Journal, 2007)
      Web Frontend
      Apps & Services
      Partitioned Data
      Distributed
      Cache
      Distributed Storage
    • 16. Flickr (from Cal Henderson, then Director of Engineering at Yahoo, 2007)
      Web Frontend
      Apps & Services
      Distributed Storage
      Distributed
      Cache
      Partitioned Data
    • 17. SlideShare(from John Boutelle, CTO at Slideshare, 2008)
      Web
      Frontend
      Apps &
      Services
      Distributed Cache
      Partitioned Data
      Distributed Storage
    • 18. Twitter (from John Adams, Ops Engineer at Twitter, 2010)
      Web
      Frontend
      Apps &
      Services
      Partitioned
      Data
      Queues
      Async
      Processes
      Distributed
      Cache
      Distributed
      Storage
    • 19. Distributed
      Storage
      Facebook
      (from Jeff Rothschild, VP Technology at Facebook, 2009)
      2010 stats (Source: http://www.facebook.com/press/info.php?statistics)
      People
      +500M active users
      50% of active users log on in any given day
      people spend +700B minutes /month
      Activity on Facebook
      +900M objects that people interact with
      +30B pieces of content shared /month
      Global Reach
      +70 translations available on the site
      ~70% of users outside the US
      +300K users helped translate the site through the translations application
      Platform
      +1M developers from +180 countries
      +70% of users engage with applications /month
      +550K active applications
      +1M websites have integrated with Facebook Platform
      +150M people engage with Facebook on external websites /month
      Web
      Frontend
      Apps &
      Services
      Distributed
      Cache
      Parallel
      Processes
      Partitioned
      Data
      Async
      Processes
    • 20. Cloud computing as a new paradigm
      Scale-out architecture + distributed computing
      small logical units of work
      loosely-coupled processes
      stateless
      event-driven design
      optimistic concurrency
      partitioned data
      redundancy fault-tolerance
      re-try-based recoverability
      parallel tasks
      app server
      web
      data store
      app server
      web
      data store
      web
      app server
      data store
      app server
      web
      data store
      app server
      web
      data store
      app server
      web
      data store
      async tasks
    • 21. Strategic advantages of cloud computing
      cost reduction
      cost reduction
      time to market
      pay by use
      ability to scale
    • 22. What’s next?
      Data
      data federation
      data purification
      data democratization
      derived intelligence
      Process
      Web as a platform
      federated applications
      adaptive agents
    • 23. Thank you!
      David Chou
      david.chou@microsoft.com
      blogs.msdn.com/dachou
      © 2010 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.
      The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

    ×