Best Practices for Large-Scale Web Sites

Loading...

Flash Player 9 (or above) is needed to view presentations.
We have detected that you do not have it on your computer. To install it, go here.

0 comments

Post a comment

    Post a comment
    Embed Video
    Edit your comment Cancel

    1 Favorite

    Best Practices for Large-Scale Web Sites - Presentation Transcript

    1. Best Practices for Large-Scale Web Sites Lessons from Ebay Brian Ko
    2. Ebay
      • 276,000,000 registered users
      • stores over 2 Petabytes of data
      • over 1 billion page views per day
      • 113 million items for sale in over 50,000 categories
      • 2 billion Photos
      1
    3. Ebay
      • 300+ features per quarter
      • Rolls 100,000+ lines of code every two weeks
      • In 39 countries, in 7 languages, 24x7x365
      • 48 Billion SQL executions/day!
      • In Year 2008
      2
    4. Design goal
      • Scalability
        • – Resource usage should increase linearly (or better!) with load
        • – Design for 10x growth in data, traffic, users, etc.
      • Availability
        • – Resilience to failure
        • – Graceful degradation
        • – Recoverability from failure
      3
    5. Design Goal
      • Latency
        • – User experience, data latency
      • Manageability
        • – Simplicity, Maintainability
        • – Provide diagnostics
      • Cost
        • – Development effort and complexity
        • – Operational cost (TCO)
      5
    6. Architecture consideration
      • Partition everything
        • – “ you eat an elephant only one bite at a time”
      • Asynchrony for everywhere
        • – “ Good things come to those who wait”
      • Automate everything
        • – “ Automation will save time and eliminate human errors…”
      • Assume everything fails
        • – “ Be Prepared”
      4
    7. Partition Everything Split
      • Split every problem into manageable chunks
        • – “ If you can’t split it, you can’t scale it”
        • – By data, load, and/or usage pattern
        • – For example, there are 1000’s of databases
      6
    8. Partition Everything Motivation
      • Scalability: can scale horizontally and independently
      • Availability: can isolate failures
      • Manageability: can decouple different segments and functional areas
      • Cost: can use less expensive hardware
      7
    9. Partition Everything Databases
      • Functional Segmentation
        • – Segment databases into functional areas – user, item, transaction, product, account, feedback
        • – Over 1000 logical databases on over 400 physical hosts
      • Horizontal Split
        • – Split (or “shard” ) databases horizontally along primary access path.
      8
    10. Partition Everything Databases
      • No Database Transactions
      • eBay’s transaction policy
        • – Absolutely no client side transactions, two-phase commit, etc.
        • – Auto-commit for vast majority of DB writes
      • Consistency is not always required or possible
        • – To guarantee availability and partition-tolerance, we are forced to trade off consistency (Brewer’s CAP Theorem)
      9
    11. Partition Everything Databases
      • Consistency without transactions
        • – Careful ordering of DB operations
        • – Eventual consistency through asynchronous event or reconciliation batch
      10
    12. Partition Everything Application Tier
      • Over 17,000 application servers in 220 pools
      • Functional Segmentation
        • – Segment functions into separate application pools
        • – Allows for parallel development, deployment, and monitoring
        • – Minimizes DB / resource dependencies
      • Horizontal Split
        • – Within pool, all application servers are created equal
      11
    13. Partition Everything Application Tier
      • User session flow moves through multiple application pools
      • Absolutely no session state
      • Transient state maintained by
        • URL, Cookie, Scratch database
      12
    14. Async Everywhere
      • Prefer Asynchronous Processing
        • – Where possible, integrate disparate components asynchronously
      • Motivations
        • – Scalability: can scale components independently
        • – Availability
          • • Can decouple availability state
          • • Can retry operations
        • – Latency
          • • Can significantly improve user experience latency at cost of data/execution latency
          • • Can allocate more time to processing than user would tolerate
        • – Cost: can spread peak load over time
      13
    15. Async Everywhere Batch
      • Scheduled offline batch process appropriate for
        • Infrequent, periodic, or scheduled processing
        • Non-incremental computation (a.k.a. “Full Table Scan”)
      • Examples
        • Import data (catalogs, currency, etc.)
        • Generate recommendations (items, products, searches, etc.)
        • Process items at end of auction
      14
    16. Automate Everything Motivation
      • Scalability
        • Can scale with machines, not humans
      • Availability / Latency
        • Can adapt to changing environment more rapidly
      • Cost
        • Machines are far less expensive than humans
        • Can learn / improve / adjust over time without manual effort
      15
    17. Automate Everything Deployment
      • Challenge
        • Need to deploy the application to over 17,000 application servers at the same time
      • Solution
        • Deploy Application in advance with the new feature switch turned off
        • Turn on the switch through automatic process on target date.
        • Make the roll back easier.
      16
    18. Assume Everything Fails
      • Build all systems to be tolerant of failure
        • Assume every operation will fail and every resource will be unavailable
        • Rapid failure detection and recovery
        • Do as much as possible during failure
      • Motivation
        • Availability
      17
    19. Assume Everything Fails
      • Rollback
        • Absolutely no changes to the site which cannot be undone (!)
      • Failure Detection
        • Real-time application state monitoring: exceptions and operational alerts
        • “Resource slow” is often far more challenging than “resource down”
      18
    20. Assume Everything Fails Graceful Degradation
      • Application “marks down” the resource
        • Stops making calls to it and sends alert
      • Non-critical functionality is removed or ignored
      • Critical functionality is retried or deferred
        • Failover to alternate resource
        • Defer processing to async event
      • Explicit “markup”
        • Allows resource to be restored and brought online in a controlled way
      19
    21. Summary
      • Partition everything
      • Asynchrony for everywhere
      • Automate everything
      • Assume everything fails
      20
    22. The End 5 minutes of question time starts now!
    23. Questions 4 minutes left!
    24. Questions 3 minutes left!
    25. Questions 2 minutes left!
    26. Questions 1 minute left!
    27. Questions 30 seconds left!
    28. Questions TIME IS UP!
    SlideShare Zeitgeist 2009

    + Craig DicksonCraig Dickson Nominate

    custom

    352 views, 1 favs, 2 embeds more stats

    This is a lightning presentation given by Brian Ko more

    More info about this document

    © All Rights Reserved

    Go to text version

    • Total Views 352
      • 346 on SlideShare
      • 6 from embeds
    • Comments 0
    • Favorites 1
    • Downloads 15
    Most viewed embeds
    • 5 views on http://craigsdickson.me
    • 1 views on http://craigsdickson.com

    more

    All embeds
    • 5 views on http://craigsdickson.me
    • 1 views on http://craigsdickson.com

    less

    Flagged as inappropriate Flag as inappropriate
    Flag as inappropriate

    Select your reason for flagging this presentation as inappropriate. If needed, use the feedback form to let us know more details.

    Cancel
    File a copyright complaint
    Having problems? Go to our helpdesk?

    Categories

    Tags