Your SlideShare is downloading. ×
Best Practices for Large-Scale Web Sites
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Introducing the official SlideShare app

Stunning, full-screen experience for iPhone and Android

Text the download link to your phone

Standard text messaging rates apply

Best Practices for Large-Scale Web Sites

7,143
views

Published on

This is a lightning presentation given by Brian Ko summarizing a session he attended at JavaOne 2009 on how to build very large scale websites. …

This is a lightning presentation given by Brian Ko summarizing a session he attended at JavaOne 2009 on how to build very large scale websites.

Published in: Technology, Business

0 Comments
14 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
7,143
On Slideshare
0
From Embeds
0
Number of Embeds
5
Actions
Shares
0
Downloads
197
Comments
0
Likes
14
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Best Practices for Large-Scale Web Sites Lessons from Ebay Brian Ko
  • 2. Ebay
    • 276,000,000 registered users
    • stores over 2 Petabytes of data
    • over 1 billion page views per day
    • 113 million items for sale in over 50,000 categories
    • 2 billion Photos
    1
  • 3. Ebay
    • 300+ features per quarter
    • Rolls 100,000+ lines of code every two weeks
    • In 39 countries, in 7 languages, 24x7x365
    • 48 Billion SQL executions/day!
    • In Year 2008
    2
  • 4. Design goal
    • Scalability
      • – Resource usage should increase linearly (or better!) with load
      • – Design for 10x growth in data, traffic, users, etc.
    • Availability
      • – Resilience to failure
      • – Graceful degradation
      • – Recoverability from failure
    3
  • 5. Design Goal
    • Latency
      • – User experience, data latency
    • Manageability
      • – Simplicity, Maintainability
      • – Provide diagnostics
    • Cost
      • – Development effort and complexity
      • – Operational cost (TCO)
    5
  • 6. Architecture consideration
    • Partition everything
      • – “ you eat an elephant only one bite at a time”
    • Asynchrony for everywhere
      • – “ Good things come to those who wait”
    • Automate everything
      • – “ Automation will save time and eliminate human errors…”
    • Assume everything fails
      • – “ Be Prepared”
    4
  • 7. Partition Everything Split
    • Split every problem into manageable chunks
      • – “ If you can’t split it, you can’t scale it”
      • – By data, load, and/or usage pattern
      • – For example, there are 1000’s of databases
    6
  • 8. Partition Everything Motivation
    • Scalability: can scale horizontally and independently
    • Availability: can isolate failures
    • Manageability: can decouple different segments and functional areas
    • Cost: can use less expensive hardware
    7
  • 9. Partition Everything Databases
    • Functional Segmentation
      • – Segment databases into functional areas – user, item, transaction, product, account, feedback
      • – Over 1000 logical databases on over 400 physical hosts
    • Horizontal Split
      • – Split (or “shard” ) databases horizontally along primary access path.
    8
  • 10. Partition Everything Databases
    • No Database Transactions
    • eBay’s transaction policy
      • – Absolutely no client side transactions, two-phase commit, etc.
      • – Auto-commit for vast majority of DB writes
    • Consistency is not always required or possible
      • – To guarantee availability and partition-tolerance, we are forced to trade off consistency (Brewer’s CAP Theorem)
    9
  • 11. Partition Everything Databases
    • Consistency without transactions
      • – Careful ordering of DB operations
      • – Eventual consistency through asynchronous event or reconciliation batch
    10
  • 12. Partition Everything Application Tier
    • Over 17,000 application servers in 220 pools
    • Functional Segmentation
      • – Segment functions into separate application pools
      • – Allows for parallel development, deployment, and monitoring
      • – Minimizes DB / resource dependencies
    • Horizontal Split
      • – Within pool, all application servers are created equal
    11
  • 13. Partition Everything Application Tier
    • User session flow moves through multiple application pools
    • Absolutely no session state
    • Transient state maintained by
      • URL, Cookie, Scratch database
    12
  • 14. Async Everywhere
    • Prefer Asynchronous Processing
      • – Where possible, integrate disparate components asynchronously
    • Motivations
      • – Scalability: can scale components independently
      • – Availability
        • • Can decouple availability state
        • • Can retry operations
      • – Latency
        • • Can significantly improve user experience latency at cost of data/execution latency
        • • Can allocate more time to processing than user would tolerate
      • – Cost: can spread peak load over time
    13
  • 15. Async Everywhere Batch
    • Scheduled offline batch process appropriate for
      • Infrequent, periodic, or scheduled processing
      • Non-incremental computation (a.k.a. “Full Table Scan”)
    • Examples
      • Import data (catalogs, currency, etc.)
      • Generate recommendations (items, products, searches, etc.)
      • Process items at end of auction
    14
  • 16. Automate Everything Motivation
    • Scalability
      • Can scale with machines, not humans
    • Availability / Latency
      • Can adapt to changing environment more rapidly
    • Cost
      • Machines are far less expensive than humans
      • Can learn / improve / adjust over time without manual effort
    15
  • 17. Automate Everything Deployment
    • Challenge
      • Need to deploy the application to over 17,000 application servers at the same time
    • Solution
      • Deploy Application in advance with the new feature switch turned off
      • Turn on the switch through automatic process on target date.
      • Make the roll back easier.
    16
  • 18. Assume Everything Fails
    • Build all systems to be tolerant of failure
      • Assume every operation will fail and every resource will be unavailable
      • Rapid failure detection and recovery
      • Do as much as possible during failure
    • Motivation
      • Availability
    17
  • 19. Assume Everything Fails
    • Rollback
      • Absolutely no changes to the site which cannot be undone (!)
    • Failure Detection
      • Real-time application state monitoring: exceptions and operational alerts
      • “Resource slow” is often far more challenging than “resource down”
    18
  • 20. Assume Everything Fails Graceful Degradation
    • Application “marks down” the resource
      • Stops making calls to it and sends alert
    • Non-critical functionality is removed or ignored
    • Critical functionality is retried or deferred
      • Failover to alternate resource
      • Defer processing to async event
    • Explicit “markup”
      • Allows resource to be restored and brought online in a controlled way
    19
  • 21. Summary
    • Partition everything
    • Asynchrony for everywhere
    • Automate everything
    • Assume everything fails
    20
  • 22. The End 5 minutes of question time starts now!
  • 23. Questions 4 minutes left!
  • 24. Questions 3 minutes left!
  • 25. Questions 2 minutes left!
  • 26. Questions 1 minute left!
  • 27. Questions 30 seconds left!
  • 28. Questions TIME IS UP!