Building Web Apps for a LOT of Users

360
-1

Published on

Published in: Technology, Design
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
360
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
9
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • Hello
  • All generalizations are lies but there is a shift in what type of web apps we’re building today.
    Moving from read-mostly to a read-write web. The old web displayed content to the user had limited points of interaction (shopping carts, TODO etc). The new web invites us to interact everywhere. This means that in general we’re going to see a larger proportion of writes to our reads.
    We’re also moving from web apps that are largely private (TODO improve this argument) to social apps that have a lot of “cross user” behavior which as we will see is notoriously hard to scale.
    And lastly, a lot of software we previously used on our “PC” now run on datacenters hosted by another company.
  • You may be aware of...
    Image - edge cache
    Search engine - read-only
    Visitors today - not real number
  • Read -> Increase -> Write
    Writes bottleneck
  • Read -> Increase -> Write
    Writes bottleneck
  • Read -> Increase -> Write
    Writes bottleneck
  • Read -> Increase -> Write
    Writes bottleneck
  • Read -> Increase -> Write
    Writes bottleneck
  • Read -> Increase -> Write
    Writes bottleneck
  • Read -> Increase -> Write
    Writes bottleneck
  • Read -> Increase -> Write
    Writes bottleneck
  • Split
    How do we implement this?
  • First write
    Select by - random fine
  • Inbox, contacts, labels - by user
    IM conversation - shared between two users
  • Doclist - by user
    Documents shared - by document
  • Sum - need fan out
  • Side story
    Serial fan out
    Latency = O(n) with fan out size
  • Parallel fan out
    Latency = O(1) with fan out size
    Threads no good: 1000 r/s w 1s latency * 1000 shard fan out = need 1 million threads
    Async I/O and RPC library
    Futures
  • Write - your shard
    Fan out - by follower
  • Writes scale by adding new shards
  • Reads - still hit all shards
    Reads faster and no need for locks
    Still limit to scale
    We’ll get to that but not our biggest problem...
  • TADA! Let’s have a look at our availability
  • Availability as we grow
    99% downtime one machine
  • Availability as we grow
    99% downtime one machine
  • “What’s the probability all shards are down?”
    Writes look good
    1000 nines uptime
  • “What’s the probability all shards are down?”
    Writes look good
    1000 nines uptime
  • “What’s the no shard is down?”
    50 - 60% - retry
    500 - 0.6% - ouch
    1000 - 0.004% - impossible
    Probability -> inevitability
  • “What’s the no shard is down?”
    50 - 60% - retry
    500 - 0.6% - ouch
    1000 - 0.004% - impossible
    Probability -> inevitability
  • “What’s the no shard is down?”
    50 - 60% - retry
    500 - 0.6% - ouch
    1000 - 0.004% - impossible
    Probability -> inevitability
  • “What’s the no shard is down?”
    50 - 60% - retry
    500 - 0.6% - ouch
    1000 - 0.004% - impossible
    Probability -> inevitability
  • “What’s the no shard is down?”
    50 - 60% - retry
    500 - 0.6% - ouch
    1000 - 0.004% - impossible
    Probability -> inevitability
  • Write -> Replica
    Read across 5 replicas
  • Same curve much higher up
    Same as GFS and BigTable
    Would work
    Each shard has little data so...
  • Same curve much higher up
    Same as GFS and BigTable
    Would work
    Each shard has little data so...
  • Same curve much higher up
    Same as GFS and BigTable
    Would work
    Each shard has little data so...
  • Same curve much higher up
    Same as GFS and BigTable
    Would work
    Each shard has little data so...
  • Cache at each shard
    Each shard - return full sum
  • Disregard cache refresh
    Same uptime as writes
    Cache refresh error rate limits scale (very high!)
  • Twitter availability - cheap shot
    People blame Rails
    Caching was their problem
  • “Cache” -> “Consistency”
  • A series of writes
    One read
    Which write do we get?
  • Simple definition: This is “consistent”
  • “Eventual” consistency - stop writing and you read last write
    Never stop writing
    Our consistency
  • Multiple shards
    System-wide state returned was never assumed
    Has interesting property -> example
  • We get a request
    Cache 299
  • Cache 52
  • Get some requests
    52 -> 56
  • Get some requests
    299 -> 302
  • Detail on one request
    Bump up 200 -> 201
    Get back 552
    Cached: 299, 52
    Look at that: it’s the last write!
  • Detail on one request
    Bump up 200 -> 201
    Get back 552
    Cached: 299, 52
    Look at that: it’s the last write!
  • Detail on one request
    Bump up 200 -> 201
    Get back 552
    Cached: 299, 52
    Look at that: it’s the last write!
  • Consistent in our own shard
    “Shard local” consistency model
    Same as AppEngine’s data store
  • Write message -> see it immediately
    Others trickle in as those caches refresh
  • Shopping cart can’t be inconsistent
  • Withdraw money from paypal can’t be consistent
  • TADA! Let’s have a look at our availability
  • “Completely different”
    Front end logs
    Analysers sum up
    Push to front end
    “Not so different” - two new tricks
  • Detach from fulfilling request
    Append cheaper
    Shard logs
    Shard analysers
  • Write fan out
    Same availability as read fan out but not critical
    At front end failure - skip and it serves out of date value
  • The “bought together”, “searched for” and so forth sections of the Amazon webpages are of course the results of massive log analysis. These don’t happen as part of the request but are rather executed in the backend and pushed to the shards serving those product pages.
  • Building Web Apps for a LOT of Users

    1. 1. Building Web Apps for a LOT of Users Jon Tirsen, Google
    2. 2. read-mostly read-write private social “personal” computing “cloud” computing
    3. 3. We’ve had 112 visitors today!
    4. 4. Scalability
    5. 5. Counter +1 112 Front End
    6. 6. Counter +1 112 Front End Front End Front End
    7. 7. Counter +1 112 Front End Front End Front End Front End Front End Front End
    8. 8. Counter +1 112 Front End Front End Front End Front End Front End Front End Front End Front End Front End
    9. 9. Counter Counter Counter Front End Front End Front End Front End Front End Front End Front End Front End Front End
    10. 10. Counter Counter Counter rite W Front End Shard by: random
    11. 11. Counter Counter Counter Front End Read fan out
    12. 12. Front end Counter Counter Counter Blocking reads
    13. 13. Front end Counter Counter Counter Async reads
    14. 14. Counter Counter Counter Front End Write
    15. 15. Counter Counter Counter Front End Read
    16. 16. Availability
    17. 17. Availability 1 Number of shards 1000
    18. 18. Availability 99% 1 Number of shards 1000
    19. 19. Availability 14 mins/day 7 hours/month 99% = 3 days/year 1 Number of shards 1000
    20. 20. Availability Writing 99% 1 Number of shards 1000
    21. 21. Availability Writing 99% 1 Number of shards 1000
    22. 22. Availability 99.9999...% Writing 99% 1 Number of shards 1000
    23. 23. Read fan out Availability 1 Number of shards 1000
    24. 24. Read fan out Availability 1 Number of shards 1000
    25. 25. 99% Read fan out Availability 1 Number of shards 1000
    26. 26. 99% Read fan out Availability 60% 1 Number of shards 1000
    27. 27. 99% Read fan out Availability 60% 0.6% 1 Number of shards 1000
    28. 28. 99% Read fan out Availability 60% 0.6% 0.004% 1 Number of shards 1000
    29. 29. Update Counter Counter Counter Replicas Replica Replica Replicas Replica Replicas Replica Replica Replica Write Reads Front End
    30. 30. Availability Read fan out to 5 replicas 1 Number of shards 1000
    31. 31. Availability 8 nines Read fan out to 5 replicas 1 Number of shards 1000
    32. 32. 8 nines 6 nines Availability Read fan out to 5 replicas 1 Number of shards 1000
    33. 33. 8 nines 5 nines 6 nines Availability Read fan out to 5 replicas 1 Number of shards 1000
    34. 34. 8 nines 5 nines 99.99% 6 nines Availability Read fan out to 5 replicas 1 Number of shards 1000
    35. 35. Read & Cache Counter Counter Counter ad Re & e rit W Front End
    36. 36. Availability 99.9999...% Read fan out with caching 99% 1 Number of shards 1000
    37. 37. Consistency
    38. 38. Read Probability Write Write Write Write
    39. 39. Read Probability Write Write Write Write
    40. 40. Read Probability Write Write Write Write
    41. 41. 200 299 52 Cached: 299 Front End
    42. 42. 200 299 52 Cached: 299, 52 Front End
    43. 43. 200 299 56 Cached: 299, 52 Front End
    44. 44. 200 302 56 Cached: 299, 52 Front End
    45. 45. 201 302 56 Cached: 299, 52 +1 Front End
    46. 46. 201 302 56 Cached: 299, 52 +1 2 55 Front End
    47. 47. 201 302 56 Cached: 299, 52 +1 2 01 +5 2 2+ 99 55 Front End =2
    48. 48. A different solution...
    49. 49. Rea d &s um Analyser Analyser Analysers Logs Logs Logs Push new count nd pe Ap Front End Front End Front End
    50. 50. Front End Counter Front End Log Analyser
    51. 51. Analyser Push new count Front End Front End Front End Write fan out
    52. 52. Sharding Fan out: writes, reads Replication/Caching Shard selection: random, by user, by document Consistency model: eventual, shard local Sync -> Async
    53. 53. Sharding Fan out: writes, reads Replication/Caching Shard selection: random, by user, by document Consistency model: eventual, shard local Sync -> Async Questions? tirsen@google.com
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×