Your SlideShare is downloading. ×
Scalability -
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Scalability -

926

Published on

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
926
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
21
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide
  • ComScore numbers show that we have more traffic than all Google properties combined. ComScore estimates 1 in 6 web pages viewed in Spain is from Tuenti. ComScore numbers are lower than our internal measurements.
  • This is what makes web programming different that application programming. How much it can do in a given period of time, not how much time it needs to do one thing.Below a reasonable threshold, I care about how far out to the right I can get on the curve.
  • “Scalability” is property of a system architecture. Generally speaking, a system is scalable if it can continue to perform acceptably well as load increases. Load level at which performance becomes unacceptable is the capacity of the system.
  • Many users trying to access a resource at the same time.
  • “Scalability” is property of a system architecture. Generally speaking, a system is scalable if it can continue to perform acceptably well as load increases. Load level at which performance becomes unacceptable is the capacity of the system.
  • “Scalability” is property of a system architecture. Generally speaking, a system is scalable if it can continue to perform acceptably well as load increases. Load level at which performance becomes unacceptable is the capacity of the system.
  • If I add resources, I should be able to shift the curve right. If system is linearly scalable, should be able to handle 2x requests with 2x machines.
  • Performance graph from Tuenti from October
  • Split resource into two, then send half load to one, half load to the other.
  • The two resources should perform independently such that the performance curve for the entire system, is the sum of the curves for each resource.
  • These are the two major caveats. The former is fundamentally a design question, and is essentially a data architecture question. The latter is generally simpler to address, and I’ll discuss it a bit later.
  • These are the two major caveats. The former is fundamentally a design question, and is essentially a data architecture question. The latter is generally simpler to address – but adds some constant overhead to each response such that performance is not simply a sum of two curves for independent systems.
  • This is a very simple example of comments on a profile. I really only need 3 queries: post (insert) a comment on a user’s profile, get the list of comments posted to a user’s profile, and delete a comment from a user’s profile. I’m going to give up on getting a list of comments written by a user – might be nice, but isn’t critical.
  • The solution is to partition by user. You need a way to map a user to a partition (hash function, lookup table, etc). Each partition contains data for a set of users – but all the data for each of those users. If user A is on partition 1, all comments on user A’s profile can be found on partition 1 and none of those comments are stored on any other partition.This imposes some costs – 1) determining the partition of a user, ie computing some partitioning function (hash, lookup table, etc). 2) since comments WRITTEN by a given user might be spread across a bunch of partitions, we’re unable to DELETE all comments such comments (we can’t even look them up without querying all partitions). We can’t have any kind of foreign key to enforce that all comments have valid authors. The only solution is to enforce this when we actually access the author information. In practice, this doesn’t add much overhead – presumably when we want to display the comment, we want to display basic info about the author (such as name) as well. If we’re unable to find that basic info, the author probably doesn’t exist – at that point, we can delete the comment. Slightly more logic is needed to account for this possibility and execute the delete – but it’s not too costly. In fact, it’s constant overhead for each comment – and presumably the number of comments we display per request is constant with respect to the rate of requests – so it’s constant wrt requests.
  • The two resources should perform independently such that the performance curve for the entire system, is the sum of the curves for each resource. The two costs I pointed out on the previous slide add overhead to every request, but this overhead is constant wrt to the rate of requests.
  • If you need to look-up comments by author, this can be achieved by maintaining a second table that is partitioning and indexed by author. Querying one partition can get you a list of all comments written by a user, but to get the content you’ll still need to query against the primary partitions for each item – which can be expensive.
  • Every time a comment is written or deleted, you’ll have to write into both the author partition and the user profile partition – a constant expense.You’ll have a constant overhead of storage – every byte in the AuthoerComments partition is duplicated data. Selecting by author is still very expensive, unless you duplicate the entirety of the comment data – which will be a significant storage cost. This duplication won’t make deletion any faster – and deletion in the worst case could require hitting every partition.This solution could be appropriate for some workloads, but has a number of drawbacks that make me inclined not to choose it.
  • What I’ve previously described is the partitioning technique applied to databases. We also use analogous methods for scaling our web server and cache tiers.
  • Load balancer is single point through which requests run – subject to contention, failure.
  • Load balancer is a single point through which many requests are sent, making it a possible point of contention.
  • Applying analogous partitioning techniques on web server tier.
  • Traditionally, one way to partition web server tier is with RRDNS to split requests into two LBs, but …
  • … we have AJAX.JavaScript and XML are just technologies; “Asynchronous” is what’s important – the shift in thinking from web browsing as serial page by page to more fluid navigation that’s wholly contained within the same HTML page. I’m not going to go much into implementation, etc – it’s a lot of detail, and talking about cross-browser compatibility isn’t so fun or interesting. Focus on approaches – what we’ve learned from scaling on the server side can be applied to client side.
  • Using AJAX in application design, allows 1-6 to collapse a bit
  • Using AJAX in application design, allows 1-6 to collapse a bit – can play with them, things don’t have to happen in such a serial order.
  • Doesn’t eliminate single point of contention at login/auth/home page load tier, but does push this back aways.
  • However, we have lots of dynamic content, and we heavily use memcache as the storage tier for that content (backed by MySQL instances for persistence) …
  • But that means we have a ton of data in cache, thus a large number of cache servers are needed to store that data. That makes for a large cache tier behind our tier of server farms. What are the problems with that? – what happens when when a web server physically (and logically at network level) at one end of internal network needs data cached in other end? Long ways to go, and crossing (and congesting) a ton of intermediary links in the process. All that data crossing in middle requires powerful switches/large links (even if have ring or some other more exotic net architecture)…
  • Solution is to partition the cache, then route page requests from clients directly to farms that are physically/logically near the partitions that hold the data need to respond to the request. The net effect of this is that fewer cache requests need to cross the network to get their data – instead are just routed to cache partition immediately behind. This saves internal network traffic by reducing hops this data has to take: instead of 1 byte passing over 4 links (web- web rack switch-center switch-cache rack switch –cache), pass 1 byte over 2 links (web – web rack switch – cache) gives 50% savings. Less aggregate network traffic means less switching/link capacity is required. Fewer hops also means less latency. In practice, the later is quite clear ….
  • The global cache tier is unpartitioned cache – cache server holding the data is equally likely to be on the otherside of the network as it is to be in the same rack as the web server making the request. The partitioned cache is separated into farms – each request is routed from the client (by picking a farm in javascript) to a web server farm that is likely to be close to the cache farm where most of the data needed for the page is cached. Savings is ~40% and will grow as the size and complexity of the network increases.
  • Performance graph from Tuentifrom October
  • In Dec, managed to flatten the line – but traded ~10 ms in best case performance. Good illustration of trading response time for scalability.Note: fit might not be most appropriate, flattened by some outliers in low load range – but clearly reaching higher load at comparable level of performance in dec, although some poor performing outliers at high load as well.
  • February graph looks really good – continued to flatten the line and won back that 10 ms cost we paid in december. The data set is also less noisy than the previous two, indicating the system was more stable.
  • Further improvements in April – shifting best case down another 10 ms while maintaining slope. The data set is again quite clean, indicating a very stable system.
  • We deliver 25k pages/second at peak, but nearly 100k static files/second.
  • Competitive market – only 2 (Akamai and Limelight) are financially very healthy – and Limelight is losing money if you consider investments
  • Transcript

    • 1. ScalabilityErik SchultinkInternational Week of Tech Innovation – 21 Apr, 2010
    • 2. What is Tuenti.com?
    • 3. Tuenti.com
      Started 2007
      1:6 pages, 1:10 minutes
      Based in Madrid
      ~130 employees, 60 engineers
    • 4.
    • 5. INTrO
      What is a scalable system?
    • 6. Scalability is throughput, not response time
    • 7. What is a scalable system?
      response
      time
      requests/second
    • 8. The Problem: Concurrency
      25k pageviews/second at peak
    • 9. What is a scalable system?
      response
      time
      requests/second
      code / architecture
      machines
      Variables:
    • 10. What is a scalable system?
      response
      time
      requests/second
      code / architecture
      machines
      Variables:
    • 11. What is a scalable system?
      response
      time
      x machines
      2x machines
      requests/second
    • 12.
    • 13. The Database TIER
    • 14. The Solution: Partition
    • 15. The Solution: Partition
    • 16. The Solution: Partition
    • 17. Technologies
      MySQL
      simple RDBMS
      InnoDB
      Memcache
      Lighttpd
      PHP
    • 18. The Solution: Partition
      Work must be structured such that each resource can complete it independently
      Overhead to divide workload
    • 19. Data architecture
      Look at queries you perform.
      Divide data such that each query can be answered by querying no more than 1 partition.
    • 20. Comments on a profile
      Comments (user_id, author_id, comment)
      Post a comment on a user’s profile
      Get list of comments on a user’s profile
      Delete a comment from a user’s profile
      Give up for now:
      Comments written by a user
    • 21. Comments on a profile
      Partition by user
      Costs:
      Determining partition of a user
      constant
      Consistency check on access that author still exists
      linear on number of comments to display
    • 22. The Solution: Partition
      Constant
      overhead
    • 23. Alternative Solution
      Partition by user, duplicate by author
      Comments(user_id, author_id, comment)
      AuthoredComments(author_id, user_id, comment_id)
    • 24. Alternative Solution
      Comments(user_id, author_id, comment)
      AuthoredComments(author_id, user_id, comment_id)
      Costs:
      double writes
      extra storage
      delete by author still very expensive
    • 25. The Web SERVER Tier
    • 26. Traditional Systems Architecture
      www.tuenti.com
      Load Balancer
      Web server farm
      Web server farm
      Web server farm
    • 27. Concurrency
    • 28. The Solution: Partition
    • 29. Traditional Systems Architecture
      www.tuenti.com
      12.45.34.179
      12.45.34.178
      Load Balancer
      Load Balancer
      Web server farm
      Web server farm
      Web server farm
      Web server farm
    • 30. AJAX
      What is AJAX?
      “Asynchronous JavaScript and XML”
      Paradigm for client-server interaction
      Change state on client, without loading a complete HTML page
    • 31. Traditional HTML Browsing
      User clicks link
      Browser sends request
      Server receives, parses request, generates response
      Browser receives response and begins rendering
      Dependent objects (images, js, css) load and render
      Page appears
    • 32. AJAX Browsing
      User clicks link
      Browser sends request
      Server receives, parses request, generates response
      Browser receives response and begins rendering
      Dependent objects (images, js, css) load and render
      Page appears
    • 33. How does Tuenti use AJAX?
      Only pageloads are login and home page
      Loader pulls in all JS/CSS
      Afterwards stay within one HTML page, rotating canvas area content
    • 34. Balancing Load
      Top-level requests to www.tuenti.com
      Each request tells client which farm it should be using, based on a mapping
      Mapping can be changed to balance load, perform maintenance, etc
    • 35. Client-side Routing
      www.tuenti.com
      wwwb3.tuenti.com
      wwwb2.tuenti.com
      wwwb1.tuenti.com
      wwwb4.tuenti.com
      Load Balancer
      Load Balancer
      Load Balancer
      Load Balancer
      Web server farm
      Web server farm
      Web server farm
      Web server farm
      Linearly scalable …
    • 36. Client-side Routing
      www.tuenti.com
      wwwb3.tuenti.com
      wwwb2.tuenti.com
      wwwb1.tuenti.com
      wwwb4.tuenti.com
      Load Balancer
      Load Balancer
      Load Balancer
      Load Balancer
      Web server farm
      Web server farm
      Web server farm
      Web server farm
      Linearly scalable … except for top level
    • 37. Client-side Routing
      www.tuenti.com
      wwwb3.tuenti.com
      wwwb2.tuenti.com
      wwwb1.tuenti.com
      wwwb4.tuenti.com
      Load Balancer
      Load Balancer
      Load Balancer
      Load Balancer
      Web server farm
      Web server farm
      Web server farm
      Web server farm
      lots of content creation
      = lots of dynamic data
    • 38. Client-side Routing
      www.tuenti.com
      wwwb3.tuenti.com
      wwwb2.tuenti.com
      wwwb1.tuenti.com
      wwwb4.tuenti.com
      Load Balancer
      Load Balancer
      Load Balancer
      Load Balancer
      Web server farm
      Web server farm
      Web server farm
      Web server farm
      Cache Farm
      lots of dynamic data
      = lots of cache
      = internal network traffic
    • 39. Client-side Routing
      www.tuenti.com
      wwwb3.tuenti.com
      wwwb2.tuenti.com
      wwwb1.tuenti.com
      wwwb4.tuenti.com
      Load Balancer
      Load Balancer
      Load Balancer
      Load Balancer
      Web server farm
      Web server farm
      Web server farm
      Web server farm
      Cache Farm
      Cache Farm
      Cache Farm
      Cache Farm
      Partition cache
      Route requests to a farm near cache needed to respond
    • 40. Internal network savings
    • 41. SERVER-SIDE GAIN?
    • 42.
    • 43.
    • 44.
    • 45.
    • 46. CONTENT DELIVERY
    • 47. Image Serving
      Tuenti serves ~2.5 billion images/day
      At peak, this is >6 Gbps and >70k hits/sec
      We use CDNs
    • 48. What is a CDN?
      Content Delivery Network
    • 49. What is a CDN?
      Examples: Akamai, Limelight
      also dozens more, including Amazon
      Big distributed, object cache
      Pay per use
      either per request, per TB transfer, or per peak Mbps
    • 50. What is a CDN?
      Advantages:
      Outsource dev and infrastructure
      Geographically distributed
      Economies of scale
      Disadvantages:
      High cost
      Less control and transparency
      Commitments
    • 51. What affects image load time?
      Client internet connection
      Response time of CDN
      CDN cache hit rate
    • 52. What affects image load time?
      Client internet connection
      Response time of CDN
      CDN cache hit rate
    • 53.
    • 54. Monitor Performance from Client
      Closer to performance experienced by end-user
      Only way to get view of network issues faced by users (ie last mile)
    • 55.
    • 56. How to fix slow ISP?
      Choose better transit provider
      Set-up peering (or get CDN too)
      Traffic management
    • 57. What affects image load time?
      Client internet connection
      Response time of CDN
      CDN cache hit rate
    • 58.
    • 59.
    • 60. Quality of End-User Experience
      vs.
      Cost
    • 61. We use multiple CDNs, and shift content based on price/performance.
    • 62. Know your content
    • 63. Know your content
    • 64. Know your content
    • 65. Know your content
      30
      75
      200
    • 66. Know your content
      600
    • 67. Know your content
      120
    • 68. Know your content
    • 69. Pre-fetch Content
      Exploit predictable user behavior
      Ex: clicking to next photo in an album
      Simple solution – load next image hidden
      Client browser will cache it (next response < 100 ms)
      Increase tolerance for slow response time
    • 70. Pre-fetch Content
      More complex solution
      Pre-fetch next canvas (full html), render in background – rotate in on Next
      Even more complex
      Instantiate HTML template w/ data on client
      Pre-fetch data X photos in advance, render Y templates in advance with this data
    • 71. Pre-fetch Content
      Problems:
      Rendering still takes time
      Increases browser load
      Need to set cache headers correctly
    • 72. Image delivery
      Small images: High request, low volume
      Most cost-effective to cache in memory
      Large images: High volume, low requests, greater tolerance for latency
    • 73. What affects image load time?
      Client internet connection
      Response time of CDN
      CDN cache hit rate
    • 74. Monitor Performance from Client
      cold servers online
    • 75. More
      jobs.tuenti.com
      dev.tuenti.com
    • 76. Q & A

    ×