Last.fm vs. Xbox


David Singleton
last.fm/user/underpangs
twitter.com/dsingleton
Music discovery powered by Scrobbling

Personalised radio, social network, events, a
“wikipedia of music”

High traffic

  ...
Last.fm Architecture
       Load balancer



        HTTP Cache



        Web Server



        Object Cache



         ...
Xbox Live Platform,
millions of users

Last.fm Radio App

Built by Microsoft

Powered by our API

Launched along side
Face...
Last.fm vs. Xbox
So, what’s there to
    talk about?
Good Things ™

  New users. It’s really cool.

Bad Things™

  Lots of new users, traffi...
Xbox Live: 15 million users
       assuming a 10% take-up rate = 1,500,000 users
      startup: 5 requests   + starting ra...
Oh fuck
What next?

Picked a metric: requests per second

Estimated traffic increase vs capacity

Selected our goals;

  Serve requ...
Profiling traffic

Used traffic generated beta testing

Web server request logs

  Common format, widely supported format

  ...
Which API requests
     were made?
        Method

71638   track.getInfo
53941   artist.getImages
15150   radio.getPlaylis...
Which API requests
   were made?
Raw data from beta
Calls Method                         Total    Average

53941   artist.getImages              19647    0...
How long did each
 method take?
Why so many
track.getInfo calls?
A tiny UI tweak...

...responsible for 25% of calls.

Arrggghhhhhh

Added that informatio...
What next?
What about the
  getImages calls?

Powers an artist slideshow visualisation

Results of this call won’t change often

  Se...
Cached Requests



4
Request generation
Calls Method                         Total    Average

53941   artist.getImages              19647   0....
kcachegrind
 http://kcachegrind.sourceforge.net




  webgrind
 http://code.google.com/p/webgrind/
What happens if
   things break?

Simulated failing calls

Highlighted essential calls

Acted as a dry-run for launch day ...
Only essential
  requests
Prepare for the worst

 Unexpected problems we’ve had:

  Servers overheating (twice)

  Hardware (almost) stolen from dat...
Backup plans, AKA
     The “Kill List”
       Plan                  Effect         Severity

 Disable radio DB-
          ...
Communication
Last.fm: Launch Day
    (When traffic attacks)
How did it go?

Our estimate was about 50% over

Didn’t exceed capacity (but got quite close)

Profiling and caching was es...
What did we learn?

Use timezones to rollout slowly

Traffic will follow daily trends

Live monitoring is essential

Backup...
So, how does this
    help me?
1. Estimate

Choose your metric

Estimate launch traffic

Compare against capacity

Make performance targets

Know your lim...
2. Profile requests

Start with a sample of traffic

Extract data for your metric

Visualise the results

Identify expensive...
3. Optimise
Reduce number of requests

Set the right HTTP caching headers

  Combine with reverse web proxy

  Prime cache...
Web Request



Load balancer



HTTP Cache



 Web Server



Object Cache



  Database
Web Request



Load balancer



HTTP Cache



 Web Server



Object Cache



  Database
Web Request



Load balancer



HTTP Cache



 Web Server



Object Cache



  Database
4. Plan for failure

Simulate failures

Know your weak spots

Prepare backups plans

Communicate with users and partners
5. Launch it!
Roll out slowly, if you can

Setup live monitoring

If something goes wrong;

  Don’t panic

  Keep people u...
1. Start with an estimate
2. Profile your traffic
3. Make optimisations
4. Prepare for the worst
5. Launch it!
Last.fm vs. Xbox

Questions?
David Singleton
last.fm/user/underpangs
twitter.com/dsingleton
Upcoming SlideShare
Loading in …5
×

Last.fm vs Xbox

1,518 views

Published on

Given at DIBI, Newcastle Apr 2010.

http://dibiconference.com/

Published in: Technology
  • Be the first to comment

Last.fm vs Xbox

  1. 1. Last.fm vs. Xbox David Singleton last.fm/user/underpangs twitter.com/dsingleton
  2. 2. Music discovery powered by Scrobbling Personalised radio, social network, events, a “wikipedia of music” High traffic Monthly visitors: 40 million Monthly page views: 500,000 million
  3. 3. Last.fm Architecture Load balancer HTTP Cache Web Server Object Cache Database
  4. 4. Xbox Live Platform, millions of users Last.fm Radio App Built by Microsoft Powered by our API Launched along side Facebook & Twitter
  5. 5. Last.fm vs. Xbox
  6. 6. So, what’s there to talk about? Good Things ™ New users. It’s really cool. Bad Things™ Lots of new users, traffic spikes A very important, high profile, launch How did Last.fm approach this?
  7. 7. Xbox Live: 15 million users assuming a 10% take-up rate = 1,500,000 users startup: 5 requests + starting radio: 5 requests + 15 minutes of radio: 60 requests 1 hour of radio = 250 requests per user an hour of radio per user is a rough averaged guess 1,500,000 users = 375,000,000 requests over 24 hours assuming an even distribution = 4,500 requests / second Likely peaking at more than triple = 15,000 requests / second Last.fm: 2,000 requests/sec based on number of servers and apache configuration estimated max capacity of 3,500 requests per second
  8. 8. Oh fuck
  9. 9. What next? Picked a metric: requests per second Estimated traffic increase vs capacity Selected our goals; Serve requests faster Reduce number requests
  10. 10. Profiling traffic Used traffic generated beta testing Web server request logs Common format, widely supported format Hundreds of existing tools We generated some stats using AWK...
  11. 11. Which API requests were made? Method 71638 track.getInfo 53941 artist.getImages 15150 radio.getPlaylist 7308 library.getArtists 5020 user.getRecentStations 4979 ads.getVideos 4205 radio.tune 3155 track.love 1507 artist.getInfo 1258 user.getRecommendedArtists 1135 user.getInfo 1130 geo.getTopArtists 1128 radio.gamerStations 1102 tag.getTopArtists 1021 track.ban 1006 user.getLovedTracks 340 library.addArtist 206 auth.getMobileSession
  12. 12. Which API requests were made?
  13. 13. Raw data from beta Calls Method Total Average 53941 artist.getImages 19647 0.36 71638 track.getInfo 15789 0.22 15150 radio.getPlaylist 6962 0.46 7308 library.getArtists 2402 0.33 4979 ads.getVideos 1810 0.36 5020 user.getRecentStations 1674 0.33 1102 tag.getTopArtists 1488 1.35 1258 user.getRecommendedArtists 1457 1.16 4205 radio.tune 923 0.22 1130 geo.getTopArtists 575 0.51 1507 artist.getInfo 440 0.29 1128 radio.gamerStations 298 0.26 1006 user.getLovedTracks 271 0.27 1135 user.getInfo 171 0.15 206 auth.getMobileSession 38 0.19 136 user.signUp 32 0.24 123 user.terms 16 0.13 3155 track.love 0 0.00
  14. 14. How long did each method take?
  15. 15. Why so many track.getInfo calls? A tiny UI tweak... ...responsible for 25% of calls. Arrggghhhhhh Added that information to a sensible API call Microsoft kindly updated the app
  16. 16. What next?
  17. 17. What about the getImages calls? Powers an artist slideshow visualisation Results of this call won’t change often Set a HTTP cache timeout Set caching on a few other calls too
  18. 18. Cached Requests 4
  19. 19. Request generation Calls Method Total Average 53941 artist.getImages 19647 0.36 71638 track.getInfo 15789 0.22 15150 radio.getPlaylist 6962 0.46 7308 library.getArtists 2402 0.33 4979 ads.getVideos 1810 0.36 5020 user.getRecentStations 1674 0.33 1102 tag.getTopArtists 1488 1.35 1258 user.getRecommendedArtists 1457 1.16 4205 radio.tune 923 0.22 1130 geo.getTopArtists 575 0.51 1507 artist.getInfo 440 0.29 1128 radio.gamerStations 298 0.26 1006 user.getLovedTracks 271 0.27 1135 user.getInfo 171 0.15 206 auth.getMobileSession 38 0.19 136 user.signUp 32 0.24 123 user.terms 16 0.13 3155 track.love 0 0.00
  20. 20. kcachegrind http://kcachegrind.sourceforge.net webgrind http://code.google.com/p/webgrind/
  21. 21. What happens if things break? Simulated failing calls Highlighted essential calls Acted as a dry-run for launch day failures Informed our backup plans
  22. 22. Only essential requests
  23. 23. Prepare for the worst Unexpected problems we’ve had: Servers overheating (twice) Hardware (almost) stolen from data-centers Power outage in the office
  24. 24. Backup plans, AKA The “Kill List” Plan Effect Severity Disable radio DB- Faster calls Minor backing Disable Flash Player Save 200 req/sec Major Drop non essential Reduce Xbox traffic Extreme Xbox API calls by 0 - 50% Drop X% of radio Reduce Xbox traffic Nuclear tune calls by X%
  25. 25. Communication
  26. 26. Last.fm: Launch Day (When traffic attacks)
  27. 27. How did it go? Our estimate was about 50% over Didn’t exceed capacity (but got quite close) Profiling and caching was essential Or we would have gone down
  28. 28. What did we learn? Use timezones to rollout slowly Traffic will follow daily trends Live monitoring is essential Backup plans are comforting Pre-fill caches before launch
  29. 29. So, how does this help me?
  30. 30. 1. Estimate Choose your metric Estimate launch traffic Compare against capacity Make performance targets Know your limitations
  31. 31. 2. Profile requests Start with a sample of traffic Extract data for your metric Visualise the results Identify expensive requests for your metric Use profiling tools on individual requests
  32. 32. 3. Optimise Reduce number of requests Set the right HTTP caching headers Combine with reverse web proxy Prime caches for common calls Use an object cache Avoid language level optimisation
  33. 33. Web Request Load balancer HTTP Cache Web Server Object Cache Database
  34. 34. Web Request Load balancer HTTP Cache Web Server Object Cache Database
  35. 35. Web Request Load balancer HTTP Cache Web Server Object Cache Database
  36. 36. 4. Plan for failure Simulate failures Know your weak spots Prepare backups plans Communicate with users and partners
  37. 37. 5. Launch it! Roll out slowly, if you can Setup live monitoring If something goes wrong; Don’t panic Keep people updated Have some champagne on ice
  38. 38. 1. Start with an estimate 2. Profile your traffic 3. Make optimisations 4. Prepare for the worst 5. Launch it!
  39. 39. Last.fm vs. Xbox Questions? David Singleton last.fm/user/underpangs twitter.com/dsingleton

×