Performance metrics for a social network


Published on

Performance metrics for a social network.
Presentation on Fashiolista's usage of Newrelic, Statsd/Graphite and PgFouine to say on top of load times.

See the blogost at

Published in: Technology
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Fashiolista
  • Users of Fashiolista install the so called “love button”. While browsing around the web they can use this button to add their favourite fashion finds to Fashiolista.
  • Once they click the button, we figure out the relevant image on the page and allow you to add it to your profile.
  • The find is added to your profile and other people can follow the items you love.
  • Over the past 2 years thing have moved along rapidly.Currently we’re the second largest fashion community worldwide.With close to 1mln members, and massive monthly engagement.So, a quick check.Who In this room is a member of Fashiolista
  • So, let’s focus on the tech side of things.Powered by
  • On to the topic of this talkTracking the right things to optimize your web application.Now, optimizing a social network is hard.I won’t go into the techniques we use at Fashiolista for speeding up the application, today we’ll focus on the metrics enabling us to focus on the bottlenecks.
  • Why does it matter?We’ve consistently seen massive growth in pageviews after speeding up the site.You can often add 20% to the number of pageviews, just by making the site faster.Once you have an initial audience, making your product work well is really powerfull.
  • We can divide most of our measurement tools into 3 categoriesdevelopment, system health, page level- Answering the specific questions
  • Some of the tools we use.We care mainly about the page level.CPU on the database is interesting, but tells you more about when you’ll run into system limits.While working on optimizing you application you want to focus on the page level.How fast are the applications used to generate this page.Did we wait on the database, or was it solr.
  • Small interlude, because I’m really happy with the Django debug toolbar.I used to work with Symonfy and Django borrowed this, but it’s awesome.Duplicate queriesStatsd Functional reportingCache calls
  • A few tools are really slick and definitely deserve a little demo.NewRelicGraphitePgFouineExplain their use cases
  • New Relic gives you the full drilldown. Starting from frontend to the app to the components.The apdex score tells you the percentage of users which had a good experience (page generated under 500ms).
  • Useful to see how your CDN is doing it’s job.We’ve recently switched from Akamai to Cloudfront.Which seems to work quite well in most countries.
  • At the page level you get a lot of cool information.You see the load times per component.- Such as the the time spent querying the db. (Entity_love table in this case)Or the time spent querying memcached (Which is quite substantial about 20ms)In addition you see the development over time.Which is great for spotting problems which are recently introduced.
  • New Relic also offers database level components.Tables under most loadThe awesome bit, it relates this to pages cause the loadShow again the development over time. For fun try dropping an index and you’ll see it popping up immediately here.
  • That nicely brings me to the deploy overview of newrelic.Every deploy gives you a nice change report.Showing what happened to the speed before and after your deploy.This allows you to quickly spot mistakes landing in production.
  • Shows the average response time before and after your deploy.Also shows things like memory or CPU utilization which will pick major mistakes in those areas.
  • Newrelic does pretty much the same thing for background tasks as for views.You can zoom into the specific components.If yourautoscaling suddenly boots up twice the number of task workers, NewRelic tells you why.
  • So It’s clear I’m exited about New Relic. It’s an awesome tool and helped us a lot with scaling the site.However sometimes you have questions about your data which NewRelic can’t answer.We stick all the metrics we can think of in Graphite.Now NewRelic is really slick. Graphite is designed by engineers and, well looks like this.
  • But it tracks everything you throw at it.And has a very powerful querying interface.Graphite is a data analysis tool. It’s not a dashboard.For instance in this case calls per database server.
  • It’s however a data tool though. And not a dashboard.It has a really techy interface.You use * for wildcardsAnd call functions to run on your data.Quite ugly.
  • You can also retrieve data similar to new relic load time breakdowns.With the added advantage that Graphite is a free tool.
  • It also tracks functional parts of pages.So we see which part of the page is slowing down load times.
  • We track things like loadtime per functionality.We track all database calls.We track 90th percentile loadtimes.Adding new measurements is super easy.
  • Lastly we’re using PgFouineIt’s an awesome tool to get a complete understanding on what your database is actually doing.
  • Performance metrics for a social network

    1. 1. Performance metricsfor a social network
    2. 2. About Me• Thierry Schellenbach• Founder/ CTO Fashiolista• Author of Django Facebook• Github/tschellenbach• Blog:• @tschellenbach
    3. 3. Global Fashion Discovery
    4. 4. 5.000.000+8.000.000+
    5. 5. Growth2nd largest fashion community• 1mln members• 17mln loves/month• 94mln non-bot pageviews
    6. 6. Powered By• Django/Python• PostgreSQL• Solr• Redis• Celery• AWS/ Ubuntu• Nginx/ Gunicorn/ Supervisor
    7. 7. Sexy Metrics driven optimizationHard Because• All content is personalized• Activity is clustered around a few users (>100k followers)• Individual users are insanely active (7 hours in a day is normal)• Social network, can’t easily shard data
    8. 8. Speed is a Feature
    9. 9. Metrics across the board• Development – Spot things early on, wrong usage of ORM etc• System Health – Is my DB healthy, my Redis cluster etc• Page level – Why is my page slow – What is the average speed of the components (DB, Redis, Solr etc)
    10. 10. Tools we use Development System Health Page Level • Cloudwatch • New Relic• Debug toolbar • Munin – Cache calls • Graphite • Nagios – Graphite Timings • DB slow log – Queries and • Redis slow log their explains • Integration Tests – Duplicate query • PgFouine detection
    11. 11. Development StatsD Duplicates Cache Calls
    12. 12. Today’s Presentation New Relic Graphite PgFouine• Dashboard, High • Stash all data, • Understand level insights query it any way what keeps you want your DB busy • Tool, not a dashboard
    13. 13. New Relic• Frontend -> App -> Components (DB, Solr, etc.)• Breaks page performance down into it’s components• Tracks deploys and compares before and after
    14. 14. Are you Supported?• Ruby • Pip install newrelic• Java • Edit the .ini• .NET • Add the WSGI middleware• PHP • Wait for Magic• Python
    15. 15. End user load times• Drill down all the way to Database calls• The purple line is our app, the rest frontend Frontend (97%) App
    16. 16. Global page loads
    17. 17. Page Level• Average frontend performance per page• Click to view App level breakdown Page. Not URL. To App Level
    18. 18. Drill down/ App overview History Memcached DB Query
    19. 19. Database • See which tables are under most load • See which pages cause the load• Development over time
    20. 20. Deploys
    21. 21. Deploys part TwoResponse Time Pre & Post Memory Utilization
    22. 22. Background TaskNumber of Taskcalls (sample)
    23. 23. Graphite Insights• NewRelic has the overview, Graphite the detail• Open Source!• Throw data at it via UDP• Popularized by Etsy (see for link)
    24. 24. It’s Complicated
    25. 25. Tracks Everything
    26. 26. Setup• Track using StatsD – Support for (PHP, Python, Ruby, Node, Java)• Hierarchy (python example)• get.<app>.<view>.<component> with request.timings(get.user.profile_page.sql): print ‘database query here’
    27. 27. Data tool/ Not a dashboard• Wildcards – get.<app>.<view>.*.upper_90 – get.<app>.*.redis.zadd.upper_90 – limit(sortByMaxima(get.<app>.<view>.*.up per_90),4)
    28. 28. /style/<user>/ performance Memcached Slowdown ZADD Set Many
    29. 29. Including Functional parts of Pages• More like this part is tracked• Solr & Redis Cache
    30. 30. What we Track• Loadtime per bit of functionality• Database calls per DB• 90th percentile load times• Task broker roundtrip times• Facebook API calls
    31. 31. PgFouine• Run on samples of all queries (say 5m)• Not just slow queries• Repeating a simple query many times is also wrong, PgFouine finds it• See Instagram’s fabric snippet•
    32. 32. PgFouine ContinuedQueries that took upthe most time (N)• Spots issues with many small queries NormalizedCompare multiplereports
    33. 33. PgFouine Tips• My colleague wrote a fast C++ version• Also look at:• Pg Stat Statement• Pg Badger
    34. 34. Concluding New Relic Graphite PgFouine• Dashboard, High • Stash all data, • Understand level insights query it any way what keeps you want your DB busy • Tool, not a dashboard
    35. 35. Q&AWe’re Searching for Django Developers & Linuxsystem administrators! source projects: Try Django Facebook!