Lessons from Branch's launch

  • 790 views
Uploaded on

 

More in: Technology , Education
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
790
On Slideshare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
2
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide
  • Hi Im andrew blah blah blah\n
  • What is branch anyway?\nThink of a dinner table conversation on the web\n\nIn between a blog post, 1 person, and a forum open to anyone\n\nWe’re trying to create a place for thoughtful communication\nSo, \nhow did we build it??\n\n
  • We’re running a pretty standard Rails app off of heroku,\nPostgres for our DB, Redis on EC2 for caching and misc. key value\nCoffee/Haml for the frontend\nHow do we monitor it?\nadd ec2\n
  • a few tools Using statsD on ec2 paired with datadog for most of our app stats - \n\netsy developed statsD as a metrics aggregator-\n we send over all our metrics from logs and internal events to statsd, and it gets aggregated and sent to datadog\n\nddog makes it very easy to define and track custom metrics , \nand alerts\n\nTo keep tabs on DB< New Relic through heroku\n\nTo keep an eye on async tasks, Resque web app (more on that later)\n
  • We learned alot during our launch, tonight I’m going to talk about the three biggest take aways we got from it.\n\nThese aren’t branch-specific lessons, should be applicable to any consumer-facing rails app \n
  • Probably stating the obvious, but we can’t predict the future as much as we might like to believe.\n\nThe week before launch we thought about what to optimize.\n\nWe set up a waitlist system so we could throttle the number of people using our site\n\nwhat about people who are using our site?\n\n“Creating branches!” Since we’re granting 100s of ppl access, that will cause a bunch of load- lets optimize that\n\nopened the floodgates\n
  • New Relic ++\nThe ability to drill down into your database load made optimizing so much easier\n\nNR catalogues every request and database transaction, \nWe can examine the stack traces for slow queries,\nView response time geographically\n\nBeyond just which transactions are slow- which components of those transactions are slow?\nare you stuck in ruby, or doing too many sql calls.\n\nSo as we looked at our db load during the first few days,\n
  • This is what we set up to throttle load!, but it was regularly making up >20% of our database operations\n\nWHY??\n\nWhat does the validates_uniqueness do?\nPostgres scans for matches, so if there are not indexes, you are doing close to a full table scan!\n\nIn the end, the public facing actions that we weren’t throttling were the biggest load\n\nFirst day we didn’t even invite anyone since we were just struggling to keep up with waitlist signups\n\nsolution?\n
  • We needed to optimize those validations, one specific culprit was the email uniqueness\nwe added a functional index on the email column\nwe LOWERCASE FUNCTION t\n\nsidenote: active record doesn’t support creating functional indexes, hence the explicit sql\n\nthis also means we needed to store our scheme as a structure.sql dump instead of schema.rb\nin order to reconstruct it\n\nSo that was the first thing we learned\n-- Optimizing based on assumptions is going to be wrong sometimes, so be ready for that and have the tools to deal with it.\n
  • Everyone loves async right?\n\nAlot of our app happens asynchronously at branch, \neverything from sending emails to pushing realtime updates,\n to firing off tweets\n\nsidenote -- give you a rundown of how we handle that\n
  • We used DJob until a few weeks before launch\n\nDelayedJob is great and super easy to work with,\nin the end we wanted more granular control over our background tasks since a lot of stuff happens there\n\n_different queues, priorities, complete visibility into them , failed jobs etc.\n\n\n
  • We used DJob until a few weeks before launch\n\nDelayedJob is great and super easy to work with,\nin the end we wanted more granular control over our background tasks since a lot of stuff happens there\n\n_different queues, priorities, complete visibility into them , failed jobs etc.\n\n\n
  • We used DJob until a few weeks before launch\n\nDelayedJob is great and super easy to work with,\nin the end we wanted more granular control over our background tasks since a lot of stuff happens there\n\n_different queues, priorities, complete visibility into them , failed jobs etc.\n\n\n
  • We used DJob until a few weeks before launch\n\nDelayedJob is great and super easy to work with,\nin the end we wanted more granular control over our background tasks since a lot of stuff happens there\n\n_different queues, priorities, complete visibility into them , failed jobs etc.\n\n\n
  • We used DJob until a few weeks before launch\n\nDelayedJob is great and super easy to work with,\nin the end we wanted more granular control over our background tasks since a lot of stuff happens there\n\n_different queues, priorities, complete visibility into them , failed jobs etc.\n\n\n
  • Being able to see into the queues individually (and individually kill them) saved our asses more than once.\n\nESPECIALLY EXAMINING failed queue in detail, \nwhich jobs are failing, \nhow frequently, \nare there stuck workers\n\n\nRuns on Redis, created by @defunkt, has a great community and a host of plugins \n
  • When thinking about how to optimize Branch before launch, we mostly concentrated on how to optimize user response time. \n\nSo we ended up making a bunch of actions asynchronous-\n\nwe even have quite a few background jobs which QUEUE background jobs.\n\nIf they’re processed in the background, who cares if they’re fast?\n\n\n
  • When thinking about how to optimize Branch before launch, we mostly concentrated on how to optimize user response time. \n\nSo we ended up making a bunch of actions asynchronous-\n\nwe even have quite a few background jobs which QUEUE background jobs.\n\nIf they’re processed in the background, who cares if they’re fast?\n\n\n
  • A big lesson for us is that you can’t think like that- especially when so much of your app is async\n\nespecially at peak activity times like when we would send batches of invites\n\nHOW\n\n\n\n
  • There are a bunch of great blog posts and gems to do this\n\nbut be careful!\n\n\n
  • They are not fire and forget\n\nWhen you have a lot of workers doing silly things in your DB, the result can be catastrophic\n
  • They are not fire and forget\n\nWhen you have a lot of workers doing silly things in your DB, the result can be catastrophic\n
  • They are not fire and forget\n\nWhen you have a lot of workers doing silly things in your DB, the result can be catastrophic\n
  • Rails is great- gives us alot of convenience, makes it easy to develop quickly\n\nactive record abstracts your database for you, but \n\nthat’s a double edged sword\n
  • I can run this migration, even on production data, and it won’t be THAT painful.\nI even think I’m being efficient with find_each\n\nIt should be. Running a migration like this on prod can brick for ??? how long\n
  • there’s probably a SQL ninja here who could accomplish this in 1 call,\n\nbut we can all agree this is much better than the first\n\nThe ORM makes it easy to ignore sql completely, but that will bite you. FOR INSTANCE\n
  • At first: this sounds super convenient\n\n
  • At first: this sounds super convenient\n\n
  • Especially with big objects! \n\ngod forbid you do any type of validations on those serialized hashes, since they dont support indices\n\nWe were storing twitter token objects, \nUpdating/ changing the tables, or moving the data became EXTREMELY expensive. \n\n\nyou have to de-serialize in memory! \n
  • \nIf your object is so important, maybe it needs its own table?\n\nIf you don’t have to store it, dont!\n\nHstore fsho!\n
  • \nIf your object is so important, maybe it needs its own table?\n\nIf you don’t have to store it, dont!\n\nHstore fsho!\n
  • \nIf your object is so important, maybe it needs its own table?\n\nIf you don’t have to store it, dont!\n\nHstore fsho!\n
  • Hstore is the way postgres stores hashes\n\nit’s not supported in rails natively yet, but there are gems that will allow you to use\n\nmuch faster!\n\nthose are just two examples of ways to shoot yourself in the foot with rails\n\nstay alert!\n
  • \n
  • \n
  • \n
  • \n
  • \n

Transcript

  • 1. Our experience launching a Ruby on Rails app Andrew Flockhart @andruflockhart aflock@branch.com
  • 2. What is Branch?
  • 3. Our stack• Rails• Postgres• Heroku• Amazon EC2
  • 4. Monitoring• StatsD + Datadog• New Relic• Resque
  • 5. Lessons from Launch• What to optimize?• Async ALL the things! (?)• Don’t let rails make you dumb!
  • 6. Predicting your app’s future is tough Photo by FUNKYAH
  • 7. Finding the bottlenecks
  • 8. App load• We didn’t predict where the hotspots would be correctly• Validations were a large bottleneck
  • 9. Conclusion:Indexes are your friend• One sql statement cut our DB load in half• Functional indexes++
  • 10. DelayedJob vs. Resque
  • 11. DelayedJob vs. Resque• DelayedJob is simple, plug and play, great for small volume of jobs
  • 12. DelayedJob vs. Resque• DelayedJob is simple, plug and play, great for small volume of jobs• Resque is more granular, allows more control
  • 13. DelayedJob vs. Resque• DelayedJob is simple, plug and play, great for small volume of jobs• Resque is more granular, allows more control • Different queues
  • 14. DelayedJob vs. Resque• DelayedJob is simple, plug and play, great for small volume of jobs• Resque is more granular, allows more control • Different queues • Priorities
  • 15. DelayedJob vs. Resque• DelayedJob is simple, plug and play, great for small volume of jobs• Resque is more granular, allows more control • Different queues • Priorities • Visibility
  • 16. Resque
  • 17. Background jobs are magic, right?photo by Jenn and Tony Bot
  • 18. Background jobs are magic, right?photo by Jenn and Tony Bot
  • 19. Background jobs are magic, right?photo by Jenn and Tony Bot
  • 20. Over half of our database load was background tasks at times
  • 21. Over half of our database load was background tasks at times :0
  • 22. Autoscaling workers• Easy to do with Heroku’s API• Saves money• Your app is now responsive to load
  • 23. Be mindful of yourbackground tasks!
  • 24. Be mindful of your background tasks!1. Treat your jobs as users
  • 25. Be mindful of your background tasks!1. Treat your jobs as users2. Have a sensible upper bound to the # of workers
  • 26. Be mindful of your background tasks!1. Treat your jobs as users2. Have a sensible upper bound to the # of workers3. Monitor, monitor, monitor
  • 27. Don’t let rails make you lazy!
  • 28. Learn to speak SQL• It’s easy to write naive migrations with no repercussions while you’re developing• But who wants to do 500,000 insert statements?
  • 29. 4 is better than 500,000So be conscious of what SQL rails generates
  • 30. You can serializeobjects into your tables
  • 31. You can serializeobjects into your tables
  • 32. You can serializeobjects into your tables Sounds great!
  • 33. Serializing/Deserializing to/from text is expensive!
  • 34. Solution?
  • 35. Solution?• This is why we have relations between tables
  • 36. Solution?• This is why we have relations between tables• Don’t store anything extraneous
  • 37. Solution?• This is why we have relations between tables• Don’t store anything extraneous• Generate on the fly, or use something like hstore.
  • 38. Hstore is faster.
  • 39. In conclusion
  • 40. In conclusion• Expect your app to misbehave
  • 41. In conclusion• Expect your app to misbehave• Async isn’t an optimization cure-all
  • 42. In conclusion• Expect your app to misbehave• Async isn’t an optimization cure-all• Don’t let Rails make you lazy!
  • 43. Thanks! Psst! We’re hiring. aflock@branch.com Slides:bit.ly/branch_launch_lessons
  • 44. Sources• Autoscale workers: http://blog.leshill.org/ blog/2011/04/03/using-resque-and-resque- scheduler-on-heroku.html• Using hstore with rails: http:// travisjeffery.com/b/2012/02/using-postgress- hstore-with-rails/• These slides: bit.ly/branch_launch_lessons