Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Rails performance at - Guillaume Luccisano


Published on

The 7 main actions we took to improve the Rails stack performance at

June 2011 - Guillaume Luccisano

Published in: Technology
  • Be the first to comment

Rails performance at - Guillaume Luccisano

  1. 1. The 7 main actions we took to improvethe Rails stack performance at Guillaume Luccisano - June 2011
  2. 2. Guillaume Luccisano• Work with ruby for about 4 years now• Arrived from France 8 months ago• Ran the migration from Rails 2.1 to 2.3..• ... and to Rails 3 more recently
  3. 3. In the meantine.... I was able to work onimproving performance
  4. 4. I love itAnd I have learn ton of stuff to share with you today
  5. 5. First lessonGuess what?
  6. 6. They lied to usRails 3 is slower than Rails 2
  7. 7. Ok, troll is overSo what’s making our app slow ?
  8. 8. First one is obvious (no) SQL
  9. 9. Less obviousWorker queue wait
  10. 10. External dependencies
  11. 11. Yes, it canMemcached
  12. 12. That one, we love it, so ... Ruby
  13. 13. And its dark sideThe garbage collector
  14. 14. Cool, we have made somegood guesses, now what ?
  15. 15. The Rails stack 12 frontend serversNginx + cache layer with a magic conf from our experts
  16. 16. Talking to 24 app servers
  17. 17. 2423: one is currently dead
  18. 18. R.I.P app2Unicorn running with 21 app workers per box 504 workers, yeah!Running Ruby Entreprise Edition with GC tuning
  19. 19. Running at 10% of capacity during normal time We can hit 80K+ Rails requests per minute easily during peak time
  20. 20. One beefy master DB with 7 slaves (not used only by rails)And only 2 memcached boxes for rails
  21. 21. + some mongo, rabbit, etc... And haproxy everywhereto make us failure resistant...
  22. 22. Cool, so are we finally going to optimize stuff or not?
  23. 23. Yes, the real first step is:Monitoring
  24. 24. You can’t improveperformance in the dark
  25. 25. Newrelic =Awesome
  26. 26. Ganglia =Awesome
  27. 27. Make yourself happy byimproving things and seeing instantly the result
  28. 28. This is the kind of graph we are looking for Still work to do, but it was going down!
  29. 29. Great, we are all setBut eh, what is fast btw?
  30. 30. Fast =Nothing
  31. 31. Less code =Easier to maintain = (often) Faster
  32. 32. I’m curiousWhat is a good average response time? 300ms 200ms 100ms 50ms
  33. 33. we were at 250+msWe are now at 80msAnd there is still ton to do!
  34. 34. How did we do that?
  35. 35. But keep in mind that Every app is unique
  36. 36. 1) SQL• Tracked down slow queries, added Indexes• Refactored bad queries • Sometime, 2 queries is faster than a big one• Retrievednetwork and lesscolumns from the db only needed •Less db ruby object creation!
  37. 37. 2) C libraries - why?• We added a bunch of C libraries• Bypass ruby memory management • less garbage collection of ruby objects• Raw C speed!• Easy to drop in
  38. 38. 2) C libraries - which?• Curb for HTTP (libcurl) (with real timeout) • Support pipelining for mutli requests at the same time• Yajl, the fastest JSON library• Nokogiri, the fastest XML library• Snappy, super fast compression tool by google
  39. 39. Yajl: the facts ~3.5x faster than JSON.generate ~1.9x faster than JSON.parse ~4.5x faster than YAML.load~377.5x faster than YAML.dump ~1.5x faster than Marshal.load ~2x faster than Marshal.dumpAll of this while taking less memory!
  40. 40. Snappy: the factsruby-1.9.2-p180 :061 > f = :066 > f.length => 2504GC.start; Benchmark.measure { 1000.times { ActiveSupport::Gzip.compress(f) } }=> 1.840000 0.010000 1.850000 ( 1.842741)GC.start; Benchmark.measure { 1000.times { Snappy.deflate(f) } }=> 0.020000 0.000000 0.020000 ( 0.019659)ruby-1.9.2-p180 :064 > ActiveSupport::Gzip.compress(f).length => 971ruby-1.9.2-p180 :065 > Snappy.deflate(f).length => 1398
  41. 41. 3) Memcache• Upgraded our memcached to the last version! • (Not the gem, the real memcached) • We got a x3 improvements!!• Switched everything to the memcached gem • used and made by twitter, use the C libmemcached • 3.5 times faster than Dalli on a simple get (ruby equivalent)
  42. 42. 3) Memcache• Used more raw memcache objects • Avoid useless marshal dump • Yajl + Snappy + raw memcache = Win Combo• Removed huge get_multi (100+ items) • It can be slower than the sql query equivalent!• Tuned memcached options
  43. 43. 4) Cache expiration• Removed a ton of after_save cache expiration • Using correct expiration time • Or using auto changing cache_key
  44. 44. 5) Switched to Unicorn• Like a Boss! Twitter and Github use it.• Fast bootup• Graceful restart• Reduced our queue wait to 0 • Our previous round robin dispatch on our mongrels cluster added up to 40ms delay on average to each request.
  45. 45. 6) More GC tuning• Memory vs performance trade off • export RUBY_HEAP_FREE_MIN=100000 • export RUBY_HEAP_SLOTS_GROWTH_FACTOR=1 • export RUBY_HEAP_MIN_SLOTS=800000 • export RUBY_HEAP_SLOTS_INCREMENT=200000• We added a GC run after expensive requests: • We divided by 3 our time spent in GC during request
  46. 46. 7) Regain memory• Less objects = Faster garbage collection = happiness• Cleaned up our Gemfile and removed unused dependencies • aws-s3 gem = 30k ruby objects in your stack • A blank Rails project (2.3 or 3.0) is =~ 100K objects• Cleaned up our codebase! Removing tons of old controllers/views
  47. 47. Regain memory: the facts We refactored our translations system: we saved 50k of useless objects: 10% garbage collection speed upEnough memory saving to add one more unicorn worker
  48. 48. To go further...
  49. 49. • Create or find a lighter aws s3 gem! using Curb!• Starting using extra light controller ala Metal for some critical actions• Use snappy to compress fragment caching• Give a try to kiji, ruby fork of REE (from twitter)• Or switch the stack to ruby 1.9 or to jRuby
  50. 50. • Do more memory profiling, with tools like memprof• Get a real nonblocking stack to handle several requests per worker • Try Goliath: a non blocking ruby framework• Try the MySQL nosql plugin (if only we were using MySQL!)
  51. 51. Bonus - Extra slidesremoved to save time
  52. 52. Curb: the facts
  53. 53. Memcache - Tune it!• Memcached has a bunch of options: • Auto failover and recovery • Noreply, Noblock • Tcp nodelay, UDP • UDP for set and TCP for get? • Key verification • Binary protocol (but slower in ruby, don’t use it :p) • and more.... Play with them!
  54. 54. Clean up your before_filters We created a speed_up! methodto skip all before_filters on critical actions speed_up! :only => [‘critical’, ‘action’]
  55. 55. find_in_batches
  56. 56. Set.include? instead ofArray.include?
  57. 57. url helpers are slowstore them in a variable when you can to avoid multiple calls
  58. 58. use the bang!like gsub! and avoid new object creation
  59. 59. beware of symbol leakEvery symbol and every string converted to a symbol stay forever in memory => memory leak
  60. 60. The Cloud computing eraCloud is great, but dedicated hardware is still a way faster We monitored a x3 when switching socialcam from h****u to our own cluster.
  61. 61. FIN
  62. 62. If you are awesome and want to tackle challenges on awesome products and systems used by millions of users every day: We are currently hiring awesome people Fork me on Github: @kwiRecruiting coordinator at JTV: Brooke (
  63. 63. Links / References••••••••••