The 7 main actions we took to improve
the Rails stack performance at Justin.tv



        Guillaume Luccisano - June 2011
Guillaume Luccisano

• Work with ruby for about 4 years now
• Arrived from France 8 months ago
• Ran the migration from Rails 2.1 to 2.3..
• ... and to Rails 3 more recently
In the meantine....


 I was able to work on
improving performance
I love it

And I have learn ton of stuff to
    share with you today
First lesson



Guess what?
They lied to us


Rails 3 is slower
  than Rails 2
Ok, troll is over


So what’s making
 our app slow ?
First one is obvious



    (no)   SQL
Less obvious



Worker queue wait
External dependencies
Yes, it can



Memcached
That one, we love it, so ...




     Ruby
And its dark side




The garbage collector
Cool, we have made some
good guesses, now what ?
The Justin.tv Rails stack




            12 frontend servers



Nginx + cache layer with a magic conf from our experts
Talking to 24 app servers
24

23: one is currently dead
R.I.P app2




Unicorn running with 21 app workers per box
                    504 workers, yeah!

Running Ruby Entreprise Edition with GC tuning
Running at 10% of capacity during normal time


      We can hit 80K+ Rails requests
     per minute easily during peak time
One beefy master DB with 7 slaves
             (not used only by rails)




And only 2 memcached boxes for rails
+ some mongo, rabbit, etc...

  And haproxy everywhere
to make us failure resistant...
Cool, so are we finally going
 to optimize stuff or not?
Yes, the real first step is:



Monitoring
You can’t improve
performance in the dark
Newrelic
   =
Awesome
Ganglia
   =
Awesome
Make yourself happy by
improving things and seeing
    instantly the result
This is the kind of graph we are looking for




  Still work to do, but it was going down!
Great, we are all set

But eh, what is fast btw?
Fast
  =
Nothing
Less code
         =
Easier to maintain
         =
       (often)


     Faster
I’m curious
What is a good average response time?


               300ms
               200ms
               100ms
               50ms
we were at 250+ms

We are now at 80ms

And there is still ton to do!
How did we do that?
But keep in mind that
 Every app is unique
1) SQL

• Tracked down slow queries, added Indexes
• Refactored bad queries
  •   Sometime, 2 queries is faster than a big one


• Retrievednetwork and lesscolumns from the db
            only needed
  •Less db                  ruby object creation!
2) C libraries - why?

• We added a bunch of C libraries
• Bypass ruby memory management
 • less garbage collection of ruby objects
• Raw C speed!
• Easy to drop in
2) C libraries - which?

• Curb for HTTP (libcurl) (with real timeout)
  •   Support pipelining for mutli requests at the same time

• Yajl, the fastest JSON library
• Nokogiri, the fastest XML library
• Snappy, super fast compression tool by google
Yajl: the facts
   ~3.5x faster than JSON.generate
    ~1.9x faster than JSON.parse
     ~4.5x faster than YAML.load
~377.5x faster than YAML.dump
    ~1.5x faster than Marshal.load
 ~2x faster than Marshal.dump

All of this while taking less memory!
Snappy: the facts
ruby-1.9.2-p180 :061 > f = File.read('Gemfile')
ruby-1.9.2-p180 :066 > f.length
 => 2504

GC.start; Benchmark.measure { 1000.times { ActiveSupport::Gzip.compress(f) } }
=> 1.840000 0.010000 1.850000 ( 1.842741)
GC.start; Benchmark.measure { 1000.times { Snappy.deflate(f) } }
=> 0.020000 0.000000 0.020000 ( 0.019659)

ruby-1.9.2-p180 :064 > ActiveSupport::Gzip.compress(f).length
 => 971
ruby-1.9.2-p180 :065 > Snappy.deflate(f).length
 => 1398
3) Memcache

•   Upgraded our memcached to the last version!
    •   (Not the gem, the real memcached)
    •   We got a x3 improvements!!



•   Switched everything to the memcached gem
    •   used and made by twitter, use the C libmemcached
    •   3.5 times faster than Dalli on a simple get (ruby equivalent)
3) Memcache
• Used more raw memcache objects
 •   Avoid useless marshal dump
 •   Yajl + Snappy + raw memcache = Win Combo


• Removed huge get_multi         (100+ items)

 •   It can be slower than the sql query equivalent!


• Tuned memcached options
4) Cache expiration


• Removed a ton of after_save cache expiration
 •   Using correct expiration time
 •   Or using auto changing cache_key
5) Switched to Unicorn

•   Like a Boss! Twitter and Github use it.
•   Fast bootup
•   Graceful restart
•   Reduced our queue wait to 0
    •   Our previous round robin dispatch on our mongrels cluster
        added up to 40ms delay on average to each request.
6) More GC tuning

•   Memory vs performance trade off
        •   export RUBY_HEAP_FREE_MIN=100000
        •   export RUBY_HEAP_SLOTS_GROWTH_FACTOR=1
        •   export RUBY_HEAP_MIN_SLOTS=800000
        •   export RUBY_HEAP_SLOTS_INCREMENT=200000



•   We added a GC run after expensive requests:
    •   We divided by 3 our time spent in GC during request
7) Regain memory
•   Less objects = Faster garbage collection = happiness
•   Cleaned up our Gemfile and removed unused
    dependencies
    •   aws-s3 gem = 30k ruby objects in your stack
    •   A blank Rails project (2.3 or 3.0) is =~ 100K objects



•   Cleaned up our codebase! Removing tons of old
    controllers/views
Regain memory: the facts

   We refactored our translations system:

          we saved 50k of useless objects:
          10% garbage collection speed up

Enough memory saving to add one more unicorn worker
To go further...
• Create or find a lighter aws s3 gem! using Curb!
• Starting using extra light controller ala Metal for
  some critical actions
• Use snappy to compress fragment caching

• Give a try to kiji, ruby fork of REE (from twitter)
• Or switch the stack to ruby 1.9 or to jRuby
•   Do more memory profiling, with tools like memprof


•   Get a real nonblocking stack to handle several requests
    per worker

    •   Try Goliath: a non blocking ruby framework


•   Try the MySQL nosql plugin (if only we were using MySQL!)
Bonus - Extra slides
removed to save time
Curb: the facts
Memcache - Tune it!
•   Memcached has a bunch of options:
    •   Auto failover and recovery

    •   Noreply, Noblock

    •   Tcp nodelay, UDP

        •   UDP for set and TCP for get?

    •   Key verification

    •   Binary protocol (but slower in ruby, don’t use it :p)

    •   and more.... Play with them!
Clean up your before_filters
     We created a speed_up! method
to skip all before_filters on critical actions

      speed_up! :only => [‘critical’, ‘action’]
find_in_batches
Set.include?
    instead of


Array.include?
url helpers are slow
store them in a variable when you can to avoid multiple calls
use the bang!
like gsub! and avoid new object creation
beware of symbol leak


Every symbol and every string converted to a symbol
      stay forever in memory => memory leak
The Cloud computing era

Cloud is great, but dedicated hardware is still a way faster

      We monitored a x3 when switching socialcam
           from h****u to our own cluster.
FIN
If you are awesome and want to tackle challenges on
         awesome products and systems used
            by millions of users every day:

      We are currently hiring awesome people


              http://jobs.justin.tv

   guillaume@justin.tv     Fork me on Github: @kwi

Recruiting coordinator at JTV: Brooke (brooke@justin.tv)
Links / References

•   https://github.com/miyucy/snappy

•   http://toevolve.org/2011/04/03/http-request-performance.html

•   https://github.com/brianmario/yajl-ruby

•   https://github.com/fauna/memcached

•   http://unicorn.bogomips.org/

•   http://apidock.com/ruby/Set

•   http://engineering.twitter.com/2011/05/faster-ruby-kiji-update.html

•   https://github.com/postrank-labs/goliath

•   http://blogs.innodb.com/wp/2011/04/nosql-to-innodb-with-memcached/

•   https://github.com/ice799/memprof

Rails performance at Justin.tv - Guillaume Luccisano

  • 1.
    The 7 mainactions we took to improve the Rails stack performance at Justin.tv Guillaume Luccisano - June 2011
  • 2.
    Guillaume Luccisano • Workwith ruby for about 4 years now • Arrived from France 8 months ago • Ran the migration from Rails 2.1 to 2.3.. • ... and to Rails 3 more recently
  • 3.
    In the meantine.... I was able to work on improving performance
  • 4.
    I love it AndI have learn ton of stuff to share with you today
  • 5.
  • 6.
    They lied tous Rails 3 is slower than Rails 2
  • 7.
    Ok, troll isover So what’s making our app slow ?
  • 8.
    First one isobvious (no) SQL
  • 9.
  • 10.
  • 11.
  • 12.
    That one, welove it, so ... Ruby
  • 13.
    And its darkside The garbage collector
  • 14.
    Cool, we havemade some good guesses, now what ?
  • 15.
    The Justin.tv Railsstack 12 frontend servers Nginx + cache layer with a magic conf from our experts
  • 16.
    Talking to 24app servers
  • 17.
    24 23: one iscurrently dead
  • 18.
    R.I.P app2 Unicorn runningwith 21 app workers per box 504 workers, yeah! Running Ruby Entreprise Edition with GC tuning
  • 19.
    Running at 10%of capacity during normal time We can hit 80K+ Rails requests per minute easily during peak time
  • 20.
    One beefy masterDB with 7 slaves (not used only by rails) And only 2 memcached boxes for rails
  • 21.
    + some mongo,rabbit, etc... And haproxy everywhere to make us failure resistant...
  • 22.
    Cool, so arewe finally going to optimize stuff or not?
  • 23.
    Yes, the realfirst step is: Monitoring
  • 24.
  • 25.
    Newrelic = Awesome
  • 26.
    Ganglia = Awesome
  • 27.
    Make yourself happyby improving things and seeing instantly the result
  • 28.
    This is thekind of graph we are looking for Still work to do, but it was going down!
  • 29.
    Great, we areall set But eh, what is fast btw?
  • 30.
  • 31.
    Less code = Easier to maintain = (often) Faster
  • 32.
    I’m curious What isa good average response time? 300ms 200ms 100ms 50ms
  • 33.
    we were at250+ms We are now at 80ms And there is still ton to do!
  • 34.
    How did wedo that?
  • 35.
    But keep inmind that Every app is unique
  • 36.
    1) SQL • Trackeddown slow queries, added Indexes • Refactored bad queries • Sometime, 2 queries is faster than a big one • Retrievednetwork and lesscolumns from the db only needed •Less db ruby object creation!
  • 37.
    2) C libraries- why? • We added a bunch of C libraries • Bypass ruby memory management • less garbage collection of ruby objects • Raw C speed! • Easy to drop in
  • 38.
    2) C libraries- which? • Curb for HTTP (libcurl) (with real timeout) • Support pipelining for mutli requests at the same time • Yajl, the fastest JSON library • Nokogiri, the fastest XML library • Snappy, super fast compression tool by google
  • 39.
    Yajl: the facts ~3.5x faster than JSON.generate ~1.9x faster than JSON.parse ~4.5x faster than YAML.load ~377.5x faster than YAML.dump ~1.5x faster than Marshal.load ~2x faster than Marshal.dump All of this while taking less memory!
  • 40.
    Snappy: the facts ruby-1.9.2-p180:061 > f = File.read('Gemfile') ruby-1.9.2-p180 :066 > f.length => 2504 GC.start; Benchmark.measure { 1000.times { ActiveSupport::Gzip.compress(f) } } => 1.840000 0.010000 1.850000 ( 1.842741) GC.start; Benchmark.measure { 1000.times { Snappy.deflate(f) } } => 0.020000 0.000000 0.020000 ( 0.019659) ruby-1.9.2-p180 :064 > ActiveSupport::Gzip.compress(f).length => 971 ruby-1.9.2-p180 :065 > Snappy.deflate(f).length => 1398
  • 41.
    3) Memcache • Upgraded our memcached to the last version! • (Not the gem, the real memcached) • We got a x3 improvements!! • Switched everything to the memcached gem • used and made by twitter, use the C libmemcached • 3.5 times faster than Dalli on a simple get (ruby equivalent)
  • 42.
    3) Memcache • Usedmore raw memcache objects • Avoid useless marshal dump • Yajl + Snappy + raw memcache = Win Combo • Removed huge get_multi (100+ items) • It can be slower than the sql query equivalent! • Tuned memcached options
  • 43.
    4) Cache expiration •Removed a ton of after_save cache expiration • Using correct expiration time • Or using auto changing cache_key
  • 44.
    5) Switched toUnicorn • Like a Boss! Twitter and Github use it. • Fast bootup • Graceful restart • Reduced our queue wait to 0 • Our previous round robin dispatch on our mongrels cluster added up to 40ms delay on average to each request.
  • 45.
    6) More GCtuning • Memory vs performance trade off • export RUBY_HEAP_FREE_MIN=100000 • export RUBY_HEAP_SLOTS_GROWTH_FACTOR=1 • export RUBY_HEAP_MIN_SLOTS=800000 • export RUBY_HEAP_SLOTS_INCREMENT=200000 • We added a GC run after expensive requests: • We divided by 3 our time spent in GC during request
  • 46.
    7) Regain memory • Less objects = Faster garbage collection = happiness • Cleaned up our Gemfile and removed unused dependencies • aws-s3 gem = 30k ruby objects in your stack • A blank Rails project (2.3 or 3.0) is =~ 100K objects • Cleaned up our codebase! Removing tons of old controllers/views
  • 47.
    Regain memory: thefacts We refactored our translations system: we saved 50k of useless objects: 10% garbage collection speed up Enough memory saving to add one more unicorn worker
  • 48.
  • 49.
    • Create orfind a lighter aws s3 gem! using Curb! • Starting using extra light controller ala Metal for some critical actions • Use snappy to compress fragment caching • Give a try to kiji, ruby fork of REE (from twitter) • Or switch the stack to ruby 1.9 or to jRuby
  • 50.
    Do more memory profiling, with tools like memprof • Get a real nonblocking stack to handle several requests per worker • Try Goliath: a non blocking ruby framework • Try the MySQL nosql plugin (if only we were using MySQL!)
  • 51.
    Bonus - Extraslides removed to save time
  • 52.
  • 53.
    Memcache - Tuneit! • Memcached has a bunch of options: • Auto failover and recovery • Noreply, Noblock • Tcp nodelay, UDP • UDP for set and TCP for get? • Key verification • Binary protocol (but slower in ruby, don’t use it :p) • and more.... Play with them!
  • 54.
    Clean up yourbefore_filters We created a speed_up! method to skip all before_filters on critical actions speed_up! :only => [‘critical’, ‘action’]
  • 55.
  • 56.
    Set.include? instead of Array.include?
  • 57.
    url helpers areslow store them in a variable when you can to avoid multiple calls
  • 58.
    use the bang! likegsub! and avoid new object creation
  • 59.
    beware of symbolleak Every symbol and every string converted to a symbol stay forever in memory => memory leak
  • 60.
    The Cloud computingera Cloud is great, but dedicated hardware is still a way faster We monitored a x3 when switching socialcam from h****u to our own cluster.
  • 61.
  • 62.
    If you areawesome and want to tackle challenges on awesome products and systems used by millions of users every day: We are currently hiring awesome people http://jobs.justin.tv guillaume@justin.tv Fork me on Github: @kwi Recruiting coordinator at JTV: Brooke (brooke@justin.tv)
  • 63.
    Links / References • https://github.com/miyucy/snappy • http://toevolve.org/2011/04/03/http-request-performance.html • https://github.com/brianmario/yajl-ruby • https://github.com/fauna/memcached • http://unicorn.bogomips.org/ • http://apidock.com/ruby/Set • http://engineering.twitter.com/2011/05/faster-ruby-kiji-update.html • https://github.com/postrank-labs/goliath • http://blogs.innodb.com/wp/2011/04/nosql-to-innodb-with-memcached/ • https://github.com/ice799/memprof