Rails performance at Justin.tv - Guillaume Luccisano

The 7 main actions we took to improve
the Rails stack performance at Justin.tv

Guillaume Luccisano - June 2011

Guillaume Luccisano

• Work with ruby for about 4 years now
• Arrived from France 8 months ago
• Ran the migration from Rails 2.1 to 2.3..
• ... and to Rails 3 more recently

In the meantine....

I was able to work on
improving performance

I love it

And I have learn ton of stuff to
share with you today

They lied to us

Rails 3 is slower
than Rails 2

Ok, troll is over

So what’s making
our app slow ?

First one is obvious

(no) SQL

Less obvious

Worker queue wait

That one, we love it, so ...

Ruby

And its dark side

The garbage collector

Cool, we have made some
good guesses, now what ?

The Justin.tv Rails stack

12 frontend servers

Nginx + cache layer with a magic conf from our experts

R.I.P app2

Unicorn running with 21 app workers per box
504 workers, yeah!

Running Ruby Entreprise Edition with GC tuning

Running at 10% of capacity during normal time

We can hit 80K+ Rails requests
per minute easily during peak time

One beefy master DB with 7 slaves
(not used only by rails)

And only 2 memcached boxes for rails

+ some mongo, rabbit, etc...

And haproxy everywhere
to make us failure resistant...

Cool, so are we ﬁnally going
to optimize stuff or not?

Yes, the real ﬁrst step is:

Monitoring

You can’t improve
performance in the dark

Make yourself happy by
improving things and seeing
instantly the result

This is the kind of graph we are looking for

Still work to do, but it was going down!

Great, we are all set

But eh, what is fast btw?

Less code
=
Easier to maintain
=
(often)

Faster

I’m curious
What is a good average response time?

300ms
200ms
100ms
50ms

we were at 250+ms

We are now at 80ms

And there is still ton to do!

But keep in mind that
Every app is unique

1) SQL

• Tracked down slow queries, added Indexes
• Refactored bad queries
• Sometime, 2 queries is faster than a big one

• Retrievednetwork and lesscolumns from the db
only needed
•Less db ruby object creation!

2) C libraries - why?

• We added a bunch of C libraries
• Bypass ruby memory management
• less garbage collection of ruby objects
• Raw C speed!
• Easy to drop in

2) C libraries - which?

• Curb for HTTP (libcurl) (with real timeout)
• Support pipelining for mutli requests at the same time

• Yajl, the fastest JSON library
• Nokogiri, the fastest XML library
• Snappy, super fast compression tool by google

Yajl: the facts
~3.5x faster than JSON.generate
~1.9x faster than JSON.parse
~4.5x faster than YAML.load
~377.5x faster than YAML.dump
~1.5x faster than Marshal.load
~2x faster than Marshal.dump

All of this while taking less memory!

Snappy: the facts
ruby-1.9.2-p180 :061 > f = File.read('Gemfile')
ruby-1.9.2-p180 :066 > f.length
=> 2504

GC.start; Benchmark.measure { 1000.times { ActiveSupport::Gzip.compress(f) } }
=> 1.840000 0.010000 1.850000 ( 1.842741)
GC.start; Benchmark.measure { 1000.times { Snappy.deflate(f) } }
=> 0.020000 0.000000 0.020000 ( 0.019659)

ruby-1.9.2-p180 :064 > ActiveSupport::Gzip.compress(f).length
=> 971
ruby-1.9.2-p180 :065 > Snappy.deflate(f).length
=> 1398

3) Memcache

• Upgraded our memcached to the last version!
• (Not the gem, the real memcached)
• We got a x3 improvements!!

• Switched everything to the memcached gem
• used and made by twitter, use the C libmemcached
• 3.5 times faster than Dalli on a simple get (ruby equivalent)

3) Memcache
• Used more raw memcache objects
• Avoid useless marshal dump
• Yajl + Snappy + raw memcache = Win Combo

• Removed huge get_multi (100+ items)

• It can be slower than the sql query equivalent!

• Tuned memcached options

4) Cache expiration

• Removed a ton of after_save cache expiration
• Using correct expiration time
• Or using auto changing cache_key

5) Switched to Unicorn

• Like a Boss! Twitter and Github use it.
• Fast bootup
• Graceful restart
• Reduced our queue wait to 0
• Our previous round robin dispatch on our mongrels cluster
added up to 40ms delay on average to each request.

6) More GC tuning

• Memory vs performance trade off
• export RUBY_HEAP_FREE_MIN=100000
• export RUBY_HEAP_SLOTS_GROWTH_FACTOR=1
• export RUBY_HEAP_MIN_SLOTS=800000
• export RUBY_HEAP_SLOTS_INCREMENT=200000

• We added a GC run after expensive requests:
• We divided by 3 our time spent in GC during request

7) Regain memory
• Less objects = Faster garbage collection = happiness
• Cleaned up our Gemﬁle and removed unused
dependencies
• aws-s3 gem = 30k ruby objects in your stack
• A blank Rails project (2.3 or 3.0) is =~ 100K objects

• Cleaned up our codebase! Removing tons of old
controllers/views

Regain memory: the facts

We refactored our translations system:

we saved 50k of useless objects:
10% garbage collection speed up

Enough memory saving to add one more unicorn worker

• Create or ﬁnd a lighter aws s3 gem! using Curb!
• Starting using extra light controller ala Metal for
some critical actions
• Use snappy to compress fragment caching

• Give a try to kiji, ruby fork of REE (from twitter)
• Or switch the stack to ruby 1.9 or to jRuby

• Do more memory proﬁling, with tools like memprof

• Get a real nonblocking stack to handle several requests
per worker

• Try Goliath: a non blocking ruby framework

• Try the MySQL nosql plugin (if only we were using MySQL!)

Bonus - Extra slides
removed to save time

Memcache - Tune it!
• Memcached has a bunch of options:
• Auto failover and recovery

• Noreply, Noblock

• Tcp nodelay, UDP

• UDP for set and TCP for get?

• Key veriﬁcation

• Binary protocol (but slower in ruby, don’t use it :p)

• and more.... Play with them!

Clean up your before_ﬁlters
We created a speed_up! method
to skip all before_ﬁlters on critical actions

speed_up! :only => [‘critical’, ‘action’]

Set.include?
instead of

Array.include?

url helpers are slow
store them in a variable when you can to avoid multiple calls

use the bang!
like gsub! and avoid new object creation

beware of symbol leak

Every symbol and every string converted to a symbol
stay forever in memory => memory leak

The Cloud computing era

Cloud is great, but dedicated hardware is still a way faster

We monitored a x3 when switching socialcam
from h****u to our own cluster.

If you are awesome and want to tackle challenges on
awesome products and systems used
by millions of users every day:

We are currently hiring awesome people

http://jobs.justin.tv

guillaume@justin.tv Fork me on Github: @kwi

Recruiting coordinator at JTV: Brooke (brooke@justin.tv)

Links / References

• https://github.com/miyucy/snappy

• http://toevolve.org/2011/04/03/http-request-performance.html

• https://github.com/brianmario/yajl-ruby

• https://github.com/fauna/memcached

• http://unicorn.bogomips.org/

• http://apidock.com/ruby/Set

• http://engineering.twitter.com/2011/05/faster-ruby-kiji-update.html

• https://github.com/postrank-labs/goliath

• http://blogs.innodb.com/wp/2011/04/nosql-to-innodb-with-memcached/

• https://github.com/ice799/memprof

Rails performance at Justin.tv - Guillaume Luccisano

More Related Content

What's hot

Similar to Rails performance at Justin.tv - Guillaume Luccisano

Recently uploaded

Rails performance at Justin.tv - Guillaume Luccisano

Editor's Notes