Improving Running Components at Twitter
Upcoming SlideShare
Loading in...5
×
 

Improving Running Components at Twitter

on

  • 44,206 views

We will learn how a small team at Twitter keeps up with breakneck growth, by focusing on three performance milestones completed in the last year....

We will learn how a small team at Twitter keeps up with breakneck growth, by focusing on three performance milestones completed in the last year.

First, an upgraded caching strategy. Second, our new open source message queue. Third, optimizations made to the internal cache client. Real-life graphs will be viewed along the way, and we will discuss how the lessons learned can be applied to your own projects.

Statistics

Views

Total Views
44,206
Views on SlideShare
25,538
Embed Views
18,668

Actions

Likes
124
Downloads
1,109
Comments
3

45 Embeds 18,668

http://blog.evanweaver.com 16691
http://mireille.it 542
http://codejunkies.posterous.com 288
http://blogs.huihoo.com 219
http://ajasver.posterous.com 159
http://blog.huihoo.com 133
http://blog.estadao.com.br 122
http://www.caiwangqin.com 83
http://www.katkovonline.com 71
http://www.slideshare.net 70
http://static.slideshare.net 59
http://blog.caiwangqin.com 42
http://www.360doc.com 33
http://127.0.0.1 27
http://migh.info 26
http://fendou.org 23
http://190.244.144.2 18
http://translate.googleusercontent.com 5
http://linkeddata.uriburner.com 5
http://www.ibole.cn 5
http://blish.in.com 4
http://caiwangqin.com 3
http://localhost 3
http://webcache.googleusercontent.com 3
http://soaandwebservices.blogspot.com 3
http://feeds.feedburner.com 3
http://www.in.com 3
https://ra.sixapart.com 3
http://planetrubyonrails.net 2
http://prmachine.blogspot.com 2
https://daili.bz 2
http://www.brandhackers.com 2
http://74.125.43.132 2
http://www.e-presentations.us 1
http://209.85.229.132 1
http://conversation.ecairn.com 1
applewebdata://AD43A600-F6E9-4055-9C41-47F443D61C36 1
http://66.196.80.202 1
http://www.disinnovate.com 1
http://safe.tumblr.com 1
http://www.ibole.cn:88 1
http://www.hanrss.com 1
http://74.125.153.132 1
http://www.zhuaxia.com 1
http://131.253.14.66 1
More...

Accessibility

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
  • twitter tools
    Are you sure you want to
    Your message goes here
    Processing…
  • Great high-value content, good work. Very generously shared.

    Sam
    http://www.cross-country-running-shoes.net
    Are you sure you want to
    Your message goes here
    Processing…
  • Very useful stuff. It would be much better if providing more details or samples

    [Comment posted from http://blog.evanweaver.com/articles/2009/03/13/qcon-presentation/]
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Improving Running Components at Twitter Improving Running Components at Twitter Presentation Transcript

  • Improving Running Components Evan Weaver Twitter, Inc. QCon London, 2009
  • Many tools: Rails C Scala Java MySQL
  • Rails front-end: rendering cache composition db querying
  • Middleware: Memcached Varnish (cache) Kestrel (MQ) comet server
  • Milestone 1: Cache policy Optimization plan: 1. stop working 2. share the work 3. work faster
  • Old
  • Everything runs from memory in Web 2.0.
  • First policy change: vector cache Stores arrays of tweet pkeys Write-through 99% hit rate
  • Second policy change: row cache Store records from the db (Tweets and users) Write-through 95% hit rate
  • Third policy change: fragment cache Stores rendered version of tweets for the API Read-through 95% hit rate
  • Fourth policy change: giving the page cache its own cache pool Generational keys Low hit rate (40%)
  • Visibility was lacking. Peep tool Dumps a live memcached heap
  • Cache only was living five hours mysql> select round(round(log10(3576669 - last_read_time) * 5, 0) / 5, 1) as log, round(avg(3576669 - last_read_time), -2) as freshness, count(*), rpad('', count(*) / 2000, '*') as bar from entries group by log order by log desc; +------+-----------+----------+------------------------------------------------------------------------------------------------------------------+ | log | freshness | count(*) | bar | +------+-----------+----------+------------------------------------------------------------------------------------------------------------------+ | NULL | 0| 13400 | ******* | | 6.6 | 3328300 | 940 | | | 6.2 | 1623200 | 1| | | 5.2 | 126200 | 1| | | 5.0 | 81100 | 343 | | | 4.8 | 64800 | 3200 | ** | | 4.6 | 34800 | 18064 | ********* | | 4.4 | 24200 | 96739 | ************************************************ | | 4.2 | 15700 | 212865 | ********************************************************************************************************** | | 4.0 | 10200 | 224703 | **************************************************************************************************************** | | 3.8 | 6500 | 158067 | ******************************************************************************* | | 3.6 | 4100 | 108034 | ****************************************************** | | 3.4 | 2600 | 82000 | ***************************************** | | 3.2 | 1600 | 65637 | ********************************* | | 3.0 | 1000 | 49267 | ************************* | | 2.8 | 600 | 34398 | ***************** | | 2.6 | 400 | 24322 | ************ | | 2.4 | 300 | 19865 | ********** | | 2.2 | 200 | 14810 | ******* | | 2.0 | 100 | 10108 | ***** | | 1.8 | 100 | 8002 | **** | | 1.6 | 0| 6479 | *** | | 1.4 | 0| 4014 | ** | | 1.2 | 0| 2297 | * | | 1.0 | 0| 1733 | * | | 0.8 | 0| 649 | | | 0.6 | 0| 710 | | | 0.4 | 0| 672 | | | 0.0 | 0| 319 | | +------+-----------+----------+------------------------------------------------------------------------------------------------------------------+
  • What does a timeline miss mean? Container union /home rebuild reads through your followings’ profiles
  • New
  • Milestone 2: Message queue A component with problems
  • Purpose in a web app: Move operations out of the synchronous request cycle Amortize load over time
  • Inauguration, 2009
  • Simplest MQ ever: Gives up constraints for scalability No strict ordering of jobs No shared state among servers Just like memcached Uses memcached protocol
  • First version was written in Ruby Ruby is “optimization- resistant” Mainly due to the GC
  • If the consumers could not keep pace, the MQ would fill up and crash Ported it to Scala for this reason
  • Good tooling for the Java GC: JConsole Yourkit
  • Poor tooling for the Ruby GC: Railsbench w/patches BleakHouse w/patches Valgrind/Memcheck MBARI 1.8.6 patches
  • Our Railsbench GC tunings 35% speed increase RUBY_HEAP_MIN_SLOTS=500000 RUBY_HEAP_SLOTS_INCREMENT=250000 RUBY_HEAP_SLOTS_GROWTH_FACTOR=1 RUBY_GC_MALLOC_LIMIT=50000000 RUBY_HEAP_FREE_MIN=4096
  • Situational decision: Scala is a flexible language (But libraries a bit lacking) We have experienced JVM engineers
  • Big rewrites fail...? Small rewrite: No new features added Well-defined interface Already went over the wire
  • Deployed to 1 MQ host Fixed regressions Eventually deployed to all hosts
  • Milestone 3: the memcached client Optimizing a critical path
  • Switched to libmemcached, a new C Memcached client We are now the biggest user and biggest 3rd-party contributor
  • Uses a SWIG Ruby binding I started a year or so ago Compatibility among memcached clients is critical
  • Twitter is big, and runs hot Flushing the cache would be catastrophic
  • Spent endless time on backwards compatibility A/B tested the new client over 3 months
  • MQ also benefitted Memcached can be a generic lightweight service protocol We also use Thrift and HTTP internally
  • So many RPCs! Sometimes 100s of Memcached round trips per request. “As a memory device gets larger, it tends to get slower.”
  • Performance hierarchy is supposed to look like:
  • At web scale, it looks more like:
  • End
  • Links: C tools: - Peep http://github.com/fauna/peep/ - Libmemcached http://tangent.org/552/libmemcached.html - Valgrind http://valgrind.org/ JVM tools: - Kestrel http://github.com/robey/kestrel/ - Smile http://github.com/robey/smile/ - Jconsole http://openjdk.java.net/tools/svc/jconsole/ - Yourkit http://www.yourkit.com/ Ruby tools: - BleakHouse http://github.com/fauna/bleak_house/ - Railsbench Ruby patches http://github.com/skaes/railsbench/ - MBARI Ruby patches http://github.com/brentr/matzruby/tree/v1_8_6_287-mbari General: - Danga stack http://www.danga.com/words/2005_oscon/oscon-2005.pdf - Seymour Cray quote http://books.google.com/books?client=safari&id=qM4Yzf8K9hwC&dq=rapid +development&q=cray&pgis=1 - Last.fm downtime http://blog.last.fm/2008/04/18/possible-lastfm-downtime
  • twitter.com/evan blog.evanweaver.com cloudbur.st