Just-In-Time Scalability: Agile Methods to Support Massive Growth
What is IMVU? <ul><li>  </li></ul>
Behind the scenes... <ul><li>IMVU is LAMP, plus... </li></ul><ul><ul><li>Perlbal </li></ul></ul><ul><ul><li>Memcached </li...
Before and After Architecture <ul><li>Before We started with a small site, a mess of open source, and a small team that di...
Before and After Architecture (1/4) November
Before and After Architecture (2/4) December
Before and After Architecture (3/4) February
Before and After Architecture (4/4) May
Advanced planning vs. fast response <ul><ul><li>“ Driving” </li></ul></ul><ul><ul><li>Continuously figure out what is goin...
Questions to ask <ul><ul><li>“ Driving” </li></ul></ul><ul><ul><li>How do you know you will be able to fix the problem in ...
Continuous Ship <ul><ul><li>Deploy new software quickly </li></ul></ul><ul><ul><ul><li>At IMVU time from check-in to produ...
Cluster Immune System <ul><li>What it looks like to ship one piece of code to production: </li></ul><ul><ul><li>Run tests ...
Case Study: Sharding <ul><li>Problem:  Spread write queries across multiple databases </li></ul><ul><li>Solution:  </li></...
Case Study: Caching <ul><li>Problem:  Cache frequently read data to memcached </li></ul><ul><li>Solution:  </li></ul><ul><...
Case Study: Steering Data Design <ul><li>Problem:  Improve database schemas and data design to meet scalability requiremen...
Case Study: Steering Data Design
Case Study: Steering Data Design
Case Study: Steering Data Design <ul><li>Problem: You can’t bulk move large frequently accessed data </li></ul><ul><li>Sol...
“ Thank You for Listening!”
Upcoming SlideShare
Loading in...5
×

Just In Time Scalability Agile Methods To Support Massive Growth Presentation

1,436

Published on

Eric Reis and Chris Hondl's MySQL conference presentation on Just In Time Scalability. http://startuplessonslearned.blogspot.com/2008/09/just-in-time-scalability.html

Published in: Technology, Business
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,436
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
48
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide
  • We all aspire to have scalability problems. We are going to talk about when we had scalability problems and the approach we used to solve those problems in a “Just In Time” way. I’m Eric and this is Chris. We are from IMVU. IMVU is a site that has had the good fortune to get some traction in the market and have had to solve scalability problems. The two critical pieces of our approach are an agile methodology we call continuous ship and a defect prevention process we call the cluster immune system. We are going to talk about these pieces and then how we applied them to solve a couple scalability problems.
  • Just In Time Scalability Agile Methods To Support Massive Growth Presentation

    1. 1. Just-In-Time Scalability: Agile Methods to Support Massive Growth
    2. 2. What is IMVU? <ul><li>  </li></ul>
    3. 3. Behind the scenes... <ul><li>IMVU is LAMP, plus... </li></ul><ul><ul><li>Perlbal </li></ul></ul><ul><ul><li>Memcached </li></ul></ul><ul><ul><li>Solr </li></ul></ul><ul><ul><li>MogileFS </li></ul></ul><ul><ul><li>plus... </li></ul></ul><ul><ul><li>BuildBot </li></ul></ul><ul><ul><li>eAccelerator </li></ul></ul><ul><ul><li>Linux (Debian) </li></ul></ul><ul><ul><li>memcached </li></ul></ul><ul><ul><li>Nagios </li></ul></ul><ul><ul><li>Perl </li></ul></ul><ul><ul><li>Roundup </li></ul></ul><ul><ul><li>rrd </li></ul></ul><ul><ul><li>Subversion </li></ul></ul><ul><ul><li>ADODB </li></ul></ul><ul><ul><li>b2evolution </li></ul></ul><ul><ul><li>Coppermine </li></ul></ul><ul><ul><li>feed2js </li></ul></ul><ul><ul><li>FreeTag </li></ul></ul><ul><ul><li>Incutio XML-RPC </li></ul></ul><ul><ul><li>jrcache </li></ul></ul><ul><ul><li>JSON-PHP </li></ul></ul><ul><ul><li>Magpie </li></ul></ul><ul><ul><li>osCommerce </li></ul></ul><ul><ul><li>phpBB </li></ul></ul><ul><ul><li>Phorum </li></ul></ul><ul><ul><li>SimpleTest </li></ul></ul><ul><ul><li>Selenium </li></ul></ul><ul><ul><li>Audiere </li></ul></ul><ul><ul><li>Boost </li></ul></ul><ul><ul><li>Cal3D  </li></ul></ul><ul><ul><li>CFL </li></ul></ul><ul><ul><li>NSIS </li></ul></ul><ul><ul><li>Pixomatic </li></ul></ul><ul><ul><li>Python </li></ul></ul><ul><ul><li>pywin32 </li></ul></ul><ul><ul><li>SCons </li></ul></ul><ul><ul><li>wxPython </li></ul></ul>
    4. 4. Before and After Architecture <ul><li>Before We started with a small site, a mess of open source, and a small team that didn't know much about scaling.  </li></ul><ul><li>After We ended with a large site, a medium sized team, and an architecture that has scaled.  </li></ul>We never stopped. We used a roadmap and a compass, made weekly changes in direction, regularly shipped code on Wednesday to handle the next weekend's capacity constraints, and shipped new features the whole time.  
    5. 5. Before and After Architecture (1/4) November
    6. 6. Before and After Architecture (2/4) December
    7. 7. Before and After Architecture (3/4) February
    8. 8. Before and After Architecture (4/4) May
    9. 9. Advanced planning vs. fast response <ul><ul><li>“ Driving” </li></ul></ul><ul><ul><li>Continuously figure out what is going to go wrong soon </li></ul></ul><ul><ul><li>Quickly fix it, without breaking something else </li></ul></ul><ul><ul><li>Get feedback along the way </li></ul></ul><ul><ul><li>“ Rocket ship” </li></ul></ul><ul><ul><li>Figure out in advance what is going to go wrong </li></ul></ul><ul><ul><li>Build a plan that prevents those things from happening </li></ul></ul><ul><ul><li>Execute your plan </li></ul></ul><ul><ul><li>Get feedback when done </li></ul></ul>
    10. 10. Questions to ask <ul><ul><li>“ Driving” </li></ul></ul><ul><ul><li>How do you know you will be able to fix the problem in time? </li></ul></ul><ul><ul><li>How can you be sure you won't cause collateral damage? </li></ul></ul><ul><ul><li>How can you be sure you won't code yourself into a corner? </li></ul></ul><ul><ul><li>“ Rocket ship” </li></ul></ul><ul><ul><li>Are you sure you know what is going to happen? </li></ul></ul><ul><ul><li>Are you sure you can execute? </li></ul></ul><ul><ul><li>Can you afford it? </li></ul></ul><ul><ul><li>Do you need feedback? </li></ul></ul>
    11. 11. Continuous Ship <ul><ul><li>Deploy new software quickly </li></ul></ul><ul><ul><ul><li>At IMVU time from check-in to production = 20 minutes </li></ul></ul></ul><ul><ul><li>Tell a good change from a bad change (quickly) </li></ul></ul><ul><ul><li>Revert a bad change quickly </li></ul></ul><ul><ul><li>Work in small batches </li></ul></ul><ul><ul><ul><li>At IMVU, a large batch = 3 days worth of work </li></ul></ul></ul><ul><ul><li>Break large projects down into small batches </li></ul></ul><ul><ul><li>Don't have the same problem twice – fix the root cause of each class of problems </li></ul></ul>IMVU pushes code to production 20-30 times every day
    12. 12. Cluster Immune System <ul><li>What it looks like to ship one piece of code to production: </li></ul><ul><ul><li>Run tests locally (SimpleTest, Selenium) </li></ul></ul><ul><ul><ul><li>Everyone has a complete sandbox </li></ul></ul></ul><ul><ul><li>Continuous Integration Server (BuildBot) </li></ul></ul><ul><ul><ul><li>A ll tests must pass or “shut down the line” </li></ul></ul></ul><ul><ul><ul><li>Automatic feedback if the team is going too fast </li></ul></ul></ul><ul><ul><li>Incremental deploy </li></ul></ul><ul><ul><ul><li>Monitor cluster and business metrics in real-time </li></ul></ul></ul><ul><ul><ul><li>Reject changes that move metrics out-of-bounds </li></ul></ul></ul><ul><ul><li>Alerting & Predictive monitoring (Nagios) </li></ul></ul><ul><ul><ul><li>Monitor all metrics that stakeholders care about </li></ul></ul></ul><ul><ul><ul><li>If any metric goes out-of-bounds, wake somebody up </li></ul></ul></ul><ul><ul><ul><li>Use historical trends to predict acceptable bounds </li></ul></ul></ul><ul><ul><li>When customers see a failure: </li></ul></ul><ul><ul><ul><li>Fix the problem for customers </li></ul></ul></ul><ul><ul><ul><li>Improve your defenses at each level </li></ul></ul></ul>
    13. 13. Case Study: Sharding <ul><li>Problem: Spread write queries across multiple databases </li></ul><ul><li>Solution: </li></ul><ul><li>Intercept and redirect queries based on SQL comments </li></ul><ul><ul><li>Move one table or sub-system at a time </li></ul></ul><ul><ul><ul><li>Our experience was one engineer horizontally partitions one table or small sub-system in one week </li></ul></ul></ul><ul><li>New engineers figure this out in about 5 minutes </li></ul><ul><li>db_query(“INSERT INTO inventory (customers_id, products_id) </li></ul><ul><li>VALUES ($customer_id, $product_id)&quot;); </li></ul><ul><li>db_query(&quot; /*shard customer://$customer_id */ </li></ul><ul><li>INSERT INTO inventory (customers_id, products_id) </li></ul><ul><li>VALUES ($customer_id, $product_id)&quot;); </li></ul><ul><li>Learning: cross shard joins & transactions aren’t required </li></ul>
    14. 14. Case Study: Caching <ul><li>Problem: Cache frequently read data to memcached </li></ul><ul><li>Solution: </li></ul><ul><li>Intercept and cache queries based on SQL comments </li></ul><ul><li>db_query_cache( BUDDY_CACHE_TIME , </li></ul><ul><li>&quot;/*shard customer://$customer_id */ </li></ul><ul><li>/*cache-class customer://$customer_id/buddies */ </li></ul><ul><li>SELECT friend_id, buddy_order FROM customers_friends </li></ul><ul><li>WHERE customers_id=$customer_id&quot;); </li></ul><ul><li>----------------- </li></ul><ul><li>db_query(“/*shard customer://$customer_id */ </li></ul><ul><li>DELETE FROM customers_friends </li></ul><ul><li>WHERE customers_id = $customer_id </li></ul><ul><li>AND friend_id = $friend_id”); </li></ul><ul><li>db_flush_cacheclass(&quot;customer://$customer_id/buddies”); </li></ul><ul><li>Learning: Flushing cache critical to users and performance </li></ul><ul><ul><li>When a customer spends $24.95, they want the benefits immediately </li></ul></ul><ul><li>Learning: Test the cache behavior for critical systems </li></ul>
    15. 15. Case Study: Steering Data Design <ul><li>Problem: Improve database schemas and data design to meet scalability requirements without downtime </li></ul><ul><li>Solution: </li></ul><ul><li>Measure to find the real problems (harder than it sounds) </li></ul><ul><li>Migrate to new design that takes advantage of sharding and/or caching </li></ul>
    16. 16. Case Study: Steering Data Design
    17. 17. Case Study: Steering Data Design
    18. 18. Case Study: Steering Data Design <ul><li>Problem: You can’t bulk move large frequently accessed data </li></ul><ul><li>Solution: </li></ul><ul><li>Copy on read </li></ul><ul><ul><li>Use when you are read bound </li></ul></ul><ul><ul><li>Reads check cache, new location, and copy to new location if missing </li></ul></ul><ul><ul><li>Writes go to new location if data has been migrated, otherwise old </li></ul></ul><ul><li>Copy on write </li></ul><ul><ul><li>Use when you are write bound </li></ul></ul><ul><ul><li>Reads check cache, new location, then old location </li></ul></ul><ul><ul><li>Writes go to new location, copying to new location if missing </li></ul></ul><ul><li>Copy all </li></ul><ul><ul><li>Use when file system fills up </li></ul></ul><ul><ul><li>Reads & writes go to new location, falling back to old location if missing </li></ul></ul><ul><ul><li>Cron copies data a few records at a time </li></ul></ul>
    19. 19. “ Thank You for Listening!”
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×