SlideShare a Scribd company logo
1 of 19
Download to read offline
Just-In-Time Scalability: Agile
Methods to Support Massive
           Growth
What is IMVU?

                 
Behind the scenes...

                       IMVU is LAMP, plus...
                        • Perlbal
                        • Memcached
                        • Solr
                        • MogileFS
                        • plus...
                                                      •   ADODB
                                                      •   b2evolution
                   •   Audiere
                                   •   BuildBot       •   Coppermine
                   •   Boost
                                   •   eAccelerator •     feed2js
                   •   Cal3D 
                                   •   Linux (Debian) •   FreeTag
                   •   CFL
                                   •   memcached •        Incutio XML-RPC
                   •   NSIS
                                   •   Nagios         •   jrcache
                   •   Pixomatic
                                   •   Perl           •   JSON-PHP
                   •   Python
                                   •   Roundup        •   Magpie
                   •   pywin32
                                   •   rrd            •   osCommerce
                   •   SCons
                                   •   Subversion     •   phpBB
                   •   wxPython
                                                      •   Phorum
                                                      •   SimpleTest
                                                      •   Selenium
Before and After Architecture

Before                            After

We started with a small site, a   We ended with a large site, a
mess of open source, and a        medium sized team, and an
small team that didn't know       architecture that has scaled. 
much about scaling. 




We never stopped. We used a roadmap and a compass, made
weekly changes in direction, regularly shipped code on
Wednesday to handle the next weekend's capacity constraints,
and shipped new features the whole time.  
Before and After Architecture (1/4)




                November
Before and After Architecture (2/4)




                December
Before and After Architecture (3/4)




                February
Before and After Architecture (4/4)




                May
Advanced planning vs. fast response
       “Rocket ship”                   “Driving”

• Figure out in advance what   • Continuously figure out
  is going to go wrong           what is going to go wrong
                                 soon
• Build a plan that prevents
  those things from            • Quickly fix it, without
  happening                      breaking something else
• Execute your plan            • Get feedback along the
                                 way
• Get feedback when done
Questions to ask
       “Rocket ship”                  “Driving”

• Are you sure you know      • How do you know you will
  what is going to happen?     be able to fix the problem
                               in time?
• Are you sure you can
                             • How can you be sure you
  execute?
                               won't cause collateral
• Can you afford it?           damage?
• Do you need feedback?      • How can you be sure you
                               won't code yourself into a
                               corner?
Continuous Ship
• Deploy new software quickly
   •   At IMVU time from check-in to production = 20 minutes

• Tell a good change from a bad change (quickly)

• Revert a bad change quickly

• Work in small batches
   •   At IMVU, a large batch = 3 days worth of work

• Break large projects down into small batches

• Don't have the same problem twice – fix the root cause of each
  class of problems

 IMVU pushes code to production 20-30 times every day
Cluster Immune System
What it looks like to ship one piece of code to production:
 • Run tests locally (SimpleTest, Selenium)
         Everyone has a complete sandbox
     o


 • Continuous Integration Server (BuildBot)
    o All tests must pass or “shut down the line”
         Automatic feedback if the team is going too fast
     o


 • Incremental deploy
         Monitor cluster and business metrics in real-time
     o
         Reject changes that move metrics out-of-bounds
     o


 • Alerting & Predictive monitoring (Nagios)
         Monitor all metrics that stakeholders care about
     o
         If any metric goes out-of-bounds, wake somebody up
     o
         Use historical trends to predict acceptable bounds
     o


 When customers see a failure:
         Fix the problem for customers
     o
         Improve your defenses at each level
     o
Case Study: Sharding

Problem: Spread write queries across multiple databases

Solution:
•Intercept and redirect queries based on SQL comments
• Move one table or sub-system at a time
   • Our experience was one engineer horizontally partitions one table or
     small sub-system in one week

•New engineers figure this out in about 5 minutes
db_query(“INSERT INTO inventory (customers_id, products_id)
          VALUES ($customer_id, $product_id)quot;);

db_query(quot;/*shard customer://$customer_id */
          INSERT INTO inventory (customers_id, products_id)
          VALUES ($customer_id, $product_id)quot;);

•Learning: cross shard joins & transactions aren’t required
Case Study: Caching
Problem: Cache frequently read data to memcached

Solution:
•Intercept and cache queries based on SQL comments
db_query_cache(BUDDY_CACHE_TIME,
              quot;/*shard customer://$customer_id */
               /*cache-class customer://$customer_id/buddies */
               SELECT friend_id, buddy_order FROM customers_friends
               WHERE customers_id=$customer_idquot;);

-----------------

db_query(“/*shard customer://$customer_id */
          DELETE FROM customers_friends
          WHERE customers_id = $customer_id
          AND friend_id = $friend_id”);
db_flush_cacheclass(quot;customer://$customer_id/buddies”);


•Learning: Flushing cache critical to users and performance
   –When a customer spends $24.95, they want the benefits immediately

•Learning: Test the cache behavior for critical systems
Case Study: Steering Data Design

Problem: Improve database schemas and data design to meet
scalability requirements without downtime

Solution:
•Measure to find the real problems (harder than it sounds)
•Migrate to new design that takes advantage of sharding and/or
caching
Case Study: Steering Data Design
Case Study: Steering Data Design
Case Study: Steering Data Design
Problem: You can’t bulk move large frequently accessed data
Solution:
•Copy on read
   –Use when you are read bound
   –Reads check cache, new location, and copy to new location if missing
   –Writes go to new location if data has been migrated, otherwise old

•Copy on write
   –Use when you are write bound
   –Reads check cache, new location, then old location
   –Writes go to new location, copying to new location if missing

•Copy all
   –Use when file system fills up
   –Reads & writes go to new location, falling back to old location if missing
   –Cron copies data a few records at a time
“Thank You for Listening!”

More Related Content

Similar to Just In Time Scalability Agile Methods To Support Massive Growth Presentation

Ruby on Rails 101 - Presentation Slides for a Five Day Introductory Course
Ruby on Rails 101 - Presentation Slides for a Five Day Introductory CourseRuby on Rails 101 - Presentation Slides for a Five Day Introductory Course
Ruby on Rails 101 - Presentation Slides for a Five Day Introductory Coursepeter_marklund
 
High-Octane Dev Teams: Three Things You Can Do To Improve Code Quality
High-Octane Dev Teams: Three Things You Can Do To Improve Code QualityHigh-Octane Dev Teams: Three Things You Can Do To Improve Code Quality
High-Octane Dev Teams: Three Things You Can Do To Improve Code QualityAtlassian
 
Just In Time Scalability Agile Methods To Support Massive Growth Presentation
Just In Time Scalability  Agile Methods To Support Massive Growth PresentationJust In Time Scalability  Agile Methods To Support Massive Growth Presentation
Just In Time Scalability Agile Methods To Support Massive Growth PresentationEric Ries
 
Just In Time Scalability Agile Methods To Support Massive Growth Presentation
Just In Time Scalability  Agile Methods To Support Massive Growth PresentationJust In Time Scalability  Agile Methods To Support Massive Growth Presentation
Just In Time Scalability Agile Methods To Support Massive Growth PresentationTimothy Fitz
 
Web 2.0 Performance and Reliability: How to Run Large Web Apps
Web 2.0 Performance and Reliability: How to Run Large Web AppsWeb 2.0 Performance and Reliability: How to Run Large Web Apps
Web 2.0 Performance and Reliability: How to Run Large Web Appsadunne
 
Rails Conf Europe 2007 Notes
Rails Conf  Europe 2007  NotesRails Conf  Europe 2007  Notes
Rails Conf Europe 2007 NotesRoss Lawley
 
Introduction to JRuby
Introduction to JRubyIntroduction to JRuby
Introduction to JRubyAmit Solanki
 
Peer Code Review: In a Nutshell and The Tantric Team: Getting Your Automated ...
Peer Code Review: In a Nutshell and The Tantric Team: Getting Your Automated ...Peer Code Review: In a Nutshell and The Tantric Team: Getting Your Automated ...
Peer Code Review: In a Nutshell and The Tantric Team: Getting Your Automated ...Atlassian
 
Why Architecture in Web Development matters
Why Architecture in Web Development mattersWhy Architecture in Web Development matters
Why Architecture in Web Development mattersLars Jankowfsky
 
Agile Development Methodologies
Agile Development MethodologiesAgile Development Methodologies
Agile Development MethodologiesNainil Chheda
 
Performance and scalability with drupal
Performance and scalability with drupalPerformance and scalability with drupal
Performance and scalability with drupalRonan Berder
 
Os Fitzpatrick Sussman Swp
Os Fitzpatrick Sussman SwpOs Fitzpatrick Sussman Swp
Os Fitzpatrick Sussman Swposcon2007
 
Modern Webdevelopment With Ruby On Rails
Modern Webdevelopment With Ruby On RailsModern Webdevelopment With Ruby On Rails
Modern Webdevelopment With Ruby On RailsRobert Glaser
 
Multi Core Playground
Multi Core PlaygroundMulti Core Playground
Multi Core PlaygroundESUG
 
From One to a Cluster
From One to a ClusterFrom One to a Cluster
From One to a Clusterguestd34230
 
Elite Bug Squashing
Elite Bug SquashingElite Bug Squashing
Elite Bug SquashingTony Brown
 
Sustainable Agile Development
Sustainable Agile DevelopmentSustainable Agile Development
Sustainable Agile DevelopmentGabriele Lana
 
The Current State of Asynchronous Processing With Ruby
The Current State of Asynchronous Processing With RubyThe Current State of Asynchronous Processing With Ruby
The Current State of Asynchronous Processing With Rubymattmatt
 
Gw Pres Agile 4slideshare
Gw Pres Agile 4slideshareGw Pres Agile 4slideshare
Gw Pres Agile 4slideshareDave Burke
 

Similar to Just In Time Scalability Agile Methods To Support Massive Growth Presentation (20)

Ruby on Rails 101 - Presentation Slides for a Five Day Introductory Course
Ruby on Rails 101 - Presentation Slides for a Five Day Introductory CourseRuby on Rails 101 - Presentation Slides for a Five Day Introductory Course
Ruby on Rails 101 - Presentation Slides for a Five Day Introductory Course
 
High-Octane Dev Teams: Three Things You Can Do To Improve Code Quality
High-Octane Dev Teams: Three Things You Can Do To Improve Code QualityHigh-Octane Dev Teams: Three Things You Can Do To Improve Code Quality
High-Octane Dev Teams: Three Things You Can Do To Improve Code Quality
 
Just In Time Scalability Agile Methods To Support Massive Growth Presentation
Just In Time Scalability  Agile Methods To Support Massive Growth PresentationJust In Time Scalability  Agile Methods To Support Massive Growth Presentation
Just In Time Scalability Agile Methods To Support Massive Growth Presentation
 
Just In Time Scalability Agile Methods To Support Massive Growth Presentation
Just In Time Scalability  Agile Methods To Support Massive Growth PresentationJust In Time Scalability  Agile Methods To Support Massive Growth Presentation
Just In Time Scalability Agile Methods To Support Massive Growth Presentation
 
Web 2.0 Performance and Reliability: How to Run Large Web Apps
Web 2.0 Performance and Reliability: How to Run Large Web AppsWeb 2.0 Performance and Reliability: How to Run Large Web Apps
Web 2.0 Performance and Reliability: How to Run Large Web Apps
 
Rails Conf Europe 2007 Notes
Rails Conf  Europe 2007  NotesRails Conf  Europe 2007  Notes
Rails Conf Europe 2007 Notes
 
Introduction to JRuby
Introduction to JRubyIntroduction to JRuby
Introduction to JRuby
 
Peer Code Review: In a Nutshell and The Tantric Team: Getting Your Automated ...
Peer Code Review: In a Nutshell and The Tantric Team: Getting Your Automated ...Peer Code Review: In a Nutshell and The Tantric Team: Getting Your Automated ...
Peer Code Review: In a Nutshell and The Tantric Team: Getting Your Automated ...
 
Why Architecture in Web Development matters
Why Architecture in Web Development mattersWhy Architecture in Web Development matters
Why Architecture in Web Development matters
 
Agile Development Methodologies
Agile Development MethodologiesAgile Development Methodologies
Agile Development Methodologies
 
Performance and scalability with drupal
Performance and scalability with drupalPerformance and scalability with drupal
Performance and scalability with drupal
 
Os Fitzpatrick Sussman Swp
Os Fitzpatrick Sussman SwpOs Fitzpatrick Sussman Swp
Os Fitzpatrick Sussman Swp
 
Modern Webdevelopment With Ruby On Rails
Modern Webdevelopment With Ruby On RailsModern Webdevelopment With Ruby On Rails
Modern Webdevelopment With Ruby On Rails
 
Continuous integration at CartoDB March '16
Continuous integration at CartoDB March '16Continuous integration at CartoDB March '16
Continuous integration at CartoDB March '16
 
Multi Core Playground
Multi Core PlaygroundMulti Core Playground
Multi Core Playground
 
From One to a Cluster
From One to a ClusterFrom One to a Cluster
From One to a Cluster
 
Elite Bug Squashing
Elite Bug SquashingElite Bug Squashing
Elite Bug Squashing
 
Sustainable Agile Development
Sustainable Agile DevelopmentSustainable Agile Development
Sustainable Agile Development
 
The Current State of Asynchronous Processing With Ruby
The Current State of Asynchronous Processing With RubyThe Current State of Asynchronous Processing With Ruby
The Current State of Asynchronous Processing With Ruby
 
Gw Pres Agile 4slideshare
Gw Pres Agile 4slideshareGw Pres Agile 4slideshare
Gw Pres Agile 4slideshare
 

Recently uploaded

Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesZilliz
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 

Recently uploaded (20)

Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector Databases
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 

Just In Time Scalability Agile Methods To Support Massive Growth Presentation

  • 1. Just-In-Time Scalability: Agile Methods to Support Massive Growth
  • 3. Behind the scenes... IMVU is LAMP, plus... • Perlbal • Memcached • Solr • MogileFS • plus... • ADODB • b2evolution • Audiere • BuildBot • Coppermine • Boost • eAccelerator • feed2js • Cal3D  • Linux (Debian) • FreeTag • CFL • memcached • Incutio XML-RPC • NSIS • Nagios • jrcache • Pixomatic • Perl • JSON-PHP • Python • Roundup • Magpie • pywin32 • rrd • osCommerce • SCons • Subversion • phpBB • wxPython • Phorum • SimpleTest • Selenium
  • 4. Before and After Architecture Before After We started with a small site, a We ended with a large site, a mess of open source, and a medium sized team, and an small team that didn't know architecture that has scaled.  much about scaling.  We never stopped. We used a roadmap and a compass, made weekly changes in direction, regularly shipped code on Wednesday to handle the next weekend's capacity constraints, and shipped new features the whole time.  
  • 5. Before and After Architecture (1/4) November
  • 6. Before and After Architecture (2/4) December
  • 7. Before and After Architecture (3/4) February
  • 8. Before and After Architecture (4/4) May
  • 9. Advanced planning vs. fast response “Rocket ship” “Driving” • Figure out in advance what • Continuously figure out is going to go wrong what is going to go wrong soon • Build a plan that prevents those things from • Quickly fix it, without happening breaking something else • Execute your plan • Get feedback along the way • Get feedback when done
  • 10. Questions to ask “Rocket ship” “Driving” • Are you sure you know • How do you know you will what is going to happen? be able to fix the problem in time? • Are you sure you can • How can you be sure you execute? won't cause collateral • Can you afford it? damage? • Do you need feedback? • How can you be sure you won't code yourself into a corner?
  • 11. Continuous Ship • Deploy new software quickly • At IMVU time from check-in to production = 20 minutes • Tell a good change from a bad change (quickly) • Revert a bad change quickly • Work in small batches • At IMVU, a large batch = 3 days worth of work • Break large projects down into small batches • Don't have the same problem twice – fix the root cause of each class of problems IMVU pushes code to production 20-30 times every day
  • 12. Cluster Immune System What it looks like to ship one piece of code to production: • Run tests locally (SimpleTest, Selenium) Everyone has a complete sandbox o • Continuous Integration Server (BuildBot) o All tests must pass or “shut down the line” Automatic feedback if the team is going too fast o • Incremental deploy Monitor cluster and business metrics in real-time o Reject changes that move metrics out-of-bounds o • Alerting & Predictive monitoring (Nagios) Monitor all metrics that stakeholders care about o If any metric goes out-of-bounds, wake somebody up o Use historical trends to predict acceptable bounds o When customers see a failure: Fix the problem for customers o Improve your defenses at each level o
  • 13. Case Study: Sharding Problem: Spread write queries across multiple databases Solution: •Intercept and redirect queries based on SQL comments • Move one table or sub-system at a time • Our experience was one engineer horizontally partitions one table or small sub-system in one week •New engineers figure this out in about 5 minutes db_query(“INSERT INTO inventory (customers_id, products_id) VALUES ($customer_id, $product_id)quot;); db_query(quot;/*shard customer://$customer_id */ INSERT INTO inventory (customers_id, products_id) VALUES ($customer_id, $product_id)quot;); •Learning: cross shard joins & transactions aren’t required
  • 14. Case Study: Caching Problem: Cache frequently read data to memcached Solution: •Intercept and cache queries based on SQL comments db_query_cache(BUDDY_CACHE_TIME, quot;/*shard customer://$customer_id */ /*cache-class customer://$customer_id/buddies */ SELECT friend_id, buddy_order FROM customers_friends WHERE customers_id=$customer_idquot;); ----------------- db_query(“/*shard customer://$customer_id */ DELETE FROM customers_friends WHERE customers_id = $customer_id AND friend_id = $friend_id”); db_flush_cacheclass(quot;customer://$customer_id/buddies”); •Learning: Flushing cache critical to users and performance –When a customer spends $24.95, they want the benefits immediately •Learning: Test the cache behavior for critical systems
  • 15. Case Study: Steering Data Design Problem: Improve database schemas and data design to meet scalability requirements without downtime Solution: •Measure to find the real problems (harder than it sounds) •Migrate to new design that takes advantage of sharding and/or caching
  • 16. Case Study: Steering Data Design
  • 17. Case Study: Steering Data Design
  • 18. Case Study: Steering Data Design Problem: You can’t bulk move large frequently accessed data Solution: •Copy on read –Use when you are read bound –Reads check cache, new location, and copy to new location if missing –Writes go to new location if data has been migrated, otherwise old •Copy on write –Use when you are write bound –Reads check cache, new location, then old location –Writes go to new location, copying to new location if missing •Copy all –Use when file system fills up –Reads & writes go to new location, falling back to old location if missing –Cron copies data a few records at a time
  • 19. “Thank You for Listening!”