Scaling with Continuous Deployment

               Web 2.0 Expo
     New York, NY, September 29, 2010

             Brett ...
An online community where members use 3D avatars
           to meet new people, chat, create
            and have fun with...
Who is my audience?



Mix of engineering / product?




                                   2
Who is my audience?



Mix of engineering / product?

 How many from a startup?




                                   3
Who is my audience?



       Mix of engineering / product?

        How many from a startup?

How many believe iterating ...
How quickly can your business iterate?




                                         5
Can I interest you in some
Continuous Deployment?




                             6
In a Nutshell




What is Continuous Deployment?

• Engineer commits code
• 20 minutes later it is live in production
• Re...
Does This Really Work?


“Maybe this is just viable for a single
  developer … your site will be down. A lot.”

“It seems ...
Benefits




• Regressions easy to find, correct

• Releases have zero overhead

• Rapid iteration using real customer met...
Finding and Fixing Problems


                        • Each release has few
                          changes, 1-3 commit...
CD at IMVU: Simple Overview

  Local tests                                                   Rollback
pass, engineer      ...
CD at IMVU: Detailed Overview




                            12
Getting Started – Extreme Basics




1. Continuous integration system
2. Production monitoring and alerting
   – System pe...
Commit to Making Forward Progress




• Require coverage for all new code

• Add coverage for bugs / regressions

• Unders...
Expect Some Hurdles


• Production outages
• New overhead
   – Tests
   – Build systems
• Production outages
• Frustration...
Dealing with SQL


Problems
• Difficult to roll-back schema
• Alter statements lock / impact customers

Solutions
• New sc...
Big Features



• Developed on trunk, not branch
   – “hidden” from customers by A/B experiment
   – 100% control, add QA ...
Test Speed


Slow tests burden to scaling
• Can’t run all tests in sandbox
• Faster to debug on build cluster

If possible...
The cost of failing tests



As the team grows…

• More likely to have test failures
• More people blocked as a result

  ...
Other Issues


• Won’t catch issues that fail slowly
   – SELECT * FROM growing_table WHERE 1


• Some critical areas caus...
Does Continuous Deployment Scale?

• Technical staff ~50 people

• 10 million monthly unique visitors

• Peak ~130K concur...
Newer Scaling Challenges




Biggest challenges come with growth of the
          engineering organization




           ...
SLA for Build Systems




Build systems are a critical service




                                        23
SLA for Build Systems




Build systems are a critical service
        Run them that way




                             ...
Build and Push Times




                   25
Overall Availability




                   26
http://www.flickr.com/photos/onebigchickenman/4869442019/
Build Throughput


• Initial implementation sequential builds
   – Scaled okay to ~20 engineers
   – Like trains running e...
Web Build Software



•   Custom test-file runner with JS GUI
•   PHP SimpleTest
•   Python's built-in unittest
•   Seleni...
Current Systems


• > 15,000 tests

• 86 web build servers
   – 62 Linux

   – 24 Windows

• ~ 10 minutes on build servers...
Conclusion



• Continuous Deployment is possible!

• Starting earlier is easier - baby steps

• The value of being able t...
Questions?




             32
Thank You!


                        IMVU recognized as:

                      Inc. 500
                           http:/...
Upcoming SlideShare
Loading in...5
×

Scaling Continuous Deployment at IMVU

3,146

Published on

Scaling with Continuous Deployment as presented at the Web 2.0 Expo, New York, September 29, 2010

Published in: Technology
0 Comments
5 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
3,146
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
45
Comments
0
Likes
5
Embeds 0
No embeds

No notes for slide

Scaling Continuous Deployment at IMVU

  1. 1. Scaling with Continuous Deployment Web 2.0 Expo New York, NY, September 29, 2010 Brett G. Durrett (@bdurrett) Vice President Engineering & Operations, IMVU, Inc. 0
  2. 2. An online community where members use 3D avatars to meet new people, chat, create and have fun with their friends
  3. 3. Who is my audience? Mix of engineering / product? 2
  4. 4. Who is my audience? Mix of engineering / product? How many from a startup? 3
  5. 5. Who is my audience? Mix of engineering / product? How many from a startup? How many believe iterating on your product is critical to the success of your business? 4
  6. 6. How quickly can your business iterate? 5
  7. 7. Can I interest you in some Continuous Deployment? 6
  8. 8. In a Nutshell What is Continuous Deployment? • Engineer commits code • 20 minutes later it is live in production • Repeat for about 50 commits per day 7
  9. 9. Does This Really Work? “Maybe this is just viable for a single developer … your site will be down. A lot.” “It seems like the author either has no customers or very understanding customers” Responses to February 2009 blog posting about Continuous Deployment at IMVU (at the time IMVU had a $12 million run rate) 8
  10. 10. Benefits • Regressions easy to find, correct • Releases have zero overhead • Rapid iteration using real customer metrics 9
  11. 11. Finding and Fixing Problems • Each release has few changes, 1-3 commits • Production issues correlate with check- in timestamp • No overhead to Identifying cause takes minutes producing a new release to correct issue
  12. 12. CD at IMVU: Simple Overview Local tests Rollback pass, engineer (Blocks) commits code No Lots and lots of Metrics Yes Code deployed tests run good? to all servers All tests Yes Code deployed Metrics No pass? to % of servers still good? No Yes Revert commit (Blocks) Win! 11
  13. 13. CD at IMVU: Detailed Overview 12
  14. 14. Getting Started – Extreme Basics 1. Continuous integration system 2. Production monitoring and alerting – System performance – Business metrics – Trending is nice too  3. Simple deploy / roll-back system 13
  15. 15. Commit to Making Forward Progress • Require coverage for all new code • Add coverage for bugs / regressions • Understand and fix root cause of failures 14
  16. 16. Expect Some Hurdles • Production outages • New overhead – Tests – Build systems • Production outages • Frustration • Production outages (but well worth it)
  17. 17. Dealing with SQL Problems • Difficult to roll-back schema • Alter statements lock / impact customers Solutions • New schema has formal review process • No alter on large tables, create new table – Copy on read – Complete migration with background job 16
  18. 18. Big Features • Developed on trunk, not branch – “hidden” from customers by A/B experiment – 100% control, add QA to experiment • Deployed daily during development • Slow roll-out by increasing experiment % – Experiment closed = fully launched 17
  19. 19. Test Speed Slow tests burden to scaling • Can’t run all tests in sandbox • Faster to debug on build cluster If possible… • Keep tests fast • Keep tests specific 18
  20. 20. The cost of failing tests As the team grows… • More likely to have test failures • More people blocked as a result Intermittent failures very bad Eliminate the root cause 19
  21. 21. Other Issues • Won’t catch issues that fail slowly – SELECT * FROM growing_table WHERE 1 • Some critical areas cause hard lock-ups – MySQL – Memcached • Lack of test coverage of older code – Not an issue if you start with test coverage 20
  22. 22. Does Continuous Deployment Scale? • Technical staff ~50 people • 10 million monthly unique visitors • Peak ~130K concurrent IM client logins • It’s a real business! – $40 million run rate – Profitable and doubled revenue in 2009 21
  23. 23. Newer Scaling Challenges Biggest challenges come with growth of the engineering organization 22
  24. 24. SLA for Build Systems Build systems are a critical service 23
  25. 25. SLA for Build Systems Build systems are a critical service Run them that way 24
  26. 26. Build and Push Times 25
  27. 27. Overall Availability 26
  28. 28. http://www.flickr.com/photos/onebigchickenman/4869442019/
  29. 29. Build Throughput • Initial implementation sequential builds – Scaled okay to ~20 engineers – Like trains running every 20 minutes – One “red” blocks all following builds • Solution: build isolation – Enable testing single build without deploy – “Red” build pulled, allow other builds to pass 28
  30. 30. Web Build Software • Custom test-file runner with JS GUI • PHP SimpleTest • Python's built-in unittest • Selenium Core with in-house API wrapper • YUITest for browser JS unit tests • Erlang Eunit • Buildbot 29
  31. 31. Current Systems • > 15,000 tests • 86 web build servers – 62 Linux – 24 Windows • ~ 10 minutes on build servers • Deploy to cluster of ~700 servers 30
  32. 32. Conclusion • Continuous Deployment is possible! • Starting earlier is easier - baby steps • The value of being able to iterate outweighs the challenges 31
  33. 33. Questions? 32
  34. 34. Thank You! IMVU recognized as: Inc. 500 http://bit.ly/dv52wK Brett G. Durrett Red Herring 100: bdurrett@imvu.com http://bit.ly/bbz5Ex Twitter: @bdurrett Best Place to Work: http://bit.ly/aAVdp8 (and we're hiring) http://www.imvu.com/jobs
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×