The document discusses Disqus' approach to continuous deployment. It describes how code is automatically deployed as soon as tests pass, with the goal of releasing features incrementally. It outlines the workflow, pros and cons, and techniques used to simplify local development and ensure stability through testing. Pain points like test speed and database migrations are addressed along with tools developed in-house like Mule for distributed testing.
3. Continuous Deployment
Shipping new code as soon
as it’s ready
(It’s really just super awesome buildbots)
Friday, April 29, 2011
4. Workflow
1. Developer commits code
2. CI server runs tests automatically
2.1.Build passes, code deploys
2.2.Build fails, block deploy
3. Developer tests feature on production
Friday, April 29, 2011
5. Pros Cons
• Develop features • Culture Shock
incrementally • Stability depends on
• Release frequently test coverage
• Less QA! (maybe) • Initial time
investment
We mostly just care about iteration and stability
Friday, April 29, 2011
7. Development
• Production > Staging > CI > Dev
• Automate testing of complicated
processes and architecture
• Simple is arguably better than complete
• Especially for local development
• python setup.py {develop,test}
• Use Puppet/Chef or build a simple
bootstrap.{py,sh}
Friday, April 29, 2011
9. Bootstrapping Local
• Simplify local setup
• git clone dcramer@disqus:disqus.git
• python setup.py develop
• python manage.py runserver
• Need to test dependancies?
• virtualbox + vagrant up
Friday, April 29, 2011
10. “Under Construction”
• Iterate quickly by hiding features
• Early adopters are free QA
from gargoyle import gargoyle
def my_view(request):
if gargoyle.is_active('awesome', request):
return 'new happy version :D'
else:
return 'old sad version :('
Friday, April 29, 2011
11. Gargoyle
Deploy features to portions of a user base at a
time to ensure smooth, measurable releases
Being users of our product, we actively use
early versions of features before public release
Friday, April 29, 2011
12. Integration
(or as we like to call it)
Friday, April 29, 2011
15. CI Requirements
✓ Developers must know when they’ve
broken something
✓ IRC, Email, IM
✓ Support proper reporting
✓ XUnit, Pylint, Coverage.py
✓ Painless setup
✓ apt-get install jenkins
Friday, April 29, 2011
16. Shortcomings
• False positives lower awareness
• Reporting isn't accurate
• Services fail
• Bad Tests
• Not enough code coverage
• Regressions on untested code
• Test suite takes too long
• Integration tests vs Unit tests
Friday, April 29, 2011
17. Fixing False Positives
• Re-run tests several times on a failure
• Report continually failing tests
• Fix continually failing tests
• Rely less on 3rd parties
• Mock/Dingus
Friday, April 29, 2011
18. Maintaining Coverage
• Raise awareness with reporting
• Fail/alert when coverage drops on a build
• Commit tests with code
• Drive it into your culture
Friday, April 29, 2011
19. Speeding Up Tests
• Write true unit tests
• vs slower integration tests
• Mock 3rd party APIs
• Distributed and parallel testing
• http://github.com/disqus/mule
Friday, April 29, 2011
20. Mule
• Unstable (we’re working on it)
• Mostly Django right now
• Generic interfaces for unittest2
• Works with multi-processing and Celery
• More complex than normal Celery usage
• Full XUnit integration
• Simple workflow
• mule test --runner="python manage.py
mule --worker $TEST"
Friday, April 29, 2011
22. How DISQUS Does It
• Incremental deploy with Fabric
• Drop server from pool
• Pull in requirements on each server
• Isolated virtualenv’s built on each server
• Push server back online
Friday, April 29, 2011
23. Challenges
• PyPi works on server A, but not B
• Scale, or lack of
• CPU cost per server
• Schema changes, data model changes
• Backwards compatibility
Friday, April 29, 2011
24. PyPi is Down
• http://github.com/disqus/chishop
Friday, April 29, 2011
25. Help, we have 100 servers!
• Incremental (ours) vs Fanout
• Push vs Pull
• Twitter uses BitTorrent
• Isolation vs Packaging (Complexity)
Friday, April 29, 2011
26. SQL Schema Changes
1. Add column (NULLable)
2. Add app code to fill column
3. Deploy
4. Backfill column
5. Add app code to read column
6. Deploy
Cached Data Changes
• Have a global version number
• Have a data model cache version
• maybe md5(cls.__dict__)?
Friday, April 29, 2011