10+ Deploys Per Day: Dev and Ops Cooperation at Flickr

  • 239,965 views
Uploaded on

Communications and cooperation between development and operations isn't optional, it's mandatory. Flickr takes the idea of "release early, release often" to an extreme - on a normal day there are 10 …

Communications and cooperation between development and operations isn't optional, it's mandatory. Flickr takes the idea of "release early, release often" to an extreme - on a normal day there are 10 full deployments of the site to our servers. This session discusses why this rate of change works so well, and the culture and technology needed to make it possible.

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
  • Great presentation. Nice job
    Are you sure you want to
    Your message goes here
  • This is hilarious, you say no finger-pointing, but there is a clear message in this presentation that 'Ops' are the problem, with only a small nod to how developers can help. The general theme is that Ops are the ones causing problems with several slides focussing on this with no equivalent developers slide.
    Are you sure you want to
    Your message goes here
  • http://blip.tv/oreilly-velocity-conference/velocity-09-john-allspaw-10-deploys-per-day-dev-and-ops-cooperation-at-flickr-2297883
    Are you sure you want to
    Your message goes here
  • infra
    Are you sure you want to
    Your message goes here
  • Well made presentation, good work!
    http://www.fungiftideas.org/
    http://www.fungiftideas.org/category/wedding-gift-ideas/
    Are you sure you want to
    Your message goes here
No Downloads

Views

Total Views
239,965
On Slideshare
0
From Embeds
0
Number of Embeds
121

Actions

Shares
Downloads
3,085
Comments
14
Likes
553

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. 10 deploys per day Dev & ops cooperation at Flickr John Allspaw & Paul Hammond Velocity 2009
  • 2. 3 billion photos 40,000 photos per second http://flickr.com/photos/jimmyroq/415506736/
  • 3. Dev versus Ops
  • 4. “It’s not my machines, it’s your code!”
  • 5. “It’s not my code, it’s your machines!”
  • 6. Spock Scotty Little bit weird Pulls levers & turns knobs Sits closer to the boss Easily excited Thinks too hard Yells a lot in emergencies
  • 7. Says “No” all the time Afraid that new fangled things will break the site Fingerpointy
  • 8. Ops stereotype Because the site breaks unexpectedly Because no one tells them anything Because They say “NO” all the time
  • 9. http://www.flickr.com/photos/stewart/461099066/ Traditional thinking Dev’s job is to add new features Ops’ job is to keep the site stable and fast
  • 10. Ops’ job is NOT to keep the site stable and fast
  • 11. Ops’ job is to enable the business (this is dev’s job too)
  • 12. The business requires change
  • 13. But change is the root cause of most outages!
  • 14. Discourage change in the interests of stability or Allow change to happen as often as it needs to
  • 15. Lowering risk of change through tools and culture
  • 16. Dev and Ops
  • 17. Ops who think like devs Devs who think like ops
  • 18. “But that’s me!”
  • 19. You can always think more like them
  • 20. Tools
  • 21. 1. Automated infrastructure If there is only one thing you do…
  • 22. CFengine Chef BCfg2 FAI 1. Automated infrastructure If there is only one thing you do… System Imager Puppet Cobbler
  • 23. Role & configuration management OS imaging
  • 24. 2. Shared version control
  • 25. Everyone knows where to look http://www.flickr.com/photos/thunderchild5/1330744559/
  • 26. 3. One step build
  • 27. 3. One step build and deploy
  • 28. [2009-06-22 16:03:57] [harmes] site deployed (changes...) Who? When? What?
  • 29. Small frequent changes http://www.flickr.com/photos/mauren/2429240906/
  • 30. 4. Feature flags (aka branching in code)
  • 31. 1.0.1 1.0.2 1.0 1.1 1.2 1.1.1 Desktop software
  • 32. r2301 r2302 r2306 Web software
  • 33. http://www.flickr.com/photos/8720628@N04/2188922076/ Always ship trunk
  • 34. Everyone knows exactly where to look http://www.flickr.com/photos/thunderchild5/1330744559/
  • 35. Feature flags #php if ($cfg['enable_feature_video']){ … } {* smarty *} {if $cfg.enable_feature_beehive} … {/if}
  • 36. http://www.flickr.com/photos/healthserviceglasses/3522809727/ Private betas
  • 37. Bucket testing http://www.flickr.com/photos/davidw/2063575447/
  • 38. http://www.flickr.com/photos/jking89/3031204314/ Dark launches
  • 39. Free contingency switches http://www.flickr.com/photos/flattop341/260207875/
  • 40. 5. Shared metrics
  • 41. Application level metrics
  • 42. Application level metrics
  • 43. Adaptive feedback loops RU ok? App System Metrics maybe?
  • 44. 6. IRC and IM robots
  • 45. Dev, Ops, and Robots Having a conversation build deploy logs logs alerts monitors IRC search engine
  • 46. Culture
  • 47. 1. Respect If there is only one thing you do…
  • 48. Don’t stereotype (not all developers are lazy) http://www.flickr.com/photos/aaronjacobs/64368770/
  • 49. http://www.flickr.com/photos/chrisdag/2286198568/ Respect other people’s expertise, opinions and responsibilities
  • 50. http://www.flickr.com/photos/jwheare/2580631103/ Don’t just say “No”
  • 51. http://www.flickr.com/photos/alancleaver/2661424637/ Don’t hide things
  • 52. Developers: Talk to ops about the impact of your code: • what metrics will change, and how? • what are the risks? • what are the signs that something is going wrong? • what are the contingencies? This means you need to work this out before talking to ops
  • 53. 2. Trust
  • 54. Ops needs to trust dev to involve them on feature discussions Dev needs to trust ops to discuss infrastructure changes Everyone needs to trust that everyone else is doing their best for the business http://www.flickr.com/photos/85128884@N00/2650981813/
  • 55. http://www.flickr.com/photos/flattop341/224176602/ Shared runbooks & escalation plans
  • 56. http://www.flickr.com/photos/telstar/2861103147/ Provide knobs and levers
  • 57. http://www.flickr.com/photos/williamhook/3468484351/ Ops: Be transparent, give devs access to systems
  • 58. 3. Healthy attitude about failure
  • 59. http://www.flickr.com/photos/pinksherbet/447190603/ Failure will happen
  • 60. If you think you can prevent failure then you aren’t developing your ability to respond http://www.flickr.com/photos/toms/2323779363/
  • 61. http://www.flickr.com/photos/changereality/2349538868/
  • 62. Fire drills http://www.flickr.com/photos/dnorman/2678090600
  • 63. 4. Avoiding Blame
  • 64. No fingerpointing http://www.flickr.com/photos/rocketjim54/2955889085/
  • 65. Fingerpointyness problem!!! argggh! fixed. freaking out, blaming, whining, figuring it fixing things not talking, covering hiding. out finding fault ass hurt egos time
  • 66. Being productive problem!!! argggh! fixed. figuring it fixing things feeling move out guilty on with life time
  • 67. Developers: Remember that someone else will probably get woken up when your code breaks http://www.flickr.com/photos/alex-s/353218851/
  • 68. http://www.flickr.com/photos/allspaw/2819774755/ Ops: provide constructive feedback on current aches and pains
  • 69. 1. Automated infrastructure 2. Shared version control 3. One step build and deploy 4. Feature flags 5. Shared metrics 6. IRC and IM robots 1. Respect 2. Trust 3. Healthy attitude about failure 4. Avoiding Blame
  • 70. This is not easy You could just carry on shouting at each other…
  • 71. (Thank you)