Cloud Operations Bootcamp: Culture - Jesse Robbins

  • 2,646 views
Uploaded on

Cloud Operations Bootcamp: Culture

Cloud Operations Bootcamp: Culture

More in: Technology , Business
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
2,646
On Slideshare
0
From Embeds
0
Number of Embeds
3

Actions

Shares
Downloads
68
Comments
0
Likes
4

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Operations Culture Speaker: Jesse Robbins CEO ‣ jesse@opscode.com ‣ @jesserobbins ‣ www.opscode.com 1
  • 2. Today 2
  • 3. Today ‣ Operations is Culture 2
  • 4. Today ‣ Operations is Culture ‣ Failure Happens 2
  • 5. Today ‣ Operations is Culture ‣ Failure Happens ‣ The OODA Loop 2
  • 6. Today ‣ Operations is Culture ‣ Failure Happens ‣ The OODA Loop ‣ Do Fire Drills 2
  • 7. Operations is Culture 3
  • 8. “You don’t choose the moment, the moment chooses you. You only get to choose how prepared you are when it does.” -Fire Chief Mike Burtch 4
  • 9. Cloud Operations is the ability to consistently create and deploy reliable software to an unreliable platform that scales horizontally. http://radar.oreilly.com/2007/10/operations-is-a-competitive-ad.html 5
  • 10. “It’s not my code, it’s your machines! http://www.slideshare.net/jallspaw/10-deploys-per-day-dev-and-ops-cooperation-at-flickr 6
  • 11. “It’s not my code, it’s your machines! http://www.slideshare.net/jallspaw/10-deploys-per-day-dev-and-ops-cooperation-at-flickr 6
  • 12. “It’s not my code, it’s your machines! Spock Scotty Little bit weird Pulls levers & turns knobs Sits closer to the boss Easily excited Thinks too hard Yells a lot in emergencies http://www.slideshare.net/jallspaw/10-deploys-per-day-dev-and-ops-cooperation-at-flickr 6
  • 13. No ngerpointing http://www.slideshare.net/jallspaw/10-deploys-per-day-dev-and-ops-cooperation-at-flickr http://www. ickr.com/photos/rocketjim54/2955889085/ Reserved Copyright © 2010 Opscode, Inc - All Rights 7
  • 14. Fingerpointyness problem!!! argggh! time http://www.slideshare.net/jallspaw/10-deploys-per-day-dev-and-ops-cooperation-at-flickr
  • 15. Fingerpointyness problem!!! argggh! freaking out, not talking, finding fault time http://www.slideshare.net/jallspaw/10-deploys-per-day-dev-and-ops-cooperation-at-flickr
  • 16. Fingerpointyness problem!!! argggh! freaking out, blaming, not talking, covering finding fault ass time http://www.slideshare.net/jallspaw/10-deploys-per-day-dev-and-ops-cooperation-at-flickr
  • 17. Fingerpointyness problem!!! argggh! freaking out, blaming, not talking, covering whining, finding fault ass hiding. hurt egos time http://www.slideshare.net/jallspaw/10-deploys-per-day-dev-and-ops-cooperation-at-flickr
  • 18. Fingerpointyness problem!!! argggh! freaking out, blaming, figuring it not talking, covering whining, out finding fault ass hiding. hurt egos time http://www.slideshare.net/jallspaw/10-deploys-per-day-dev-and-ops-cooperation-at-flickr
  • 19. Fingerpointyness problem!!! argggh! fixed freaking out, blaming, figuring it fixing things not talking, covering whining, out finding fault ass hiding. hurt egos time http://www.slideshare.net/jallspaw/10-deploys-per-day-dev-and-ops-cooperation-at-flickr
  • 20. Being productive problem!!! argggh! time http://www.slideshare.net/jallspaw/10-deploys-per-day-dev-and-ops-cooperation-at-flickr
  • 21. Being productive problem!!! argggh! figuring it out time http://www.slideshare.net/jallspaw/10-deploys-per-day-dev-and-ops-cooperation-at-flickr
  • 22. Being productive problem!!! argggh! fixed figuring it fixing things out time http://www.slideshare.net/jallspaw/10-deploys-per-day-dev-and-ops-cooperation-at-flickr
  • 23. Being productive problem!!! argggh! fixed figuring it fixing things feeling out guilty time http://www.slideshare.net/jallspaw/10-deploys-per-day-dev-and-ops-cooperation-at-flickr
  • 24. Being productive problem!!! argggh! fixed figuring it fixing things feeling move out guilty on with life time http://www.slideshare.net/jallspaw/10-deploys-per-day-dev-and-ops-cooperation-at-flickr
  • 25. This will be on the test: FAILURE HAPPENS!
  • 26. Good Book!
  • 27. Catastrophic Potential Simple Complexity Complex Tight Coupling Loose Created by Jesse Robbins "Catastrophic Potential" adapted from Normal Accidents by Charles Perrow 12
  • 28. Catastrophic Potential Simple Complexity Complex Tight KEEP OUT!!! Coupling Loose Created by Jesse Robbins "Catastrophic Potential" adapted from Normal Accidents by Charles Perrow 12
  • 29. define: Nines (roughly)
  • 30. define: Nines (roughly) 99% 5256 min (3.5 days)
  • 31. define: Nines (roughly) 99% 5256 min (3.5 days) 99.9% 528 min ( 8.8 hours )
  • 32. define: Nines (roughly) 99% 5256 min (3.5 days) 99.9% 528 min ( 8.8 hours ) 99.99% 53 min
  • 33. define: Nines (roughly) 99% 5256 min (3.5 days) 99.9% 528 min ( 8.8 hours ) 99.99% 53 min 99.999% 5 min
  • 34. define: Nines (roughly) 99% 5256 min (3.5 days) 99.9% 528 min ( 8.8 hours ) 99.99% 53 min 99.999% 5 min 99.9999% 30 Seconds
  • 35. define: Nines (roughly) 99% 5256 min (3.5 days) 99.9% 528 min ( 8.8 hours ) 99.99% 53 min 99.999% 5 min 99.9999% 30 Seconds 99.99999% 3 Seconds
  • 36. 99.9% * 99.9% * 99.9% = 99.7% 14
  • 37. Internet Routing... won’t.
  • 38. ;''-1(<"=/-)"3.1>0?-'"@'-': !"#$$%"&'(')*)"+,-.,-/01,( +/.01210*"345467"89: #
  • 39. http://radar.oreilly.com/2008/10/sprint-blocking-cogent-network.html
  • 40. #googlefail
  • 41. YOU Copyright © 2010 Opscode, Inc - All Rights Reserved 21
  • 42. Continuous Power... isn’t
  • 43. 365 Main SF
  • 44. 365 364.96 Main SF
  • 45. http://radar.oreilly.com/2007/07/failure-happens-a-summary-of-t.html
  • 46. http://radar.oreilly.com/2007/07/failure-happens-a-summary-of-t.html
  • 47. Failure happens A single datacenter is the problem • Since they all fail at some point Recovery procedures after failure • Power was gone ~45 minutes • Most services took hours to come back • Some unnamed ones more than 12 hours
  • 48. Geography is a Single Point of Failure
  • 49. Copyright © 2010 Opscode, Inc - All Rights Reserved 30
  • 50. Providers are baskets too.
  • 51. Copyright © 2010 Opscode, Inc - All Rights Reserved 32
  • 52. Failure Happens. Anyone promising otherwise is either foolish or lying (or both).
  • 53. OODA Observe, Orient, Decide, Act 34
  • 54. OODA: Observe, Orient, Decide, Act http://en.wikipedia.org/wiki/OODA_loop 35
  • 55. http://www.slideshare.net/jallspaw/10-deploys-per-day-dev-and-ops-cooperation-at-flickr http://www.flickr.com/photos/dnorman/2678090600
  • 56. Speaker: Jesse Robbins CEO ‣ jesse@opscode.com ‣ @jesserobbins ‣ www.opscode.com 37