• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Cloud Operations Bootcamp: Culture - Jesse Robbins
 

Cloud Operations Bootcamp: Culture - Jesse Robbins

on

  • 3,294 views

Cloud Operations Bootcamp: Culture

Cloud Operations Bootcamp: Culture

Statistics

Views

Total Views
3,294
Views on SlideShare
3,073
Embed Views
221

Actions

Likes
4
Downloads
66
Comments
0

6 Embeds 221

http://www.opscode.com 200
http://www.getchef.com 10
http://www.slideshare.net 5
http://mndoci.com 4
http://feeds.opscode.com 1
http://opscode.com 1

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

CC Attribution-ShareAlike LicenseCC Attribution-ShareAlike License

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel

Cloud Operations Bootcamp: Culture - Jesse Robbins Cloud Operations Bootcamp: Culture - Jesse Robbins Presentation Transcript

  • Operations Culture Speaker: Jesse Robbins CEO ‣ jesse@opscode.com ‣ @jesserobbins ‣ www.opscode.com 1
  • Today 2
  • Today ‣ Operations is Culture 2
  • Today ‣ Operations is Culture ‣ Failure Happens 2
  • Today ‣ Operations is Culture ‣ Failure Happens ‣ The OODA Loop 2
  • Today ‣ Operations is Culture ‣ Failure Happens ‣ The OODA Loop ‣ Do Fire Drills 2
  • Operations is Culture 3
  • “You don’t choose the moment, the moment chooses you. You only get to choose how prepared you are when it does.” -Fire Chief Mike Burtch 4
  • Cloud Operations is the ability to consistently create and deploy reliable software to an unreliable platform that scales horizontally. http://radar.oreilly.com/2007/10/operations-is-a-competitive-ad.html 5
  • “It’s not my code, it’s your machines! http://www.slideshare.net/jallspaw/10-deploys-per-day-dev-and-ops-cooperation-at-flickr 6
  • “It’s not my code, it’s your machines! http://www.slideshare.net/jallspaw/10-deploys-per-day-dev-and-ops-cooperation-at-flickr 6
  • “It’s not my code, it’s your machines! Spock Scotty Little bit weird Pulls levers & turns knobs Sits closer to the boss Easily excited Thinks too hard Yells a lot in emergencies http://www.slideshare.net/jallspaw/10-deploys-per-day-dev-and-ops-cooperation-at-flickr 6
  • No ngerpointing http://www.slideshare.net/jallspaw/10-deploys-per-day-dev-and-ops-cooperation-at-flickr http://www. ickr.com/photos/rocketjim54/2955889085/ Reserved Copyright © 2010 Opscode, Inc - All Rights 7
  • Fingerpointyness problem!!! argggh! time http://www.slideshare.net/jallspaw/10-deploys-per-day-dev-and-ops-cooperation-at-flickr
  • Fingerpointyness problem!!! argggh! freaking out, not talking, finding fault time http://www.slideshare.net/jallspaw/10-deploys-per-day-dev-and-ops-cooperation-at-flickr
  • Fingerpointyness problem!!! argggh! freaking out, blaming, not talking, covering finding fault ass time http://www.slideshare.net/jallspaw/10-deploys-per-day-dev-and-ops-cooperation-at-flickr
  • Fingerpointyness problem!!! argggh! freaking out, blaming, not talking, covering whining, finding fault ass hiding. hurt egos time http://www.slideshare.net/jallspaw/10-deploys-per-day-dev-and-ops-cooperation-at-flickr
  • Fingerpointyness problem!!! argggh! freaking out, blaming, figuring it not talking, covering whining, out finding fault ass hiding. hurt egos time http://www.slideshare.net/jallspaw/10-deploys-per-day-dev-and-ops-cooperation-at-flickr
  • Fingerpointyness problem!!! argggh! fixed freaking out, blaming, figuring it fixing things not talking, covering whining, out finding fault ass hiding. hurt egos time http://www.slideshare.net/jallspaw/10-deploys-per-day-dev-and-ops-cooperation-at-flickr
  • Being productive problem!!! argggh! time http://www.slideshare.net/jallspaw/10-deploys-per-day-dev-and-ops-cooperation-at-flickr
  • Being productive problem!!! argggh! figuring it out time http://www.slideshare.net/jallspaw/10-deploys-per-day-dev-and-ops-cooperation-at-flickr
  • Being productive problem!!! argggh! fixed figuring it fixing things out time http://www.slideshare.net/jallspaw/10-deploys-per-day-dev-and-ops-cooperation-at-flickr
  • Being productive problem!!! argggh! fixed figuring it fixing things feeling out guilty time http://www.slideshare.net/jallspaw/10-deploys-per-day-dev-and-ops-cooperation-at-flickr
  • Being productive problem!!! argggh! fixed figuring it fixing things feeling move out guilty on with life time http://www.slideshare.net/jallspaw/10-deploys-per-day-dev-and-ops-cooperation-at-flickr
  • This will be on the test: FAILURE HAPPENS!
  • Good Book!
  • Catastrophic Potential Simple Complexity Complex Tight Coupling Loose Created by Jesse Robbins "Catastrophic Potential" adapted from Normal Accidents by Charles Perrow 12
  • Catastrophic Potential Simple Complexity Complex Tight KEEP OUT!!! Coupling Loose Created by Jesse Robbins "Catastrophic Potential" adapted from Normal Accidents by Charles Perrow 12
  • define: Nines (roughly)
  • define: Nines (roughly) 99% 5256 min (3.5 days)
  • define: Nines (roughly) 99% 5256 min (3.5 days) 99.9% 528 min ( 8.8 hours )
  • define: Nines (roughly) 99% 5256 min (3.5 days) 99.9% 528 min ( 8.8 hours ) 99.99% 53 min
  • define: Nines (roughly) 99% 5256 min (3.5 days) 99.9% 528 min ( 8.8 hours ) 99.99% 53 min 99.999% 5 min
  • define: Nines (roughly) 99% 5256 min (3.5 days) 99.9% 528 min ( 8.8 hours ) 99.99% 53 min 99.999% 5 min 99.9999% 30 Seconds
  • define: Nines (roughly) 99% 5256 min (3.5 days) 99.9% 528 min ( 8.8 hours ) 99.99% 53 min 99.999% 5 min 99.9999% 30 Seconds 99.99999% 3 Seconds
  • 99.9% * 99.9% * 99.9% = 99.7% 14
  • Internet Routing... won’t.
  • ;''-1(<"=/-)"3.1>0?-'"@'-': !"#$$%"&'(')*)"+,-.,-/01,( +/.01210*"345467"89: #
  • http://radar.oreilly.com/2008/10/sprint-blocking-cogent-network.html
  • #googlefail
  • YOU Copyright © 2010 Opscode, Inc - All Rights Reserved 21
  • Continuous Power... isn’t
  • 365 Main SF
  • 365 364.96 Main SF
  • http://radar.oreilly.com/2007/07/failure-happens-a-summary-of-t.html
  • http://radar.oreilly.com/2007/07/failure-happens-a-summary-of-t.html
  • Failure happens A single datacenter is the problem • Since they all fail at some point Recovery procedures after failure • Power was gone ~45 minutes • Most services took hours to come back • Some unnamed ones more than 12 hours
  • Geography is a Single Point of Failure
  • Copyright © 2010 Opscode, Inc - All Rights Reserved 30
  • Providers are baskets too.
  • Copyright © 2010 Opscode, Inc - All Rights Reserved 32
  • Failure Happens. Anyone promising otherwise is either foolish or lying (or both).
  • OODA Observe, Orient, Decide, Act 34
  • OODA: Observe, Orient, Decide, Act http://en.wikipedia.org/wiki/OODA_loop 35
  • http://www.slideshare.net/jallspaw/10-deploys-per-day-dev-and-ops-cooperation-at-flickr http://www.flickr.com/photos/dnorman/2678090600
  • Speaker: Jesse Robbins CEO ‣ jesse@opscode.com ‣ @jesserobbins ‣ www.opscode.com 37