Operations Culture
Speaker:

Jesse Robbins CEO
  ‣ jesse@opscode.com
  ‣ @jesserobbins
  ‣ www.opscode.com
               ...
Today




        2
Today

‣ Operations is Culture




                          2
Today

‣ Operations is Culture

‣ Failure Happens




                          2
Today

‣ Operations is Culture

‣ Failure Happens

‣ The OODA Loop




                          2
Today

‣ Operations is Culture

‣ Failure Happens

‣ The OODA Loop

‣ Do Fire Drills



                          2
Operations is Culture



                        3
“You don’t choose the moment,
   the moment chooses you.

  You only get to choose how
prepared you are when it does.”
	 	...
Cloud Operations
 is the ability to consistently create
 and deploy reliable software to an
   unreliable platform that sc...
“It’s not my code, it’s your machines!




http://www.slideshare.net/jallspaw/10-deploys-per-day-dev-and-ops-cooperation-a...
“It’s not my code, it’s your machines!




http://www.slideshare.net/jallspaw/10-deploys-per-day-dev-and-ops-cooperation-a...
“It’s not my code, it’s your machines!



                            Spock Scotty
                   Little bit weird    ...
No ngerpointing




http://www.slideshare.net/jallspaw/10-deploys-per-day-dev-and-ops-cooperation-at-flickr
http://www. ick...
Fingerpointyness

        problem!!!
         argggh!




                                                                ...
Fingerpointyness

        problem!!!
         argggh!



                 freaking out,
                  not talking,
   ...
Fingerpointyness

        problem!!!
         argggh!



                 freaking out, blaming,
                  not tal...
Fingerpointyness

        problem!!!
         argggh!



                 freaking out, blaming,
                  not tal...
Fingerpointyness

        problem!!!
         argggh!



                 freaking out, blaming,                         fi...
Fingerpointyness

        problem!!!
         argggh!                                                                     ...
Being productive

        problem!!!
         argggh!




                                                                ...
Being productive

        problem!!!
         argggh!



                  figuring it
                     out




       ...
Being productive

        problem!!!
         argggh!                        fixed


                  figuring it      fixin...
Being productive

        problem!!!
         argggh!                        fixed


                  figuring it      fixin...
Being productive

        problem!!!
         argggh!                        fixed


                  figuring it      fixin...
This will be on the test:
  FAILURE HAPPENS!
Good
Book!
Catastrophic Potential
           Simple             Complexity                               Complex


   Tight
Coupling
...
Catastrophic Potential
           Simple             Complexity                               Complex


   Tight          ...
define:
 Nines (roughly)
define:
 Nines (roughly)
   99%	 5256 min (3.5 days)
define:
 Nines (roughly)
   99%	 5256 min (3.5 days)
   99.9%	 528 min ( 8.8 hours )
define:
 Nines (roughly)
   99%	 5256 min (3.5 days)
   99.9%	 528 min ( 8.8 hours )
   99.99% 53 min
define:
 Nines (roughly)
   99%	 5256 min (3.5 days)
   99.9%	 528 min ( 8.8 hours )
   99.99% 53 min
   99.999% 5 min
define:
 Nines (roughly)
   99%	 5256 min (3.5 days)
   99.9%	 528 min ( 8.8 hours )
   99.99% 53 min
   99.999% 5 min
   9...
define:
 Nines (roughly)
   99%	 5256 min (3.5 days)
   99.9%	 528 min ( 8.8 hours )
   99.99% 53 min
   99.999% 5 min
   9...
99.9% *
99.9% *
99.9%
   =
99.7%
          14
Internet Routing... won’t.
;''-1(<"=/-)"3.1>0?-'"@'-':




!"#$$%"&'(')*)"+,-.,-/01,(   +/.01210*"345467"89:   #
http://radar.oreilly.com/2008/10/sprint-blocking-cogent-network.html
#googlefail
YOU




Copyright © 2010 Opscode, Inc - All Rights Reserved   21
Continuous Power...
       isn’t
365 Main SF
365 364.96 Main SF
http://radar.oreilly.com/2007/07/failure-happens-a-summary-of-t.html
http://radar.oreilly.com/2007/07/failure-happens-a-summary-of-t.html
Failure happens

 A single datacenter is the
 problem
 • Since they all fail at some point

 Recovery procedures after
 fa...
Geography is a
Single Point of Failure
Copyright © 2010 Opscode, Inc - All Rights Reserved   30
Providers are
baskets too.
Copyright © 2010 Opscode, Inc - All Rights Reserved   32
Failure Happens.
Anyone promising otherwise
 is either foolish or lying
          (or both).
OODA
Observe, Orient, Decide, Act



                               34
OODA: Observe, Orient, Decide, Act




             http://en.wikipedia.org/wiki/OODA_loop




                           ...
http://www.slideshare.net/jallspaw/10-deploys-per-day-dev-and-ops-cooperation-at-flickr
http://www.flickr.com/photos/dnorman...
Speaker:

Jesse Robbins CEO
  ‣ jesse@opscode.com
  ‣ @jesserobbins
  ‣ www.opscode.com
                        37
Cloud Operations Bootcamp: Culture - Jesse Robbins
Cloud Operations Bootcamp: Culture - Jesse Robbins
Cloud Operations Bootcamp: Culture - Jesse Robbins
Cloud Operations Bootcamp: Culture - Jesse Robbins
Cloud Operations Bootcamp: Culture - Jesse Robbins
Upcoming SlideShare
Loading in...5
×

Cloud Operations Bootcamp: Culture - Jesse Robbins

2,822

Published on

Cloud Operations Bootcamp: Culture

Published in: Technology, Business
0 Comments
4 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
2,822
On Slideshare
0
From Embeds
0
Number of Embeds
5
Actions
Shares
0
Downloads
70
Comments
0
Likes
4
Embeds 0
No embeds

No notes for slide

Cloud Operations Bootcamp: Culture - Jesse Robbins

  1. 1. Operations Culture Speaker: Jesse Robbins CEO ‣ jesse@opscode.com ‣ @jesserobbins ‣ www.opscode.com 1
  2. 2. Today 2
  3. 3. Today ‣ Operations is Culture 2
  4. 4. Today ‣ Operations is Culture ‣ Failure Happens 2
  5. 5. Today ‣ Operations is Culture ‣ Failure Happens ‣ The OODA Loop 2
  6. 6. Today ‣ Operations is Culture ‣ Failure Happens ‣ The OODA Loop ‣ Do Fire Drills 2
  7. 7. Operations is Culture 3
  8. 8. “You don’t choose the moment, the moment chooses you. You only get to choose how prepared you are when it does.” -Fire Chief Mike Burtch 4
  9. 9. Cloud Operations is the ability to consistently create and deploy reliable software to an unreliable platform that scales horizontally. http://radar.oreilly.com/2007/10/operations-is-a-competitive-ad.html 5
  10. 10. “It’s not my code, it’s your machines! http://www.slideshare.net/jallspaw/10-deploys-per-day-dev-and-ops-cooperation-at-flickr 6
  11. 11. “It’s not my code, it’s your machines! http://www.slideshare.net/jallspaw/10-deploys-per-day-dev-and-ops-cooperation-at-flickr 6
  12. 12. “It’s not my code, it’s your machines! Spock Scotty Little bit weird Pulls levers & turns knobs Sits closer to the boss Easily excited Thinks too hard Yells a lot in emergencies http://www.slideshare.net/jallspaw/10-deploys-per-day-dev-and-ops-cooperation-at-flickr 6
  13. 13. No ngerpointing http://www.slideshare.net/jallspaw/10-deploys-per-day-dev-and-ops-cooperation-at-flickr http://www. ickr.com/photos/rocketjim54/2955889085/ Reserved Copyright © 2010 Opscode, Inc - All Rights 7
  14. 14. Fingerpointyness problem!!! argggh! time http://www.slideshare.net/jallspaw/10-deploys-per-day-dev-and-ops-cooperation-at-flickr
  15. 15. Fingerpointyness problem!!! argggh! freaking out, not talking, finding fault time http://www.slideshare.net/jallspaw/10-deploys-per-day-dev-and-ops-cooperation-at-flickr
  16. 16. Fingerpointyness problem!!! argggh! freaking out, blaming, not talking, covering finding fault ass time http://www.slideshare.net/jallspaw/10-deploys-per-day-dev-and-ops-cooperation-at-flickr
  17. 17. Fingerpointyness problem!!! argggh! freaking out, blaming, not talking, covering whining, finding fault ass hiding. hurt egos time http://www.slideshare.net/jallspaw/10-deploys-per-day-dev-and-ops-cooperation-at-flickr
  18. 18. Fingerpointyness problem!!! argggh! freaking out, blaming, figuring it not talking, covering whining, out finding fault ass hiding. hurt egos time http://www.slideshare.net/jallspaw/10-deploys-per-day-dev-and-ops-cooperation-at-flickr
  19. 19. Fingerpointyness problem!!! argggh! fixed freaking out, blaming, figuring it fixing things not talking, covering whining, out finding fault ass hiding. hurt egos time http://www.slideshare.net/jallspaw/10-deploys-per-day-dev-and-ops-cooperation-at-flickr
  20. 20. Being productive problem!!! argggh! time http://www.slideshare.net/jallspaw/10-deploys-per-day-dev-and-ops-cooperation-at-flickr
  21. 21. Being productive problem!!! argggh! figuring it out time http://www.slideshare.net/jallspaw/10-deploys-per-day-dev-and-ops-cooperation-at-flickr
  22. 22. Being productive problem!!! argggh! fixed figuring it fixing things out time http://www.slideshare.net/jallspaw/10-deploys-per-day-dev-and-ops-cooperation-at-flickr
  23. 23. Being productive problem!!! argggh! fixed figuring it fixing things feeling out guilty time http://www.slideshare.net/jallspaw/10-deploys-per-day-dev-and-ops-cooperation-at-flickr
  24. 24. Being productive problem!!! argggh! fixed figuring it fixing things feeling move out guilty on with life time http://www.slideshare.net/jallspaw/10-deploys-per-day-dev-and-ops-cooperation-at-flickr
  25. 25. This will be on the test: FAILURE HAPPENS!
  26. 26. Good Book!
  27. 27. Catastrophic Potential Simple Complexity Complex Tight Coupling Loose Created by Jesse Robbins "Catastrophic Potential" adapted from Normal Accidents by Charles Perrow 12
  28. 28. Catastrophic Potential Simple Complexity Complex Tight KEEP OUT!!! Coupling Loose Created by Jesse Robbins "Catastrophic Potential" adapted from Normal Accidents by Charles Perrow 12
  29. 29. define: Nines (roughly)
  30. 30. define: Nines (roughly) 99% 5256 min (3.5 days)
  31. 31. define: Nines (roughly) 99% 5256 min (3.5 days) 99.9% 528 min ( 8.8 hours )
  32. 32. define: Nines (roughly) 99% 5256 min (3.5 days) 99.9% 528 min ( 8.8 hours ) 99.99% 53 min
  33. 33. define: Nines (roughly) 99% 5256 min (3.5 days) 99.9% 528 min ( 8.8 hours ) 99.99% 53 min 99.999% 5 min
  34. 34. define: Nines (roughly) 99% 5256 min (3.5 days) 99.9% 528 min ( 8.8 hours ) 99.99% 53 min 99.999% 5 min 99.9999% 30 Seconds
  35. 35. define: Nines (roughly) 99% 5256 min (3.5 days) 99.9% 528 min ( 8.8 hours ) 99.99% 53 min 99.999% 5 min 99.9999% 30 Seconds 99.99999% 3 Seconds
  36. 36. 99.9% * 99.9% * 99.9% = 99.7% 14
  37. 37. Internet Routing... won’t.
  38. 38. ;''-1(<"=/-)"3.1>0?-'"@'-': !"#$$%"&'(')*)"+,-.,-/01,( +/.01210*"345467"89: #
  39. 39. http://radar.oreilly.com/2008/10/sprint-blocking-cogent-network.html
  40. 40. #googlefail
  41. 41. YOU Copyright © 2010 Opscode, Inc - All Rights Reserved 21
  42. 42. Continuous Power... isn’t
  43. 43. 365 Main SF
  44. 44. 365 364.96 Main SF
  45. 45. http://radar.oreilly.com/2007/07/failure-happens-a-summary-of-t.html
  46. 46. http://radar.oreilly.com/2007/07/failure-happens-a-summary-of-t.html
  47. 47. Failure happens A single datacenter is the problem • Since they all fail at some point Recovery procedures after failure • Power was gone ~45 minutes • Most services took hours to come back • Some unnamed ones more than 12 hours
  48. 48. Geography is a Single Point of Failure
  49. 49. Copyright © 2010 Opscode, Inc - All Rights Reserved 30
  50. 50. Providers are baskets too.
  51. 51. Copyright © 2010 Opscode, Inc - All Rights Reserved 32
  52. 52. Failure Happens. Anyone promising otherwise is either foolish or lying (or both).
  53. 53. OODA Observe, Orient, Decide, Act 34
  54. 54. OODA: Observe, Orient, Decide, Act http://en.wikipedia.org/wiki/OODA_loop 35
  55. 55. http://www.slideshare.net/jallspaw/10-deploys-per-day-dev-and-ops-cooperation-at-flickr http://www.flickr.com/photos/dnorman/2678090600
  56. 56. Speaker: Jesse Robbins CEO ‣ jesse@opscode.com ‣ @jesserobbins ‣ www.opscode.com 37
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×