Operations Culture
Speaker:

Jesse Robbins CEO
  ‣ jesse@opscode.com
  ‣ @jesserobbins
  ‣ www.opscode.com
                        1
Today




        2
Today

‣ Operations is Culture




                          2
Today

‣ Operations is Culture

‣ Failure Happens




                          2
Today

‣ Operations is Culture

‣ Failure Happens

‣ The OODA Loop




                          2
Today

‣ Operations is Culture

‣ Failure Happens

‣ The OODA Loop

‣ Do Fire Drills



                          2
Operations is Culture



                        3
“You don’t choose the moment,
   the moment chooses you.

  You only get to choose how
prepared you are when it does.”
	 	 	 	 	 	 	 	 	 	   -Fire Chief Mike Burtch




                                                4
Cloud Operations
 is the ability to consistently create
 and deploy reliable software to an
   unreliable platform that scales
              horizontally.


http://radar.oreilly.com/2007/10/operations-is-a-competitive-ad.html
                                                                       5
“It’s not my code, it’s your machines!




http://www.slideshare.net/jallspaw/10-deploys-per-day-dev-and-ops-cooperation-at-flickr   6
“It’s not my code, it’s your machines!




http://www.slideshare.net/jallspaw/10-deploys-per-day-dev-and-ops-cooperation-at-flickr   6
“It’s not my code, it’s your machines!



                            Spock Scotty
                   Little bit weird                   Pulls levers & turns knobs
          Sits closer to the boss                     Easily excited
                 Thinks too hard                      Yells a lot in emergencies




http://www.slideshare.net/jallspaw/10-deploys-per-day-dev-and-ops-cooperation-at-flickr   6
No ngerpointing




http://www.slideshare.net/jallspaw/10-deploys-per-day-dev-and-ops-cooperation-at-flickr
http://www. ickr.com/photos/rocketjim54/2955889085/ Reserved
                                           Copyright © 2010 Opscode, Inc - All Rights    7
Fingerpointyness

        problem!!!
         argggh!




                                                                                         time




http://www.slideshare.net/jallspaw/10-deploys-per-day-dev-and-ops-cooperation-at-flickr
Fingerpointyness

        problem!!!
         argggh!



                 freaking out,
                  not talking,
                  finding fault



                                                                                         time




http://www.slideshare.net/jallspaw/10-deploys-per-day-dev-and-ops-cooperation-at-flickr
Fingerpointyness

        problem!!!
         argggh!



                 freaking out, blaming,
                  not talking, covering
                  finding fault    ass



                                                                                         time




http://www.slideshare.net/jallspaw/10-deploys-per-day-dev-and-ops-cooperation-at-flickr
Fingerpointyness

        problem!!!
         argggh!



                 freaking out, blaming,
                  not talking, covering          whining,
                  finding fault    ass            hiding.
                                                hurt egos


                                                                                         time




http://www.slideshare.net/jallspaw/10-deploys-per-day-dev-and-ops-cooperation-at-flickr
Fingerpointyness

        problem!!!
         argggh!



                 freaking out, blaming,                         figuring it
                  not talking, covering          whining,          out
                  finding fault    ass            hiding.
                                                hurt egos


                                                                                         time




http://www.slideshare.net/jallspaw/10-deploys-per-day-dev-and-ops-cooperation-at-flickr
Fingerpointyness

        problem!!!
         argggh!                                                                      fixed


                 freaking out, blaming,                         figuring it




                                                                                fixing things
                  not talking, covering          whining,          out
                  finding fault    ass            hiding.
                                                hurt egos


                                                                                               time




http://www.slideshare.net/jallspaw/10-deploys-per-day-dev-and-ops-cooperation-at-flickr
Being productive

        problem!!!
         argggh!




                                                                                         time




http://www.slideshare.net/jallspaw/10-deploys-per-day-dev-and-ops-cooperation-at-flickr
Being productive

        problem!!!
         argggh!



                  figuring it
                     out




                                                                                         time




http://www.slideshare.net/jallspaw/10-deploys-per-day-dev-and-ops-cooperation-at-flickr
Being productive

        problem!!!
         argggh!                        fixed


                  figuring it      fixing things
                     out




                                                                                         time




http://www.slideshare.net/jallspaw/10-deploys-per-day-dev-and-ops-cooperation-at-flickr
Being productive

        problem!!!
         argggh!                        fixed


                  figuring it      fixing things   feeling
                     out                          guilty




                                                                                         time




http://www.slideshare.net/jallspaw/10-deploys-per-day-dev-and-ops-cooperation-at-flickr
Being productive

        problem!!!
         argggh!                        fixed


                  figuring it      fixing things   feeling move
                     out                          guilty on with
                                                           life



                                                                                         time




http://www.slideshare.net/jallspaw/10-deploys-per-day-dev-and-ops-cooperation-at-flickr
This will be on the test:
  FAILURE HAPPENS!
Good
Book!
Catastrophic Potential
           Simple             Complexity                               Complex


   Tight
Coupling
 Loose




                                             Created by Jesse Robbins
              "Catastrophic Potential" adapted from Normal Accidents by Charles Perrow
                                                                                         12
Catastrophic Potential
           Simple             Complexity                               Complex


   Tight                                              KEEP
                                                      OUT!!!
Coupling
 Loose




                                             Created by Jesse Robbins
              "Catastrophic Potential" adapted from Normal Accidents by Charles Perrow
                                                                                         12
define:
 Nines (roughly)
define:
 Nines (roughly)
   99%	 5256 min (3.5 days)
define:
 Nines (roughly)
   99%	 5256 min (3.5 days)
   99.9%	 528 min ( 8.8 hours )
define:
 Nines (roughly)
   99%	 5256 min (3.5 days)
   99.9%	 528 min ( 8.8 hours )
   99.99% 53 min
define:
 Nines (roughly)
   99%	 5256 min (3.5 days)
   99.9%	 528 min ( 8.8 hours )
   99.99% 53 min
   99.999% 5 min
define:
 Nines (roughly)
   99%	 5256 min (3.5 days)
   99.9%	 528 min ( 8.8 hours )
   99.99% 53 min
   99.999% 5 min
   99.9999% 30 Seconds
define:
 Nines (roughly)
   99%	 5256 min (3.5 days)
   99.9%	 528 min ( 8.8 hours )
   99.99% 53 min
   99.999% 5 min
   99.9999% 30 Seconds
   99.99999% 3 Seconds
99.9% *
99.9% *
99.9%
   =
99.7%
          14
Internet Routing... won’t.
;''-1(<"=/-)"3.1>0?-'"@'-':




!"#$$%"&'(')*)"+,-.,-/01,(   +/.01210*"345467"89:   #
http://radar.oreilly.com/2008/10/sprint-blocking-cogent-network.html
#googlefail
YOU




Copyright © 2010 Opscode, Inc - All Rights Reserved   21
Continuous Power...
       isn’t
365 Main SF
365 364.96 Main SF
http://radar.oreilly.com/2007/07/failure-happens-a-summary-of-t.html
http://radar.oreilly.com/2007/07/failure-happens-a-summary-of-t.html
Failure happens

 A single datacenter is the
 problem
 • Since they all fail at some point

 Recovery procedures after
 failure
 • Power was gone ~45 minutes
 • Most services took hours to come back
 • Some unnamed ones more than 12 hours
Geography is a
Single Point of Failure
Copyright © 2010 Opscode, Inc - All Rights Reserved   30
Providers are
baskets too.
Copyright © 2010 Opscode, Inc - All Rights Reserved   32
Failure Happens.
Anyone promising otherwise
 is either foolish or lying
          (or both).
OODA
Observe, Orient, Decide, Act



                               34
OODA: Observe, Orient, Decide, Act




             http://en.wikipedia.org/wiki/OODA_loop




                                                      35
http://www.slideshare.net/jallspaw/10-deploys-per-day-dev-and-ops-cooperation-at-flickr
http://www.flickr.com/photos/dnorman/2678090600
Speaker:

Jesse Robbins CEO
  ‣ jesse@opscode.com
  ‣ @jesserobbins
  ‣ www.opscode.com
                        37

Cloud Operations Bootcamp: Culture - Jesse Robbins