“The DevOps Cookbook”


         Gene Kim
    IT Revolution Press
  DevOps/Kanban Meetup
      March 22, 2012



                   @RealGeneKim, genek@realgenekim.me
The Downward Spiral
Operations Sees…                              Dev Sees…
• Fragile applications are prone to failure   • More urgent, date-driven projects
                                                put into the queue
• Long time required to figure out “which
  bit got flipped”                            • Even more fragile code (less secure)
                                                put into production
• Detective control is a salesperson
                                              • More releases have increasingly
• Too much time required to restore service     “turbulent installs”
• Too much firefighting and unplanned work    • Release cycles lengthen to amortize
• Urgent security rework and remedation         “cost of deployments”
• Planned project work cannot complete        • Failing bigger deployments more
                                                difficult to diagnose
• Frustrated customers leave
                                              • Most senior and constrained IT ops
• Market share goes down                        resources have less time to fix
                                                underlying process problems
• Business misses Wall Street commitments
                                              • Ever increasing backlog of work that
• Business makes even larger promises to        cold help the business win
  Wall Street
                                              • Ever increasing amount of tension
                                                between IT Ops, Development,
                                                Design…


                       These aren’t IT or Design problems…
                          These are business problems!
                                                    @RealGeneKim, genek@realgenekim.me
My Mission: Figure Out How Break The
          IT Core Chronic Conflict
    • Every IT organization is pressured to
      simultaneously:
      – Respond more quickly to urgent business needs
      – Provide stable, secure and predictable IT service

                       Words often used to describe process improvement:
      “hysterical, irrelevant, bureaucratic, bottleneck, difficult to understand, not aligned
        with the business, immature, shrill, perpetually focused on irrelevant technical
                                            minutiae…”



           Source: The authors acknowledge Dr. Eliyahu Goldratt, creator of the Theory of Constraints and author of The Goal, has written
           extensively on the theory and practice of identifying and resolving core, chronic conflicts.
3
                                                                                           @RealGeneKim, genek@realgenekim.me
Good News: It Can Be Done

Bad News: You Can’t Do It Alone




                    @RealGeneKim, genek@realgenekim.me
Ops




      @RealGeneKim, genek@realgenekim.me
QA And Test




Source: Flickr: vandyll            @RealGeneKim, genek@realgenekim.me
Development




         @RealGeneKim, genek@realgenekim.me
Infosec




          @RealGeneKim, genek@realgenekim.me
Product Management And Design




Source: Flickr: birdsandanchors   @RealGeneKim, genek@realgenekim.me
DevOps:
The Shining Beacon Of Hope




                 @RealGeneKim, genek@realgenekim.me
Source: John Allspaw
                       @RealGeneKim, genek@realgenekim.me
Source: John Allspaw
                       @RealGeneKim, genek@realgenekim.me
Source: John Allspaw
                       @RealGeneKim, genek@realgenekim.me
Source: Theo Schlossnagle
                            @RealGeneKim, genek@realgenekim.me
Source: Theo Schlossnagle
                            @RealGeneKim, genek@realgenekim.me
Source: Theo Schlossnagle
                            @RealGeneKim, genek@realgenekim.me
Source: John Jenkins, Amazon.com   @RealGeneKim, genek@realgenekim.me
Source: James Wickett

                        @RealGeneKim, genek@realgenekim.me
Source: James Wickett

                        @RealGeneKim, genek@realgenekim.me
The Prescriptive DevOps Cookbook

                • “DevOps Cookbook” Authors
                   – Patrick DeBois, Mike Orzen, John
                     Willis, Gene Kim


                • Goals
                   – Codify how to start and finish
                     DevOps transformations
                   – How does Development, IT
                     Operations and Infosec become
                     dependable partners
                   – Describe in detail how to replicate
                     the transformations describe in
                     “When IT Fails: The Novel”




                          @RealGeneKim, genek@realgenekim.me
Philosophies And Outcomes:
      The Three Ways




                 @RealGeneKim, genek@realgenekim.me
@RealGeneKim, genek@realgenekim.me
The First Way:
Systems Thinking




            @RealGeneKim, genek@realgenekim.me
The First Way:
    Systems Thinking (Left To Right)
• Never pass defects to downstream work
  centers
• Never allow local optimization to create global
  degradation
• Increase flow: elevate bottlenecks, reduce
  WIP, throttle release of work, reduce batch
  sizes



                               @RealGeneKim, genek@realgenekim.me
The First Way:
                     Outcomes
• Determinism in the release process
• Continuation of the Agile and CI/CR processes
• Creating single repository for code and environments
• Packaging responsibility moves to development
• Consistent Dev, QA, Int, and Staging environments, all
  properly built before deployment begins
• Decrease cycle time
   – Reduce deployment times from 6 hours to 45 minutes
   – Refactor deployment process that had 1300+ steps
     spanning 4 weeks
• Faster release cadence


                                      @RealGeneKim, genek@realgenekim.me
The Second Way:
Amplify Feedback Loops




               @RealGeneKim, genek@realgenekim.me
The Second Way:
Amplify Feedback Loops (Right to Left)
• Protect the integrity of the entire system of
  work, versus completion of tasks
• Expose visual data so everyone can see how
  their decisions affect the entire system




                               @RealGeneKim, genek@realgenekim.me
The Second Way:
                   Outcomes
•   Andon cords that stop the production line
•   Kanban to control work
•   Project freeze to reduce work in process
•   Eradicating “quick fixes” that circumvent the
    process
•   Ops user stories are part of the Agile planning
    process
•   Better build and deployment systems
•   More stable environment
•   Happier and more productive staff

                                   @RealGeneKim, genek@realgenekim.me
The Third Way:
Culture Of Continual Experimentation
            And Learning




                       @RealGeneKim, genek@realgenekim.me
The Third Way:
 Culture Of Continual Experimentation
             And Learning
• Foster a culture that rewards:
   – Experimentation (taking risks) and learning from failure
   – Repetition is the prerequisite to mastery
• Why?
   – You need a culture that keeps pushing into the danger
     zone
   – And have the habits that enable you to survive in the
     danger zone


                                         @RealGeneKim, genek@realgenekim.me
The Third Way:
                      Outcomes
•   15 minutes/daily spent on improving daily work
•   Continual reduction of unplanned work
•   More cycles for planned work
•   Projects completed to pay down technical debt and increase
    flow
•   Elimination of needless complexity
•   More resilient code and environments
•   Balancing nimbleness and practiced repetition
•   Enabling wider range of risk/reward balance



                                         @RealGeneKim, genek@realgenekim.me
Some Prescriptive Steps




                @RealGeneKim, genek@realgenekim.me
Phase 1: Extend the Agile CI/CR
               Processes
• Assign Ops person into Dev team
• Create one-step Dev, Test and Production
  environment creation procedure
• Create the one-step automated code
  deployment procedure
• Define roles of Dev, QA, Prod Mgmt and
  Infosec


                              @RealGeneKim, genek@realgenekim.me
Phase 2: Extend Release Process And
 Create Right -> Left Feedback Loops
• Embed Dev into Ops escalation process
• Invite Dev to post-mortems/root cause analysis
  meeting
• Create necessary rollback procedures (instead of
  fixing forward)
• Create application monitoring/metrics to aid in
  Ops work (e.g., incident/problem management)
• Actively manage flow of work across org
  boundaries


                                @RealGeneKim, genek@realgenekim.me
Phase 3: Organize Dev and Ops To
     Achieve Organizational Goals
• Allocate 20% of Dev cycles to non-functional
  requirements
• Build Ops user stories and environments in
  Dev that can be reused across all projects
  (e.g., deployment, capacity, security)
• Integrate fault injection and resilience into
  design, development and production (e.g.,
  Chaos Monkey)
• Prioritize backlog to manage technical debt

                               @RealGeneKim, genek@realgenekim.me
Phase 4: Reflection, Introspection,
     Continually Improvement?
• Create improvement cycles (2 week: improve
  something)
• Create rituals to reward taking risks and
  learning from failure: Ensure proper balance
  of risk and reward
• Find bottlenecks and increase capacity when
  needed
• Reflection: given where the Organization
  needs to go, where do we need to be going

                              @RealGeneKim, genek@realgenekim.me
When IT Fails: The Novel and The
                 DevOps Cookbook

                              • Coming in July 2012

                              • “In the tradition of the best MBA case studies, this
                                book should be mandatory reading for business and
                                IT graduates alike.” -Paul Muller, VP Software
                                Marketing, Hewlett-Packard

                              • “The greatest IT management book of our
Gene Kim, Tripwire founder,     generation.” –Branden Williams, CTO Marketing, RSA
Visible Ops co-author




                                                       @RealGeneKim, genek@realgenekim.me
When IT Fails: The Novel and The
                 DevOps Cookbook

                              • Coming in July 2012

                              • If you would like the “Top 10 Things You
                                Need To Know About DevOps,” sample
                                chapters and updates on the book:

                                Sign up at http://itrevolution.com
Gene Kim, Tripwire founder,
Visible Ops co-author
                                Email genek@realgenekim.me
                                Give me your business card




                                                   @RealGeneKim, genek@realgenekim.me
To Join The Movement

• If you would like the “Top 10 Things You Need
  To Know About DevOps,” sample chapters and
  updates on the book:

  Sign up at http://itrevolution.com
  Email genek@realgenekim.me
  Give me your business card



                               @RealGeneKim, genek@realgenekim.me
@RealGeneKim, genek@realgenekim.me
Other Resources

• From the IT Process Institute www.itpi.org
   – Both Visible Ops Handbooks
   – ITPI IT Controls Performance Study

• Rugged Software by Corman, et al:
  http://ruggedsoftware.org
• “Continuous Delivery: Reliable Software
  Releases through Build, Test, and
  Deployment Automation” by
  Humble, Farley
• Follow us…
   – @JoshCorman, @RealGeneKim
   – mailto:genek@realgenekim.me
   – http://realgenekim.me/blog


                                            @RealGeneKim, genek@realgenekim.me
@RealGeneKim, genek@realgenekim.me
Meeting The DevOps Leadership Team

• Typically led by Dev, QA, IT Operations and
  Product Management/Design
• Our ultimate goal is to add value at every step
  in the flow of work
  – See the end-to-end value flow
  – Shorten and amplify feedback loops
  – Help break silos (e.g., server, networking,
    database)


                                   @RealGeneKim, genek@realgenekim.me
Definition: Agile Sprints

• The basic unit of development in Agile
  Scrums, typically between one week and one
  month
• At the end of each sprint, team should have
  potentially deliverable product




         Aha Moment: shipping product implies not just code –
                                                                         46
                     it’s the environment, too!
                                             @RealGeneKim, genek@realgenekim.me
Help Dev And Ops Build Code And
            Environments
• Dev and Ops work together in Sprint 0 and 1
  to create code and environments
  – Create environment that Dev deploys into
  – Create downstream environments: QA, Staging,
    Production
  – Create testable migration procedures from Dev all
    the way to production
• Integrate Infosec and QA into daily sprint
  activities

                                 @RealGeneKim, genek@realgenekim.me
Definition: Andon Cord




                                          48
               @RealGeneKim, genek@realgenekim.me
Integrate Ops Into Dev

• Embed Ops person into Dev structure
  – Describes non-functional requirements, use cases
    and stories from Ops
  – Responsible for improving “quality at the source”
    (e.g., reducing technical debt, fix known
    problems, etc.)
  – Has special responsibility for pulling the Andon
    cord
     • No ability to restart service without rebooting
     • Configuration settings impossible to find


                                        @RealGeneKim, genek@realgenekim.me
Integrate Dev Into Ops

• MobBrowser case study: “Waking up
  developers at 3am is a great feedback loop:
  defects get fixed very quickly”

• Goal is to get Dev closer to the customer
  – Infosec can help determine when it’s too close
    (and when SOD is a requirement)




                                  @RealGeneKim, genek@realgenekim.me
Keep Shrinking Batch Sizes

• Waterfall projects often have cycle time of one
  year
• Sprints have cycle time of 1 or 2 weeks
• When IT Operations work is sufficiently fast
  and capable, we may decide to decouple
  deployments from sprint boundaries (e.g.,
  Kanbans)



                               @RealGeneKim, genek@realgenekim.me
Definition: Kanban Board

• Signaling tool to reduce WIP and increase flow




                                                         52
                              @RealGeneKim, genek@realgenekim.me
IT Operations Increases Process Rigor

• Standardize deployment
• Standardize how unplanned work is
  prosecuted: make it repeatable
• Modify first response: ensure constrained
  resources have all data at hand to diagnose
• Elevate preventive activities to reduce
  incidents


                              @RealGeneKim, genek@realgenekim.me
Letter to Development

• Seek the downstream effects of your actions
  – Unplanned work comes at the expense of planned
    work
  – Technical debt retards feature throughput
  – Environment matters as much as the code
• Allocate time for fault modeling, asking “what
  could go wrong?” and implementing
  countermeasures


                               @RealGeneKim, genek@realgenekim.me
Letter To QA

• Ensure test plans cover not only code
  functionality, but also:
  – Suitability of the environment the code runs in
  – The end-to-end deployment process
• Help find variance…
  – Functionality, performance, configuration
  – Duration, wait time and handoff errors, rework, …



                                  @RealGeneKim, genek@realgenekim.me
Letter To IT Operations
               •   “The best way to avoid failure
                   is to fail constantly”
               •   Harden the production
                   environment
               •   Have scheduled drills to “crash
                   the data center”
               •   Create your “chaos monkeys”
                   to introduce faults into the
                   system (e.g., randomly kill
                   processes, take out
                   servers, etc.)
               •   Rehearse and improve
                   responding to unplanned work
                    – NetFlix: Hardened AWS service

                    – StackOverflow
                    – Amazon firedrills (Jesse
                      Allspaw)
                    – The Monkey (Mac)



                   @RealGeneKim, genek@realgenekim.me
You Don’t Choose Chaos Monkey…
   Chaos Monkey Chooses You




                   @RealGeneKim, genek@realgenekim.me
Letter To Product Management




Lesson: Allocate 20% of Dev cycles to paying down technical
                           debt
                                     @RealGeneKim, genek@realgenekim.me
To Designers

• Help IT Operations codify their work and
  requirements into great and ever increasing
  library of user stories
• Realize that IT processes are likely the largest
  impediment preventing your great ideas from
  making it to market




                                @RealGeneKim, genek@realgenekim.me
Source: James Wickett

                        @RealGeneKim, genek@realgenekim.me

DevOps Kanban Meet Up 3/22/12

  • 1.
    “The DevOps Cookbook” Gene Kim IT Revolution Press DevOps/Kanban Meetup March 22, 2012 @RealGeneKim, genek@realgenekim.me
  • 2.
    The Downward Spiral OperationsSees… Dev Sees… • Fragile applications are prone to failure • More urgent, date-driven projects put into the queue • Long time required to figure out “which bit got flipped” • Even more fragile code (less secure) put into production • Detective control is a salesperson • More releases have increasingly • Too much time required to restore service “turbulent installs” • Too much firefighting and unplanned work • Release cycles lengthen to amortize • Urgent security rework and remedation “cost of deployments” • Planned project work cannot complete • Failing bigger deployments more difficult to diagnose • Frustrated customers leave • Most senior and constrained IT ops • Market share goes down resources have less time to fix underlying process problems • Business misses Wall Street commitments • Ever increasing backlog of work that • Business makes even larger promises to cold help the business win Wall Street • Ever increasing amount of tension between IT Ops, Development, Design… These aren’t IT or Design problems… These are business problems! @RealGeneKim, genek@realgenekim.me
  • 3.
    My Mission: FigureOut How Break The IT Core Chronic Conflict • Every IT organization is pressured to simultaneously: – Respond more quickly to urgent business needs – Provide stable, secure and predictable IT service Words often used to describe process improvement: “hysterical, irrelevant, bureaucratic, bottleneck, difficult to understand, not aligned with the business, immature, shrill, perpetually focused on irrelevant technical minutiae…” Source: The authors acknowledge Dr. Eliyahu Goldratt, creator of the Theory of Constraints and author of The Goal, has written extensively on the theory and practice of identifying and resolving core, chronic conflicts. 3 @RealGeneKim, genek@realgenekim.me
  • 4.
    Good News: ItCan Be Done Bad News: You Can’t Do It Alone @RealGeneKim, genek@realgenekim.me
  • 5.
    Ops @RealGeneKim, genek@realgenekim.me
  • 6.
    QA And Test Source:Flickr: vandyll @RealGeneKim, genek@realgenekim.me
  • 7.
    Development @RealGeneKim, genek@realgenekim.me
  • 8.
    Infosec @RealGeneKim, genek@realgenekim.me
  • 9.
    Product Management AndDesign Source: Flickr: birdsandanchors @RealGeneKim, genek@realgenekim.me
  • 10.
    DevOps: The Shining BeaconOf Hope @RealGeneKim, genek@realgenekim.me
  • 11.
    Source: John Allspaw @RealGeneKim, genek@realgenekim.me
  • 12.
    Source: John Allspaw @RealGeneKim, genek@realgenekim.me
  • 13.
    Source: John Allspaw @RealGeneKim, genek@realgenekim.me
  • 14.
    Source: Theo Schlossnagle @RealGeneKim, genek@realgenekim.me
  • 15.
    Source: Theo Schlossnagle @RealGeneKim, genek@realgenekim.me
  • 16.
    Source: Theo Schlossnagle @RealGeneKim, genek@realgenekim.me
  • 17.
    Source: John Jenkins,Amazon.com @RealGeneKim, genek@realgenekim.me
  • 18.
    Source: James Wickett @RealGeneKim, genek@realgenekim.me
  • 19.
    Source: James Wickett @RealGeneKim, genek@realgenekim.me
  • 20.
    The Prescriptive DevOpsCookbook • “DevOps Cookbook” Authors – Patrick DeBois, Mike Orzen, John Willis, Gene Kim • Goals – Codify how to start and finish DevOps transformations – How does Development, IT Operations and Infosec become dependable partners – Describe in detail how to replicate the transformations describe in “When IT Fails: The Novel” @RealGeneKim, genek@realgenekim.me
  • 21.
    Philosophies And Outcomes: The Three Ways @RealGeneKim, genek@realgenekim.me
  • 22.
  • 23.
    The First Way: SystemsThinking @RealGeneKim, genek@realgenekim.me
  • 24.
    The First Way: Systems Thinking (Left To Right) • Never pass defects to downstream work centers • Never allow local optimization to create global degradation • Increase flow: elevate bottlenecks, reduce WIP, throttle release of work, reduce batch sizes @RealGeneKim, genek@realgenekim.me
  • 25.
    The First Way: Outcomes • Determinism in the release process • Continuation of the Agile and CI/CR processes • Creating single repository for code and environments • Packaging responsibility moves to development • Consistent Dev, QA, Int, and Staging environments, all properly built before deployment begins • Decrease cycle time – Reduce deployment times from 6 hours to 45 minutes – Refactor deployment process that had 1300+ steps spanning 4 weeks • Faster release cadence @RealGeneKim, genek@realgenekim.me
  • 26.
    The Second Way: AmplifyFeedback Loops @RealGeneKim, genek@realgenekim.me
  • 27.
    The Second Way: AmplifyFeedback Loops (Right to Left) • Protect the integrity of the entire system of work, versus completion of tasks • Expose visual data so everyone can see how their decisions affect the entire system @RealGeneKim, genek@realgenekim.me
  • 28.
    The Second Way: Outcomes • Andon cords that stop the production line • Kanban to control work • Project freeze to reduce work in process • Eradicating “quick fixes” that circumvent the process • Ops user stories are part of the Agile planning process • Better build and deployment systems • More stable environment • Happier and more productive staff @RealGeneKim, genek@realgenekim.me
  • 29.
    The Third Way: CultureOf Continual Experimentation And Learning @RealGeneKim, genek@realgenekim.me
  • 30.
    The Third Way: Culture Of Continual Experimentation And Learning • Foster a culture that rewards: – Experimentation (taking risks) and learning from failure – Repetition is the prerequisite to mastery • Why? – You need a culture that keeps pushing into the danger zone – And have the habits that enable you to survive in the danger zone @RealGeneKim, genek@realgenekim.me
  • 31.
    The Third Way: Outcomes • 15 minutes/daily spent on improving daily work • Continual reduction of unplanned work • More cycles for planned work • Projects completed to pay down technical debt and increase flow • Elimination of needless complexity • More resilient code and environments • Balancing nimbleness and practiced repetition • Enabling wider range of risk/reward balance @RealGeneKim, genek@realgenekim.me
  • 32.
    Some Prescriptive Steps @RealGeneKim, genek@realgenekim.me
  • 33.
    Phase 1: Extendthe Agile CI/CR Processes • Assign Ops person into Dev team • Create one-step Dev, Test and Production environment creation procedure • Create the one-step automated code deployment procedure • Define roles of Dev, QA, Prod Mgmt and Infosec @RealGeneKim, genek@realgenekim.me
  • 34.
    Phase 2: ExtendRelease Process And Create Right -> Left Feedback Loops • Embed Dev into Ops escalation process • Invite Dev to post-mortems/root cause analysis meeting • Create necessary rollback procedures (instead of fixing forward) • Create application monitoring/metrics to aid in Ops work (e.g., incident/problem management) • Actively manage flow of work across org boundaries @RealGeneKim, genek@realgenekim.me
  • 35.
    Phase 3: OrganizeDev and Ops To Achieve Organizational Goals • Allocate 20% of Dev cycles to non-functional requirements • Build Ops user stories and environments in Dev that can be reused across all projects (e.g., deployment, capacity, security) • Integrate fault injection and resilience into design, development and production (e.g., Chaos Monkey) • Prioritize backlog to manage technical debt @RealGeneKim, genek@realgenekim.me
  • 36.
    Phase 4: Reflection,Introspection, Continually Improvement? • Create improvement cycles (2 week: improve something) • Create rituals to reward taking risks and learning from failure: Ensure proper balance of risk and reward • Find bottlenecks and increase capacity when needed • Reflection: given where the Organization needs to go, where do we need to be going @RealGeneKim, genek@realgenekim.me
  • 37.
    When IT Fails:The Novel and The DevOps Cookbook • Coming in July 2012 • “In the tradition of the best MBA case studies, this book should be mandatory reading for business and IT graduates alike.” -Paul Muller, VP Software Marketing, Hewlett-Packard • “The greatest IT management book of our Gene Kim, Tripwire founder, generation.” –Branden Williams, CTO Marketing, RSA Visible Ops co-author @RealGeneKim, genek@realgenekim.me
  • 38.
    When IT Fails:The Novel and The DevOps Cookbook • Coming in July 2012 • If you would like the “Top 10 Things You Need To Know About DevOps,” sample chapters and updates on the book: Sign up at http://itrevolution.com Gene Kim, Tripwire founder, Visible Ops co-author Email genek@realgenekim.me Give me your business card @RealGeneKim, genek@realgenekim.me
  • 39.
    To Join TheMovement • If you would like the “Top 10 Things You Need To Know About DevOps,” sample chapters and updates on the book: Sign up at http://itrevolution.com Email genek@realgenekim.me Give me your business card @RealGeneKim, genek@realgenekim.me
  • 40.
  • 41.
    Other Resources • Fromthe IT Process Institute www.itpi.org – Both Visible Ops Handbooks – ITPI IT Controls Performance Study • Rugged Software by Corman, et al: http://ruggedsoftware.org • “Continuous Delivery: Reliable Software Releases through Build, Test, and Deployment Automation” by Humble, Farley • Follow us… – @JoshCorman, @RealGeneKim – mailto:genek@realgenekim.me – http://realgenekim.me/blog @RealGeneKim, genek@realgenekim.me
  • 42.
  • 43.
    Meeting The DevOpsLeadership Team • Typically led by Dev, QA, IT Operations and Product Management/Design • Our ultimate goal is to add value at every step in the flow of work – See the end-to-end value flow – Shorten and amplify feedback loops – Help break silos (e.g., server, networking, database) @RealGeneKim, genek@realgenekim.me
  • 44.
    Definition: Agile Sprints •The basic unit of development in Agile Scrums, typically between one week and one month • At the end of each sprint, team should have potentially deliverable product Aha Moment: shipping product implies not just code – 46 it’s the environment, too! @RealGeneKim, genek@realgenekim.me
  • 45.
    Help Dev AndOps Build Code And Environments • Dev and Ops work together in Sprint 0 and 1 to create code and environments – Create environment that Dev deploys into – Create downstream environments: QA, Staging, Production – Create testable migration procedures from Dev all the way to production • Integrate Infosec and QA into daily sprint activities @RealGeneKim, genek@realgenekim.me
  • 46.
    Definition: Andon Cord 48 @RealGeneKim, genek@realgenekim.me
  • 47.
    Integrate Ops IntoDev • Embed Ops person into Dev structure – Describes non-functional requirements, use cases and stories from Ops – Responsible for improving “quality at the source” (e.g., reducing technical debt, fix known problems, etc.) – Has special responsibility for pulling the Andon cord • No ability to restart service without rebooting • Configuration settings impossible to find @RealGeneKim, genek@realgenekim.me
  • 48.
    Integrate Dev IntoOps • MobBrowser case study: “Waking up developers at 3am is a great feedback loop: defects get fixed very quickly” • Goal is to get Dev closer to the customer – Infosec can help determine when it’s too close (and when SOD is a requirement) @RealGeneKim, genek@realgenekim.me
  • 49.
    Keep Shrinking BatchSizes • Waterfall projects often have cycle time of one year • Sprints have cycle time of 1 or 2 weeks • When IT Operations work is sufficiently fast and capable, we may decide to decouple deployments from sprint boundaries (e.g., Kanbans) @RealGeneKim, genek@realgenekim.me
  • 50.
    Definition: Kanban Board •Signaling tool to reduce WIP and increase flow 52 @RealGeneKim, genek@realgenekim.me
  • 51.
    IT Operations IncreasesProcess Rigor • Standardize deployment • Standardize how unplanned work is prosecuted: make it repeatable • Modify first response: ensure constrained resources have all data at hand to diagnose • Elevate preventive activities to reduce incidents @RealGeneKim, genek@realgenekim.me
  • 52.
    Letter to Development •Seek the downstream effects of your actions – Unplanned work comes at the expense of planned work – Technical debt retards feature throughput – Environment matters as much as the code • Allocate time for fault modeling, asking “what could go wrong?” and implementing countermeasures @RealGeneKim, genek@realgenekim.me
  • 53.
    Letter To QA •Ensure test plans cover not only code functionality, but also: – Suitability of the environment the code runs in – The end-to-end deployment process • Help find variance… – Functionality, performance, configuration – Duration, wait time and handoff errors, rework, … @RealGeneKim, genek@realgenekim.me
  • 54.
    Letter To ITOperations • “The best way to avoid failure is to fail constantly” • Harden the production environment • Have scheduled drills to “crash the data center” • Create your “chaos monkeys” to introduce faults into the system (e.g., randomly kill processes, take out servers, etc.) • Rehearse and improve responding to unplanned work – NetFlix: Hardened AWS service – StackOverflow – Amazon firedrills (Jesse Allspaw) – The Monkey (Mac) @RealGeneKim, genek@realgenekim.me
  • 55.
    You Don’t ChooseChaos Monkey… Chaos Monkey Chooses You @RealGeneKim, genek@realgenekim.me
  • 56.
    Letter To ProductManagement Lesson: Allocate 20% of Dev cycles to paying down technical debt @RealGeneKim, genek@realgenekim.me
  • 57.
    To Designers • HelpIT Operations codify their work and requirements into great and ever increasing library of user stories • Realize that IT processes are likely the largest impediment preventing your great ideas from making it to market @RealGeneKim, genek@realgenekim.me
  • 58.
    Source: James Wickett @RealGeneKim, genek@realgenekim.me

Editor's Notes

  • #3 How each side Actively impedes the achievement of each other’s goals.
  • #6 Who are they auditing? IT operations.I love IT operatoins. Why? Because when the developers screw up, the only people who can save the day are the IT operations people. Memory leak? No problem, we’ll do hourly reboots until you figure that out.Who here is from IT operations?Bad day:Not as prepared for the audit as they thoughtSpending 30% of their time scrambling, generating presentation for auditorsOr an outage, and the developer is adamant that they didn’t make the change – they’re saying, “it must be the security guys – they’re always causing outages”Or, there’s 50 systems behind the load balancer, and six systems are acting funny – what different, and who made them differentOr every server is like a snowflake, each having their own personalityWe as Tripwire practitioners can help them make sure changes are made visible, authorized, deployed completely and accurately, find differencesCreate and enforce a culture of change management and causality
  • #8 Who’s introducing variance? Well, it’s often these guys. Show me a developer who isn’t causing an outage, I’ll show you one who is on vacation.Primary measurement is deploy features quickly – get to market.I’ve worked with two of the five largest Internet companies (Google, Microsoft, Yahoo, AOL, Amazon), and I now believe that the biggest differentiator to great time to market is great operations:Bad day: We do 6 weeks of testing, but deployment still fails. Why? QA environment doesn’t match productionOr there’s a failure in testing, and no one can agree whether it’s a code failure or an environment failureOr changes are made in QA, but no one wrote them down, so they didn’t get replicated downstream in productionBelieve it or not, we as Tripwire practitioners can even help them – make sure environments are available when we need them, that they’re properly configured correctly the first time, document all the changes, replicate them downstream
  • #9 So who are all these constituencies that we can help, and increase our relevance as Tripwire practitioners and champions?How many people here are in infosec?Goal: protect critical systems and dataSafeguard organizational commitmentsPrevent security breaches, help quickly detect and recover from themBad day: no security standardsNo one is complyingYes, we’re 3 years behind. “Whaddyagonna do about it?”Vs. we (Tripwire owner) can become more relevant and add value by help infosec by leveraging all the configuration guidance out thereMeasure variance between produciton and those known good statesTrust and verify that when management says, we’ve trued up the configurations, they’ve actually done itWhy? Now, more than ever, there are an ever increasing amount of regulatory and contractual requirements to protect systems and data
  • #18 Tell story of Amazon, Netflix: they care about, availability, securityIt’s not a push, it’s a pull – they’re looking for our help (#1 concern: fear of disintermediation and being marginalized)
  • #21 [ text ] My personal goal is to prescriptively define 1) what does Dev need to do to become a reliable partner, 2) what does IT Operations need to do to become a realiable partner, and then 3) how do they work together to deliver unbelievable value to the business.Of course, the goal is more than happy coexistence. It’s to replicate the Etsy and LinkedIn stories:Increase the rate of features that we can put into production, while simultaneously maintaining the reliability, stability, security and survivability of the production environment.