FAIL FAST, FAIL OFTEN
Gordon Haff @ghaff, Technology Evangelist
William Henry @ipbabble, DevOps Strategy Lead
13 July 2016
FAILURE
2
3
FAILURE
4
FAILURE
ALSO FAILURE
5
FAILURES HAVE
CONSEQUENCES
6
THE
INESCAPABLE
CONCLUSION?
7
DON’T
FAIL
8
DON’T
FAIL
9
FAIL
WELL
10
11
Experiment by Peter Skillman, former VP of design at Palm
12
WHAT HE LEARNED
• Kindergarteners do not spend 15
minutes in a bunch of status
transactions trying to figure out who
is going to be CEO of Spaghetti
Corporation.
• They don’t sit around talking about
the problem. They just start
building to determine what works
and what doesn’t.
SOFTWARE =
GREAT MATCH FOR
FAILING WELL
13
14
FIVE PRINCIPLES:
THE RIGHT
scope
approach
workflow
incentives
culture
15
THE RIGHT SCOPE
Constrain the impact of failure
• Enable experimentation
• Stop cascading of failures
• Make deployments incremental,
frequent, and routine events
• Generally decouple activities and
decisions from each other
• Small, autonomous, bounded context
services
16
SMALL
• “Two pizza teams”
• Well-defined functional units
• Organized around business
capabilities (Conway's Law)
17
AUTONOMOUS
• Implementation changes can happen
independently of other services
• Data and functionality exposed only
through service calls over the
network
• Designed to be externalizable
• No back-doors
18
THE RIGHT APPROACH
Continuously experiment, iterate,
and improve
• It’s about the process
• Identify mistakes early
• Establish safety nets
• Fail and move on
19
THE PROCESS
Involves people and communication
• The most effective process have continuous
communication - think scrums and kanban
• Allows for collaboration that can identify
failures before they happen
• Allows for feedback to continuously improve
and cultivate growth
• Provides transparency
20
DEV LESSONS: BREAKING CODE VIOLENTLY
Build in violent failures to highlight issues
• C/C++ lessons:
• Sanity check using assertions
• Invariant checks
• If ever I’m here in the code and these
conditions aren’t met, then I have no
business being here. Something is
wrong and I should fail violently.
• Involves tracing through the failure
21
AUTOMATED REGRESSION TESTING
• As products and services evolve we
discovered that maintaining and incrementally
adding new tests became valuable
• These tests were/are most often based on
experienced failures and bugs
• Scripts were developed to run nightly builds
against various developer changes to test for
regression
• Testing tools evolved - proprietary and open
source
22
OPS LESSONS: CHAOS MONKEY
Test robustness of recovery using failure
• Platform should provide uninterrupted services
to the customer
• Therefore:
• Should always recover in acceptable
amount of time
• We should have random failures to ensure
that changes have not regressed or caused
new recovery problems
http://understeer.hatenablog.com/entry/2012/02/29/224629
23
THE RIGHT WORKFLOW
Repeatably automate for consistency
• Goal is repeatable automation
• Toyota’s yellow cord
• Initially pipelines may be very
different
• Different tools
• Traditional vs. “cloud native”
• It’s a journey
• Consolidation evolves naturally
24
DESIRABLE ENTERPRISE CI/CD WORKFLOW
myRepo
Project
Repo
CI
Commit Push
Pass/Fail
Local Test
Build
Repo
CD
Release
Repo
Monitor
Build Test
Review/
Appr
Deliver Deploy
3rd
Party
25
CI/CD PIPELINE TOOLSET
CI/CD Workflow UI
gerrit
26
OPS LESSONS: RED/GREEN
Configuration as code has built in failure
Continuous Integration /
Continuous Deployment
Image & Package &
Metadata Repository
src repo
Dev./Build QA
Production
in OHC
Events
27
THE RIGHT INCENTIVES
Align rewards and behavior with desirable outcomes
• Incentives (advancement, money,
recognition) need to reward trust,
cooperation, and innovation
• Peer reward systems also valuable
• Individual has control over their own
success
• But people still have responsibility for
their actions
28
THE RIGHT CULTURE
Build systems and organizations that allow for failing well
• Transparency
• Even good decisions can have bad
outcomes
• Innovation inherently risky
• Cut losses (avoid sunk cost fallacy)
This is why open source is
so successful!
29
30
BUT CULTURE ISN’T SOMETHING YOU JUST CHANGE
• Lack of agreed-to model of what
“right” culture looks like
• Different organizations require
different behaviors
• Culture change is difficult to measure
and quantify
• Culture is very hard to impose
• Culture is an output, not an input
31
CULTURE IS:
emergent
pervasive
the keystone
plus.google.com/+RedHat
linkedin.com/company/red-hat
youtube.com/user/RedHatVideos
facebook.com/redhatinc
twitter.com/RedHatNews
THANK YOU
CREDITS
33
Tacoma Narrows Bridge: Barney Elliott; The Camera Shop - Screenshot taken from 16MM Kodachrome motion picture
film by Barney Elliott.
Time cover: Time, Inc.
Wipeout, Flickr/CC: https://www.flickr.com/photos/andymorffew/15843725192
Marshmallow challenge: http://marshmallowchallenge.com/Welcome.html
Linux Collaboration Summit: Linux Foundation.
Two pizzas: Flickr/CC https://www.flickr.com/photos/dongkwan/283076601
Frog: Kathy CC/Flickr https://flic.kr/p/b9fFV
Square peg Flickr/CC: https://www.flickr.com/photos/epublicist/3546059144/

Fail Fast, Fail Often

  • 1.
    FAIL FAST, FAILOFTEN Gordon Haff @ghaff, Technology Evangelist William Henry @ipbabble, DevOps Strategy Lead 13 July 2016
  • 2.
  • 3.
  • 4.
  • 5.
  • 6.
  • 7.
  • 8.
  • 9.
  • 10.
  • 11.
    11 Experiment by PeterSkillman, former VP of design at Palm
  • 12.
    12 WHAT HE LEARNED •Kindergarteners do not spend 15 minutes in a bunch of status transactions trying to figure out who is going to be CEO of Spaghetti Corporation. • They don’t sit around talking about the problem. They just start building to determine what works and what doesn’t.
  • 13.
    SOFTWARE = GREAT MATCHFOR FAILING WELL 13
  • 14.
  • 15.
    15 THE RIGHT SCOPE Constrainthe impact of failure • Enable experimentation • Stop cascading of failures • Make deployments incremental, frequent, and routine events • Generally decouple activities and decisions from each other • Small, autonomous, bounded context services
  • 16.
    16 SMALL • “Two pizzateams” • Well-defined functional units • Organized around business capabilities (Conway's Law)
  • 17.
    17 AUTONOMOUS • Implementation changescan happen independently of other services • Data and functionality exposed only through service calls over the network • Designed to be externalizable • No back-doors
  • 18.
    18 THE RIGHT APPROACH Continuouslyexperiment, iterate, and improve • It’s about the process • Identify mistakes early • Establish safety nets • Fail and move on
  • 19.
    19 THE PROCESS Involves peopleand communication • The most effective process have continuous communication - think scrums and kanban • Allows for collaboration that can identify failures before they happen • Allows for feedback to continuously improve and cultivate growth • Provides transparency
  • 20.
    20 DEV LESSONS: BREAKINGCODE VIOLENTLY Build in violent failures to highlight issues • C/C++ lessons: • Sanity check using assertions • Invariant checks • If ever I’m here in the code and these conditions aren’t met, then I have no business being here. Something is wrong and I should fail violently. • Involves tracing through the failure
  • 21.
    21 AUTOMATED REGRESSION TESTING •As products and services evolve we discovered that maintaining and incrementally adding new tests became valuable • These tests were/are most often based on experienced failures and bugs • Scripts were developed to run nightly builds against various developer changes to test for regression • Testing tools evolved - proprietary and open source
  • 22.
    22 OPS LESSONS: CHAOSMONKEY Test robustness of recovery using failure • Platform should provide uninterrupted services to the customer • Therefore: • Should always recover in acceptable amount of time • We should have random failures to ensure that changes have not regressed or caused new recovery problems http://understeer.hatenablog.com/entry/2012/02/29/224629
  • 23.
    23 THE RIGHT WORKFLOW Repeatablyautomate for consistency • Goal is repeatable automation • Toyota’s yellow cord • Initially pipelines may be very different • Different tools • Traditional vs. “cloud native” • It’s a journey • Consolidation evolves naturally
  • 24.
    24 DESIRABLE ENTERPRISE CI/CDWORKFLOW myRepo Project Repo CI Commit Push Pass/Fail Local Test Build Repo CD Release Repo Monitor Build Test Review/ Appr Deliver Deploy 3rd Party
  • 25.
  • 26.
    26 OPS LESSONS: RED/GREEN Configurationas code has built in failure Continuous Integration / Continuous Deployment Image & Package & Metadata Repository src repo Dev./Build QA Production in OHC Events
  • 27.
    27 THE RIGHT INCENTIVES Alignrewards and behavior with desirable outcomes • Incentives (advancement, money, recognition) need to reward trust, cooperation, and innovation • Peer reward systems also valuable • Individual has control over their own success • But people still have responsibility for their actions
  • 28.
    28 THE RIGHT CULTURE Buildsystems and organizations that allow for failing well • Transparency • Even good decisions can have bad outcomes • Innovation inherently risky • Cut losses (avoid sunk cost fallacy) This is why open source is so successful!
  • 29.
  • 30.
    30 BUT CULTURE ISN’TSOMETHING YOU JUST CHANGE • Lack of agreed-to model of what “right” culture looks like • Different organizations require different behaviors • Culture change is difficult to measure and quantify • Culture is very hard to impose • Culture is an output, not an input
  • 31.
  • 32.
  • 33.
    CREDITS 33 Tacoma Narrows Bridge:Barney Elliott; The Camera Shop - Screenshot taken from 16MM Kodachrome motion picture film by Barney Elliott. Time cover: Time, Inc. Wipeout, Flickr/CC: https://www.flickr.com/photos/andymorffew/15843725192 Marshmallow challenge: http://marshmallowchallenge.com/Welcome.html Linux Collaboration Summit: Linux Foundation. Two pizzas: Flickr/CC https://www.flickr.com/photos/dongkwan/283076601 Frog: Kathy CC/Flickr https://flic.kr/p/b9fFV Square peg Flickr/CC: https://www.flickr.com/photos/epublicist/3546059144/