Why Everyone Needs DevOps Now:
My Fourteen Year Journey
Studying High Performing IT
Organizations

Gene Kim
Session ID:
@RealGeneKim, genek@realgenekim.me
Where Did The High Performers Come From?

@RealGeneKim
Visible Ops: Playbook of High Performers
The IT Process Institute has
been studying high-performing
organizations since 1999
What is common to all the high
performers?
What is different between them
and average and low
performers?
How did they become great?

www.ITPI.org

@RealGeneKim
Act I: IT Ops Fixing Fragile Artifacts

@RealGeneKim
@RealGeneKim
The Product Managers

@RealGeneKim
Act 2: The Developers

@RealGeneKim
@RealGeneKim
@RealGeneKim
IT Ops And Dev At War

10

@RealGeneKim
Nothing Left For Infosec

@RealGeneKim
@RealGeneKim
The Downward
Spiral…

13

@RealGeneKim
@RealGeneKim
So, CEOs Don’t Trust IT…
“If IT fails I don't know why… and if IT succeeds I don't know why.”
“By managing inputs and outputs, I can hold any area of the
business accountable – except for IT…”
“Large investments in IT projects that eventual fail, without
warning. And the CIO is the first to say, ‘I told you so.’”
“I can’t hold IT accountable – IT is way too ‘slippery.’”
Source: Gene Kim 2012
15

@RealGeneKim
The IT Core Chronic Conflict
Every IT organization is pressured to simultaneously:
Respond more quickly to urgent business needs
Provide stable, secure and predictable IT service

16

Source: The authors acknowledge Dr. Eliyahu Goldratt, creator of the Theory of Constraints and
author of The Goal, has written extensively on the theory and practice of identifying and resolving
core, chronic conflicts.

@RealGeneKim
Every Company Is An IT Company…
95% of all capital projects have an IT component…
50% of all capital spending is technology-related

Where we need
to be…
IT is always in the way
(again…)
We are here…

@RealGeneKim
The Urgency Of This Business Problem
“Of the Fortune 500 companies in 1955, 87% are gone...
“In 1958, the Fortune 500 tenure was 61 years;
now it’s 18 years…”
–Richard Foster, “Creative Destruction”

18

@RealGeneKim
How Team Obama’s tech efficiency
left Romney IT in dust
Obama campaign’s tech team beat Romney by using opposite strategy—
“insourcing.”
Even taken with the software and Web hosting expenses, the Obama campaign
spent a seventh of what the Romney campaign spent on digital….
In the end, the deciding factor wasn’t what the Obama campaign spent money
on, but what it did with all that money. Insourcing gave the campaign a strategic
flexibility that the Romney campaign lacked….
“This is the difference...between a well run professional machine and a gaggle of
amateurs....
I would be shocked if such a chasm exists next cycle between the parties—these
aren’t mistakes to be repeated if you want to do things like win elections.”
http://arstechnica.com/information-technology/2012/11/how-team-obamas-tech-efficiency-left-romney-it-in-dust/

19

| Reimagining the Application Lifecycle
Build. Measure.
Learn.

Technologies
accelerate
businesspractice
changes

The massive scope of its
polling effort helped
guide the Obama
campaign in ways that
would be impossible with
conventional polling…
three-day rolling-average
tracking in each state.

“We ran the election 66,000 times
every night,” said a senior official,
describing the computer
simulations the campaign ran to
figure out Obama’s odds of
winning each swing state. “And
every morning we got the spit-out
— here are your chances of
winning these states. And that is
how we allocated resources.”

Surveys used live
interviewers, very large
sample sizes and very short
questionnaires, which
focused on vote preference
and strength of support,
with no more than a handful
of additional substantive
questions.

Hired campaign
staff engineers
from Facebook,
Twitter, Google,
Microsoft, and
technology
startups.

http://www.theatlantic.com/technology/archive/2012/11/when-the-nerds-go-marching-in/265325/
http://www.huffingtonpost.com/2012/11/21/obama-campaign-polls-2012_n_2171242.html
http://swampland.time.com/2012/11/07/inside-the-secret-world-of-quants-and-data-crunchers-who-helped-obama-win/
Act 3:
There Must Be A Better Way…

21
Source: John Allspaw

@RealGeneKim
@RealGeneKim
Source: John Allspaw

@RealGeneKim
@RealGeneKim
Source: John Allspaw

@RealGeneKim
Source: John Allspaw

@RealGeneKim
Source: Theo Schlossnagle

@RealGeneKim
Source: Theo Schlossnagle

@RealGeneKim
Source: Theo Schlossnagle

@RealGeneKim
Source: John Jenkins, Amazon.com

@RealGeneKim
@RealGeneKim
Who Is Doing DevOps?
Google, Amazon, Netflix, Etsy, Akamai, Twitter, Facebook,
Pinterest …
BNY Mellon, Bank of America, World Bank, Paychex, Intuit…
The Gap, Nordstrom, REI, Macy’s, GameStop, Target …
Portland State University, Seton Hill University, Kansas State
University…
Who else?

33

@RealGeneKim
High Performing DevOps Teams
They’re more agile
30x more frequent deployments
8,000x faster lead time than their peers

They’re more reliable
2x the change success rate
12x faster MTTR

Source: Puppet Labs 2012 State Of DevOps: http://puppetlabs.com/2013-state-of-devops-infographic

@RealGeneKim
35

@RealGeneKim
How Can We Better
Sell DevOps?

36
Eric Passmore, former SVP Global Engineering, AOL (2007)

37

@RealGeneKim
The Downward Spiral
Operations Sees…
Fragile applications are prone to
failure
Long time required to figure out “which
bit got flipped”
Detective control is a salesperson
Too much time required to restore
service
Too much firefighting and unplanned
work
Planned project work cannot complete
Frustrated customers leave
Market share goes down
Business misses Wall Street
commitments
Business makes even larger promises
to Wall Street

Dev Sees…
More urgent, date-driven projects
put into the queue
Even more fragile code put into
production
More releases have increasingly
“turbulent installs”
Release cycles lengthen to
amortize “cost of deployments”
Failing bigger deployments more
difficult to diagnose
Most senior and constrained IT
ops resources have less time to
fix underlying process problems
Ever increasing backlog of
infrastructure projects that could
fix root cause and reduce costs
Ever increasing amount of
tension between IT Ops and
Development

These aren’t IT Operations problems…
These are business problems!
Gene Kim, CTO, Tripwire, Inc. (2006)

39

@RealGeneKim
The Downward Spiral
Operations Sees…
Fragile applications are prone to
failure
Long time required to figure out “which
bit got flipped”
Detective control is a salesperson
Too much time required to restore
service
Too much firefighting and unplanned
work
Planned project work cannot complete
Frustrated customers leave
Market share goes down
Business misses Wall Street
commitments
Business makes even larger promises
to Wall Street

Dev Sees…
More urgent, date-driven projects
put into the queue
Even more fragile code put into
production
More releases have increasingly
“turbulent installs”
Release cycles lengthen to
amortize “cost of deployments”
Failing bigger deployments more
difficult to diagnose
Most senior and constrained IT
ops resources have less time to
fix underlying process problems
Ever increasing backlog of
infrastructure projects that could
fix root cause and reduce costs
Ever increasing amount of
tension between IT Ops and
Development

These aren’t IT Operations problems…
These are business problems!
Anonymous Product Manager / UX (2011)

41

@RealGeneKim
The Downward Spiral
Operations Sees…
Fragile applications are prone to
failure
Long time required to figure out “which
bit got flipped”
Detective control is a salesperson
Too much time required to restore
service
Too much firefighting and unplanned
work
Planned project work cannot complete
Frustrated customers leave
Market share goes down
Business misses Wall Street
commitments
Business makes even larger promises
to Wall Street

Dev Sees…
More urgent, date-driven projects
put into the queue
Even more fragile code put into
production
More releases have increasingly
“turbulent installs”
Release cycles lengthen to
amortize “cost of deployments”
Failing bigger deployments more
difficult to diagnose
Most senior and constrained IT
ops resources have less time to
fix underlying process problems
Ever increasing backlog of
infrastructure projects that could
fix root cause and reduce costs
Ever increasing amount of
tension between IT Ops and
Development

These aren’t IT Operations problems…
These are business problems!
Anonymous Infosec Officer (2012)

43

@RealGeneKim
44

@RealGeneKim
@RealGeneKim
“This book will have a profound
effect on IT, just as The Goal did
for manufacturing.” –Jez
Humble, co-author Continuous
Delivery
“This is the IT swamp draining
manual for anyone who is neck
deep in alligators.” –Adrian
Cockroft, Cloud Architect at
Netflix
“This is The Goal for our decade,
and is for any IT professional who
wants their life back.” –Charles
Betz, IT architect, author
“Architecture and Patterns for
IT”
46

@RealGeneKim
The First Way: Flow

@RealGeneKim
The First Way: Flow
Understand the flow of work
Always seek to increase flow
Never unconsciously pass defects downstream
Never allow local optimization to cause global degradation
Achieve profound understanding of the system

@RealGeneKim
“Annual business planning sessions can be madding. They think IT
Operations is an ‘all you can eat buffet.’”
-Ben Rockwood,
Director Systems Engineering,
Joyent

@RealGeneKim
Define The Work and Make It Visible
Business projects (e.g., new order system)
Internal IT projects (e.g., configuration management, automation,
debt reduction)
Changes (e.g., deploys, improve database performance)
Unplanned work (e.g., site down, site impaired)

50

@RealGeneKim
Questions
What is your lead time for changes? (i.e., how long does it take to
go from “code committed” to “code successfully running in
production”)
How much of that is queue time vs. run time?

51

@RealGeneKim
@RealGeneKim
@RealGeneKim
Create One Step Environment Creation Process
Make environments available early in the Development process
Make sure Dev builds the code and environment at the same time
Create a common Dev, QA and Production environment creation
process

@RealGeneKim
If I had a magic wand, I’d change the Agile sprints and definition
of “done”:
“At the end of each sprint, we must have working and shippable
code, demonstrated in an environment that resembles production.”

@RealGeneKim
Deploy Smaller Changes, More Frequently *
Decouple feature releases from code deployments
Deploy features in a disabled state, using feature flags
Require all developers check code into trunk daily (at least)
Practice deploying smaller changes, which dramatically reduces
risk and improves MTTR

56

@RealGeneKim
Breaking The Bottlenecks In The Flow
Environment creation
Code deployment
Test setup and run
Overly tight architecture
Development
Product management

57

@RealGeneKim
How organizations achieve high performance
• 89% are using infrastructure version control
• 82% are using automated code deployments

58

Source: Puppet Labs 2012 State Of DevOps: http://puppetlabs.com/2013-state-of-devops-infographic
Why Dedicated Teams Vs. Shared Services

59

@RealGeneKim
@RealGeneKim
Leankit Kanban

61

@RealGeneKim
Blackboard Learn: 2005-Present

62
Source: David Ashman, Chief Architect, Blackboard, Inc.

@RealGeneKim
Blackboard Learn Building Blocks

63
Source: David Ashman, Chief Architect, Blackboard, Inc.

@RealGeneKim
The First Way: Outcomes
Creating single repository for code and environments
Determinism in the release process
Consistent Dev, Test and Production environments, all properly built before
deployment begins
A continuous delivery pipeline that can be relied upon and daily Dev code
commits
Free ourselves from the learned behavior of catastrophic deployments
Decreased lead time
Reduce deployment times from 6 hours to 45 minutes
Refactor deployment process that had 1300+ steps spanning 4 weeks
Faster cycle time and release cadence

@RealGeneKim
The Second Way: Feedback

@RealGeneKim
The Second Way: Feedback
Understand and respond to the needs of all customers, internal
and external
Shorten and amplify all feedback loops: stop the line when
necessary
Create quality at the source
Create and embed knowledge where we need it

@RealGeneKim
Source: John Shook

67

@RealGeneKim
“We found that when we woke up developers at 2am, defects got
fixed faster than ever”
– Patrick Lightbody,
CEO, BrowserMob

@RealGeneKim
Require That Devs Manage Their Own Code For
6+ Months

Source: Tom Limoncelli, Google

69

@RealGeneKim
Test Whether Developers Qualify For IT Operations Resources
Types/frequency of pager alerts
Maturity of monitoring
System architecture review
Release process
Defect counts and severity
Production hygiene

Source: Tom Limoncelli, Google

70

@RealGeneKim
Return Fragile Services Back To Dev

Source: Tom Limoncelli, Google

71

@RealGeneKim
Feedback And Situational Awareness
“Having a
developer add a
monitoring metric
shouldn’t feel like
a schema
change.”
– John Allspaw,
SVP Tech Ops,
Etsy

72

@RealGeneKim
73

@RealGeneKim
74

@RealGeneKim
Integrating Into Continuous Delivery
The days of reviewing RFCs in Word docs in change
management meetings are over
Failures must result in automated tests in the continuous
deployment pipeline (Release, Config, Change)
Invite or embed Ops into Dev standups and the scrum teams
(“hey, we can sprint and scrums, too!”)

@RealGeneKim
Embed Dev Into IT Ops
Embed Dev into IT Ops incident escalation process
Put production monitoring in pre-production environments
Invite Dev to post-mortems/root cause analysis meeting
Have Dev and Infosec cross-train IT Operations
Ensure application monitoring/metrics to aid in Ops and Infosec
work (e.g., incident/problem management)

@RealGeneKim
What’s In It For Infosec And QA?

77

@RealGeneKim
The Second Way:
Outcomes
Defects and security issues getting fixed faster than ever
Standardized and reusable Ops and Infosec user stories now part
of the Agile process
All groups communicating and coordinating better
Everybody is getting more work done

@RealGeneKim
The Third Way:
Continual Experimentation And Learning

@RealGeneKim
The Third Way:
Continual Experimentation And Learning
Foster a culture that rewards:
Experimentation (taking risks) and learning from failure
Repetition is the prerequisite to mastery

Why?
You need a culture that keeps pushing into the danger zone
And have the habits that enable you to survive in the danger zone

@RealGeneKim
Break Things Early And Often
“Do painful things more frequently, so you can make it less painful…
We don’t get pushback from Dev, because they know it makes
rollouts smoother.”
– Adrian Cockcroft, Architect, Netflix

@RealGeneKim
82

@RealGeneKim
Inject Failures Often

@RealGeneKim
You Don’t Choose Chaos Monkey…
Chaos Monkey Chooses You

@RealGeneKim
Break Things Before Production
Enforce consistency in code, environments and configurations
across the environments
Add your ASSERTs to find misconfigurations, enforce https, etc.
Add static code analysis to automated continuous integration and
testing process

@RealGeneKim
Reduce Technical Debt
“The deal with engineering goes like this. Product management
takes 20% of the capacity right off the top and gives this to
engineering to spend as they see fit. Whatever is required to
avoid, ‘we need to stop features to rewrite code.
“If you’re in really bad shape today, you might need to make this
30% or even more of the resources. I get nervous when I find
teams that think they can get away with much less than 20%.”
– Marty Cagan, Inspired

@RealGeneKim
Allocate 20% Of Cycles To Technical Debt Reduction

@RealGeneKim
Recognize Compounding Technical Debt…

@RealGeneKim
That Gets Worse…

@RealGeneKim
And Fixing It…

Source: Pingdom

@RealGeneKim
An Innovation Culture
“By installing a rampant innovation culture, they now do 165
experiments in the three months of tax season.
Our business result? Conversion rate of the website is up 50
percent. Employee result? Everyone loves it, because now their
ideas can make it to market.”
–Scott Cook, Intuit Founder

91

@RealGeneKim
Convergence And Evolution Of Ideas
Four Steps To The Epiphany, Steven Blank (2005)
Principles Of Product Development Flow: Second Generation
Lean Product Development, Donald Reinertsen (2009)
Lean Startup, Eric Ries (2011)
Lean UX, Jeff Gothelf (2013)

92

@RealGeneKim
Performance by DevOps maturity

Organizations that implemented DevOps practices over 12
months ago were 5x more likely to be high performing than
organizations that weren’t implementing DevOps at all.
93

Source: Puppet Labs 2012 State Of DevOps: http://puppetlabs.com/2013-state-of-devops-infographic
Why Do I Think This Is Important?

94
The Downward
Spiral…

95

@RealGeneKim
@RealGeneKim
97

@RealGeneKim
If I Could Wave A Magic Wand, Everyone Will…
See the suffering downstream, and have confidence that your
intuitions and skills can make a profound and positive difference…
Become conversant with DevOps and recognize the practices
when you see them
Be energized about how practitioners can contribute in this
organizational journey
Leave with some concrete steps to get some great outcomes
Help create a team that starts putting DevOps practices into place

98

@RealGeneKim
If I Could Wave A Magic Wand, Everyone Will…
Become conversant with DevOps and recognize the practices
when you see them
Be energized about how practitioners can contribute in this
organizational journey
Leave with some concrete steps to get some great outcomes
Become a part of a team that starts putting DevOps practices into
place

99

@RealGeneKim
“Some books you give to friends,
for the joy of sharing a great
novel.
“Some books you recommend to
your colleagues and employees,
to create common ground.
“Some books you share with your
boss, to plant the seeds of a big
idea.
“The Phoenix Project is all three.”
–Jeremiah Shirk, Integration &
Infrastructure Manager at
Kansas State University
100

@RealGeneKim
Our Mission: Positively Impact The Lives Of One Million IT
Workers By 2017
Free 170 page excerpt:
http://itrevolution.com/the-phoenixproject-excerpt/
http://slideshare.net/realgenekim
DevOps Defensive Audit Toolkit
Enterprise DevOps Case Studies
Early draft of upcoming “DevOps
Cookbook” (Allspaw, DeBois, Edwards,
Humble, Kim, Orzen)
Email me at genek@realgenekim.me
@RealGeneKim

Why Everyone Needs DevOps Now: My Fourteen Year Journey Studying High Performing IT Organizations - Gene Kim, Author of The Phoenix Project

  • 1.
    Why Everyone NeedsDevOps Now: My Fourteen Year Journey Studying High Performing IT Organizations Gene Kim Session ID: @RealGeneKim, genek@realgenekim.me
  • 2.
    Where Did TheHigh Performers Come From? @RealGeneKim
  • 3.
    Visible Ops: Playbookof High Performers The IT Process Institute has been studying high-performing organizations since 1999 What is common to all the high performers? What is different between them and average and low performers? How did they become great? www.ITPI.org @RealGeneKim
  • 4.
    Act I: ITOps Fixing Fragile Artifacts @RealGeneKim
  • 5.
  • 6.
  • 7.
    Act 2: TheDevelopers @RealGeneKim
  • 8.
  • 9.
  • 10.
    IT Ops AndDev At War 10 @RealGeneKim
  • 11.
    Nothing Left ForInfosec @RealGeneKim
  • 12.
  • 13.
  • 14.
  • 15.
    So, CEOs Don’tTrust IT… “If IT fails I don't know why… and if IT succeeds I don't know why.” “By managing inputs and outputs, I can hold any area of the business accountable – except for IT…” “Large investments in IT projects that eventual fail, without warning. And the CIO is the first to say, ‘I told you so.’” “I can’t hold IT accountable – IT is way too ‘slippery.’” Source: Gene Kim 2012 15 @RealGeneKim
  • 16.
    The IT CoreChronic Conflict Every IT organization is pressured to simultaneously: Respond more quickly to urgent business needs Provide stable, secure and predictable IT service 16 Source: The authors acknowledge Dr. Eliyahu Goldratt, creator of the Theory of Constraints and author of The Goal, has written extensively on the theory and practice of identifying and resolving core, chronic conflicts. @RealGeneKim
  • 17.
    Every Company IsAn IT Company… 95% of all capital projects have an IT component… 50% of all capital spending is technology-related Where we need to be… IT is always in the way (again…) We are here… @RealGeneKim
  • 18.
    The Urgency OfThis Business Problem “Of the Fortune 500 companies in 1955, 87% are gone... “In 1958, the Fortune 500 tenure was 61 years; now it’s 18 years…” –Richard Foster, “Creative Destruction” 18 @RealGeneKim
  • 19.
    How Team Obama’stech efficiency left Romney IT in dust Obama campaign’s tech team beat Romney by using opposite strategy— “insourcing.” Even taken with the software and Web hosting expenses, the Obama campaign spent a seventh of what the Romney campaign spent on digital…. In the end, the deciding factor wasn’t what the Obama campaign spent money on, but what it did with all that money. Insourcing gave the campaign a strategic flexibility that the Romney campaign lacked…. “This is the difference...between a well run professional machine and a gaggle of amateurs.... I would be shocked if such a chasm exists next cycle between the parties—these aren’t mistakes to be repeated if you want to do things like win elections.” http://arstechnica.com/information-technology/2012/11/how-team-obamas-tech-efficiency-left-romney-it-in-dust/ 19 | Reimagining the Application Lifecycle
  • 20.
    Build. Measure. Learn. Technologies accelerate businesspractice changes The massivescope of its polling effort helped guide the Obama campaign in ways that would be impossible with conventional polling… three-day rolling-average tracking in each state. “We ran the election 66,000 times every night,” said a senior official, describing the computer simulations the campaign ran to figure out Obama’s odds of winning each swing state. “And every morning we got the spit-out — here are your chances of winning these states. And that is how we allocated resources.” Surveys used live interviewers, very large sample sizes and very short questionnaires, which focused on vote preference and strength of support, with no more than a handful of additional substantive questions. Hired campaign staff engineers from Facebook, Twitter, Google, Microsoft, and technology startups. http://www.theatlantic.com/technology/archive/2012/11/when-the-nerds-go-marching-in/265325/ http://www.huffingtonpost.com/2012/11/21/obama-campaign-polls-2012_n_2171242.html http://swampland.time.com/2012/11/07/inside-the-secret-world-of-quants-and-data-crunchers-who-helped-obama-win/
  • 21.
    Act 3: There MustBe A Better Way… 21
  • 22.
  • 23.
  • 24.
  • 25.
  • 26.
  • 27.
  • 28.
  • 29.
  • 30.
  • 31.
    Source: John Jenkins,Amazon.com @RealGeneKim
  • 32.
  • 33.
    Who Is DoingDevOps? Google, Amazon, Netflix, Etsy, Akamai, Twitter, Facebook, Pinterest … BNY Mellon, Bank of America, World Bank, Paychex, Intuit… The Gap, Nordstrom, REI, Macy’s, GameStop, Target … Portland State University, Seton Hill University, Kansas State University… Who else? 33 @RealGeneKim
  • 34.
    High Performing DevOpsTeams They’re more agile 30x more frequent deployments 8,000x faster lead time than their peers They’re more reliable 2x the change success rate 12x faster MTTR Source: Puppet Labs 2012 State Of DevOps: http://puppetlabs.com/2013-state-of-devops-infographic @RealGeneKim
  • 35.
  • 36.
    How Can WeBetter Sell DevOps? 36
  • 37.
    Eric Passmore, formerSVP Global Engineering, AOL (2007) 37 @RealGeneKim
  • 38.
    The Downward Spiral OperationsSees… Fragile applications are prone to failure Long time required to figure out “which bit got flipped” Detective control is a salesperson Too much time required to restore service Too much firefighting and unplanned work Planned project work cannot complete Frustrated customers leave Market share goes down Business misses Wall Street commitments Business makes even larger promises to Wall Street Dev Sees… More urgent, date-driven projects put into the queue Even more fragile code put into production More releases have increasingly “turbulent installs” Release cycles lengthen to amortize “cost of deployments” Failing bigger deployments more difficult to diagnose Most senior and constrained IT ops resources have less time to fix underlying process problems Ever increasing backlog of infrastructure projects that could fix root cause and reduce costs Ever increasing amount of tension between IT Ops and Development These aren’t IT Operations problems… These are business problems!
  • 39.
    Gene Kim, CTO,Tripwire, Inc. (2006) 39 @RealGeneKim
  • 40.
    The Downward Spiral OperationsSees… Fragile applications are prone to failure Long time required to figure out “which bit got flipped” Detective control is a salesperson Too much time required to restore service Too much firefighting and unplanned work Planned project work cannot complete Frustrated customers leave Market share goes down Business misses Wall Street commitments Business makes even larger promises to Wall Street Dev Sees… More urgent, date-driven projects put into the queue Even more fragile code put into production More releases have increasingly “turbulent installs” Release cycles lengthen to amortize “cost of deployments” Failing bigger deployments more difficult to diagnose Most senior and constrained IT ops resources have less time to fix underlying process problems Ever increasing backlog of infrastructure projects that could fix root cause and reduce costs Ever increasing amount of tension between IT Ops and Development These aren’t IT Operations problems… These are business problems!
  • 41.
    Anonymous Product Manager/ UX (2011) 41 @RealGeneKim
  • 42.
    The Downward Spiral OperationsSees… Fragile applications are prone to failure Long time required to figure out “which bit got flipped” Detective control is a salesperson Too much time required to restore service Too much firefighting and unplanned work Planned project work cannot complete Frustrated customers leave Market share goes down Business misses Wall Street commitments Business makes even larger promises to Wall Street Dev Sees… More urgent, date-driven projects put into the queue Even more fragile code put into production More releases have increasingly “turbulent installs” Release cycles lengthen to amortize “cost of deployments” Failing bigger deployments more difficult to diagnose Most senior and constrained IT ops resources have less time to fix underlying process problems Ever increasing backlog of infrastructure projects that could fix root cause and reduce costs Ever increasing amount of tension between IT Ops and Development These aren’t IT Operations problems… These are business problems!
  • 43.
    Anonymous Infosec Officer(2012) 43 @RealGeneKim
  • 44.
  • 45.
  • 46.
    “This book willhave a profound effect on IT, just as The Goal did for manufacturing.” –Jez Humble, co-author Continuous Delivery “This is the IT swamp draining manual for anyone who is neck deep in alligators.” –Adrian Cockroft, Cloud Architect at Netflix “This is The Goal for our decade, and is for any IT professional who wants their life back.” –Charles Betz, IT architect, author “Architecture and Patterns for IT” 46 @RealGeneKim
  • 47.
    The First Way:Flow @RealGeneKim
  • 48.
    The First Way:Flow Understand the flow of work Always seek to increase flow Never unconsciously pass defects downstream Never allow local optimization to cause global degradation Achieve profound understanding of the system @RealGeneKim
  • 49.
    “Annual business planningsessions can be madding. They think IT Operations is an ‘all you can eat buffet.’” -Ben Rockwood, Director Systems Engineering, Joyent @RealGeneKim
  • 50.
    Define The Workand Make It Visible Business projects (e.g., new order system) Internal IT projects (e.g., configuration management, automation, debt reduction) Changes (e.g., deploys, improve database performance) Unplanned work (e.g., site down, site impaired) 50 @RealGeneKim
  • 51.
    Questions What is yourlead time for changes? (i.e., how long does it take to go from “code committed” to “code successfully running in production”) How much of that is queue time vs. run time? 51 @RealGeneKim
  • 52.
  • 53.
  • 54.
    Create One StepEnvironment Creation Process Make environments available early in the Development process Make sure Dev builds the code and environment at the same time Create a common Dev, QA and Production environment creation process @RealGeneKim
  • 55.
    If I hada magic wand, I’d change the Agile sprints and definition of “done”: “At the end of each sprint, we must have working and shippable code, demonstrated in an environment that resembles production.” @RealGeneKim
  • 56.
    Deploy Smaller Changes,More Frequently * Decouple feature releases from code deployments Deploy features in a disabled state, using feature flags Require all developers check code into trunk daily (at least) Practice deploying smaller changes, which dramatically reduces risk and improves MTTR 56 @RealGeneKim
  • 57.
    Breaking The BottlenecksIn The Flow Environment creation Code deployment Test setup and run Overly tight architecture Development Product management 57 @RealGeneKim
  • 58.
    How organizations achievehigh performance • 89% are using infrastructure version control • 82% are using automated code deployments 58 Source: Puppet Labs 2012 State Of DevOps: http://puppetlabs.com/2013-state-of-devops-infographic
  • 59.
    Why Dedicated TeamsVs. Shared Services 59 @RealGeneKim
  • 60.
  • 61.
  • 62.
    Blackboard Learn: 2005-Present 62 Source:David Ashman, Chief Architect, Blackboard, Inc. @RealGeneKim
  • 63.
    Blackboard Learn BuildingBlocks 63 Source: David Ashman, Chief Architect, Blackboard, Inc. @RealGeneKim
  • 64.
    The First Way:Outcomes Creating single repository for code and environments Determinism in the release process Consistent Dev, Test and Production environments, all properly built before deployment begins A continuous delivery pipeline that can be relied upon and daily Dev code commits Free ourselves from the learned behavior of catastrophic deployments Decreased lead time Reduce deployment times from 6 hours to 45 minutes Refactor deployment process that had 1300+ steps spanning 4 weeks Faster cycle time and release cadence @RealGeneKim
  • 65.
    The Second Way:Feedback @RealGeneKim
  • 66.
    The Second Way:Feedback Understand and respond to the needs of all customers, internal and external Shorten and amplify all feedback loops: stop the line when necessary Create quality at the source Create and embed knowledge where we need it @RealGeneKim
  • 67.
  • 68.
    “We found thatwhen we woke up developers at 2am, defects got fixed faster than ever” – Patrick Lightbody, CEO, BrowserMob @RealGeneKim
  • 69.
    Require That DevsManage Their Own Code For 6+ Months Source: Tom Limoncelli, Google 69 @RealGeneKim
  • 70.
    Test Whether DevelopersQualify For IT Operations Resources Types/frequency of pager alerts Maturity of monitoring System architecture review Release process Defect counts and severity Production hygiene Source: Tom Limoncelli, Google 70 @RealGeneKim
  • 71.
    Return Fragile ServicesBack To Dev Source: Tom Limoncelli, Google 71 @RealGeneKim
  • 72.
    Feedback And SituationalAwareness “Having a developer add a monitoring metric shouldn’t feel like a schema change.” – John Allspaw, SVP Tech Ops, Etsy 72 @RealGeneKim
  • 73.
  • 74.
  • 75.
    Integrating Into ContinuousDelivery The days of reviewing RFCs in Word docs in change management meetings are over Failures must result in automated tests in the continuous deployment pipeline (Release, Config, Change) Invite or embed Ops into Dev standups and the scrum teams (“hey, we can sprint and scrums, too!”) @RealGeneKim
  • 76.
    Embed Dev IntoIT Ops Embed Dev into IT Ops incident escalation process Put production monitoring in pre-production environments Invite Dev to post-mortems/root cause analysis meeting Have Dev and Infosec cross-train IT Operations Ensure application monitoring/metrics to aid in Ops and Infosec work (e.g., incident/problem management) @RealGeneKim
  • 77.
    What’s In ItFor Infosec And QA? 77 @RealGeneKim
  • 78.
    The Second Way: Outcomes Defectsand security issues getting fixed faster than ever Standardized and reusable Ops and Infosec user stories now part of the Agile process All groups communicating and coordinating better Everybody is getting more work done @RealGeneKim
  • 79.
    The Third Way: ContinualExperimentation And Learning @RealGeneKim
  • 80.
    The Third Way: ContinualExperimentation And Learning Foster a culture that rewards: Experimentation (taking risks) and learning from failure Repetition is the prerequisite to mastery Why? You need a culture that keeps pushing into the danger zone And have the habits that enable you to survive in the danger zone @RealGeneKim
  • 81.
    Break Things EarlyAnd Often “Do painful things more frequently, so you can make it less painful… We don’t get pushback from Dev, because they know it makes rollouts smoother.” – Adrian Cockcroft, Architect, Netflix @RealGeneKim
  • 82.
  • 83.
  • 84.
    You Don’t ChooseChaos Monkey… Chaos Monkey Chooses You @RealGeneKim
  • 85.
    Break Things BeforeProduction Enforce consistency in code, environments and configurations across the environments Add your ASSERTs to find misconfigurations, enforce https, etc. Add static code analysis to automated continuous integration and testing process @RealGeneKim
  • 86.
    Reduce Technical Debt “Thedeal with engineering goes like this. Product management takes 20% of the capacity right off the top and gives this to engineering to spend as they see fit. Whatever is required to avoid, ‘we need to stop features to rewrite code. “If you’re in really bad shape today, you might need to make this 30% or even more of the resources. I get nervous when I find teams that think they can get away with much less than 20%.” – Marty Cagan, Inspired @RealGeneKim
  • 87.
    Allocate 20% OfCycles To Technical Debt Reduction @RealGeneKim
  • 88.
    Recognize Compounding TechnicalDebt… @RealGeneKim
  • 89.
  • 90.
    And Fixing It… Source:Pingdom @RealGeneKim
  • 91.
    An Innovation Culture “Byinstalling a rampant innovation culture, they now do 165 experiments in the three months of tax season. Our business result? Conversion rate of the website is up 50 percent. Employee result? Everyone loves it, because now their ideas can make it to market.” –Scott Cook, Intuit Founder 91 @RealGeneKim
  • 92.
    Convergence And EvolutionOf Ideas Four Steps To The Epiphany, Steven Blank (2005) Principles Of Product Development Flow: Second Generation Lean Product Development, Donald Reinertsen (2009) Lean Startup, Eric Ries (2011) Lean UX, Jeff Gothelf (2013) 92 @RealGeneKim
  • 93.
    Performance by DevOpsmaturity Organizations that implemented DevOps practices over 12 months ago were 5x more likely to be high performing than organizations that weren’t implementing DevOps at all. 93 Source: Puppet Labs 2012 State Of DevOps: http://puppetlabs.com/2013-state-of-devops-infographic
  • 94.
    Why Do IThink This Is Important? 94
  • 95.
  • 96.
  • 97.
  • 98.
    If I CouldWave A Magic Wand, Everyone Will… See the suffering downstream, and have confidence that your intuitions and skills can make a profound and positive difference… Become conversant with DevOps and recognize the practices when you see them Be energized about how practitioners can contribute in this organizational journey Leave with some concrete steps to get some great outcomes Help create a team that starts putting DevOps practices into place 98 @RealGeneKim
  • 99.
    If I CouldWave A Magic Wand, Everyone Will… Become conversant with DevOps and recognize the practices when you see them Be energized about how practitioners can contribute in this organizational journey Leave with some concrete steps to get some great outcomes Become a part of a team that starts putting DevOps practices into place 99 @RealGeneKim
  • 100.
    “Some books yougive to friends, for the joy of sharing a great novel. “Some books you recommend to your colleagues and employees, to create common ground. “Some books you share with your boss, to plant the seeds of a big idea. “The Phoenix Project is all three.” –Jeremiah Shirk, Integration & Infrastructure Manager at Kansas State University 100 @RealGeneKim
  • 101.
    Our Mission: PositivelyImpact The Lives Of One Million IT Workers By 2017 Free 170 page excerpt: http://itrevolution.com/the-phoenixproject-excerpt/ http://slideshare.net/realgenekim DevOps Defensive Audit Toolkit Enterprise DevOps Case Studies Early draft of upcoming “DevOps Cookbook” (Allspaw, DeBois, Edwards, Humble, Kim, Orzen) Email me at genek@realgenekim.me @RealGeneKim